Publications and Manuscript Under Review:
A Bayesian Dual-Network Clustering Approach for Selecting Data and Parameter Granularities (with Eric T. Bradlow and Raghuram Iyengar)
Job Market Paper, Revising for Resubmission, Marketing Science
Abstract: While there are well-established methods for model selection (e.g., BIC, marginal likelihood), they generally condition on an a priori selected data (e.g., SKU-level data) and parameter granularity (e.g., brand-level parameters). That is, researchers think they are doing model selection, but what they are really doing is model selection conditional on their choices of data and parameter granularities. In this research, we propose a Bayesian dual-network clustering method as a novel way to make these two decisions simultaneously. To accomplish this, the method represents data and parameters as two separate networks with nodes being the unit of analysis (e.g., SKUs). The method then (a) clusters the two networks using a covariate-driven distance function which allows for a high degree of interpretability and (b) infers the data and parameter granularities that offer the best in-sample fit, akin to standard model selection methods. We apply our method to SKU-level demand analysis. The results show that the choices of data and parameter granularities based on our method as compared to those from extant approaches (e.g., latent class analysis) impact the demand elasticities and the optimal pricing of SKUs. We conclude by highlighting the generalizability of our framework to a broad array of marketing problems.
Selecting Data Granularity and Model Specification Using the Scaled Power Likelihood with Multiple Weights (with Eric T. Bradlow and Raghuram Iyengar)
Marketing Science, 2022
Abstract: Firms employ temporal data for predicting sales and making managerial decisions accordingly. To use such data appropriately, managers need to make two major analysis decisions: (a) the temporal granularity (e.g., weekly, monthly) and (b) an accompanying demand model. In most empirical contexts, however, model selection, sales forecasts, and managerial decisions are vulnerable to both of these choices. While extant literature has proposed methods that can select the best-fitted model (e.g., BIC) or provide predictions robust to model misspecification (e.g., weighted likelihood), most methods assume that the granularity is either correctly specified or pre-specify it. Our research fills this gap by proposing a method, the scaled power likelihood with multiple weights (SPLM), that not only identifies the best-fitted granularity-model combination jointly, but also conducts doubly (granularity and model) robust prediction against their potentially incorrect selection. An extensive set of simulations shows that SPLM has higher statistical power than extant approaches for selecting the best-fitted granularity-model combination and provides doubly robust prediction in a wide variety of mis-specified conditions. We apply our framework to predict sales for a scanner dataset and find that similar to our simulations, SPLM improves sales forecasts due to its ability to select the best-fitted pair via SPLM’s dual weights.
Working Paper and Work in Progress:
The Cure is Worse than the Disease: Individual-Level Fixed Effects in Hazard Models Induce Spurious Peer Effects (with Christophe Van den Bulte)
Graph Representation Learning for Inferring Market Structure (with Ryan Dew)
Abstract: This paper aims to uncover market structure, with a focus on complementary and substitutable relationships, within a large set of products. While understanding market structure has played a crucial role in designing new products, repositioning existing products, and planning marketing actions such as pricing, extant literature has mostly focused on learning market structure for a small subset of products or at an aggregate level (e.g., brand, category). We seek to overcome this limitation by using a modern graph representation learning technique termed Variational Graph Auto Encoder (VGAE). Specifically, we plan to extend VGAE, which has primarily been used to learn synergistic and antagonistic effects among a large set of molecules in the field of Computational Biology, to learn complementary and substitutable relationships among a large set of products.