Bayesian Semiparametric Density Deconvolution and Regression in the Presence of Measurement Errors



Journal Title

Journal ISSN

Volume Title



Although the literature on measurement error problems is quite extensive, solutions to even the most fundamental measurement error problems like density deconvolution and regression with errors-in-covariates are available only under numerous simplifying and unrealistic assumptions. This dissertation demonstrates that Bayesian methods, by accommodating measurement errors through natural hierarchies, can provide a very powerful framework for solving these important measurement errors problems under more realistic scenarios. However, the very presence of measurement errors often renders techniques that are successful in measurement error free scenarios inefficient, numerically unstable, computationally challenging or intractable. Additionally, measurement error problems often have unique features that compound modeling and computational challenges.

In this dissertation, we develop novel Bayesian semiparametric approaches that cater to these unique challenges of measurement error problems and allow us to break free from many restrictive parametric assumptions of previously existing approaches. In this dissertation, we first consider the problem of univariate density deconvolution when replicated proxies are available for each unknown value of the variable of interest. Existing deconvolution methods often make restrictive and unrealistic assumptions about the density of interest and the distribution of measurement errors, e.g., normality and homoscedasticity and thus independence from the variable of interest. We relax these assumptions and develop robust and efficient deconvolution approaches based on Dirichlet process mixture models and mixtures of B-splines in the presence of conditionally heteroscedastic measurement errors. We then extend the methodology to nonlinear univariate regression with errors-in-covariates problems when the densities of the covariate, the regression errors and the measurement errors are all unknown, and the regression and the measurement errors are conditionally heteroscedastic. The final section of this dissertation is devoted to the development of flexible multivariate density deconvolution approaches. The methods available in the existing sparse literature all assume the measurement error density to be fully specified. In contrast, we develop multivariate deconvolution approaches for scenarios when the measurement error density is unknown but replicated proxies are available for each subject. We consider scenarios when the measurement errors are distributed independently from the vector valued variable of interest as well as scenarios when they are conditionally heteroscedastic. To meet the significantly harder modeling and computational challenges of the multivariate problem, we exploit properties of finite mixture models, multivariate normal kernels, latent factor models and exchangeable priors in many novel ways.

We provide theoretical results showing the flexibility of the proposed models. In simulation experiments, the proposed semiparametric methods vastly outperform previously existing approaches. Our methods also significantly outperform theoretically more flexible possible nonparametric alternatives even when the true data generating process closely conformed to these alternatives. The methods automatically encompass a variety of simplified parametric scenarios as special cases and often outperform their competitors even in those special scenarios for which the competitors were specifically designed. We illustrate practical usefulness of the proposed methodology by successfully applying the methods to problems in nutritional epidemiology. The methods can be readily adapted and applied to similar problems from other areas of applied research. The methods also provide the foundation for many interesting extensions and analyses.