Efficient inference in general semiparametric regression models
Abstract
Semiparametric regression has become very popular in the field of Statistics over the years. While on one hand more and more sophisticated models are being developed, on the other hand the resulting theory and estimation process has become more and more involved. The main problems that are addressed in this work are related to efficient inferential procedures in general semiparametric regression problems. We first discuss efficient estimation of population-level summaries in general semiparametric regression models. Here our focus is on estimating general population-level quantities that combine the parametric and nonparametric parts of the model (e.g., population mean, probabilities, etc.). We place this problem in a general context, provide a general kernel-based methodology, and derive the asymptotic distributions of estimates of these population-level quantities, showing that in many cases the estimates are semiparametric efficient. Next, motivated from the problem of testing for genetic effects on complex traits in the presence of gene-environment interaction, we consider developing score test in general semiparametric regression problems that involves Tukey style 1 d.f form of interaction between parametrically and non-parametrically modeled covariates. We develop adjusted score statistics which are unbiased and asymptotically efficient and can be performed using standard bandwidth selection methods. In addition, to over come the difficulty of solving functional equations, we give easy interpretations of the target functions, which in turn allow us to develop estimation procedures that can be easily implemented using standard computational methods. Finally, we take up the important problem of estimation in a general semiparametric regression model when covariates are measured with an additive measurement error structure having normally distributed measurement errors. In contrast to methods that require solving integral equation of dimension the size of the covariate measured with error, we propose methodology based on Monte Carlo corrected scores to estimate the model components and investigate the asymptotic behavior of the estimates. For each of the problems, we present simulation studies to observe the performance of the proposed inferential procedures. In addition, we apply our proposed methodology to analyze nontrivial real life data sets and present the results.