Efficient Semiparametric Estimators for Biological, Genetic, and Measurement Error Applications

Date

2012-10-19

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Many statistical models, like measurement error models, a general class of survival models, and a mixture data model with random censoring, are semiparametric where interest lies in estimating finite-dimensional parameters in the presence of infinite-dimensional nuisance parameters. Developing efficient estimators for the parameters of interest in these models is important because such estimators provide better inferences.

For a general regression model with measurement error, we utilize semiparametric theory to develop an unprecedented estimation procedure which delivers consistent estimators even when the model error and latent variable distributions are misspecified. Until now, root-n consistent estimators for this setting were not attainable except for special cases, like a polynomial relationship between the response and mismeasured variables. Through simulation studies and a nutrition study application, we demonstrate that our method outperforms existing methods which ignore measurement error or require a correct model error distribution.

In randomized clinical trials, scientists often compare two-sample survival data with a log-rank test. The two groups typically have nonproportional hazards, however, and using a log rank test results in substantial power loss. To ameliorate this issue and improve model efficiency, we propose a model-free strategy of incorporating auxiliary covariates in a general class of survival models. Our approach produces an unbiased, asymptotically normal estimator with significant efficiency gains over current methods.

Lastly, we apply semiparametric theory to mixture data models common in kin-cohort designs of Huntington's disease where interest lies in comparing the estimated age-at-death distributions for disease gene carriers and non-carriers. The distribution of the observed, possibly censored, outcome is a mixture of the genotype-specific distributions where the mixing proportions are computed based on the genotypes which are independent of the trait outcomes. Current methods for such data include a Cox proportional hazards model which is susceptible to model misspecification, and two types of nonparametric maximum likelihood estimators which are either inefficient or inconsistent. Using semiparametric theory, we propose an inverse probability weighting estimator (IPW), a nonparametrically imputed estimator and an optimal augmented IPW estimator which provide more reasonable estimates for the age-at-death distributions, and are not susceptible to model misspecification nor poor efficiencies.

Description

Citation