Semiparametric Estimation and Inference with Mis-measured, Correlated or Mixed Observations, and the Application in Ecology, Medicine and Neurology

Date

2013-10-21

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The dissertation considers semiparametric regression models inspired by statistical problems in ecological, medical and neurological studies. In those models, the interest is usually on the estimation of a set of finite parameters with difficulties of handling some unknown distribution functions or some other unknown structures. Developing novel semiparametric treatments and deriving a class of consistent and efficient estimators can not only provide us with better inferences, but also a general framework in those studies.

In capture-recapture models for closed populations, the goal is to estimate the abundance of population. When multiple error-prone measurements of a covariate are available, we discover that no suitable complete and sufficient statistic exists due to the identity between the number of captures and the number of measurements. Hence the existing treatment utilizing such statistic no longer apply. Our investigation indicates that the familiar strategy of generalized method of moments can only resolve the issue with high capture probabilities. Further complexity includes the loss of the surrogacy assumption, commonly assumed in most measurement error problems. We devise a novel semiparametric treatment to overcome those difficulties. Simulation studies and real data analysis show good performance of our method.

In HIV research, we study errors-in-variables problems when the response is bi- nary and instrumental variables are available. We construct consistent estimators through taking advantage of the prediction relation between the unobservable variables and the instruments. The asymptotic properties of the new estimator are established, and illustrated through simulation studies. We also demonstrate that the method can be readily generalized to generalized linear models and beyond. The usefulness of the method is illustrated through a real data example.

Lastly, we nonparametrically estimate distribution functions for multiple populations in kin-cohort studies. The data is mixed and known to belong to a specific population with certain probabilities. Some of the observations can be further correlated, and are subject to censoring. We estimate the distributions in an optimal way through using the optimal base estimators and then combine the estimators optimally as well. The optimality implies both estimation consistency and minimum estimation variability. One obvious advantage is that our estimator does not assume any parametric forms of the distributions, and does not require to know or to model the potential correlation structure. Analysis on the Huntington?s disease data is performed to illustrate the effectiveness of the method.

Description

Citation