Semiparametric functional data analysis for longitudinal/clustered data: theory and application



Journal Title

Journal ISSN

Volume Title


Texas A&M University


Semiparametric models play important roles in the field of biological statistics. In this dissertation, two types of semiparametic models are to be studied. One is the partially linear model, where the parametric part is a linear function. We are to investigate the two common estimation methods for the partially linear models when the data is correlated ?? longitudinal or clustered. The other is a semiparametric model where a latent covariate is incorporated in a mixed effects model. We will propose a semiparametric approach for estimation of this model and apply it to the study on colon carcinogenesis. First, we study the profilekernel and backfitting methods in partially linear models for clustered/longitudinal data. For independent data, despite the potential rootn inconsistency of the backfitting estimator noted by Rice (1986), the two estimators have the same asymptotic variance matrix as shown by Opsomer and Ruppert (1999). In this work, theoretical comparisons of the two estimators for multivariate responses are investigated. We show that, for correlated data, backfitting often produces a larger asymptotic variance than the profilekernel method; that is, in addition to its bias problem, the backfitting estimator does not have the same asymptotic efficiency as the profilekernel estimator when data is correlated. Consequently, the common practice of using the backfitting method to compute profilekernel estimates is no longer advised. We illustrate this in detail by following Zeger and Diggle (1994), Lin and Carroll (2001) with a working independence covariance structure for nonparametric estimation and a correlated covariance structure for parametric estimation. Numerical performance of the two estimators is investigated through a simulation study. Their application to an ophthalmology dataset is also described. Next, we study a mixed effects model where the main response and covariate variables are linked through the positions where they are measured. But for technical reasons, they are not measured at the same positions. We propose a semiparametric approach for this misaligned measurements problem and derive the asymptotic properties of the semiparametric estimators under reasonable conditions. An application of the semiparametric method to a colon carcinogenesis study is provided. We find that, as compared with the corn oil supplemented diet, fish oil supplemented diet tends to inhibit the increment of bcl2 (oncogene) gene expression in rats when the amount of DNA damage increases, and thus promotes apoptosis.