Topics in functional data analysis with biological applications

dc.contributorCarroll, Raymond J.
dc.contributorHsing, Tailen
dc.creatorLi, Yehua
dc.date.accessioned2010-01-15T00:16:00Z
dc.date.accessioned2010-01-16T02:17:37Z
dc.date.accessioned2017-04-07T19:56:50Z
dc.date.available2010-01-15T00:16:00Z
dc.date.available2010-01-16T02:17:37Z
dc.date.available2017-04-07T19:56:50Z
dc.date.created2006-08
dc.date.issued2009-06-02
dc.description.abstractFunctional data analysis (FDA) is an active field of statistics, in which the primary subjects in the study are curves. My dissertation consists of two innovative applications of functional data analysis in biology. The data that motivated the research broadened the scope of FDA and demanded new methodology. I develop new nonparametric methods to make various estimations, and I focus on developing large sample theories for the proposed estimators. The first project is motivated from a colon carcinogenesis study, the goal of which is to study the function of a protein (p27) in colon cancer development. In this study, a number of colonic crypts (units) were sampled from each rat (subject) at random locations along the colon, and then repeated measurements on the protein expression level were made on each cell (subunit) within the selected crypts. In this problem, measurements within each crypt can be viewed as a function, since the measurements can be indexed by the cell locations. The functions from the same subject are spatially correlated along the colon, and my goal is to estimate this correlation function using nonparametric methods. We use this data set as an motivation and propose a kernel estimator of the correlation function in a more general framework. We develop a pointwise asymptotic normal distribution for the proposed estimator when the number of subjects is fixed and the number of units within each subject goes to infinity. Based on the asymptotic theory, we propose a weighted block bootstrapping method for making inferences about the correlation function, where the weights account for the inhomogeneity of the distribution of the unit locations. Simulation studies are also provided to illustrate the numerical performance of the proposed method. My second project is on a lipoprotein profile data, where the goal is to use lipoprotein profile curves to predict the cholesterol level in human blood. Again, motivated by the data, we consider a more general problem: the functional linear models (Ramsay and Silverman, 1997) with functional predictor and scalar response. There is literature developing different methods for this model; however, there is little theory to support the methods. Therefore, we focus more on the theoretical properties of this model. There are other contemporary theoretical work on methods based on Principal Component Regression. Our work is different in the sense that we base our method on roughness penalty approach and consider a more realistic scenario that the functional predictor is observed only on discrete points. To reduce the difficulty of the theoretical derivations, we restrict the functions with a periodic boundary condition and develop an asymptotic convergence rate for this problem in Chapter III. A more general result based on splines is a future research topic that I give some discussion in Chapter IV.
dc.identifier.urihttp://hdl.handle.net/1969.1/ETD-TAMU-1867
dc.language.isoen_US
dc.subjectFunctional Data Analysis
dc.subjectNonparametric statistics
dc.titleTopics in functional data analysis with biological applications
dc.typeBook
dc.typeThesis

Files