Browsing by Subject "Measurement error"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Bayesian Methods in Nutrition Epidemiology and Regression-based Predictive Models in Healthcare(2012-02-14) Zhang, SaijuanThis dissertation has mainly two parts. In the first part, we propose a bivariate nonlinear multivariate measurement error model to understand the distribution of dietary intake and extend it to a multivariate model to capture dietary patterns in nutrition epidemiology. In the second part, we propose regression-based predictive models to accurately predict surgery duration in healthcare. Understanding the distribution of episodically consumed dietary components is an important problem in public health. Short-term measurements of episodically consumed dietary components are zero-inflated skewed distributions. So-called two-part models have been developed for such data. However, there is much greater public health interest in the usual intake adjusted for caloric intake. Recently a nonlinear mixed effects model has been developed and fit by maximum likelihood using nonlinear mixed effects programs. However, the fitting is slow and unstable. We develop a Monte-Carlo-based fitting method in Chapter II. We demonstrate numerically that our methods lead to increased speed of computation, converge to reasonable solutions, and have the flexibility to be used in either a frequentist or a Bayesian manner. Diet consists of numerous foods, nutrients and other components, each of which have distinctive attributes. Increasingly nutritionists are interested in exploring them collectively to capture overall dietary patterns. We thus extend the bivariate model described in Chapter III to multivariate level. We use survey-weighted MCMC computations to fit the model, with uncertainty estimation coming from balanced repeated replication. The methodology is illustrated through an application of estimating the population distribution of the Healthy Eating Index-2005 (HEI-2005), a multi-component dietary quality index , among children aged 2-8 in the United States. The second part of this dissertation is to accurately predict surgery duration. Prior research has identified the current procedural terminology (CPT) codes as the most important factor when predicting surgical case durations but there has been little reporting of a general predictive methodology using it effectively. In Chapter IV, we propose two regression-based predictive models. However, the naively constructed design matrix is singular. We thus devise a systematic procedure to construct a fullranked design matrix. Using surgical data from a central Texas hospital, we compare the proposed models with a few benchmark methods and demonstrate that our models lead to a remarkable reduction in prediction errors.Item New Advances in Logistic Regression for Handling Missing and Mismeasured Data with Applications in Biostatistics(2014-05-30) Miao, JingangAs a probabilistic statistical classification model, logistic regression (or logit regression) is widely used to model the outcome of a categorical dependent variable based on one or more predictor variables/features. We study two problems related to logistic regression with applications in biostatistics. In the first problem, we study multivariate disease classification in the presence of partially missing disease traits. In modern cancer epidemiology, diseases are classified based on pathologic and molecular traits, and different combinations of these traits give rise to many disease subtypes. The effect of predictor variables can be measured by fitting a polytomous logistic model to such data. The differences (heterogeneity) among the relative risk parameters associated with subtypes are of great interest to better understand disease etiology. Due to the heterogeneity of the relative risk parameters, when a risk factor is changed, the prevalence of one subtype may change more than that of another subtype does. Estimation of the heterogeneity parameters is difficult when disease trait information is only partially observed and the number of disease subtypes is large. We consider a robust semiparametric approach based on the pseudo conditional likelihood for estimating these heterogeneity parameters. Through simulation studies, we compare the robustness and efficiency of our approach with the maximum likelihood approach. The method is then applied to analyze data from the American Cancer Society Cancer Prevention Study (CPS) II Nutrition Cohort. Weight gain was associated with the risk of breast cancer and the association varies by disease subtype. In the second problem, we use a semiparametric Bayesian method to handle measurement errors. In nutritional epidemiological studies, nutrient intakes are often measured via food frequency questionnaires and 24-hour dietary recalls. Due to self reporting, recall error, and other reasons, the measured nutrient intakes can involve a substantial amount of noise. While independence assumption between the measurement error and the true predictor is likely to be a reasonable assumption for the main effect of the predictors, this assumption is not tenable for the interaction effect of two predictors measured with error. Although there are a number of flexible methods for handling additive, homogeneous measurement error in predictors in logistic regression models, relatively less attention has been paid to handling measurement error that depends on the unobserved predictor. Therefore, we propose a semiparametric Bayesian method for handling this unorthodox measurement error scenario in logistic regression models in the presence of the interaction term. The proposed method is also designed to handle partially missing values for the error-prone surrogate variables. Through simulation studies, we assess some operating characteristics of the proposed method and compare it with the simulation extrapolation and the regression calibration method. Our method has smaller biases than the other methods. In addition, we analyze the NHANES data and assess the association between some important nutrients and high cholesterol level. Total fat and protein reinforce each other's association with the risk of having high cholesterol level.Item Semiparametric Estimation and Inference with Mis-measured, Correlated or Mixed Observations, and the Application in Ecology, Medicine and Neurology(2013-10-21) Xu, KunThe dissertation considers semiparametric regression models inspired by statistical problems in ecological, medical and neurological studies. In those models, the interest is usually on the estimation of a set of finite parameters with difficulties of handling some unknown distribution functions or some other unknown structures. Developing novel semiparametric treatments and deriving a class of consistent and efficient estimators can not only provide us with better inferences, but also a general framework in those studies. In capture-recapture models for closed populations, the goal is to estimate the abundance of population. When multiple error-prone measurements of a covariate are available, we discover that no suitable complete and sufficient statistic exists due to the identity between the number of captures and the number of measurements. Hence the existing treatment utilizing such statistic no longer apply. Our investigation indicates that the familiar strategy of generalized method of moments can only resolve the issue with high capture probabilities. Further complexity includes the loss of the surrogacy assumption, commonly assumed in most measurement error problems. We devise a novel semiparametric treatment to overcome those difficulties. Simulation studies and real data analysis show good performance of our method. In HIV research, we study errors-in-variables problems when the response is bi- nary and instrumental variables are available. We construct consistent estimators through taking advantage of the prediction relation between the unobservable variables and the instruments. The asymptotic properties of the new estimator are established, and illustrated through simulation studies. We also demonstrate that the method can be readily generalized to generalized linear models and beyond. The usefulness of the method is illustrated through a real data example. Lastly, we nonparametrically estimate distribution functions for multiple populations in kin-cohort studies. The data is mixed and known to belong to a specific population with certain probabilities. Some of the observations can be further correlated, and are subject to censoring. We estimate the distributions in an optimal way through using the optimal base estimators and then combine the estimators optimally as well. The optimality implies both estimation consistency and minimum estimation variability. One obvious advantage is that our estimator does not assume any parametric forms of the distributions, and does not require to know or to model the potential correlation structure. Analysis on the Huntington?s disease data is performed to illustrate the effectiveness of the method.