Browsing by Subject "Bayesian estimation"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Analysis and Optimization of Classifier Error Estimator Performance within a Bayesian Modeling Framework(2012-07-16) Dalton, Lori AnneWith the advent of high-throughput genomic and proteomic technologies, in conjunction with the difficulty in obtaining even moderately sized samples, small-sample classifier design has become a major issue in the biological and medical communities. Training-data error estimation becomes mandatory, yet none of the popular error estimation techniques have been rigorously designed via statistical inference or optimization. In this investigation, we place classifier error estimation in a framework of minimum mean-square error (MMSE) signal estimation in the presence of uncertainty, where uncertainty is relative to a prior over a family of distributions. This results in a Bayesian approach to error estimation that is optimal and unbiased relative to the model. The prior addresses a trade-off between estimator robustness (modeling assumptions) and accuracy. Closed-form representations for Bayesian error estimators are provided for two important models: discrete classification with Dirichlet priors (the discrete model) and linear classification of Gaussian distributions with fixed, scaled identity or arbitrary covariances and conjugate priors (the Gaussian model). We examine robustness to false modeling assumptions and demonstrate that Bayesian error estimators perform especially well for moderate true errors. The Bayesian modeling framework facilitates both optimization and analysis. It naturally gives rise to a practical expected measure of performance for arbitrary error estimators: the sample-conditioned mean-square error (MSE). Closed-form expressions are provided for both Bayesian models. We examine the consistency of Bayesian error estimation and illustrate a salient application in censored sampling, where sample points are collected one at a time until the conditional MSE reaches a stopping criterion. We address practical considerations for gene-expression microarray data, including the suitability of the Gaussian model, a methodology for calibrating normal-inverse-Wishart priors from unused data, and an approximation method for non-linear classification. We observe superior performance on synthetic high-dimensional data and real data, especially for moderate to high expected true errors and small feature sizes. Finally, arbitrary error estimators may be optimally calibrated assuming a fixed Bayesian model, sample size, classification rule, and error estimation rule. Using a calibration function mapping error estimates to their optimally calibrated values off-line, error estimates may be calibrated on the fly whenever the assumptions apply.Item Bayesian ridge estimation of age-period-cohort models(2014-08) Xu, Minle; Powers, Daniel A.Age-Period-Cohort models offer a useful framework to study trends of time-specific phenomena in various areas. Yet the perfect linear relationship among age, period, and cohort induces a singular design matrix and brings about the identification issue of age, period, and cohort model due to the identity Cohort = Period -- Age. Over the last few decades, multiple methods have been proposed to cope with the identification issue, e.g., the intrinsic estimator (IE), which may be viewed as a limiting form of ridge regression. This study views the ridge estimator from a Bayesian perspective by introducing a prior distribution(s) for the ridge parameter(s). Data used in this study describe the incidence rate of cervical cancer among Ontario women from 1960 to 1994. Results indicate that a Bayesian ridge model with a common prior for the ridge parameter yields estimates of age, period, and cohort effects similar to those based on the intrinsic estimator and to those based on a ridge estimator. The performance of Bayesian models with distinctive priors for the ridge parameters of age, period, and cohort effects is affected more by the choice of prior distributions. In sum, a Bayesian ridge model is an alternative way to deal with the identification problem of age, period, and cohort model. Future studies should further investigate the influences of different prior choices on Bayesian ridge models.Item Estimating phylogenetic trees from discrete morphological data(2015-05) Wright, April Marie; Hillis, David M., 1958-; Cannatella, David C; Jansen, Robert K; Linder, Craig R; Smith, Martha KMorphological characters have a long history of use in the estimation of phylogenetic trees. Datasets consisting of morphological characters are most often analyzed using the maximum parsimony criterion, which seeks to minimize the amount of character change across a phylogenetic tree. When combined with molecular data, characters are often analyzed using model-based methods, such as maximum likelihood or, more commonly, Bayesian estimation. The efficacy of likelihood and Bayesian methods using a common model for estimating topology from discrete morphological characters, the Mk model, is poorly-explored. In Chapter One, I explore the efficacy of Bayesian estimation of phylogeny, using the Mk model, under conditions that are commonly encountered in paleontological studies. Using simulated data, I describe the relative performances of parsimony and the Mk model under a range of realistic conditions that include common scenarios of missing data and rate heterogeneity. I further examine the use of the Mk model in Chapter Two. Like any model, the Mk model makes a number of assumptions. One is that transition between character states are symmetric (i.e., there is an equal probability of changing from state 0 to state 1 and from state 1 to state 0). Many characters, including alleged Dollo characters and extremely labile characters, may not fit this assumption. I tested methods for relaxing this assumption in a Bayesian context. Using empirical datasets, I performed model fitting to demonstrate cases in which modelling asymmetric transitions among characters is preferred. I used simulated datasets to demonstrate that choosing the best-fit model of transition state symmetry can improve model fit and phylogenetic estimation. In my final chapter, I looked at the use of partitions to model datasets more appropriately. Common in molecular studies, partitioning breaks up the dataset into pieces that evolve according to similar mechanisms. These pieces, called partitions, are then modeled separately. This practice has not been widely adopted in morphological studies. I extended the PartitionFinder software, which is used in molecular studies to score different possible partition schemes to find the one which best models the dataset. I used empirical datasets to demonstrate the effects of partitioning datasets on model likelihoods and on the phylogenetic trees estimated from those datasets.Item Predicting influenza hospitalizations(2012-08) Ramakrishnan, Anurekha; Meyers, Lauren Ancel; Damien, Paul, 1960-Seasonal influenza epidemics are a major public health concern, causing three to five million cases of severe illness and about 250,000 to 500,000 deaths worldwide. Given the unpredictability of these epidemics, hospitals and health authorities are often left unprepared to handle the sudden surge in demand. Hence early detection of disease activity is fundamental to reduce the burden on the healthcare system, to provide the most effective care for infected patients and to optimize the timing of control efforts. Early detection requires reliable forecasting methods that make efficient use of surveillance data. We developed a dynamic Bayesian estimator to predict weekly hospitalizations due to influenza related illnesses in the state of Texas. The prediction of peak hospitalizations using our model is accurate both in terms of number of hospitalizations and the time at which the peak occurs. For 1-to 8 week predictions, the predicted number of hospitalizations was within 8% of actual value and the predicted time of occurrence was within a week of actual peak.