Browsing by Subject "Bayesian inference"
Now showing 1 - 15 of 15
Results Per Page
Sort Options
Item Bayesian approaches for modeling protein biophysics(2014-08) Hines, Keegan; Aldrich, R. W. (Richard W.)Proteins are the fundamental unit of computation and signal processing in biological systems. A quantitative understanding of protein biophysics is of paramount importance, since even slight malfunction of proteins can lead to diverse and severe disease states. However, developing accurate and useful mechanistic models of protein function can be strikingly elusive. I demonstrate that the adoption of Bayesian statistical methods can greatly aid in modeling protein systems. I first discuss the pitfall of parameter non-identifiability and how a Bayesian approach to modeling can yield reliable and meaningful models of molecular systems. I then delve into a particular case of non-identifiability within the context of an emerging experimental technique called single molecule photobleaching. I show that the interpretation of this data is non-trivial and provide a rigorous inference model for the analysis of this pervasive experimental tool. Finally, I introduce the use of nonparametric Bayesian inference for the analysis of single molecule time series. These methods aim to circumvent problems of model selection and parameter identifiability and are demonstrated with diverse applications in single molecule biophysics. The adoption of sophisticated inference methods will lead to a more detailed understanding of biophysical systems.Item Bayesian variable selection in clustering via dirichlet process mixture models(Texas A&M University, 2007-09-17) Kim, SinaeThe increased collection of high-dimensional data in various fields has raised a strong interest in clustering algorithms and variable selection procedures. In this disserta- tion, I propose a model-based method that addresses the two problems simultane- ously. I use Dirichlet process mixture models to define the cluster structure and to introduce in the model a latent binary vector to identify discriminating variables. I update the variable selection index using a Metropolis algorithm and obtain inference on the cluster structure via a split-merge Markov chain Monte Carlo technique. I evaluate the method on simulated data and illustrate an application with a DNA microarray study. I also show that the methodology can be adapted to the problem of clustering functional high-dimensional data. There I employ wavelet thresholding methods in order to reduce the dimension of the data and to remove noise from the observed curves. I then apply variable selection and sample clustering methods in the wavelet domain. Thus my methodology is wavelet-based and aims at clustering the curves while identifying wavelet coefficients describing discriminating local features. I exemplify the method on high-dimensional and high-frequency tidal volume traces measured under an induced panic attack model in normal humans.Item Characterization of Nonlinear Material Response in the Presence of Large Uncertainties ? A Bayesian Approach(2013-12-06) Doraiswamy, SrikrishnaThe aim of the current work is to develop a Bayesian approach to model and simulate the behavior of materials with nonlinear mechanical response in the presence of significant uncertainties in the experimental data as well as the applicability of models. The core idea of this approach is to combine deterministic approaches by the use of physics based models, with ideas from Bayesian inference to account for such uncertainties. Traditionally, parameters of models in mechanics have been identified through deterministic approaches to obtain single point estimates. Such methods perform very well for linear models and are the preferred approach in identifying model parameters, especially for precisely engineered systems such as structures and machinery. But in the presence of large variations such as in the response of biological materials, such deterministic approaches do not sufficiently capture the uncertainty in the response. We propose that the model parameters need to encode the spread that is observed in the data in addition to modeling the physics of the system. To this end, we propose the idea of probability distributions for model parameters in order to incorporate the uncertainty in the data. We demonstrate this probabilistic approach to identifying model parameters with the example of two problems: the characterization of sheep arteries using data from inflation experiments and the problem of detecting an inhomogeneity in a cantilever beam. The parameters in the artery characterization problem are the model parameters in the constitutive models and in the cantilever problem the parameters are the stiffnesses of the inhomogeneity and the material of the beam. For each of these problems, we compute the probability distribution of the parameters using Bayesian inference. We show that the probability distributions of parameters can be used towards two kinds of diagnostics: assigning probability to a hypothesis (inhomogeneity detection problem) and using the probability distribution for classifying newly obtained data (characterization of artery data). For the inhomogeneity detection problem, the hypothesis is a statement on the ratio of the stiffnesses and it is observed that the probability of the hypothesis matches well with the data. In the case of the artery characterization problem, new data was successfully classified using the probability distributions computed with training data.Item A comparison of two Markov Chain Monte Carlo methods for sampling from unnormalized discrete distributions(2015-05) Gillett, Carlos Townes; Walker, Stephen G., 1945-; Scott, JamesThis report compares the convergence behavior of the Metropolis-Hastings and an alternative Markov Chain Monte Carlo sampling algorithm targeting unnormalized, discrete distributions with countably infinite sample spaces. The two methods are compared through a simulation study in which each is used to generate samples from a known distribution. We find that the alternative sampler generates increasingly independent samples as the scale parameter is increased, in contrast to the Metropolis-Hastings. These results suggest that, regardless of the target distribution, our alternative algorithm can generate Markov chains with less autocorrelation than even an optimally scaled Metropolis-Hastings algorithm. We conclude that this alternative algorithm represents a valuable addition to extant Markov Chain Monte Carlo Methods.Item A computational framework for the solution of infinite-dimensional Bayesian statistical inverse problems with application to global seismic inversion(2015-08) Martin, James Robert, Ph. D.; Ghattas, Omar N.; Biros, George; Demkowicz, Leszek; Fomel, Sergey; Marzouk, Youssef; Moser, RobertQuantifying uncertainties in large-scale forward and inverse PDE simulations has emerged as a central challenge facing the field of computational science and engineering. The promise of modeling and simulation for prediction, design, and control cannot be fully realized unless uncertainties in models are rigorously quantified, since this uncertainty can potentially overwhelm the computed result. While statistical inverse problems can be solved today for smaller models with a handful of uncertain parameters, this task is computationally intractable using contemporary algorithms for complex systems characterized by large-scale simulations and high-dimensional parameter spaces. In this dissertation, I address issues regarding the theoretical formulation, numerical approximation, and algorithms for solution of infinite-dimensional Bayesian statistical inverse problems, and apply the entire framework to a problem in global seismic wave propagation. Classical (deterministic) approaches to solving inverse problems attempt to recover the “best-fit” parameters that match given observation data, as measured in a particular metric. In the statistical inverse problem, we go one step further to return not only a point estimate of the best medium properties, but also a complete statistical description of the uncertain parameters. The result is a posterior probability distribution that describes our state of knowledge after learning from the available data, and provides a complete description of parameter uncertainty. In this dissertation, a computational framework for such problems is described that wraps around the existing forward solvers, as long as they are appropriately equipped, for a given physical problem. Then a collection of tools, insights and numerical methods may be applied to solve the problem, and interrogate the resulting posterior distribution, which describes our final state of knowledge. We demonstrate the framework with numerical examples, including inference of a heterogeneous compressional wavespeed field for a problem in global seismic wave propagation with 10⁶ parameters.Item The effects of three different priors for variance parameters in the normal-mean hierarchical model(2010-05) Chen, Zhu, 1985-; Greenberg, Betsy S.; Sager, Thomas W.Many prior distributions are suggested for variance parameters in the hierarchical model. The “Non-informative” interval of the conjugate inverse-gamma prior might cause problems. I consider three priors – conjugate inverse-gamma, log-normal and truncated normal for the variance parameters and do the numerical analysis on Gelman’s 8-schools data. Then with the posterior draws, I compare the Bayesian credible intervals of parameters using the three priors. I use predictive distributions to do predictions and then discuss the differences of the three priors suggested.Item Forward and inverse modeling of fire physics towards fire scene reconstructions(2013-05) Overholt, Kristopher James; Ezekoye, Ofodike A.Fire models are routinely used to evaluate life safety aspects of building design projects and are being used more often in fire and arson investigations as well as reconstructions of firefighter line-of-duty deaths and injuries. A fire within a compartment effectively leaves behind a record of fire activity and history (i.e., fire signatures). Fire and arson investigators can utilize these fire signatures in the determination of cause and origin during fire reconstruction exercises. Researchers conducting fire experiments can utilize this record of fire activity to better understand the underlying physics. In all of these applications, the fire heat release rate (HRR), location of a fire, and smoke production are important parameters that govern the evolution of thermal conditions within a fire compartment. These input parameters can be a large source of uncertainty in fire models, especially in scenarios in which experimental data or detailed information on fire behavior are not available. To better understand fire behavior indicators related to soot, the deposition of soot onto surfaces was considered. Improvements to a soot deposition submodel were implemented in a computational fluid dynamics (CFD) fire model. To better understand fire behavior indicators related to fire size, an inverse HRR methodology was developed that calculates a transient HRR in a compartment based on measured temperatures resulting from a fire source. To address issues related to the uncertainty of input parameters, an inversion framework was developed that has applications towards fire scene reconstructions. Rather than using point estimates of input parameters, a statistical inversion framework based on the Bayesian inference approach was used to determine probability distributions of input parameters. These probability distributions contain uncertainty information about the input parameters and can be propagated through fire models to obtain uncertainty information about predicted quantities of interest. The Bayesian inference approach was applied to various fire problems and coupled with zone and CFD fire models to extend the physical capability and accuracy of the inversion framework. Example applications include the estimation of both steady-state and transient fire sizes in a compartment, material properties related to pyrolysis, and the location of a fire in a compartment.Item Mixtures of triangular densities with applications to Bayesian mode regressions(2014-08) Ho, Chi-San; Damien, Paul, 1960-The main focus of this thesis is to develop full parametric and semiparametric Bayesian inference for data arising from triangular distributions. A natural consequence of working with such distributions is it allows one to consider regression models where the response variable is now the mode of the data distribution. A new family of nonparametric prior distributions is developed for a certain class of convex densities of particular relevance to mode regressions. Triangular distributions arise in several contexts such as geosciences, econometrics, finance, health care management, sociology, reliability engineering, decision and risk analysis, etc. In many fields, experts, typically, have a reasonable idea about the range and most likely values that define a data distribution. Eliciting these quantities is thus, generally, easier than eliciting moments of other commonly known distributions. Using simulated and actual data, applications of triangular distributions, with and without mode regressions, in some of the aforementioned areas are tackled.Item Modeling unobserved heterogeneity of spatially correlated count data using finite-mixture random parameters(2015-05) Buddhavarapu, Prasad Naga Venkata Siva Rama; Scott, James Gordon; Prozzi, Jorge AThe main goal of this research is to propose a specification to model the unobserved heterogeneity in count outcomes. A negative binomial likelihood is utilized for modeling count data. Unobserved heterogeneity is modeled using random model parameters with finite multi-variate normal mixture prior structure. The model simultaneously accounts for potential spatial correlation of crash counts from neighboring units. The model extracts the inherent groups of road segments with crash counts that are equally sensitive to the road attributes on an average; the heterogeneity within these groups is also allowed in the proposed framework. This research employs a computationally efficient Bayesian estimation framework to perform statistical inference of the proposed model. A Markov Chain Monte Carlo (MCMC) sampling strategy is proposed that leverages recent theoretical developments in data-augmentation algorithms, and elegantly sidesteps many of the computational difficulties usually associated with Bayesian inference of count models.Item On the representation of model inadequacy : a stochastic operator approach(2016-05) Morrison, Rebecca Elizabeth; Moser, Robert deLancey; Oden, John Tinsley; Ghattas, Omar; Henkelman, Graeme; Oliver, Todd A; Simmons, Christopher SMathematical models of physical systems are subject to many sources of uncertainty such as measurement errors and uncertain initial and boundary conditions. After accounting for these uncertainties, it is often revealed that there remains some discrepancy between the model output and the observations; if so, the model is said to be inadequate. In practice, the inadequate model may be the best that is available or tractable, and so despite its inadequacy the model may be used to make predictions of unobserved quantities. In this case, a representation of the inadequacy is necessary, so the impact of the observed discrepancy can be determined. We investigate this problem in the context of chemical kinetics and propose a new technique to account for model inadequacy that is both probabilistic and physically meaningful. Chemical reactions are generally modeled by a set of nonlinear ordinary differential equations (ODEs) for the concentrations of the species and temperature. In this work, a stochastic inadequacy operator S is introduced which includes three parts. The first is represented by a random matrix which is embedded within the ODEs of the concentrations. The matrix is required to satisfy several physical constraints, and its most general form exhibits some useful properties, such as having only non-positive eigenvalues. The second is a smaller but specific set of nonlinear terms that also modifies the species’ concentrations, and the third is an operator that properly accounts for changes to the energy equation due to the previous changes. The entries of S are governed by probability distributions, which in turn are characterized by a set of hyperparameters. The model parameters and hyperparameters are calibrated using high-dimensional hierarchical Bayesian inference, with data from a range of initial conditions. This allows the use of the inadequacy operator on a wide range of scenarios, rather than correcting any particular realization of the model with a corresponding data set. We apply the method to typical problems in chemical kinetics including the reaction mechanisms of hydrogen and methane combustion. We also study how the inadequacy representation affects an unobserved quantity of interest— the flamespeed of a one-dimensional hydrogen laminar flame.Item Phylogenetic relationships of five members of the family Vespertilionidae (Chiroptera) from Malaysian Borneo(2013-05-24) Pacheco, Pablo Ricardo Rodriguez; Pacheco, Pablo Ricardo Rodriguez; Ammerman, Loren K.; Dowler, Robert C.; Amos, Bonnie B.; Braden, Heather; Angelo State University. Department of Biology.Several studies have been conducted to refine the historically unclear phylogeny of chiropterans within the family Vespertilionidae. However, the phylogenetic affinities of some taxa remain poorly resolved. My objective was to clarify the classification and phylogenetic affinities of five species (Pipistrellus petersi, Glischropus tylopus, Hesperoptenus tomesi, Philetor brachypterus, and Arielulus cuprosus) using DNA sequence data from the 12S rRNA mitochondrial gene and RAG2 nuclear gene. A total of 587 nucleotides of the 12S rRNA gene were aligned for 35 taxa, and for nuclear marker RAG2, 1231 nucleotides were aligned for 40 taxa. I performed maximum likelihood and Bayesian inference phylogenetic analyses on these taxa. Although resolution was poor overall, A. cuprosus and H. tomesi clustered with tribe Nycticeiini/Eptesicini, with Philetor brachypterus clustering with Hesperoptenus. Furthermore, Pipistrellus petersi clustered within the Hypsugine group instead of the predicted tribe Pipistrellini. Lastly, G. tylopus formed a polytomy with members of various tribes. There has been a uniform lack of resolution for this family in recent literature and the results presented here similarly provide unresolved relationships.Item Quantifying and mitigating wind power variability(2015-12) Niu, Yichuan; Santoso, Surya; Arapostathis, Aristotle; Baldick, Ross; Longoria, Raul G.; Tiwari, MohitUnderstanding variability and unpredictability of wind power is essential for improving power system reliability and energy dispatch in transmission and distribution systems. The research presented herein intends to address a major challenge in managing and utilizing wind energy with mitigated fluctuation and intermittency. Caused by the varying wind speed, power variability can be explained as power imbalances. These imbalances create power surplus or deficiency in respect to the desired demand. To ameliorate the aforementioned issue, the fluctuating wind energy needs to be properly quantified, controlled, and re-distributed to the grid. The first major study in this dissertations is to develop accurate wind turbine models and model reductions to generate wind power time-series in a laboratory time-efficient manner. Reliable wind turbine models can also perform power control events and acquire dynamic responses more realistic to a real-world condition. Therefore, a Type 4 direct-drive wind turbine with power electronic converters has been modeled and designed with detailed aerodynamic and electric parameters based on a given generator. Later, using averaging and approximation techniques for power electronic circuits, the order of the original model is lowered to boost the computational efficiency for simulating long-term wind speed data. To quantify the wind power time-series, efforts are made to enhance adaptability and robustness of the original conditional range metric (CRM) algorithm that has been proposed by literatures for quantitatively assessing the power variability within a certain time frame. The improved CRM performs better under scarce and noisy time-series data with a reduced computational complexity. Rather than using a discrete probability model, the improved method implements a continuous gamma distribution with parameters estimated by the maximum likelihood estimators. With the leverage from the aforementioned work, a wind farm level behavior can be revealed by analyzing the data through long-term simulations using individual wind turbine models. Mitigating the power variability by reserved generation sources is attempted and the generation scenarios are generalized using an unsupervised machine learning algorithm regarding power correlations of those individual wind turbines. A systematic blueprint for reducing intra-hour power variations via coordinating a fast- and a slow- response energy storage systems (ESS) has been proposed. Methods for sizing, coordination control, ESS regulation, and power dispatch schemes are illustrated in detail. Applying the real-world data, these methods have been demonstrated desirable for reducing short-term wind power variability to an expected level.Item The relationships between crime rate and income inequality : evidence from China(2013-08) Zhang, Wenjie, active 2013; Scott, James GordonThe main purpose of this study is to determine if a Bayesian approach can better capture and provide reasonable predictions for the complex linkage between crime and income inequality. In this research, we conduct a model comparison between classical inference and Bayesian inference. The conventional studies on the relationship between crime and income inequality usually employ regression analysis to demonstrate whether these two issues are associated. However, there seems to be lack of use of Bayesian approaches in regard to this matter. Studying the panel data of China from 1993 to 2009, we found that in addition to a linear mixed effects model, a Bayesian hierarchical model with informative prior is also a good model to describe the linkage between crime rate and income inequality. The choice of models really depends on the research needs and data availability.Item Selection, calibration, and validation of coarse-grained models of atomistic systems(2015-05) Farrell, Kathryn Anne; Oden, J. Tinsley (John Tinsley), 1936-; Prudhomme, Serge M.; Babuska, Ivo; Bui-Thanh, Tan; Demkowicz, Leszek; Elber, RonThis dissertation examines the development of coarse-grained models of atomistic systems for the purpose of predicting target quantities of interest in the presence of uncertainties. It addresses fundamental questions in computational science and engineering concerning model selection, calibration, and validation processes that are used to construct predictive reduced order models through a unified Bayesian framework. This framework, enhanced with the concepts of information theory, sensitivity analysis, and Occam's Razor, provides a systematic means of constructing coarse-grained models suitable for use in a prediction scenario. The novel application of a general framework of statistical calibration and validation to molecular systems is presented. Atomistic models, which themselves contain uncertainties, are treated as the ground truth and provide data for the Bayesian updating of model parameters. The open problem of the selection of appropriate coarse-grained models is addressed through the powerful notion of Bayesian model plausibility. A new, adaptive algorithm for model validation is presented. The Occam-Plausibility ALgorithm (OPAL), so named for its adherence to Occam's Razor and the use of Bayesian model plausibilities, identifies, among a large set of models, the simplest model that passes the Bayesian validation tests, and may therefore be used to predict chosen quantities of interest. By discarding or ignoring unnecessarily complex models, this algorithm contains the potential to reduce computational expense with the systematic process of considering subsets of models, as well as the implementation of the prediction scenario with the simplest valid model. An application to the construction of a coarse-grained system of polyethylene is given to demonstrate the implementation of molecular modeling techniques; the process of Bayesian selection, calibration, and validation of reduced-order models; and OPAL. The potential of the Bayesian framework for the process of coarse graining and of OPAL as a means of determining a computationally conservative valid model is illustrated on the polyethylene example.Item Value of information and the accuracy of discrete approximations(2010-08) Ramakrishnan, Arjun; Bickel, J. Eric; Lake, Larry W.Value of information is one of the key features of decision analysis. This work deals with providing a consistent and functional methodology to determine VOI on proposed well tests in the presence of uncertainties. This method strives to show that VOI analysis with the help of discretized versions of continuous probability distributions with conventional decision trees can be very accurate if the optimal method of discrete approximation is chosen rather than opting for methods such as Monte Carlo simulation to determine the VOI. This need not necessarily mean loss of accuracy at the cost of simplifying probability calculations. Both the prior and posterior probability distributions are assumed to be continuous and are discretized to find the VOI. This results in two steps of discretizations in the decision tree. Another interesting feature is that there lies a level of decision making between the two discrete approximations in the decision tree. This sets it apart from conventional discretized models since the accuracy in this case does not follow the rules and conventions that normal discrete models follow because of the decision between the two discrete approximations. The initial part of the work deals with varying the number of points chosen in the discrete model to test their accuracy against different correlation coefficients between the information and the actual values. The latter part deals more with comparing different methods of existing discretization methods and establishing conditions under which each is optimal. The problem is comprehensively dealt with in the cases of both a risk neutral and a risk averse decision maker.