Browsing by Subject "MCMC"

Now showing 1 - 17 of 17

Assessment of Eagle Ford Shale Oil and Gas Resources
(2013-07-30) Gong, Xinglai
The Eagle Ford play in south Texas is currently one of the hottest plays in the United States. In 2012, the average Eagle Ford rig count (269 rigs) was 15% of the total US rig count. Assessment of the oil and gas resources and their associated uncertainties in the early stages is critical for optimal development. The objectives of my research were to develop a probabilistic methodology that can reliably quantify the reserves and resources uncertainties in unconventional oil and gas plays, and to assess Eagle Ford shale oil and gas reserves, contingent resources, and prospective resources. I first developed a Bayesian methodology to generate probabilistic decline curves using Markov Chain Monte Carlo (MCMC) that can quantify the reserves and resources uncertainties in unconventional oil and gas plays. I then divided the Eagle Ford play from the Sligo Shelf Margin to the San Macros Arch into 8 different production regions based on fluid type, performance and geology. I used a combination of the Duong model switching to the Arps model with b = 0.3 at the minimum decline rate to model the linear flow to boundary-dominated flow behavior often observed in shale plays. Cumulative production after 20 years predicted from Monte Carlo simulation combined with reservoir simulation was used as prior information in the Bayesian decline-curve methodology. Probabilistic type decline curves for oil and gas were then generated for all production regions. The wells were aggregated probabilistically within each production region and arithmetically between production regions. The total oil reserves and resources range from a P_(90) of 5.3 to P_(10) of 28.7 billion barrels of oil (BBO), with a P_(50) of 11.7 BBO; the total gas reserves and resources range from a P_(90) of 53.4 to P_(10) of 313.5 trillion cubic feet (TCF), with a P_(50) of 121.7 TCF. These reserves and resources estimates are much higher than the U.S. Energy Information Administration?s 2011 recoverable resource estimates of 3.35 BBO and 21 TCF. The results of this study provide a critical update on the reserves and resources estimates and their associated uncertainties for the Eagle Ford shale formation of South Texas.
Bayesian Estimation of Material Properties in Case of Correlated and Insufficient Data
(2013-04-09) Giugno, Matteo
Identification of material properties has been highly discussed in recent times thanks to better technology availability and its application to the field of experimental mechanics. Bayesian approaches as Markov-chain Monte Carlo (MCMC) methods demonstrated to be reliable and suitable tools to process data, describing probability distributions and uncertainty bounds for investigated parameters in absence of explicit inverse analytical expressions. Though it is necessary to repeat experiments multiple times for good estimations, this might be not always feasible due to possible incurring limitations: the thesis addresses the problem of material properties estimation in presence of correlated and insufficient data, resulting in multivariate error modeling and high sample covariance matrix instability. To recover from the lack of information about the true covariance we analyze two different methodologies: first the hierarchical covariance modeling is investigated, then a method based on covariance shrinkage is employed. A numerical study comparing both approaches and employing finite element analysis within MCMC iterations will be presented, showing how the method based on covariance shrinkage is more suitable to post-process data for the range of problems under investigation.
Bayesian Model Selection for High-dimensional High-throughput Data
(2012-07-16) Joshi, Adarsh
Bayesian methods are often criticized on the grounds of subjectivity. Furthermore, misspecified priors can have a deleterious effect on Bayesian inference. Noting that model selection is effectively a test of many hypotheses, Dr. Valen E. Johnson sought to eliminate the need of prior specification by computing Bayes' factors from frequentist test statistics. In his pioneering work that was published in the year 2005, Dr. Johnson proposed using so-called local priors for computing Bayes? factors from test statistics. Dr. Johnson and Dr. Jianhua Hu used Bayes' factors for model selection in a linear model setting. In an independent work, Dr. Johnson and another colleage, David Rossell, investigated two families of non-local priors for testing the regression parameter in a linear model setting. These non-local priors enable greater separation between the theories of null and alternative hypotheses. In this dissertation, I extend model selection based on Bayes' factors and use nonlocal priors to define Bayes' factors based on test statistics. With these priors, I have been able to reduce the problem of prior specification to setting to just one scaling parameter. That scaling parameter can be easily set, for example, on the basis of frequentist operating characteristics of the corresponding Bayes' factors. Furthermore, the loss of information by basing a Bayes' factors on a test statistic is minimal. Along with Dr. Johnson and Dr. Hu, I used the Bayes' factors based on the likelihood ratio statistic to develop a method for clustering gene expression data. This method has performed well in both simulated examples and real datasets. An outline of that work is also included in this dissertation. Further, I extend the clustering model to a subclass of the decomposable graphical model class, which is more appropriate for genotype data sets, such as single-nucleotide polymorphism (SNP) data. Efficient FORTRAN programming has enabled me to apply the methodology to hundreds of nodes. For problems that produce computationally harder probability landscapes, I propose a modification of the Markov chain Monte Carlo algorithm to extract information regarding the important network structures in the data. This modified algorithm performs well in inferring complex network structures. I use this method to develop a prediction model for disease based on SNP data. My method performs well in cross-validation studies.
Combining Strategies for Parallel Stochastic Approximation Monte Carlo Algorithm of Big Data
(2014-10-15) Lin, Fang-Yu
Modeling and mining with massive volumes of data have become popular in recent decades. However, it is difficult to analyze on a single commodity computer because the size of data is too large. Parallel computing is widely used. As a natural methodology, the divide-and-combine (D&C) method has been applied in parallel computing. The general method of D&C is to use MCMC algorithm in each divided data set. However, MCMC algorith is computationally expensive because it requires a large number of iterations and is prone to get trapped into local optima. On the other hand, Stochastic Approximation in Monte Carlo algorithm (SAMC), a very sophisticated algorithm in theory and applications, can avoid getting trapped into local optima and produce more accurate estimation than the conventional MCMC algorithm does. Motivated by the success of SAMC, we propose parallel SAMC algorithm that can be utilized on massive data and is workable in parallel computing. It can also be applied for model selection and optimization problem. The main challenge of the parallel SAMC algorithm is how to combine the results from each parallel subset. In this work, three strategies to overcome the combining difficulties are proposed. From the simulation results, these strategies result in significant time saving and accurate estimation. Synthetic Aperture Radar Interferometry (InSAR) is a technique of analyzing deformation caused by geophysical processes. However, it is limited by signal losses which are from topographic residuals. In order to analyze the surface deformation, we have to distinguish signal losses. Many methods assume the noise has second order stationary structure without testing it. The objective of this study is to examine the second order stationary assumption for InSAR noise and develop a parametric nonstationary model in order to demonstrate the effect of making incorrect assumption on random field. It indicates that wrong stationary assumption will result in bias estimation and large variation.
Continuous Model Updating and Forecasting for a Naturally Fractured Reservoir
(2013-07-26) Almohammadi, Hisham
Recent developments in instrumentation, communication and software have enabled the integration of real-time data into the decision-making process of hydrocarbon production. Applications of real-time data integration in drilling operations and horizontal-well lateral placement are becoming industry common practice. In reservoir management, the use of real-time data has been shown to be advantageous in tasks such as improving smart-well performance and in pressure-maintenance programs. Such capabilities allow for a paradigm change in which reservoir management can be looked at as a strategy that enables a semi-continuous process of model updates and decision optimizations instead of being periodic or reactive. This is referred to as closed-loop reservoir management (CLRM). Due to the complexity of the dynamic physical processes, large sizes, and huge uncertainties associated with reservoir description, continuous model updating is a large-scale problem with a highly dimensional parameter space and high computational costs. The need for an algorithm that is both feasible for practical applications and capable of generating reliable estimates of reservoir uncertainty is a key element in CLRM. This thesis investigates the validity of Markov Chain Monte Carlo (MCMC) sampling used in a Bayesian framework as an uncertainty quantification and model-updating tool suitable for real-time applications. A 3-phase, dual-porosity, dual-permeability reservoir model is used in a synthetic experiment. Continuous probability density functions of cumulative oil production for two cases with different model updating frequencies and reservoir maturity levels are generated and compared to a case with a known geology, i.e., truth case. Results show continuously narrowing ranges for cumulative oil production, with mean values approaching the truth case as model updating advances and the reservoir becomes more mature. To deal with MCMC sampling sensitivity to increasing numbers of observed measurements, as in the case of real-time applications, a new formulation of the likelihood function is proposed. Changing the likelihood function significantly improved chain convergence, chain mixing and forecast uncertainty quantification. Further, methods to validate the sampling quality and to judge the prior model for the MCMC process in real applications are advised.
Continuous reservoir simulation model updating and forecasting using a markov chain monte carlo method
(2009-05-15) Liu, Chang
Currently, effective reservoir management systems play a very important part in exploiting reservoirs. Fully exploiting all the possible events for a petroleum reservoir is a challenge because of the infinite combinations of reservoir parameters. There is much unknown about the underlying reservoir model, which has many uncertain parameters. MCMC (Markov Chain Monte Carlo) is a more statistically rigorous sampling method, with a stronger theoretical base than other methods. The performance of the MCMC method on a high dimensional problem is a timely topic in the statistics field. This thesis suggests a way to quantify uncertainty for high dimensional problems by using the MCMC sampling process under the Bayesian frame. Based on the improved method, this thesis reports a new approach in the use of the continuous MCMC method for automatic history matching. The assimilation of the data in a continuous process is done sequentially rather than simultaneously. In addition, by doing a continuous process, the MCMC method becomes more applicable for the industry. Long periods of time to run just one realization will no longer be a big problem during the sampling process. In addition, newly observed data will be considered once it is available, leading to a better estimate. The PUNQ-S3 reservoir model is used to test two methods in this thesis. The methods are: STATIC (traditional) SIMULATION PROCESS and CONTINUOUS SIMULATION PROCESS. The continuous process provides continuously updated probabilistic forecasts of well and reservoir performance, accessible at any time. It can be used to optimize long-term reservoir performance at field scale.
Detecting calcium flux in T cells using a Bayesian model
(2015-08) Hu, Zicheng; Müller, Peter, 1963 August 9-; Ehrlich, Lauren
Upon antigen recognition, T cells are activated to carry out its effector functions. A hallmark of T cell activation is the dramatic increase of the intracellular calcium concentration (calcium influx). Indo-1 is a calcium indicator dye widely used to detect T cell activation events in in vitro assays. The use of Indo-1 to detect T cell activation events in live tissues remains a challenge, due to the high noise to signal ratio data generated. Here, we developed a Bayesian probabilistic model to identify T cell activation events from noisy Indo-1 data. The model was able to detect T cell activation events accurately from simulated data, as well as real biological data in which the time of T cell activation events are known. We then used the model to detect OTII T cells that are activated by dendritic cells in thymic medulla in Rip-OVAhi transgenic mouse. We found that dendritic cells contribute 60% of all T cell activations in the mouse model.
Distributed inference in Bayesian nonparametric models using partially collapsed MCMC
(2016-05) Zhang, Michael Minyi; Williamson, Sinead; Lin, Lizhen
Bayesian nonparametric based models are an elegant way for discovering underlying latent features within a data set, but inference in such models can be slow. Inferring latent components using Markov chain Monte Carlo either relies on an uncollapsed representation, which leads to poor mixing, or on a collapsed representation, which is usually slow. We take advantage of the fact that the latent components are conditionally independent under the given stochastic process (we apply our technique to the Dirichlet process and the Indian buffet process). Because of this conditional independence, we can partition the latent components into two parts: one part containing only the finitely many instantiated components and the other part containing the infinite tail of uninstantiated components. For the finite partition, parallel inference is simple given the instantiation of components. But for the infinite tail, performing uncollapsed MCMC leads to poor mixing and thus we collapse out the components. The resulting hybrid sampler, while being parallel, produces samples asymptotically from the true posterior.
Effects of sample size, ability distribution, and the length of Markov Chain Monte Carlo burn-in chains on the estimation of item and testlet parameters
(2011-05) Orr, Aline Pinto; Dodd, Barbara Glenzing; Suh, Youngsuk
Item Response Theory (IRT) models are the basis of modern educational measurement. In order to increase testing efficiency, modern tests make ample use of groups of questions associated with a single stimulus (testlets). This violates the IRT assumption of local independence. However, a set of measurement models, testlet response theory (TRT), has been developed to address such dependency issues. This study investigates the effects of varying sample sizes and Markov Chain Monte Carlo burn-in chain lengths on the accuracy of estimation of a TRT model’s item and testlet parameters. The following outcome measures are examined: Descriptive statistics, Pearson product-moment correlations between known and estimated parameters, and indices of measurement effectiveness for final parameter estimates.
Hessian-based response surface approximations for uncertainty quantification in large-scale statistical inverse problems, with applications to groundwater flow
(2013-08) Flath, Hannah Pearl; Ghattas, Omar N.
Subsurface flow phenomena characterize many important societal issues in energy and the environment. A key feature of these problems is that subsurface properties are uncertain, due to the sparsity of direct observations of the subsurface. The Bayesian formulation of this inverse problem provides a systematic framework for inferring uncertainty in the properties given uncertainties in the data, the forward model, and prior knowledge of the properties. We address the problem: given noisy measurements of the head, the pdf describing the noise, prior information in the form of a pdf of the hydraulic conductivity, and a groundwater flow model relating the head to the hydraulic conductivity, find the posterior probability density function (pdf) of the parameters describing the hydraulic conductivity field. Unfortunately, conventional sampling of this pdf to compute statistical moments is intractable for problems governed by large-scale forward models and high-dimensional parameter spaces. We construct a Gaussian process surrogate of the posterior pdf based on Bayesian interpolation between a set of "training" points. We employ a greedy algorithm to find the training points by solving a sequence of optimization problems where each new training point is placed at the maximizer of the error in the approximation. Scalable Newton optimization methods solve this "optimal" training point problem. We tailor the Gaussian process surrogate to the curvature of the underlying posterior pdf according to the Hessian of the log posterior at a subset of training points, made computationally tractable by a low-rank approximation of the data misfit Hessian. A Gaussian mixture approximation of the posterior is extracted from the Gaussian process surrogate, and used as a proposal in a Markov chain Monte Carlo method for sampling both the surrogate as well as the true posterior. The Gaussian process surrogate is used as a first stage approximation in a two-stage delayed acceptance MCMC method. We provide evidence for the viability of the low-rank approximation of the Hessian through numerical experiments on a large scale atmospheric contaminant transport problem and analysis of an infinite dimensional model problem. We provide similar results for our groundwater problem. We then present results from the proposed MCMC algorithms.
On the separation of preferences among marked point process wager alternatives
(2009-05-15) Park, Jee Hyuk
A wager is a one time bet, staking money on one among a collection of alternatives having uncertain reward. Wagers represent a common class of engineering decision, where ?bets? are placed on the design, deployment, and/or operation of technology. Often such wagers are characterized by alternatives having value that evolves according to some future cash flow. Here, the values of specific alternatives are derived from a cash flow modeled as a stochastic marked point process. A principal difficulty with these engineering wagers is that the probability laws governing the dynamics of random cash flow typically are not (completely) available; hence, separating the gambler?s preference among wager alternatives is quite difficult. In this dissertation, we investigate a computational approach for separating preferences among alternatives of a wager where the alternatives have values that evolve according to a marked point processes. We are particularly concerned with separating a gambler?s preferences when the probability laws on the available alternatives are not completely specified.
Parallel Markov Chain Monte Carlo Methods for Large Scale Statistical Inverse Problems
(2014-04-18) Wang, Kainan
The Bayesian method has proven to be a powerful way of modeling inverse problems. The solution to Bayesian inverse problems is the posterior distribution of estimated parameters which can provide not only estimates for the inferred parameters but also the uncertainty of these estimations. Markov chain Monte Carlo (MCMC) is a useful technique to sample the posterior distribution and information can be extracted from the sampled ensemble. However, MCMC is very expensive to compute, especially in inverse problems where the underlying forward problems involve solving differential equations. Even worse, MCMC is difficult to parallelize due to its sequential nature|that is, under the current framework, we can barely accelerate MCMC with parallel computing. We develop a new framework of parallel MCMC algorithms-the Markov chain preconditioned Monte Carlo (MCPMC) method-for sampling Bayesian inverse problems. With the help of a fast auxiliary MCMC chain running on computationally cheaper approximate models, which serves as a stochastic preconditioner to the target distribution, the sampler randomly selects candidates from the preconditioning chain for further processing on the accurate model. As this accurate model processing can be executed in parallel, the algorithm is suitable for parallel systems. We implement it using a modified master-slave architecture, analyze its potential to accelerate sampling and apply it to three examples. A two dimensional Gaussian mixture example shows that the new sampler can bring statistical efficiency in addition to increasing sampling speed. Through a 2D inverse problem with an elliptic equation as the forward model, we demonstrate the use of an enhanced error model to build the preconditioner. With a 3D optical tomography problem we use adaptive finite element methods to build the approximate model. In both examples, the MCPMC successfully samples the posterior distributions with multiple processors, demonstrating efficient speedups comparing to traditional MCMC algorithms. In addition, the 3D optical tomography example shows the feasibility of applying MCPMC towards real world, large scale, statistical inverse problems.
Statistical methods for the analysis of DSMC simulations of hypersonic shocks
(2012-05) Strand, James Stephen; Goldstein, David Benjamin, doctor of aeronautics; Moser, Robert; Varghese, Philip; Ezekoye, Ofodike; Prudencio, Ernesto
In this work, statistical techniques were employed to study the modeling of a hypersonic shock with the Direct Simulation Monte Carlo (DSMC) method, and to gain insight into how the model interacts with a set of physical parameters. Direct Simulation Monte Carlo (DSMC) is a particle based method which is useful for simulating gas dynamics in rarefied and/or highly non-equilibrium flowfields. A DSMC code was written and optimized for use in this research. The code was developed with shock tube simulations in mind, and it includes a number of improvements which allow for the efficient simulation of 1D, hypersonic shocks. Most importantly, a moving sampling region is used to obtain an accurate steady shock profile from an unsteady, moving shock wave. The code is MPI parallel and an adaptive load balancing scheme ensures that the workload is distributed properly between processors over the course of a simulation. Global, Monte Carlo based sensitivity analyses were performed in order to determine which of the parameters examined in this work most strongly affect the simulation results for two scenarios: a 0D relaxation from an initial high temperature state and a hypersonic shock. The 0D relaxation scenario was included in order to examine whether, with appropriate initial conditions, it can be viewed in some regards as a substitute for the 1D shock in a statistical sensitivity analysis. In both analyses sensitivities were calculated based on both the square of the Pearson correlation coefficient and the mutual information. The quantity of interest (QoI) chosen for these analyses was the NO density profile. This vector QoI was broken into a set of scalar QoIs, each representing the density of NO at a specific point in time (for the relaxation) or a specific streamwise location (for the shock), and sensitivities were calculated for each scalar QoI based on both measures of sensitivity. The sensitivities were then integrated over the set of scalar QoIs to determine an overall sensitivity for each parameter. A weighting function was used in the integration in order to emphasize sensitivities in the region of greatest thermal and chemical non-equilibrium. The six parameters which most strongly affect the NO density profile were found to be the same for both scenarios, which provides justification for the claim that a 0D relaxation can in some situations be used as a substitute model for a hypersonic shock. These six parameters are the pre-exponential constants in the Arrhenius rate equations for the N2 dissociation reaction N2 + N ⇄ 3N, the O2 dissociation reaction O2 + O ⇄ 3O, the NO dissociation reactions NO + N ⇄ 2N + O and NO + O ⇄ N + 2O, and the exchange reactions N2 + O ⇄ NO + N and NO + O ⇄ O2 + N. After identification of the most sensitive parameters, a synthetic data calibration was performed to demonstrate that the statistical inverse problem could be solved for the 0D relaxation scenario. The calibration was performed using the QUESO code, developed at the PECOS center at UT Austin, which employs the Delayed Rejection Adaptive Metropolis (DRAM) algorithm. The six parameters identified by the sensitivity analysis were calibrated successfully with respect to a group of synthetic datasets.
Statistical Models for Next Generation Sequencing Data
(2013-04-01) Wang, Yiyi
Three statistical models are developed to address problems in Next-Generation Sequencing data. The first two models are designed for RNA-Seq data and the third is designed for ChIP-Seq data. The first of the RNA-Seq models uses a Bayesian non- parametric model to detect genes that are differentially expressed across treatments. A negative binomial sampling distribution is used for each gene?s read count such that each gene may have its own parameters. Despite the consequent large number of parameters, parsimony is imposed by a clustering inherent in the Bayesian nonparametric framework. A Bayesian discovery procedure is adopted to calculate the probability that each gene is differentially expressed. A simulation study and real data analysis show this method will perform at least as well as existing leading methods in some cases. The second RNA-Seq model shares the framework of the first model, but replaces the usual random partition prior from the Dirichlet process by a random partition prior indexed by distances from Gene Ontology (GO). The use of the external biological information yields improvements in statistical power over the original Bayesian discovery procedure. The third model addresses the problem of identifying protein binding sites for ChIP-Seq data. An exact test via a stochastic approximation is used to test the hypothesis that the treatment effect is independent of the sequence count intensity effect. The sliding window procedure for ChIP-Seq data is followed. The p-value and the adjusted false discovery rate are calculated for each window. For the sites identified as peak regions, three candidate models are proposed for characterizing the bimodality of the ChIP-Seq data, and the stochastic approximation in Monte Carlo (SAMC) method is used for selecting the best of the three. Real data analysis shows that this method produces comparable results as other existing methods and is advantageous in identifying bimodality of the data.
Uncertainty quantification using multiscale methods for porous media flows
(2009-05-15) Dostert, Paul Francis
In this dissertation we discuss numerical methods used for uncertainty quantifi- cation applications to flow in porous media. We consider stochastic flow equations that contain both a spatial and random component which must be resolved in our numerical models. When solving the flow and transport through heterogeneous porous media some type of upscaling or coarsening is needed due to scale disparity. We describe multiscale techniques used for solving the spatial component of the stochastic flow equations. These techniques allow us to simulate the flow and transport processes on the coarse grid and thus reduce the computational cost. Additionally, we discuss techniques to combine multiscale methods with stochastic solution techniques, specifically, polynomial chaos methods and sparse grid collocation methods. We apply the proposed methods to uncertainty quantification problems where the goal is to sample porous media properties given an integrated response. We propose several efficient sampling algorithms based on Langevin diffusion and the Markov chain Monte Carlo method. Analysis and detailed numerical results are presented for applications in multiscale immiscible flow and water infiltration into a porous medium.
Understanding approximate Bayesian computation(ABC)
(2013-05) Lim, Boram; Müller, Peter, 1963 August 9-
The Bayesian approach has been developed in various areas and has come to be part of main stream statistical research. Markov Chain Monte Carlo (MCMC) methods have freed us from computational constraints for a wide class of models and several MCMC methods are now available for sampling from posterior distributions. However, when data is large and models are complex and the likelihood function is intractable we are limited in the use of MCMC, especially in evaluating likelihood function. As a solution to the problem, researchers have put forward approximate Bayesian computation (ABC), also known as a likelihood-free method. In this report I introduce the ABC algorithm and show implementation for a stochastic volatility model (SV). Even though there are alternative methods for analyzing SV models, such as particle filters and other MCMC methods, I show the ABC method with an SV model and compare it, based on the same data and the SV model, to an approach based on a mixture of normals and MCMC.
Water Budget Analysis and Groundwater Inverse Modeling
(2012-07-16) Farid Marandi, Sayena
The thesis contains two studies: First is the water budget analysis using the groundwater modeling and next is the groundwater modeling using the MCMC scheme. The case study for the water budget analysis was the Norman Landfill site in Oklahoma with a quite complex hydrology. This site contains a wetland that controls the groundwater-surface water interaction. This study reports a simulation study for better understanding of the local water balance at the landfill site using MODFLOW-2000. Inputs to the model are based on local climate, soil, geology, vegetation and seasonal hydrological dynamics of the system to determine the groundwater-surface water interaction, water balance components in various hydrologic reservoirs, and the complexity and seasonality of local/regional hydrological processes. The model involved a transient two- dimensional hydrogeological simulation of the multi-layered aquifer. In the second part of the thesis, a Markov Chain Monte Carlo (MCMC) method were developed to estimate the hydraulic conductivity field conditioned on the measurements of hydraulic conductivity and hydraulic head for saturated flow in randomly heterogeneous porous media. The groundwater modeling approach was found to be efficient in identifying the dominant hydrological processes at the Norman Landfill site including evapotranspiration, recharge, and regional groundwater flow and groundwater-surface water interaction. The MCMC scheme also proved to be a robust tool for the inverse groundwater modeling but its strength depends on the precision of the prior covariance matrix.

Browsing by Subject "MCMC"

Results Per Page

Sort Options