Bayesian Analysis for Large Spatial Data



Journal Title

Journal ISSN

Volume Title



The Gaussian geostatistical model has been widely used in Bayesian modeling of spatial data. A core difficulty for this model is at inverting the n x n covariance matrix, where n is a sample size. The computational complexity of matrix inversion increases as O(n3). This difficulty is involved in almost all statistical inferences approaches of the model, such as Kriging and Bayesian modeling. In Bayesian inference, the inverse of covariance matrix needs to be evaluated at each iteration in posterior simulations, so Bayesian approach is infeasible for large sample size n due to the current computational power limit.

In this dissertation, we propose two approaches to address this computational issue, namely, the auxiliary lattice model (ALM) approach and the Bayesian site selection (BSS) approach. The key feature of ALM is to introduce a latent regular lattice which links Gaussian Markov Random Field (GMRF) with Gaussian Field (GF) of the observations. The GMRF on the auxiliary lattice represents an approximation to the Gaussian process. The distinctive feature of ALM from other approximations lies in that ALM avoids completely the problem of the matrix inversion by using analytical likelihood of GMRF. The computational complexity of ALM is rather attractive, which increase linearly with sample size.

The second approach, Bayesian site selection (BSS), attempts to reduce the dimension of data through a smart selection of a representative subset of the observations. The BSS method first split the observations into two parts, the observations near the target prediction sites (part I) and their remaining (part II). Then, by treating the observations in part I as response variable and those in part II as explanatory variables, BSS forms a regression model which relates all observations through a conditional likelihood derived from the original model. The dimension of the data can then be reduced by applying a stochastic variable selection procedure to the regression model, which selects only a subset of the part II data as explanatory data. BSS can provide us more understanding to the underlying true Gaussian process, as it directly works on the original process without any approximations involved.

The practical performance of ALM and BSS will be illustrated with simulated data and real data sets.