Three Papers Addressing Migration Induced Autocorrelation in Spatial Analysis Within a Bayesian Modelling Framework
Abstract
Migration within a regional system influences demographic, epidemiological, and socioeconomic processes. At its core is the spatial redistribution of population and their associated characteristics. This redistribution induces spatial dependencies either [a] from a spatio-temporal perspective by making it impossible to directly observe an underlying data generating process or [b] by leading to network autocorrelation among the observed migration flows, which connect the origins and destinations. The first two papers investigate these migration induced spatial dependencies and demonstrate the methodologies with empirical data sets. The third paper evaluates the proposed models with a set of controlled experiments for their feasibility and sensitivity to model misspecifications. A disease generating process of diseases with a long latency periods consists of three key components: the disease frequency in the population at risk, bio-behavioral risk factors related to underlying population at risk, and the environmental risk factors connected to the regions of residence but being detached from the population at risk. The demographic migration process disperses the regional inhabitants throughout the regional system and, therefore, prohibits a direct identification of the underlying disease generating process. The first paper proposes a stochastic implementation of the demographic population projection. A hybrid spatial moving average and lag model is adopted to control for the migration induced uncertainties and to identify the underlying disease generating process. It is based on the 508 State Economic Areas and their interregional migration matrix. The behavioral risk factor smoking and the environmental risk factors population density as well as the indoor radon levels are modelling the historic male lung cancer mortality rates (1970-1994) at the SEA level. The proposed model controls the migration effects and improves the model substantially. The second paper identifies the inherent network autocorrelation structure in a spatial interaction model. It proposes five novel specifications and identifies empirically the most appropriate network autocorrelation structure. The 2005-2010 interprovincial migration pattern in China and a set of origin and destination specific provincial variables is used to explain the spatial interaction with a negative binomial regression model. Both the novel Bayesian integrated nested Laplace approximation (INLA) algorithm and the spatial eigenvector filtering algorithm with the objective function of minimizing the residual network autocorrelation are compared. The non-stationary estimates of the spatial autocorrelation parameters in both previous papers provide the motivation to investigate their causes by conducting controlled experiments that systematically mis-specify the estimator of the underlying disease generating process. Three uncorrelated environmental risk factors and three uncorrelated bio-behavioral risk factors are chosen from the eigenvectors of the spatial adjacency matrix. Each set of eigenvectors captured a positive, spatially independent and negative autocorrelated pattern. The disease rates are simulated using the logistic regression model with a linear combination of eigenvectors as predictor. The proposed model is able to identify the disease generating process. All three papers use the INLA algorithm for of the posteriori distributions and parallel computation to improve the computational efficiency. To satisfy the fundamental academic paradigm of reproducibility, all relevant data and specialized algorithms are bundled into publicly available R-packages.