Combining Strategies for Parallel Stochastic Approximation Monte Carlo Algorithm of Big Data



Journal Title

Journal ISSN

Volume Title



Modeling and mining with massive volumes of data have become popular in recent decades. However, it is difficult to analyze on a single commodity computer because the size of data is too large. Parallel computing is widely used. As a natural methodology, the divide-and-combine (D&C) method has been applied in parallel computing. The general method of D&C is to use MCMC algorithm in each divided data set. However, MCMC algorith is computationally expensive because it requires a large number of iterations and is prone to get trapped into local optima. On the other hand, Stochastic Approximation in Monte Carlo algorithm (SAMC), a very sophisticated algorithm in theory and applications, can avoid getting trapped into local optima and produce more accurate estimation than the conventional MCMC algorithm does. Motivated by the success of SAMC, we propose parallel SAMC algorithm that can be utilized on massive data and is workable in parallel computing. It can also be applied for model selection and optimization problem. The main challenge of the parallel SAMC algorithm is how to combine the results from each parallel subset. In this work, three strategies to overcome the combining difficulties are proposed. From the simulation results, these strategies result in significant time saving and accurate estimation.

Synthetic Aperture Radar Interferometry (InSAR) is a technique of analyzing deformation caused by geophysical processes. However, it is limited by signal losses which are from topographic residuals. In order to analyze the surface deformation, we have to distinguish signal losses. Many methods assume the noise has second order stationary structure without testing it. The objective of this study is to examine the second order stationary assumption for InSAR noise and develop a parametric nonstationary model in order to demonstrate the effect of making incorrect assumption on random field. It indicates that wrong stationary assumption will result in bias estimation and large variation.