Bayesian Gaussian Graphical models using sparse selection priors and their mixtures

Date

2012-10-19

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

We propose Bayesian methods for estimating the precision matrix in Gaussian graphical models. The methods lead to sparse and adaptively shrunk estimators of the precision matrix, and thus conduct model selection and estimation simultaneously. Our methods are based on selection and shrinkage priors leading to parsimonious parameterization of the precision (inverse covariance) matrix, which is essential in several applications in learning relationships among the variables. In Chapter I, we employ the Laplace prior on the off-diagonal element of the precision matrix, which is similar to the lasso model in a regression context. This type of prior encourages sparsity while providing shrinkage estimates. Secondly we introduce a novel type of selection prior that develops a sparse structure of the precision matrix by making most of the elements exactly zero, ensuring positive-definiteness.

In Chapter II we extend the above methods to perform classification. Reverse-phase protein array (RPPA) analysis is a powerful, relatively new platform that allows for high-throughput, quantitative analysis of protein networks. One of the challenges that currently limits the potential of this technology is the lack of methods that allows for accurate data modeling and identification of related networks and samples. Such models may improve the accuracy of biological sample classification based on patterns of protein network activation, and provide insight into the distinct biological relationships underlying different cancers. We propose a Bayesian sparse graphical modeling approach motivated by RPPA data using selection priors on the conditional relationships in the presence of class information. We apply our methodology to an RPPA data set generated from panels of human breast cancer and ovarian cancer cell lines. We demonstrate that the model is able to distinguish the different cancer cell types more accurately than several existing models and to identify differential regulation of components of a critical signaling network (the PI3K-AKT pathway) between these cancers. This approach represents a powerful new tool that can be used to improve our understanding of protein networks in cancer.

In Chapter III we extend these methods to mixtures of Gaussian graphical models for clustered data, with each mixture component being assumed Gaussian with an adaptive covariance structure. We model the data using Dirichlet processes and finite mixture models and discuss appropriate posterior simulation schemes to implement posterior inference in the proposed models, including the evaluation of normalizing constants that are functions of parameters of interest which are a result of the restrictions on the correlation matrix. We evaluate the operating characteristics of our method via simulations, as well as discuss examples based on several real data sets.

Description

Citation