Bayesian model-based approaches with MCMC computation to some bioinformatics problems



Journal Title

Journal ISSN

Volume Title


Texas A&M University


Bioinformatics applications can address the transfer of information at several stages of the central dogma of molecular biology, including transcription and translation. This dissertation focuses on using Bayesian models to interpret biological data in bioinformatics, using Markov chain Monte Carlo (MCMC) for the inference method. First, we use our approach to interpret data at the transcription level. We propose a two-level hierarchical Bayesian model for variable selection on cDNA Microarray data. cDNA Microarray quantifies mRNA levels of a gene simultaneously so has thousands of genes in one sample. By observing the expression patterns of genes under various treatment conditions, important clues about gene function can be obtained. We consider a multivariate Bayesian regression model and assign priors that favor sparseness in terms of number of variables (genes) used. We introduce the use of different priors to promote different degrees of sparseness using a unified two-level hierarchical Bayesian model. Second, we apply our method to a problem related to the translation level. We develop hidden Markov models to model linker/non-linker sequence regions in a protein sequence. We use a linker index to exploit differences in amino acid composition between regions from sequence information alone. A goal of protein structure prediction is to take an amino acid sequence (represented as a sequence of letters) and predict its tertiary structure. The identification of linker regions in a protein sequence is valuable in predicting the three-dimensional structure. Because of the complexities of both models encountered in practice, we employ the Markov chain Monte Carlo method (MCMC), particularly Gibbs sampling (Gelfand and Smith, 1990) for the inference of the parameter estimation.