Browsing by Subject "Computational biology"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Algorithms for next generation sequencing data analysis(2015-12) Das, Shreepriya; Vikalo, Haris; Dhillon, Inderjit S; Ravikumar, Pradeep; Sanghavi, Sujay; Tewfik, AhmedThe field of genomics has witnessed tremendous achievements in the past two decades. The advances in sequencing technology have enabled acquisition of massive amounts of data that reveals information about individual genetic blueprint and is revolutionizing the field of molecular biology. Interpretation of such data requires solving mathematical (statistical and computational) problems rendered difficult by the complex interacting processes that are characteristic of biological systems; the data is high dimensional, typically noisy and often incomplete. Algorithm design in these settings requires deep understanding of the underlying biological principles, good mathematical abstractions permitting tractable inference and fast, scalable and accurate solutions using ideas from diverse fields such as optimization, probability, statistics and algorithms. This dissertation deals with two such problems occurring in the field of bioinformatics/computational biology. First, for the problem of basecalling for sequencing-by-synthesis (Illumina) platforms, I describe novel computationally tractable statistical models and signal processing schemes that are fast and have lower error rates than existing state-of-the-art basecallers. Extensions to a soft information exchange setup to do joint basecalling and SNP calling are also explored. Next, I describe two novel single individual haplotyping inference schemes using an (optimal) branch and bound framework and (scalable) low rank semidefinite programming ideas for diploid and polyploid species. In addition to improving the quality of basecalling, SNP calling, genotyping and haplotyping, I also developed user-friendly software that can be used by the biological research community for various purposes including cancer genomics and metagenomics studies.Item Modeling the interaction and energetics of biological molecules with a polarizable force field(2013-05) Shi, Yue, active 21st century; Ren, PengyuAccurate prediction of protein-ligand binding affinity is essential to computational drug discovery. Current approaches are limited by the accuracy of the underlying potential energy model that describes atomic interactions. A more rigorous physical model is critical for evaluating molecular interactions to chemical accuracy. The objective of this thesis research is to develop a polarizable force field with an accurate representation of electrostatic interactions, and apply this model to protein-ligand recognition and to ultimately solve practical problems in computer aided drug discovery. By calculating the hydration free energies of a series of organic small molecules, an optimal protocol is established to develop the electrostatic parameters from quantum mechanics calculations. Next, the systematical development and parameterization procedure of AMOEBA protein force field is presented. The derived force field has gone through extensive validations in both gas phase and condensed phase. The last part of the thesis involves the application of AMOEBA to study protein-ligand interactions. The binding free energies of benzamidine analogs to trypsin using molecular dynamics alchemical perturbation are calculated with encouraging accuracy. AMOEBA is also used to study the thermodynamic effect of constraining and hydrophobicity on binding energetics between phosphotyrosine(pY)-containing tripeptides and the SH2 domain of growth receptor binding protein 2 (Grb2). The underlying mechanism of an "entropic paradox" associated with ligand preorganization is explored.Item Molecular investigation of polypyrrole and surface recognition by affinity peptides(2011-12) Fonner, John Michael; Ren, Pengyu; Schmidt, Christine E.; Elber, Ron; Roy, Krishnendu; Georgiou, GeorgeSuccessful tissue engineering strategies in the nervous system must be carefully crafted to interact favorably with the complex biochemical signals of the native environment. To date, all chronic implants incorporating electrical conductivity degrade in performance over time as the foreign body reaction and subsequent fibrous encapsulation isolate them from the host tissue. Our goal is to develop a peptide-based interfacial biomaterial that will non-covalently coat the surface of the conducting polymer polypyrrole, allowing the implant to interact with the nervous system through both electrical and chemical cues. Starting with a candidate peptide sequence discovered through phage display, we used computational simulations of the peptide on polypyrrole to describe the bound peptide structure, explore the mechanism of binding, and suggest new, better binding peptide sequences. After experimentally characterizing the polymer, we created a molecular mechanics model of polypyrrole using quantum mechanics calculations and compared its in silico properties to experimental observables such as density and chain packing. Using replica exchange molecular dynamics, we then modeled the behavior of affinity binding peptides on the surface of polypyrrole in explicit water and saline environments. Relative measurements of the contributions of each amino acid were made using distance measurements and computational alanine scanning.Item A systems approach to computational protein identification(2010-05) Ramakrishnan, Smriti Rajan; Miranker, Daniel P.; Dhillon, Inderjit S.; Marcotte, Edward M.; Mooney, Raymond J.; Press, William H.Proteomics is the science of understanding the dynamic protein content of an organism's cells (its proteome), which is one of the largest current challenges in biology. Computational proteomics is an active research area that involves in-silico methods for the analysis of high-throughput protein identification data. Current methods are based on a technology called tandem mass spectrometry (MS/MS) and suffer from low coverage and accuracy, reliably identifying only 20-40% of the proteome. This dissertation addresses recall, precision, speed and scalability of computational proteomics experiments. This research goes beyond the traditional paradigm of analyzing MS/MS experiments in isolation, instead learning priors of protein presence from the joint analysis of various systems biology data sources. This integrative `systems' approach to protein identification is very effective, as demonstrated by two new methods. The first, MSNet, introduces a social model for protein identification and leverages functional dependencies from genome-scale, probabilistic, gene functional networks. The second, MSPresso, learns a gene expression prior from a joint analysis of mRNA and proteomics experiments on similar samples. These two sources of prior information result in more accurate estimates of protein presence, and increase protein recall by as much as 30% in complex samples, while also increasing precision. A comprehensive suite of benchmarking datasets is introduced for evaluation in yeast. Methods to assess statistical significance in the absence of ground truth are also introduced and employed whenever applicable. This dissertation also describes a database indexing solution to improve speed and scalability of protein identification experiments. The method, MSFound, customizes a metric-space database index and its associated approximate k-nearest-neighbor search algorithm with a semi-metric distance designed to match noisy spectra. MSFound achieves an order of magnitude speedup over traditional spectra database searches while maintaining scalability.