Automated Prediction of Human Disease Genes

Date

2012-12

Authors

Blom, Martin

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The completion of the human genome project has led to a flood of new genetic data, that has proved surprisingly hard to interpret. Network "guilt by association" (GBA) is a proven approach for identifying novel disease genes based on the observation that similar mutational phenotypes arise from functionally related genes.

However, GBA has been shown to work poorly in genome-wide association studies (GWAS), where many genes are somewhat implicated, but few are known with very high certainty. In the first part of this work, I resolve this by explicitly modeling the uncertainty of the associations and incorporating the uncertainty for the seed set into the GBA framework. I demonstrate a significant boost in the power to detect validated candidate genes for Crohn’s disease and type 2 diabetes by comparing the predictions from my method to results from follow-up meta-analyses, with incorporation of the network serving to highlight the JAK--STAT pathway and associated adaptors GRB2/SHC1 in Crohn’s disease and BACH2 in type 2 diabetes. Consideration of the network during GWAS thus conveys some of the benefits of enrolling more participants in the GWAS study. More generally, we demonstrate that a functional network of human genes provides a valuable statistical framework for prioritizing candidate disease genes in GWAS-based studies.

Furthermore, functional gene networks are not the only kind of information that can be used to predict gene--phenotype associations. In the second part of this thesis, I show that gene-phenotype associations in model species from species as distantly related to humans as E. coli is another valuable source of information, that can be mined using methods similar to those used in recommender systems.

Finally, in the last part of this thesis, I present a machine learning formalism that combines the functional gene network and model species phenotype information. I show that this approach outperforms the state of the art methods for gene-phenotype association prediction using cross-validation.

Description

text

Keywords

Bioinformatics, Systems biology

Citation