Exploiting Instance Similarity in Applications of Deep Learning in Bioinformatics
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Predicting the state or function of a biological organism from gross observation is a recurrent challenge in biology. As biological systems are too complex, identifying the rules and principles from the observations is extremely hard. In recent years, machine learning has been employed to deal with the complexity of biological data. Machine learning techniques often rely on hand-crafted features. However, designing and leveraging such features are not always obvious. Over the last decade, deep learning has emerged as a new area in machine learning, revolutionized our understanding of biology. Unfortunately, the sparsity of training data in some specific domains has restricted the use of deep learning. One way to overcome this limitation is to utilize instance similarities. In many applications, however, incorporating similarity information into deep learning models is not straightforward. Utilizing instance similarity in biology has been the main motivation of this research. In this dissertation, we have tackled exploiting the similarity of instances into deep learning in two main applications: a) drug-target interaction prediction, and b) learning morphological similarity in histopathology images. In the first application, we introduce a deep learning framework to predict drug-target interactions by learning the topological features from a bipartite drug-target interaction graph. Furthermore, we exploit drug and protein similarity information by extending our framework to learn from a semi-bipartite graph. We show that our approach achieves state-of-the-art performance in predicting new drug-target interactions. In the second application, we propose a novel deep metric learning methodology that learns morphological similarities in histopathology images. Thanks to a new task and metric learning design, our approach performs without requiring any labeled data. We demonstrate our approach learns more general purpose features than discriminative approaches, and therefore performs better in downstream tasks. We show that our framework can be used as a backbone in a few tasks, i.e., image retrieval, identifying morphological and biological correlation and transfer learning. Furthermore, We show our approach can also be used to reduce the batch effect in histopathology domain. We also show that our approach’s strength and unsupervised nature, make it a powerful tool for biological explorations and discoveries.