Using Prior Knowledge in the Design of Classifiers
Small samples are commonplace in genomic/proteomic classification, the result being inadequate classifier design and poor error estimation. A promising approach to alleviate the problem is the use of prior knowledge. On the other hand, it is known that a huge amount of information is encoded and represented by biological signaling pathways. This dissertation is concerned with the problem of classifier design by utilizing both the available prior knowledge and training data. Specifically, this dissertation utilizes the concrete notion of regularization in signal processing and statistics to combine prior knowledge with different data-based or data-ignorant criteria.
In the first part, we address optimal discrete classification where prior knowledge is restricted to an uncertainty class of feature distributions absent a prior distribution on the uncertainty class, a problem that arises directly for biological classification using pathway information: labeling future observations obtained in the steady state by utilizing both the available prior knowledge and the training data. An optimization-based paradigm for utilizing prior knowledge is proposed to design better performing classifiers when sample sizes are limited. We derive approximate expressions for the first and second moments of the true error rate of the proposed classifier under the assumption of two widely used models for the uncertainty classes: E-contamination and p-point classes. We examine the proposed paradigm on networks containing NF-k B pathways, where it shows significant improvement compared to data-driven methods.
In the second part of this dissertation, we focus on Bayesian classification. Although the problem of designing the optimal Bayesian classifier , assuming some known prior distributions, has been fully addressed, a critical issue still remains: how to incorporate biological knowledge into the prior distribution. For genomic/proteomic, the most common kind of knowledge is in the form of signaling pathways. Thus, it behooves us to nd methods of transforming pathway knowledge into knowledge of the feature-label distribution governing the classi cation problem. In order to incorporate the available prior knowledge, the interactions in the pathways are first quantifi ed from a Bayesian perspective. Then, we address the problem of prior probability construction by proposing a series of optimization paradigms that utilize the incomplete prior information contained in pathways (both topological and regulatory). The optimization paradigms are derived for both Gaussian case with Normal-inverse-Wishart prior and discrete classi cation with Dirichlet prior.
Simulation results, using both synthetic and real pathways, show that the proposed paradigms yield improved classi ers that outperform traditional classi ers which use only training data.