A novel method for finding small highly discriminant gene sets
In a normal microarray classification problem there will be many genes, on the order of thousands, and few samples, on the order of tens. This necessitates a massive feature space reduction before classification can take place. While much time and effort has gone into evaluating and comparing the performance of different classifiers, less thought has been spent on the problem of efficient feature space reduction.
There are in the microarray classification literature several widely used heuristic feature reduction algorithms that will indeed find small feature subsets to classify over. These methods work in a broad sense but we find that they often require too much computation, find overly large gene sets or are not properly generalizable. Therefore, we believe that a systematic study of feature reduction, as it is related to microarray classification, is in order.
In this thesis we review current feature space reduction algorithms and propose a new, mixed model algorithm. This mixed-modified algorithm uses the best aspects of the filter algorithms and the best aspects of the wrapper algorithms to find very small yet highly discriminant gene sets. We also discuss methods to evaluate alternate, ambiguous gene sets. Applying our new mixed model algorithm to several published datasets we find that our new algorithm outperforms current gene finding methods.