Small sample feature selection

Sima, Chao

Small sample feature selection

dc.contributor	Dougherty, Edward R.
dc.creator	Sima, Chao
dc.date.accessioned	2007-09-17T19:33:11Z
dc.date.accessioned	2017-04-07T19:53:19Z
dc.date.available	2007-09-17T19:33:11Z
dc.date.available	2017-04-07T19:53:19Z
dc.date.created	2003-05
dc.date.issued	2007-09-17
dc.description.abstract	High-throughput technologies for rapid measurement of vast numbers of biolog- ical variables offer the potential for highly discriminatory diagnosis and prognosis; however, high dimensionality together with small samples creates the need for fea- ture selection, while at the same time making feature-selection algorithms less reliable. Feature selection is required to avoid overfitting, and the combinatorial nature of the problem demands a suboptimal feature-selection algorithm. In this dissertation, we have found that feature selection is problematic in small- sample settings via three different approaches. First we examined the feature-ranking performance of several kinds of error estimators for different classification rules, by considering all feature subsets and using 2 measures of performance. The results show that their ranking is strongly affected by inaccurate error estimation. Secondly, since enumerating all feature subsets is computationally impossible in practice, a suboptimal feature-selection algorithm is often employed to find from a large set of potential features a small subset with which to classify the samples. If error estimation is required for a feature-selection algorithm, then the impact of error estimation can be greater than the choice of algorithm. Lastly, we took a regression approach by comparing the classification errors for the optimal feature sets and the errors for the feature sets found by feature-selection algorithms. Our study shows that it is unlikely that feature selection will yield a feature set whose error is close to that of the optimal feature set, and the inability to find a good feature set should not lead to the conclusion that good feature sets do not exist.
dc.identifier.uri	http://hdl.handle.net/1969.1/5796
dc.language.iso	en_US
dc.publisher	Texas A&M University
dc.subject	feature selection
dc.subject	classification
dc.subject	microarray
dc.subject	small sample
dc.title	Small sample feature selection
dc.type	Book
dc.type	Thesis

Collections

Texas A&M University at College Station

Small sample feature selection

Files

Collections