Browse all publications by topic
Browse all publications by year
- S. Hochreiter and
K. Obermayer. Gene Selection for Microarray Data.
.
In B. Schölkopf, K. Tsuda, and J.-P. Vert, editors, Kernel Methods in
Computational Biology, pages 319-356. MIT Press, Cambridge,
Massachusetts, 2004.
(FTP Gzipped PostScript, 52 pages, 374 kb)
In this chapter we discuss methods for gene selection on data
obtained from the microarray technique. Gene selection is very important for
microarray data, (a) as a preprocessing step to improve the performance of
classifiers or other predictors for sample attributes, (b) in order to
discover relevant genes, that is genes which show specific expression
patterns cross the given set of samples, and (c) to save costs, for example
if the microarray technique is used for diagnostic purposes. We introducea
new feature selection method which is based on the support vector machine
technique. The new feature selection method extracts a sparse set of genes,
whose expression levels are important for predicting the class of a sample
(for example ``positive?? vs. ``negative?? therapy outcome for tumor samples
from patients). For this purpose the support vector technique is used in a
novel way: instead of constructing a classifier from a minimal set of most
informative samples (the so-called support vectors), the classifier is
constructed using a minimal set of most informative features. In contrast to
previously proposed methods, however, features rather than samples now
formally assume the role of support vectors. We introduce a protocol for
preprocessing, feature selection and evaluation of microarray data.Using
this protocol we demonstrate the superior performance of our feature
selection method on data sets obtained from patients with certain types of
cancer (brain tumor, lymphoma, and breast cancer), where the outcome of a
chemo- or radiation therapy must be predicted based on the gene expression
profile. The feature selection method extracts genes (the so-called support
genes) which are correlated with therapy outcome. For classifiers based on
these genes, generalization performance is improved compared to previously
proposed methods.
|