Browse all publications by topic
Browse all publications by year
- M. Hasenjäger,
H. Ritter, and K. Obermayer. Active Learning in Self-Organizing Maps.
.
In Kohonen Maps, pages 57-70. Elsevier, 1999.
(FTP Gzipped PostScript, 14 pages, 675 kb)
The self-organizing map (SOM) was originally proposed by T. Kohonen
in 1982 on biological grounds and has since then become a widespread tool for
explanatory data analysis. Although introduced as a heuristic, SOMs have been
related to statistical methods in recent years, which led to a theoretical
foundation in terms of cost functions as well as to extensions to the
analysis of pairwise data, in particular of dissimilarity data. In our
contribution, we first relate SOMs to probabilistic autoencoders, re-derive
the SOM version for dissimilarity data, and review part of the
above-mentioned work. Then we turn our attention to the fact, that
dissimilarity-based algorithms scale with O(D2), where D denotes the
number of data items, and may therefore become impractical for real-world
datasets. We find that the majority of the elements of a dissimilarity matrix
are redundant and that a sparsse matrix with more than 80 missing
values suffices to learn a SOM representation of low cost. We then describe a
strategy how to select the most informative dissimilarities for a given set
of objects. We suggest to select (and measure) only those elements whose
knowledge maximizes the expected reduction in the SOM cost function. We find
that active data selection is computationally expensive, but may reduce the
number of necessary dissimilarities by more than a factor of two compared to
a random selection strategy. This makes active data selection a viable
alternative when the cost of actually measuring dissimilarities between data
objects comes high.
|