A hierarchical unsupervised growing neural network for clustering gene expression patterns

Citation
J. Herrero et al., A hierarchical unsupervised growing neural network for clustering gene expression patterns, BIOINFORMAT, 17(2), 2001, pp. 126-136
Citations number
30
Language
INGLESE
art.tipo
Article
Categorie Soggetti
Multidisciplinary
Journal title
BIOINFORMATICS
ISSN journal
1367-4803 → ACNP
Volume
17
Issue
2
Year of publication
2001
Pages
126 - 136
Database
ISI
SICI code
1367-4803(200102)17:2<126:AHUGNN>2.0.ZU;2-Y
Abstract
Motivation: We describe a new approach to the analysis of gene expression d ata coming from DNA array experiments, using an unsupervised neural network . DNA array technologies allow monitoring thousands of genes rapidly and ef ficiently. One of the interests of these studies is the search for correlat ed gene expression patterns, and this is usually achieved by clustering the m. The Self-Organising Tree Algorithm, (SOTA) (Dopazo,J. and Carazo,J.M. (1 997) J. Mel. Evol., 44, 226-233), is a neural network that grows adopting t he topology of a binary tree. The result of the algorithm is a hierarchical cluster obtained with the accuracy and robustness of a neural network. Results: SOTA clustering confers several advantages over classical hierarch ical clustering methods. SOTA is a divisive method: the clustering process is performed from top to bottom, i.e. the highest hierarchical levels are r esolved before going to the details of the lowest levels. The growing can b e stopped at the desired hierarchical level. Moreover, a criterion to stop the growing of the tree, based on the approximate distribution of probabili ty obtained by randomisation of the original data set, is provided. By mean s of this criterion, a statistical support for the definition of clusters i s proposed. In addition, obtaining average gene expression patterns is a bu ilt-in feature of the algorithm. Different neurons defining the different h ierarchical levels represent the averages of the gene expression patterns c ontained in the clusters. Since SOTA runtimes are approximately linear with the number of items to be classified, it is especially suitable for dealing with huge amounts of dat a. The method proposed is very general and applies to any data providing th at they can be coded as a series of numbers and that a computable measure o f similarity between data items can be used.