Rate matrices for analyzing large families of protein sequences

Citation
C. Devauchelle et al., Rate matrices for analyzing large families of protein sequences, J COMPUT BI, 8(4), 2001, pp. 381-399
Citations number
19
Language
INGLESE
art.tipo
Article
Categorie Soggetti
Biochemistry & Biophysics
Journal title
JOURNAL OF COMPUTATIONAL BIOLOGY
ISSN journal
1066-5277 → ACNP
Volume
8
Issue
4
Year of publication
2001
Pages
381 - 399
Database
ISI
SICI code
1066-5277(2001)8:4<381:RMFALF>2.0.ZU;2-9
Abstract
We propose and study a new approach for the analysis of families of protein sequences. This method is related to the LogDet distances used in phylogen etic reconstructions; it can be viewed as an attempt to embed these distanc es into a multidimensional framework. The proposed method starts by associa ting a Markov matrix to each pairwise alignment deduced from a given multip le alignment. The central objects under consideration here are matrix-value d logarithms L of these Markov matrices, which exist under conditions that are compatible with fairly large divergence between the sequences. These lo garithms allow us to compare data from a family of aligned proteins with si mple models (in particular, continuous reversible Markov models) and to tes t the adequacy of such models. If one neglects fluctuations arising from th e finite length of sequences, any continuous reversible Markov model with a single rate matrix Q over an arbitrary tree predicts that all the observed matrices L are multiples of Q. Our method exploits this fact, without rely ing on any tree estimation. We test this prediction on a family of proteins encoded by the mitochondrial genome of 26 multicellular animals, which inc lude vertebrates, arthropods, echinoderms, molluscs, and nematodes. A princ ipal component analysis of the observed matrices L shows that a single rate model can be used as a rough approximation to the data, but that systemati c deviations from any such model are unmistakable and related to the evolut ionary history of the species under consideration.