The CATH database of protein structures contains similar to 18000 domains o
rganized according to their (C)lass, (A)rchitecture, (T)opology and (H)omol
ogous superfamily [1]. Relationships between evolutionary related structure
s (homologues) within the database have been used to test the sensitivity o
f various sequence search methods in order to identify relatives in Genbank
and other sequence databases [2]. Subsequent application of the most sensi
tive and efficient algorithms, gapped blast and the profile based method, P
osition Specific Iterated Basic Local Alignment Tool (PSI-BLAST) [3], could
be used to assign structural data to between 22 and 36 % of microbial geno
mes in order to improve functional annotation and enhance understanding of
biological mechanism. However, on a cautionary note, an analysis of functio
nal conservation within fold groups and homologous superfamilies in the CAT
H database, revealed that whilst function was conserved in nearly 55% of en
zyme families, function had diverged considerably, in some highly populated
families. In these families, functional properties should be inherited far
more cautiously and the probable effects of substitutions in key functiona
l residues carefully assessed.