Simmons, Mark P. [1], Richardson, Dale [1], Reddy, Anireddy S. N. [1].

Incorporation of Gap Characters and Lineage-Specific Regions into Phylogenetic Analyses of Gene Families from Divergent Clades: An Example from the Kinesin Superfamily across Eukaryotes.

THE kinesin superfamily across eukaryotes was used to examine how incorporation of gap characters scored from conserved regions shared by all members of a gene family (the motor domain in kinesins) and incorporation of amino acid and gap characters scored from lineage-specific regions affect gene-tree inference of the gene family as a whole. We addressed these two questions in the context of two different densities of sequence sampling (100 vs. 525 sequences), alignment programs (ClustalX vs. DIALIGN-T vs. MAFFT vs. MUSCLE), and methods of tree construction (Bayesian vs. parsimony). Our results are useful for developing guidelines as to which alignment program should be used, whether gap characters should be incorporated, and whether lineage-specific regions should be included in gene-tree inference for gene families sampled from divergent taxa. Taken together, our findings suggest the following. First, gap characters should be incorporated into gene-tree inference. However, the most appropriate way of treating these characters in parametric gene-tree inference methods remains unclear. Second, gene regions that are not conserved among all or most sequences sampled should not be automatically discarded without evaluation of potential phylogenetic signal that may be contained in gap and/or sequence characters. Third, among the four alignment programs evaluated using their default alignment parameters, ClustalX may be expected to output alignments that result in the greatest gene-tree resolution and support. Yet, this high resolution and support should be regarded as optimistic, rather than conservative, estimates. Fourth, this same conclusion regarding resolution and support holds for Bayesian gene-tree analyses relative to parsimony-jackknife gene-tree analyses. We suggest that a more conservative approach, such as aligning the sequences using DIALIGN-T or MAFFT, analyzing the appropriate characters using maximum likelihood and/or parsimony, and assessing branch support using the bootstrap or the jackknife, is most appropriate for inferring gene trees of divergent gene families.

1 - Colorado State University, Department of Biology and Program in Molecular Plant Biology, Fort Collins, Colorado, 80523-1878, USA

alignment methods
character sampling
gap characters
indel coding
phylogenetic signal.

Presentation Type: Oral Paper:Papers for Sections
Session: 24-2
Location: Auditorium/Laxson
Date: Monday, July 31st, 2006
Time: 2:00 PM
Abstract ID:85

