Systematics Section / ASPT
Müller, Kai , Wall, P. Kerr , Cui, Liying , Leebens-Mack, Jim , Makalowska, Izabela , dePamphilis, Claude W. .
AlignMate - A database and pipeline for comparative analysis of chloroplast genes and genomes.
THE number of sequenced organelle genomes is increasing exponentially and is providing rapidly growing opportunities for whole-genome phylogenetic and evolutionary analyses. It has created a huge practical challenge for researchers attempting to assemble genome data into accurately aligned, useable datasets. Here we present a pipeline that fully integrates crucial steps including: i) easy identification and compilation of target data, ii) sequence alignment, iii) automated assessment of alignment quality, iv) coding of structural mutation events, and v) phylogenetic analysis. The pipeline is integrated with other informatic tools previously developed for ChloroplastDB (http://chloroplast.cbio.psu.edu/), an interactive, web-based database for fully sequenced plastid genomes which includes pages for genomic, protein, DNA, and RNA sequences, RNA-editing sites, unified annotations, and homologous protein sets generated using TribeMCL. Alignments are built from user-specified gene and taxon sets, including amino acid alignments, nucleotide alignments, and nucleotide datasets forced onto protein alignments. Taxon specific low scoring alignment sections can be identified, controlled via a set of parameters, visualized, and optionally masked prior to further analysis. Insertion/deletion events are coded using various methods, taking alignment quality into account. The user can implement any subset of these functions to make custom batch files for phylogenetic analyses. We report on first comparisons between analyses that make use of identifying low scoring alignment sections and/or indel coding versus standard analyses that exclude large portions of the data (e.g., all gapped positions) and/or retain questionable alignment sections inside or outside the gapped regions. In addition to promoting phylogenetic analyses of complete plastid genomes, this alignment pipeline is helpful in the analysis of any large dataset. AlignMate greatly facilitates high throughput comparative analyses of changes in gene content, gene regulation, structure-function relationships, and context-dependent substitution processes.
Log in to add this item to your schedule
1 - Pennsylvania State University, Department of Biology, Institute of Molecular Evolutionary Genetics, and The Huck Institutes of the Life Sciences, University Park, Pennsylvania, 16802, USA
2 - University of Georgia, Department of Plant Biology, Athens, Georgia, 30602, USA
3 - Pennsylvania State University, Center for Computational Genomics, The Huck Institutes of the Life Sciences, University Park, Pennsylvania, 16802, USA
Presentation Type: Oral Paper:Papers for Sections
Date: Monday, July 31st, 2006
Time: 2:45 PM