A combined approach for successful reannotation of animal mitochondrial tRNAs based on pattern-matching and tRNA-predictor programs
Poster
Data di Pubblicazione:
2010
Citazione:
A combined approach for successful reannotation of animal mitochondrial tRNAs based on pattern-matching and tRNA-predictor programs / R. Lupi, C. Gissi - In: Annual Meeting of the Bioinformatics Italian Society : bioinformatics and computational biology for life sciences / [a cura di] M. Attimonelli, D. D'Elia, G. Pesole. - Bari : Progedit, 2010. - ISBN 978-88-6194-079-6. - pp. 196-198 (( Intervento presentato al 7. convegno Annual Meeting of the Bioinformatics Italian Society : bioinformatics and computational biology for life sciences tenutosi a Bari nel 2010.
Abstract:
Motivation.
Transfer RNAs encoded by the mitochondrial genome (mtDNA) of Metazoa present strong deviations from the classical cloverleaf secondary structure, including the loss or size variation of either D- or T-domain. In addition, some taxa show “bizarre” tRNA structures: nematodes possess unconventional mt-tRNAs lacking either the T or D stem (1); spiders (Araneae, Chelicerata) and gall midges (Cecidomiiydae, Insecta) have many “truncated” tRNAs, i.e. tRNAs lacking a well-paired aminoacyl stem, which can also lost the T-arm (2,3); annelids belonging to family Questidae have a full set of truncated tRNAs (4).
These peculiarities hamper the annotation of mt-tRNAs in mtDNA sequences, since conventional tRNA detection programs perform poorly (as tRNAscan-SE) or lead to the detection of a significant number of false positives (as Arwen) (5,6). Finally, mt-tRNA annotations of are affected by numerous errors in gene name, boundaries and strand definition occurring during the sequence submission to primary databases (7). In the effort to construct a curated database of complete mtDNAs of Metazoa, we have developed a specific pipeline including both pattern-matching and tRNA-predictor programs, aimed at automatically check/rectify the annotation of both standard and “bizarre” mt-tRNAs.
Methods.
The developed mt-tRNA reannotation pipeline analyses the single tRNA sequences through two different programs: PatSearch, a pattern-matching program (8); and Arwen, a mt-tRNA secondary structure predictor (6). Two modules, made of several home-made Python scripts, specifically parse the results of PatSearch and Arwen using several empirically-settled criteria.
As for PatSearch, two main tRNAs patterns were specifically set for each mt-tRNA category. These patterns are able to detect the overall tRNA secondary structure based on the identification of only the aminoacyl (AA) and anticodon (AC) arms: the first pattern assumes perfectly annotated tRNAs with correct limits and a single 3'-discriminant base in the AA stem, while the second pattern searches for tRNAs having incorrect boundaries. In addition, patterns looking for a perfect AC arm in the correct tRNA position were also defined in order to look for “truncated” tRNAs, only in taxa where such unusual tRNA structures are expected to be present (Araneae, Cecidomiiydae and Questidae). All mt-tRNA patterns assume the presence of canonical anticodon sequences.
In this pipeline, Arwen was preferred to tRNAscan-SE because it has a detection rate close to 100% for mt-tRNAs, however, given the high false positive rate, Arwen results were taken into account only for mt-tRNAs not identified by the PatSearch patterns. Arwen itself has the advantage to find tRNAs with unusual anticodons and uncommon secondary structures, moreover the program was run applying specific options and extending the original tRNA boundaries from 5 to 45 bp at both gene sides, using an incremental step of 5 or 15 bp. The extension of the original tRNA boundaries forced the program to identify mt-tRNAs having erroneous gene limits.
Results.
A total dataset of 42,617 mt-tRNA sequences collected in the MitoZoa database v2.0 (9) was analyzed by our pipeline: 95.9% mt-tRNAs were identified/corrected by the PatSearch module; 3.8% were identified/corrected only by the Arwen module; 0.3% of the total tRNAs were not identified at the end of the whole pipeline and correspond mainly to erroneously annotated tRNAs. Thus, our pipeline represents a reliable tool for improving the annotation quality of metazoan mt-tRNAs both in complete and partial mtDNA sequences, since it was able to resolve (i.e. correct or validate) the annotation of >99% of the analyzed sequences, taking into account either taxon-specific and seco
Tipologia IRIS:
03 - Contributo in volume
Elenco autori:
R. Lupi, C. Gissi
Link alla scheda completa:
Titolo del libro:
Annual Meeting of the Bioinformatics Italian Society : bioinformatics and computational biology for life sciences