Journal Article
Research Support, U.S. Gov't, Non-P.H.S.
Add like
Add dislike
Add to saved papers

Incorporation of splice site probability models for non-canonical introns improves gene structure prediction in plants.

Bioinformatics 2005 November 2
MOTIVATION: The vast majority of introns in protein-coding genes of higher eukaryotes have a GT dinucleotide at their 5'-terminus and an AG dinucleotide at their 3' end. About 1-2% of introns are non-canonical, with the most abundant subtype of non-canonical introns being characterized by GC and AG dinucleotides at their 5'- and 3'-termini, respectively. Most current gene prediction software, whether based on ab initio or spliced alignment approaches, does not include explicit models for non-canonical introns or may exclude their prediction altogether. With present amounts of genome and transcript data, it is now possible to apply statistical methodology to non-canonical splice site prediction. We pursued one such approach and describe the training and implementation of GC-donor splice site models for Arabidopsis and rice, with the goal of exploring whether specific modeling of non-canonical introns can enhance gene structure prediction accuracy.

RESULTS: Our results indicate that the incorporation of non-canonical splice site models yields dramatic improvements in annotating genes containing GC-AG and AT-AC non-canonical introns. Comparison of models shows differences between monocot and dicot species, but also suggests GC intron-specific biases independent of taxonomic clade. We also present evidence that GC-AG introns occur preferentially in genes with atypically high exon counts.

AVAILABILITY: Source code for the updated versions of GeneSeqer and SplicePredictor (distributed with the GeneSeqer code) isavailable at https://bioinformatics.iastate.edu/bioinformatics2go/gs/download.html. Web servers for Arabidopsis, rice and other plant species are accessible at https://www.plantgdb.org/PlantGDB-cgi/GeneSeqer/AtGDBgs.cgi, https://www.plantgdb.org/PlantGDB-cgi/GeneSeqer/OsGDBgs.cgi and https://www.plantgdb.org/PlantGDB-cgi/GeneSeqer/PlantGDBgs.cgi, respectively. A SplicePredictor web server is available at https://bioinformatics.iastate.edu/cgi-bin/sp.cgi. Software to generate training data and parameterizations for Bayesian splice site models is available at https://gremlin1.gdcb.iastate.edu/~volker/SB05B/BSSM4GSQ/

Full text links

We have located links that may give you full text access.
Can't access the paper?
Try logging in through your university/institutional subscription. For a smoother one-click institutional access experience, please use our mobile app.

Related Resources

For the best experience, use the Read mobile app

Mobile app image

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices Toggle icon

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app