List of gene prediction software
This is a list of software tools and web portals used for gene prediction .
Name
Description
Species
References
FINDER
Automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences
Eukaryotes
[ 1]
FragGeneScan
Predicting genes in complete genomes and sequencing Reads
Prokaryotes, Metagenomes
[ 2]
ATGpr
Identifies translational initiation sites in cDNA sequences
Human
[ 3]
Prodigal
Its name stands for Prokaryotic Dynamic Programming Genefinding Algorithm. It is based on log-likelihood functions and does not use Hidden or Interpolated Markov Models.
Prokaryotes, Metagenomes (metaProdigal)
[ 4]
AUGUSTUS
Eukaryote gene predictor
Eukaryotes
[ 5]
BGF
Hidden Markov model (HMM) and dynamic programming based ab initio gene prediction program
[ 6]
DIOGENES
Fast detection of coding regions in short genome sequences
Dragon Promoter Finder
Program to recognize vertebrate RNA polymerase II promoters
Vertebrates
[ 7]
EasyGene
The gene finder is based on a hidden Markov model (HMM) that is automatically estimated for a new genome.
Prokaryotes
[ 8] [ 9]
EuGene
Integrative gene finding
Prokaryotes, Eukaryotes
[ 10] [ 11]
FGENESH
HMM-based gene structure prediction: multiple genes, both chains
Eukaryotes
[ 12]
FrameD
Find genes and frameshift in G+C rich prokaryote sequences
Prokaryotes, Eukaryotes
[ 13]
GeMoMa
Homology -based gene prediction based on amino acid and intron position conservation as well as RNA-Seq data
[ 14] [ 15]
GENIUS II
Links ORFs in complete genomes to protein 3D structures
Prokaryotes, Eukaryotes
[ 16]
geneid
Program to predict genes, exons, splice sites, and other signals along DNA sequences
Eukaryotes
[ 17]
GeneParser
Parse DNA sequences into introns and exons
Eukaryotes
[ 18]
GeneMark
Family of self-training gene prediction programs
Prokaryotes, Eukaryotes,
Metagenomes
[ 19] [ 20] [ 21] [ 22]
GeneTack
Predicts genes with frameshifts in prokaryote genomes
Prokaryotes
[ 23]
GenomeScan
Predicts the locations and exon-intron structures of genes in genome sequences from a variety of organisms, GENSCAN server is the GenomeScan's predecessor
Vertebrate, Arabidopsis, Maize
[ 24]
GENSCAN
Predicts the locations and exon-intron structures of genes in genome sequences from a variety of organisms
Vertebrate, Arabidopsis, Maize
[ 25] [ 26] [ 27]
GLIMMER
Finds genes in microbial DNA
Prokaryotes
[ 28] [ 29] [ 30]
GLIMMERHMM
Eukaryotic gene-finding system
Eukaryotes
[ 31]
GrailEXP
Predicts exons, genes, promoters, polyas, CpG islands, EST similarities, and repeat elements in DNA sequence
Human, Mus musculus , Arabidopsis thaliana , Drosophila melanogaster
[ 32] [ 33]
mGene
Support-vector machine (SVM) based system to find genes
Eukaryotes
[ 34]
mGene.ngs
SVM based system to find genes using heterogeneous information: RNA-seq, tiling arrays
Eukaryotes
[ 35]
MORGAN
Decision tree system to find genes in vertebrate DNA
Eukaryotes
[ 36]
BioNIX
Web tool to combine results from different programs: GRAIL, FEX, HEXON, MZEF, GENEMARK, GENEFINDER, FGENE, BLAST, POLYAH, REPEATMASKER, TRNASCAN
Prokaryotes, Eukaryotes
[ 37]
NNPP
Neural network promoter prediction
Prokaryotes, Eukaryotes
[ 38]
NNSPLICE
Neural network splice site prediction
Drosophila, Human
[ 39]
ORFfinder
Graphical analysis tool to find all open reading frames
Prokaryotes, Eukaryotes
[ 40]
Regulatory Sequence Analysis Tools
Series of modular computer programs to detect regulatory signals in non-coding sequences
Fungi, Prokaryotes, Metazoa, Protist, Plants
[ 41] [ 42]
PHANOTATE
A tool to annotate phage genomes.
Phages
[ 43]
SplicePredictor
Method to identify potential splice sites in (plant) pre-mRNA by sequence inspection using Bayesian statistical models
Eukaryotes
[ 44]
VEIL
Hidden Markov model to find genes in vertebrate DNA Server
Eukaryotes
[ 45]
See also
References
^ Banerjee S, Bhandary P, Woodhouse M, Sen TZ, Wise RP, Andorf CM (Apr 2021). "FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences" . BMC Bioinformatics . 44 (9): e89. doi :10.1186/s12859-021-04120-9 . PMC 8056616 . PMID 33879057 .
^ Rho M, Tang H, Ye Y (November 2010). "FragGeneScan: predicting genes in short and error-prone reads" . Nucleic Acids Research . 38 (20): e191. doi :10.1093/nar/gkq747 . PMC 2978382 . PMID 20805240 .
^ Nishikawa, Tetsuo; Ota, Toshio; Isogai, Takao (2000-11-01). "Prediction whether a human cDNA sequence contains initiation codon by combining statistical information and similarity with protein sequences" . Bioinformatics . 16 (11): 960– 967. doi :10.1093/bioinformatics/16.11.960 . ISSN 1367-4803 . PMID 11159307 .
^ Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ (March 2010). "Prodigal: prokaryotic gene recognition and translation initiation site identification" . BMC Bioinformatics . 11 : 119. doi :10.1186/1471-2105-11-119 . PMC 2848648 . PMID 20211023 .
^ Keller O, Kollmar M, Stanke M, Waack S (March 2011). "A novel hybrid gene prediction method employing protein multiple sequence alignments" . Bioinformatics . 27 (6): 757– 63. doi :10.1093/bioinformatics/btr010 . hdl :11858/00-001M-0000-0011-F244-D . PMID 21216780 .
^ Li, Heng; Liu, Jin-Song; Xu, Zhao; Jin, Jiao; Fang, Lin; Gao, Lei; Li, Yu-Dong; Xing, Zi-Xing; Gao, Shao-Gen; Liu, Tao; Li, Hai-Hong (2005-07-01). "Test Data Sets and Evaluation of Gene Prediction Programs on the Rice Genome" . Journal of Computer Science and Technology . 20 (4): 446– 453. doi :10.1007/s11390-005-0446-x . ISSN 1860-4749 . S2CID 13497894 .
^ Bajic, Vladimir B.; Seah, Seng Hong; Chong, Allen; Zhang, Guanglan; Koh, Judice L. Y.; Brusic, Vladimir (2002-01-01). "Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters" . Bioinformatics . 18 (1): 198– 199. doi :10.1093/bioinformatics/18.1.198 . ISSN 1367-4803 . PMID 11836231 .
^ Nielsen, P.; Krogh, A. (2005-12-15). "Large-scale prokaryotic gene prediction and comparison to genome annotation" . Bioinformatics . 21 (24): 4322– 4329. doi :10.1093/bioinformatics/bti701 . ISSN 1367-4803 . PMID 16249266 .
^ Larsen, Thomas Schou; Krogh, Anders (2003-06-03). "EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance" . BMC Bioinformatics . 4 (1): 21. doi :10.1186/1471-2105-4-21 . ISSN 1471-2105 . PMC 521197 . PMID 12783628 .
^ Foissac S, Gouzy J, Rombauts S, Mathé C, Amselem J, Sterck L, de Peer YV, Rouzé P, Schiex T (May 2008). "Genome annotation in plants and fungi: EuGene as a model platform" . Current Bioinformatics . 3 (2): 87– 97. doi :10.2174/157489308784340702 .
^ Sallet, Erika; Gouzy, Jérôme; Schiex, Thomas (2019), Kollmar, Martin (ed.), "EuGene: An Automated Integrative Gene Finder for Eukaryotes and Prokaryotes" , Gene Prediction: Methods and Protocols , Methods in Molecular Biology, vol. 1962, New York, NY: Springer, pp. 97– 120, doi :10.1007/978-1-4939-9173-0_6 , ISBN 978-1-4939-9173-0 , PMID 31020556 , S2CID 131776381 , retrieved 2021-11-24
^ Salamov AA, Solovyev VV (April 2000). "Ab initio gene finding in Drosophila genomic DNA" . Genome Research . 10 (4): 516– 22. doi :10.1101/gr.10.4.516 . PMC 310882 . PMID 10779491 .
^ Schiex T, Gouzy J, Moisan A, de Oliveira Y (July 2003). "FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences" . Nucleic Acids Research . 31 (13): 3738– 41. doi :10.1093/nar/gkg610 . PMC 169016 . PMID 12824407 .
^ Keilwagen J, Wenk M, Erickson JL, Schattat MH, Grau J, Hartung F (May 2016). "Using intron position conservation for homology-based gene prediction" . Nucleic Acids Research . 44 (9): e89. doi :10.1186/s12859-018-2203-5 . PMC 4872089 . PMID 26893356 .
^ Keilwagen J, Hartung F, Paulini M, Twardziok SO, Grau J (May 2018). "Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi" . BMC Bioinformatics . 19 (1): 189. doi :10.1093/nar/gkw092 . PMC 5975413 . PMID 29843602 .
^ Yabuki, Yukimitsu; Mukai, Yuri; Swindells, Mark B.; Suwa, Makiko (2004-03-01). "GENIUS II: a high-throughput database system for linking ORFs in complete genomes to known protein three-dimensional structures" . Bioinformatics . 20 (4): 596– 598. doi :10.1093/bioinformatics/btg478 . ISSN 1367-4803 . PMID 14751990 .
^ Blanco, Enrique; Parra, Genís; Guigó, Roderic (June 2007), "Using geneid to Identify Genes", Current Protocols in Bioinformatics , Chapter 4, John Wiley & Sons, Inc.: 4.3.1–4.3.28, doi :10.1002/0471250953.bi0403s18 , ISBN 978-0471250951 , PMID 18428791
^ Snyder, Eric E.; Stormo, Gary D. (1995-04-21). "Identification of Protein Coding Regions In Genomic DNA" . Journal of Molecular Biology . 248 (1): 1– 18. doi :10.1006/jmbi.1995.0198 . ISSN 0022-2836 . PMID 7731036 .
^ Lukashin AV, Borodovsky M (February 1998). "GeneMark.hmm: new solutions for gene finding" . Nucleic Acids Research . 26 (4): 1107– 15. doi :10.1093/nar/26.4.1107 . PMC 147337 . PMID 9461475 .
^ Besemer J, Lomsadze A, Borodovsky M (June 2001). "GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions" . Nucleic Acids Research . 29 (12): 2607– 18. doi :10.1093/nar/29.12.2607 . PMC 55746 . PMID 11410670 .
^ Lomsadze A, Burns PD, Borodovsky M (September 2014). "Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm" . Nucleic Acids Research . 42 (15): e119. doi :10.1093/nar/gku557 . PMC 4150757 . PMID 24990371 .
^ Zhu W, Lomsadze A, Borodovsky M (July 2010). "Ab initio gene identification in metagenomic sequences" . Nucleic Acids Research . 38 (12): e132. doi :10.1093/nar/gkq275 . PMC 2896542 . PMID 20403810 .
^ Antonov I, Borodovsky M (June 2010). "Genetack: frameshift identification in protein-coding sequences by the Viterbi algorithm" . Journal of Bioinformatics and Computational Biology . 8 (3): 535– 51. doi :10.1142/S0219720010004847 . PMID 20556861 .
^ Yeh, Ru-Fang; Lim, Lee P.; Burge, Christopher B. (2001-05-01). "Computational Inference of Homologous Gene Structures in the Human Genome" . Genome Research . 11 (5): 803– 816. doi :10.1101/gr.175701 . ISSN 1088-9051 . PMC 311055 . PMID 11337476 .
^ Burge, Chris; Karlin, Samuel (1997-04-25). "Prediction of complete gene structures in human genomic DNA11Edited by F. E. Cohen" . Journal of Molecular Biology . 268 (1): 78– 94. doi :10.1006/jmbi.1997.0951 . ISSN 0022-2836 . PMID 9149143 .
^ Burge, Christopher B. (1998-01-01), Salzberg, Steven L.; Searls, David B.; Kasif, Simon (eds.), "Chapter 8 - Modeling dependencies in pre-mRNA splicing signals" , New Comprehensive Biochemistry , Computational Methods in Molecular Biology, vol. 32, Elsevier, pp. 129– 164, doi :10.1016/S0167-7306(08)60465-2 , ISBN 978-0-444-82875-0 , retrieved 2021-11-24
^ Burge, Christopher B; Karlin, Samuel (1998-06-01). "Finding the genes in genomic DNA" . Current Opinion in Structural Biology . 8 (3): 346– 354. doi :10.1016/S0959-440X(98)80069-9 . ISSN 0959-440X . PMID 9666331 .
^
^ Delcher, A. (1999-12-01). "Improved microbial gene identification with GLIMMER" . Nucleic Acids Research . 27 (23): 4636– 4641. doi :10.1093/nar/27.23.4636 . ISSN 1362-4962 . PMC 148753 . PMID 10556321 .
^ Salzberg, S. L.; Delcher, A. L.; Kasif, S.; White, O. (1998-01-01). "Microbial gene identification using interpolated Markov models" . Nucleic Acids Research . 26 (2): 544– 548. doi :10.1093/nar/26.2.544 . ISSN 0305-1048 . PMC 147303 . PMID 9421513 .
^ Majoros WH, Pertea M, Salzberg SL (November 2004). "TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders" . Bioinformatics . 20 (16): 2878– 9. doi :10.1093/bioinformatics/bth315 . PMID 15145805 .
^ Uberbacher, Edward C.; Hyatt, Doug; Shah, Manesh (2004). "GrailEXP and Genome Analysis Pipeline for Genome Annotation" . Current Protocols in Bioinformatics . 8 (1): 4.9.1–4.9.15. doi :10.1002/0471250953.bi0409s04 . ISSN 1934-340X . PMID 18428726 .
^ Uberbacher, Edward C.; Hyatt, Doug; Shah, Manesh (2003). "GrailEXP and Genome Analysis Pipeline for Genome Annotation" . Current Protocols in Human Genetics . 39 (1): 6.5.1–6.5.15. doi :10.1002/0471142905.hg0605s39 . ISSN 1934-8258 . PMID 18428363 . S2CID 21431978 .
^ Schweikert G, Zien A, Zeller G, Behr J, Dieterich C, Ong CS, et al. (November 2009). "mGene: accurate SVM-based gene finding with an application to nematode genomes" . Genome Research . 19 (11): 2133– 43. doi :10.1101/gr.090597.108 . PMC 2775605 . PMID 19564452 .
^ Gan X, Stegle O, Behr J, Steffen JG, Drewe P, Hildebrand KL, et al. (August 2011). "Multiple reference genomes and transcriptomes for Arabidopsis thaliana" . Nature . 477 (7365): 419– 23. Bibcode :2011Natur.477..419G . doi :10.1038/nature10414 . PMC 4856438 . PMID 21874022 .
^ "MORGAN" . sites.stat.washington.edu . Retrieved 2021-11-24 .
^ Bedő, Justin; Di Stefano, Leon; Papenfuss, Anthony T (November 2020). "Unifying package managers, workflow engines, and containers: Computational reproducibility with BioNix" . GigaScience . 9 (11). doi :10.1093/gigascience/giaa121 . ISSN 2047-217X . PMC 7672450 . PMID 33205815 .
^ Reese, Martin G (2001-12-01). "Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome" . Computers & Chemistry . 26 (1): 51– 56. doi :10.1016/S0097-8485(01)00099-7 . ISSN 0097-8485 . PMID 11765852 .
^ Reese, Martin G.; Eeckman, Frank H.; Kulp, David; Haussler, David (1997-01-01). "Improved Splice Site Detection in Genie" . Journal of Computational Biology . 4 (3): 311– 323. doi :10.1089/cmb.1997.4.311 . PMID 9278062 .
^ "Home - ORFfinder - NCBI" . www.ncbi.nlm.nih.gov . Retrieved 2021-11-24 .
^ Santana-Garcia, Walter; Rocha-Acevedo, Maria; Ramirez-Navarro, Lucia; Mbouamboua, Yvon; Thieffry, Denis; Thomas-Chollier, Morgane; Contreras-Moreira, Bruno; van Helden, Jacques; Medina-Rivera, Alejandra (2019-01-01). "RSAT variation-tools: An accessible and flexible framework to predict the impact of regulatory variants on transcription factor binding" . Computational and Structural Biotechnology Journal . 17 : 1415– 1428. doi :10.1016/j.csbj.2019.09.009 . ISSN 2001-0370 . PMC 6906655 . PMID 31871587 .
^ Nguyen, Nga Thi Thuy; Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Santana-Garcia, Walter; Ossio, Raul; Robles-Espinoza, Carla Daniela; Bahin, Mathieu; Collombet, Samuel; Vincens, Pierre; Thieffry, Denis; van Helden, Jacques (2018-05-02). "RSAT 2018: regulatory sequence analysis tools 20th anniversary" . Nucleic Acids Research . 46 (W1): W209 – W214 . doi :10.1093/nar/gky317 . ISSN 0305-1048 . PMC 6030903 . PMID 29722874 .
^ McNair, Katelyn; Zhou, Carol; Dinsdale, Elizabeth A.; Souza, Brian; Edwards, Robert A. (2019-11-01). "PHANOTATE: a novel approach to gene identification in phage genomes" . Bioinformatics . 35 (22): 4537– 4542. doi :10.1093/bioinformatics/btz265 . ISSN 1367-4803 . PMC 6853651 . PMID 31329826 .
^ Brendel, V.; Xing, L.; Zhu, W. (2004-02-05). "Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus" . Bioinformatics . 20 (7): 1157– 1169. doi :10.1093/bioinformatics/bth058 . ISSN 1367-4803 . PMID 14764557 .
^ Henderson, John; Salzberg, Steven; Fasman, Kenneth H. (1997-01-01). "Finding Genes in DNA with a Hidden Markov Model" . Journal of Computational Biology . 4 (2): 127– 141. doi :10.1089/cmb.1997.4.127 . hdl :1903/8004 . PMID 9228612 .