NCBI compliant multinode and multicore BLAST wrapper. Distributed with the latest version of BLAST, this wrapper facilitates parallelization of the algorithm on modern hybrid architectures with many nodes and many cores within each node.[2]
Sequence-context specific BLAST, more sensitive than BLAST, FASTA, and SSEARCH. Position-specific iterative version CSI-BLAST more sensitive than PSI-BLAST
Combining the Smith-Waterman search algorithm with the PSI-BLAST profile construction strategy to find distantly related protein sequences, and preventing homologous over-extension errors.
Protein
Li W, McWilliam H, Goujon M, Cowley A, Lopez R, Pearson WR[13]
2012
R&R
Retrieve and Relate (R&R) is a high performance yet sensitive multi-database search engine, capable of searching in parallel through DNA,RNA and Protein sequences.
First parallel Smith-Waterman algorithm exploiting Intel Xeon Phi clusters to accelerate the alignment of long DNA sequences
DNA
Liu Y, Tran TT, Lauenroth F, Schmidt B
2014
SWIMM
Smith-Waterman implementation for Intel Multicore and Manycore architectures
Protein
Rucci E, García C, Botella G, De Giusti A, Naiouf M and Prieto-Matías M[16]
2015
SWIMM2.0
Enhanced Smith-Waterman on Intel's Multicore and Manycore architectures based on AVX-512 vector extensions
Protein
Rucci E, García C, Botella G, De Giusti A, Naiouf M and Prieto-Matías M[17]
2018
SWIPE
Fast Smith-Waterman search using SIMD parallelization
Both
Rognes T
2011
*Sequence type: protein or nucleotide
Pairwise alignment
Name
Description
Sequence type*
Alignment type**
Author
Year
ACANA
Fast heuristic anchor based pairwise alignment
Both
Both
Huang, Umbach, Li
2005
AlignMe
Alignments for membrane protein sequences
Protein
Both
M. Stamm, K. Khafizov, R. Staritzbichler, L.R. Forrest
2013
ALLALIGN
For DNA, RNA and protein molecules up to 32MB, aligns all sequences of size K or greater. Similar alignments are grouped together for analysis. Automatic repetitive sequence filter.
Protein sequence to structure alignment that includes secondary structure, structural conservation, structure-derived sequence profiles, and consensus alignment scores
Multiple, non-overlapping, local similarity (same algorithm as SIM)
Both
Local non-overlapping
W. Pearson
1991 (algorithm)
NW-align
Standard Needleman-Wunsch dynamic programming algorithm
Protein
Global
Y Zhang
2012
matcher
Waterman-Eggert local alignment (based on LALIGN)
Both
Local
I. Longden (modified from W. Pearson)
1999
MCALIGN2
explicit models of indel evolution
DNA
Global
J. Wang et al.
2006
MegAlign Pro (Lasergene Molecular Biology)
Software to align DNA, RNA, protein, or DNA + protein sequences via pairwise and multiple sequence alignment algorithms including MUSCLE, Mauve, MAFFT, Clustal Omega, Jotun Hein, Wilbur-Lipman, Martinez Needleman-Wunsch, Lipman-Pearson and Dotplot analysis.
For DNA, RNA and protein molecules up to 32MB, aligns all sequences of size K or greater, MSA or within a single molecule. Similar alignments are grouped together for analysis. Automatic repetitive sequence filter.
Software to align DNA, RNA, protein, or DNA + protein sequences via pairwise and multiple sequence alignment algorithms including MUSCLE, Mauve, MAFFT, Clustal Omega, Jotun Hein, Wilbur-Lipman, Martinez Needleman-Wunsch, Lipman-Pearson and Dotplot analysis.
Computes Smith-Waterman gapped alignments and mapping qualities on one or more GPUs. Supports BS-seq alignments. Processes 100,000 to 500,000 reads per second (varies with data, hardware, and configured sensitivity).
A GPGPU accelerated Burrows–Wheeler transform (FM-index) short read alignment program based on BWA, supports alignment of indels with gap openings and extensions.
Uses a short kmers to rapidly index genome; no size or scaffold count limit. Higher sensitivity and specificity than Burrows–Wheeler aligners, with similar or greater speed. Performs affine-transform-optimized global alignment, which is slower but more accurate than Smith-Waterman. Handles Illumina, 454, PacBio, Sanger, and Ion Torrent data. Splice-aware; capable of processing long indels and RNA-seq. Pure Java; runs on any platform. Used by the Joint Genome Institute.
Explicit time and accuracy tradeoff with a prior accuracy estimation, supported by indexing the reference sequences. Optimally compresses indexes. Can handle billions of short reads. Can handle insertions, deletions, SNPs, and color errors (can map ABI SOLiD color space reads). Performs a full Smith Waterman alignment.
Runs the Burrows–Wheeler Aligner-BWA on a Hadoop cluster. It supports the algorithms BWA-MEM, BWA-ALN, and BWA-SW, working with paired and single reads. It implies an important reduction in the computational time when running in a Hadoop cluster, adding scalability and fault-tolerance.
BLAST's nucleotide alignment program, slow and not accurate for short reads, and uses a sequence database (EST, Sanger sequence) rather than a reference genome.
Uses a Burrows–Wheeler transform to create a permanent, reusable index of the genome; 1.3 GB memory footprint for human genome. Aligns more than 25 million Illumina reads in 1 CPU hour. Supports Maq-like and SOAP-like alignment policies
A probabilistic short read aligner based on the use of position specific scoring matrices (PSSM). The aligner is adaptable in the sense that it can take into account the quality scores of the reads and models of data specific biases, such as those observed in Ancient DNA, PAR-CLIP data or genomes with biased nucleotide compositions.[39]
Quantify and manage large quantities of short-read sequence data. CASHX pipeline contains a set of tools that can be used together, or separately as modules. This algorithm is very accurate for perfect hits to a reference genome.
Gapped short-read and long-read alignment based on maximal exact match seeds. This aligner supports both base-space (e.g. from Illumina, 454, Ion Torrent and PacBio sequencers) and ABI SOLiD color-space read alignments.
Read mapping alignment software that implements cache obliviousness to minimize main/cache memory transfers like mrFAST and mrsFAST, however designed for the SOLiD sequencing platform (color space reads). It also returns all possible map locations for improved structural variation discovery.
High-quality alignment engine (exhaustive mapping with substitutions and indels). More accurate and several times faster than BWA or Bowtie 1/2. Many standalone biological applications (mapper, split mapper, mappability, and other) provided.
Fast, accurate overlap assembler with the ability to handle any combination of sequencing technology, read length, any pairing orientations, with any spacer size for the pairing, with or without a reference genome.
Complete framework with user-friendly GUI to analyse NGS data. It integrates a proprietary high quality alignment algorithm and plug-in ability to integrate various public aligner into a framework allowing to import short reads, align them, detect variants, and generate reports. It is made for resequencing projects, namely in a diagnostic setting.
Robust, fast short-read alignment. GMAP: longer reads, with multiple indels and splices (see entry above under Genomics analysis); GSNAP: shorter reads, with one indel or up to two splices per read. Useful for digital gene expression, SNP and indel genotyping. Developed by Thomas Wu at Genentech. Used by the National Center for Genome Resources (NCGR) in Alpheus.
Accurately performs gapped alignment of sequence data obtained from next-generation sequencing machines (specifically of Solexa-Illumina) back to a genome of any size. Includes adaptor trimming, SNP calling and Bisulfite sequence analysis.
Yes, also supports Illumina *_int.txt and *_prb.txt files with all 4 quality scores for each base
Uses a hash table and bloom matrix to create and filter potential positions on the genome. For higher efficiency uses cross-similarity between short reads and avoids realigning non unique redundant sequences. It is faster than Bowtie and BWA and allows indels and divergent sensitive alignments on viruses, bacteria, and more conservative eukaryotic alignments.
Yes
Yes
Yes
Yes
Proprietary, freeware for academic and noncommercial users registered to HIVE deployment instance
Improved Meta-aligner and Minimap2 On Spark. A long read distributed aligner on Apache Spark platform with linear scalability w.r.t. single node execution.
Yes
Yes
Yes
Free
Isaac
Fully uses all the computing power available on one server node; thus, it scales well over a broad range of hardware architectures, and alignment performance improves with hardware abilities
Uses adaptative seeds and copes more efficiently with repeat-rich sequences (e.g. genomes). For example: it can align reads to genomes without repeat-masking, without becoming overwhelmed by repetitive hits.
Gapped (mrFAST) and ungapped (mrsFAST) alignment software that implements cache obliviousness to minimize main/cache memory transfers. They are designed for the Illumina sequencing platform and they can return all possible map locations for improved structural variation discovery.
Fast gapped aligner and reference-guided assembler. Aligns reads using a banded Smith-Waterman algorithm seeded by results from a k-mer hashing scheme. Supports reads ranging in size from very short to very long.
Yes
MPscan
Fast aligner based on a filtration strategy (no indexing, use q-grams and Backward Nondeterministic DAWG Matching)
Gapped alignment of single end and paired end Illumina GA I & II, ABI Colour space & ION Torrent reads. High sensitivity and specificity, using base qualities at all steps in the alignment. Includes adapter trimming, base quality calibration, Bi-Seq alignment, and options for reporting multiple alignments per read. Use of ambiguous IUPAC codes in reference for common SNPs can improve SNP recall and remove allelic bias.
Yes
Yes
Yes
Multi-threading and MPI versions available with paid license
Proprietary, freeware single threaded version for academic and noncommercial use
NextGENe
Developed for use by biologists performing analysis of next generation sequencing data from Roche Genome Sequencer FLX, Illumina GA/HiSeq, Life Technologies Applied BioSystems’ SOLiD System, PacBio and Ion Torrent platforms.
Flexible and fast read mapping program (twice as fast as BWA), achieves a mapping sensitivity comparable to Stampy. Internally uses a memory efficient index structure (hash table) to store positions of all 13-mers present in the reference genome. Mapping regions where pairwise alignments are required are dynamically determined for each read. Uses fast SIMD instructions (SSE) to accelerate alignment calculations on CPU. If available, alignments are computed on GPU (using OpenCL/CUDA) further reducing runtime 20-50%.
Includes highly sensitive and highly accurate tools for detecting SNPs and indels. It offers a solution to map NGS short reads with a moderate distance (up to 30% sequence divergence) from reference genomes. It poses no restrictions on the size of the reference, which, combined with its high sensitivity, makes the Variant Toolkit well-suited for targeted sequencing projects and diagnostics.
Efficiently computes both spliced and unspliced alignments at high accuracy. Relying on a machine learning strategy combined with a fast mapping based on a banded Smith-Waterman-like algorithm, it aligns around 7 million reads per hour on one CPU. It refines the originally proposed QPALMA approach.
For use by biologists and bioinformaticians. It supports ungapped, gapped and splice-junction alignment from single and paired-end reads from Illumina, Life technologies Solid TM, Roche 454 and Ion Torrent raw data (with or without quality information). It integrates powerful quality control on FASTQ/Qual level and on aligned data. Additional functionality include trimming and filtering of raw reads, SNP and InDel detection, mRNA and microRNA quantification and fusion gene detection.
Yes
Yes
Yes
Multiprocessor-core, client-server installation possible
Indexes the genome, then extends seeds using pre-computed alignments of words. Works with base space, color space (SOLID), and can align genomic and spliced RNA-seq reads.
Indexes the genome with periodic seeds to quickly find alignments with full sensitivity up to four mismatches. It can map Illumina and SOLiD reads. Unlike most mapping programs, speed increases for longer read lengths.
Indexes the genome with a k-mer lookup table with full sensitivity up to an adjustable number of mismatches. It is best for mapping 15-60 bp sequences to a genome.
Can use quality scores, intron lengths, and computation splice site predictions to perform and performs an unbiased alignment. Can be trained to the specifics of a RNA-seq experiment and genome. Useful for splice site/intron discovery and for gene model building. (See PALMapper for a faster version).
REAL is an efficient, accurate, and sensitive tool for aligning short reads obtained from next-generation sequencing. The programme can handle an enormous amount of single-end reads generated by the next-generation Illumina/Solexa Genome Analyzer. cREAL is a simple extension of REAL for aligning short reads obtained from next-generation sequencing to a genome with circular structure.
Can map reads with or without error probability information (quality scores) and supports paired-end reads or bisulfite-treated read mapping. There are no limitations on read length or number of mismatches.
Extremely fast, tolerant to high indel and substitution counts. Includes full read alignment. Product includes comprehensive pipelines for variant detection and metagenomic analysis with any combination of Illumina, Complete Genomics and Roche 454 data.
Slider is an application for the Illumina Sequence Analyzer output that uses the "probability" files instead of the sequence files as an input for alignment to a reference sequence or a set of reference sequences.
SOAP: robust with a small (1-3) number of gaps and mismatches. Speed improvement over BLAT, uses a 12 letter hash table. SOAP2: using bidirectional BWT to build the index of reference, and it is much faster than the first version. SOAP3: GPU-accelerated version that could find all 4-mismatch alignments in tens of seconds per one million reads. SOAP3-dp, also GPU accelerated, supports arbitrary number of mismatches and gaps according to affine gap penalty scores.
For ABI SOLiD technologies. Significant increase in time to map reads with mismatches (or color errors). Uses an iterative version of the Rabin-Karp string search algorithm.
Integrates the Burrows–Wheeler Aligner (BWA) on an Apache Spark framework running atop Hadoop. Version 0.2 of October 2016, supports the algorithms BWA-MEM, BWA-backtrack, and BWA-ALN. All of them work with single-reads and paired-end reads.
For Illumina reads. High specificity, and sensitive for reads with indels, structural variants, or many SNPs. Slow, but speed increased dramatically by using BWA for first alignment pass.
For Illumina or ABI SOLiD reads, with SAM native output. Highly sensitive for reads with many errors, indels (full from 0 to 15, extended support otherwise). Uses spaced seeds (single hit) and a very fast SSE-SSE2-AVX2-AVX-512 banded alignment filter. For fixed-length reads only, authors recommend SHRiMP2 otherwise.
Superfast and accurate read aligners. Subread can be used to map both gDNA-seq and RNA-seq reads. Subjunc detects exon-exon junctions and maps RNA-seq reads. They employ a novel mapping paradigm named seed-and-vote.
FPGA-accelerated reference sequence alignment mapping tool from TimeLogic. Faster than Burrows–Wheeler transform-based algorithms like BWA and Bowtie. Supports up to 7 mismatches and/or indels with no performance penalty. Produces sensitive Smith–Waterman gapped alignments.
FPGA based sliding window short read aligner which exploits the embarrassingly parallel property of short read alignment. Performance scales linearly with number of transistors on a chip (i.e. performance guaranteed to double with each iteration of Moore's Law without modification to algorithm). Low power consumption is useful for datacentre equipment. Predictable runtime. Better price/performance than software sliding window aligners on current hardware, but not better than software BWT-based aligners currently. Can manage large numbers (>2) of mismatches. Will find all hit positions for all seeds. Single-FPGA experimental version, needs work to develop it into a multi-FPGA production version.
100% sensitivity for a reads between 15 and 240 bp with practical mismatches. Very fast. Support insertions and deletions. Works with Illumina & SOLiD instruments, not 454.
^Harris R S (2007). Improved pairwise alignment of genomic DNA (Thesis).
^Sandes, Edans F. de O.; de Melo, Alba Cristina M.A. (May 2013). "Retrieving Smith-Waterman Alignments with Optimizations for Megabase Biological Sequences Using GPU". IEEE Transactions on Parallel and Distributed Systems. 24 (5): 1009–1021. doi:10.1109/TPDS.2012.194.
^Sandes, Edans F. de O.; Miranda, G.; De Melo, A.C.M.A.; Martorell, X.; Ayguade, E. (May 2014). CUDAlign 3.0: Parallel Biological Sequence Comparison in Large GPU Clusters. Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on. p. 160. doi:10.1109/CCGrid.2014.18.
^Sandes, Edans F. de O.; Miranda, G.; De Melo, A.C.M.A.; Martorell, X.; Ayguade, E. (August 2014). Fine-grain Parallel Megabase Sequence Comparison with Multiple Heterogeneous GPUs. Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. pp. 383–384. doi:10.1145/2555243.2555280.
^Rucci, Enzo; Garcia, Carlos; Botella, Guillermo; Naiouf, Marcelo; De Giusti,Armando; Prieto-Matias, Manuel. Accelerating Smith-Waterman Alignment of Long DNA Sequences with OpenCL on FPGA. 5th International Work-Conference on Bioinformatics and Biomedical Engineering. pp. 500–511. doi:10.1007/978-3-319-56154-7_45.
^Rasmussen K, Stoye J, Myers EW; Stoye; Myers (2006). "Efficient q-Gram Filters for Finding All epsilon-Matches over a Given Length". Journal of Computational Biology. 13 (2): 296–308. CiteSeerX10.1.1.465.2084. doi:10.1089/cmb.2006.13.296. PMID16597241.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^Malhis, Nawar; Jones, Steven J. M. (2010). "High Quality SNP Calling Using Illumina Data at Shallow Coverage". Bioinformatics. 26 (8): 1029–1035. doi:10.1093/bioinformatics/btq092. PMID20190250.