天天看點

WES/WGS資料比對,除了bwa mem,還有哪些可替代方案

bwa mem性能較好、可靠穩定,一直是WES/WGS資料分析的首選比對工具,随着技術的發展,有了越來越多的可替代方案。

1. Minimap2( https://github.com/lh3/minimap2 )

在bwa的作者李恒的部落格( https://lh3.github.io/2018/04/02/minimap2-and-the-future-of-bwa )《Minimap2 and the future of BWA》中寫道:

The story on short-read alignment is a little complex, though. I did plan to replace bwa-mem with minimap2 on short-read alignment, too. In the minimap2 paper, I showed that minimap2 is 3X as fast as bwa-mem and achieves comparable accuracy to bwa-mem on short variant calling (section 3.3). In the final round of the review, an reviewer still argued that minimap2 wouldn’t work well for short reads. I didn’t think so at the time given that Illumina Inc. has independently evaluated minimap2 and observed that minimap2 is highly competitive. Therefore, I didn’t follow the suggestion of that reviewer.

However, Andrew Carroll at DNAnexus has recently showed me that minimap2 was slower than bwa-mem on two NovaSeq runs at his hand. Part of the reason, I guess, is that the two NovaSeq runs have a little higher error rate, which triggers expensive heuristics in minimap2 more frequently. Furthermore, I also realize that bwa-mem will be better than minimap2 at Hi-C alignment because bwa-mem is more sensitive to short matches. In the end, I admit minimap2 is not ready to replace bwa-mem all around. I owe that reviewer an apology.

Generally, I still think minimap2 is a competitive short-read mapper and I will use it often in my research projects. However, given that the performance of minimap2 is not as consistent as bwa-mem for short reads of varying quality, bwa-mem is still better for production uses, at least before I find a way to improve minimap2.

距離這部落格已經三年多過去了,minimap2也不知更新了多少版本,minimap2的github網站上這樣寫道:

Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database. Typical use cases include: (1) mapping PacBio or Oxford Nanopore genomic reads to the human genome; (2) finding overlaps between long reads with error rate up to ~15%; (3) splice-aware alignment of PacBio Iso-Seq or Nanopore cDNA or Direct RNA reads against a reference genome; (4) aligning Illumina single- or paired-end reads; (5) assembly-to-assembly alignment; (6) full-genome alignment between two closely related species with divergence below ~15%.

For ~10kb noisy reads sequences, minimap2 is tens of times faster than mainstream long-read mappers such as BLASR, BWA-MEM, NGMLR and GMAP. It is more accurate on simulated long reads and produces biologically meaningful alignment ready for downstream analyses. For >100bp Illumina short reads, minimap2 is three times as fast as BWA-MEM and Bowtie2, and as accurate on simulated data.Detailed evaluations are available from the minimap2 paper or the preprint.

是以,一些科研型的、對某個具體的點突變要求不那麼精确,又對軟體效率要求較高的的應用(比如就NGS資料call ROH)上可以嘗試使用minimap2作為比對工具,不排除将來也可全面替代bwa mem。

2. Dragmap( https://github.com/Illumina/DRAGMAP )

Illumina公司的開源軟體,出自Edico Genome的DRAGEN系列軟體,現已将比對部分開源,當然也可去使用商業License 的DRAGEN軟體,現在已更新到V3.9版本,在準确性上有進一步提升。

3. bwa-mem2( https://github.com/bwa-mem2/bwa-mem2 )

指令集優化版bwa mem,近期的版本極大降低了記憶體和存儲使用量

Bwa-mem2 is the next version of the bwa-mem algorithm in bwa. It produces alignment identical to bwa and is ~1.3-3.1x faster depending on the use-case, dataset and the running machine.

The original bwa was developed by Heng Li (@lh3). Performance enhancement in bwa-mem2 was primarily done by Vasimuddin Md (@yuk12) and Sanchit Misra (@sanchit-misra) from Parallel Computing Lab, Intel. Bwa-mem2 is distributed under the MIT license.

4. Sentieon

Sentieon也是商業License 的軟體,其中的比對軟體也主要基于bwa mem進行優化的,準确性和性能也都不錯