天天看點

雲上mNGS分析: 通過雲服務60秒完成病毒序列比對

新型冠狀病毒SARS-CoV-2 RNA在發病後數周内在上呼吸道或下呼吸道中檢測到,用于泛病原體檢測的基因組下一代測序(mNGS)為此次新冠病毒SARS-CoV-2 RNA 的早期發現和準确測序立下了汗馬功勞。

雲上mNGS分析: 通過雲服務60秒完成病毒序列比對

目前針對新型冠狀病毒核酸檢測,是指多重熒光RT-PCR試劑盒,主要針對新冠病毒ORF1ab、N、E基因設定三靶标做的熒光檢測。抗體檢測就是病毒進入身體,身體的免疫系統産生了針對新型冠狀病毒的特殊抗體lgM和lgG,能檢查到抗體就檢查出來是否被病毒感染和康複。而mNGS是通過測序儀對病人病竈組織的深度測序獲得的宏基因組資料,來定位和排查多種病原體。

盡管已經有衆多新型冠狀病毒RT-PCR試劑盒可選,由于病毒濃度和試劑盒品質,相關RT-PCR試劑盒等試劑出現假陰性較高的問題,導緻醫生和患者往往需要重複多次檢測和長時間等待檢測結果。

mNGS的技術優勢為通過一次檢測就可以排查所有已知的病原體, mNGS檢測能有效避免重複采樣給醫生和患者帶來的操作難度,也避免了PCR檢測手段下多次檢測篩查所需的大量樣本在臨床中難以實作的問題。基于mNGS核酸序列比對的分析方式,一旦病原體的基因組已知,通過更新資料庫,就可以實作高效準确檢測病原體的功能。

阿裡雲基因計算服務AGS提供了針對mNGS宏基因組測序資料的快速比對能力,對一組肺泡采樣測序後的宏基因組資料3.2Gbase(22M reads),60秒内可以完成和已知的病原體基因組序列庫包括新型冠狀病毒SARS-CoV-2,或者39種BetaCov RNA的參考序列的比對,并且支援自定義的數萬種的病毒庫的上傳和比對。對于疾控中心,醫院,實驗室隻需要一個對象存儲Bucket, 以及指令行

AGS

就可以完成整個的比對過程,并拿到高品質的比對reads的資料和初步品質報告,為多種病原體檢測,進一步的新冠病毒的蛋白質研究和,變異研究提供了快捷準确的資料支撐。

與社會共抗疫情,阿裡雲基因計算服務AGS面向基因測序廠商,疾控中心,醫院,學校,制藥企業等開放mNGS RNA比對計算能力,歡迎

申請使用 .

準備

  1. 下載下傳和安裝AGS指令行,請參見 AGS指令行幫助
  2. 下載下傳安裝ossutil指令行,請參見 ossutil指令行幫助
  3. 準備一個阿裡雲賬号,以及準備一個存放mNGS測序資料的對象存儲Bucket, e.g. oss://my-test-shenzhen
  4. 為服務AGS配置bucket通路權限。e.g. ags config oss my-test-shenzhen
  5. 上傳mNGS 測序資料到Bucket。
e.g.
ossutil cp  ICU6G_S2_L001_R1_001.fastq.gz oss://my-test-shenzhen/cov2-samples/
ossutil cp  ICU6G_S2_L001_R2_001.fastq.gz oss://my-test-shenzhen/cov2-samples/           
  1. 運作比對任務來對mNGS資料和已知RNA序列和序列資料庫做比對,重複執行5,6步驟對不同的樣本實作比對。
Usage:
ags remote run rna-mapping \ # <rna-mapping>: RNA 序列的比對任務
--region cn-shenzhen \ # <cn-shenzhen|cn-beijing|...>: 地域ID,目前支援深圳和北京。
--bucket my-test-shenzhen \ # <bucket_name> 對象存儲bucket的名稱
--fastq1 cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz \ # 雙端測序資料fq1相對路徑
--fastq2 cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz  \ # 雙端測序資料fq2的相對路徑
--output-bam bam/ICU6G_S2.bam  \ #産出比對結果bam的輸出路徑,報告也在同樣位置,以.txt結尾
--reference [sars-cov-2 | betacov-ncbi-39 | <path of RNA library reference in specified bucket >] # 參考序列預置了新型冠狀病毒sars-cov-2和目前已經知道的39種betacov的冠狀病毒,可以指定自定義的病毒序列庫           

e.g.

新型冠狀病毒比對

1. 送出比對任務比較ICU6G_S2_L001 的測序樣本和新型冠狀病毒的相似度

ags remote run rna-mapping \
--region cn-shenzhen \
--fastq1 cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz \
--fastq2 cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz  \
--bucket my-test-shenzhen \
--output-bam bam/ICU6G_S2.bam  \
--reference sars-cov-2


INFO[0002] {"JobName":"rna-mapping-gpu-2ms6w"}
INFO[0002] Job submit succeed           

2. 檢查比對任務和比對結果

e.g. 

在這個比對任務任務中,10M reads(1.4Gbase)和新型冠狀病毒序列

MN908947.3

在43秒完成比對,比對産生了3629個高品質重合的reads,并在新型冠狀病毒特征區間有超過120分的reads條數有404個。說明可以精确的從次樣本的測序資料中檢測出SARS-CoV-2 RNA的序列。

High Quality Mapped Reads is: 3629
Matched reads in orf1ab range is: 480
Matched reads in orf1ab range with alignment score (AS) is greater than 120: 404
feature sequence of ICU6G_S2_L001 is similar to SARS-CoV-2 with very high mappQ and AS reads: True           
ags remote get rna-mapping-gpu-2ms6w --show
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
|       JOB NAME        |  JOB NAMESPACE   |  STATUS   |          CREATE TIME          | DURATION |          FINISH TIME          | TOTAL READS | TOTAL BASES |
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
| rna-mapping-gpu-2ms6w | XXXXXXXXXXXX | Succeeded | 2020-03-04 16:40:30 +0800 CST | 43s      | 2020-03-04 16:41:13 +0800 CST |    10369818 |  1456539874 |
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+


+---------------------------------+--------------------------------------------+
|           JOB DETAIL            |                                            |
+---------------------------------+--------------------------------------------+
| rna_matached_reads              |                                        480 |
| rna_is_sars_cov2                | True                                       |
| rna_mapping_oss_region          | cn-shenzhen                                |
| rna_mapping_fastq_second_name   | cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz |
| rna_mapping_no_unmapped         |                                            |
| rna_mapping_service             | s                                          |
| rna_matached_reads_alignment    |                                        404 |
| rna_high_quality_mapped         |                                       3629 |
| rna_mapping_fastq_first_name    | cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz |
| rna_mapping_mark_dup            |                                            |
| rna_mapping_reference_file_name | sars-cov-2                                 |
| rna_cov_detail_file             | bam/ICU6G_S2.bam.cov.txt                   |
| rna_mapping_bam_file_name       | bam/ICU6G_S2.bam                           |
| rna_mapping_bucket_name         | my-test-shenzhen                           |
+---------------------------------+--------------------------------------------+           

3. 下載下傳比對資料bam和報告

ossutil ls oss://my-test-shenzhen/bam/ICU6G_S2.bam
LastModifiedTime                   Size(B)  StorageClass   ETAG                                  ObjectName
2020-03-04 16:41:11 +0800 CST       356320      Standard   9596D012A30438A0073A2A0B38F5D578      oss://my-test-shenzhen/bam/ICU6G_S2.bam
2020-03-04 16:41:11 +0800 CST         2889      Standard   63175E7180D110BA9D3BAB34F4313C59      oss://my-test-shenzhen/bam/ICU6G_S2.bam.cov.txt
2020-03-04 16:41:11 +0800 CST          396      Standard   940D51FF7ECFF60B5E5A41D1F635180D      oss://my-test-shenzhen/bam/ICU6G_S2.bam.summary.json

ossutil cp oss://my-test-shenzhen/bam/HKU2_160660.summary.json .
ossutil cp -r oss://my-test-shenzhen/bam/ICU6G_S2.bam.cov.txt .
ossutil cp oss://my-test-shenzhen/bam/HKU2_160660.bam .           

Example for sars-cov-2 RNA detected

cat bam/ICU6G_S2.bam.cov.txt

Summary:
High Quality Mapped Reads is: 3629
Matched reads in orf1ab range is: 480
Matched reads in orf1ab range with alignment score (AS) is greater than 120: 404
/data/cov2-samples_ICU6G_S2_L001_R1_001.fastq.gz-output/ICU6G_S2.bam is similar to SARS-CoV-2 with very high mappQ and AS reads: True
        21571     21581     21591     21601     21611     21621     21631
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
ATGTT GTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGT                           CAATTACCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGA         CCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCA  agtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgcca         AGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
 TGTTTGTTTTTCTTGTTTT     CACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
 tgtttgtttttcttgtttt
   TTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
      gtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc           

進一步的分析比對資料

使用者可以通過samtools stats, plot-bamstat 等工具對比對bam産出,來實作對coverage, depth等的進一步分析相似度,使用産出的bam資料,可以進一步實作蛋白質組成,以及變異分析。

e.g. stats

雲上mNGS分析: 通過雲服務60秒完成病毒序列比對
Coverage Analysis
雲上mNGS分析: 通過雲服務60秒完成病毒序列比對

已知的39個Beta Coronavirus的病毒比對

ags remote run rna-mapping \
--region cn-shenzhen \
--fastq1 cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz \
--fastq2 cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz  \
--bucket my-test-shenzhen \
--output-bam bam/ICU6G_S2_virus.bam  \
--reference betacov-ncbi-39
INFO[0011] {"JobName":"rna-mapping-gpu-6mpcc"}
INFO[0011] Job submit succeed

ags remote get rna-mapping-gpu-6mpcc --show
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
|       JOB NAME        |  JOB NAMESPACE   |  STATUS   |          CREATE TIME          | DURATION |          FINISH TIME          | TOTAL READS | TOTAL BASES |
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
| rna-mapping-gpu-6mpcc | XXXXXXXXX | Succeeded | 2020-03-04 17:36:21 +0800 CST | 40s      | 2020-03-04 17:37:01 +0800 CST |    10369818 |  1456539874 |
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
# 2014 mapped reads detected, but no mapped reads found in range
+---------------------------------+--------------------------------------------+
|           JOB DETAIL            |                                            |
+---------------------------------+--------------------------------------------+
| rna_mapping_reference_file_name | betacov-ncbi-39                            |
| rna_matached_reads_alignment    |                                          0 |
| rna_mapping_bam_file_name       | bam/ICU6G_S2_virus.bam                     |
| rna_mapping_fastq_first_name    | cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz |
| rna_mapping_oss_region          | cn-shenzhen                                |
| rna_cov_detail_file             | bam/ICU6G_S2_virus.bam.cov.txt             |
| rna_mapping_no_unmapped         |                                            |
| rna_matached_reads              |                                          0 |
| rna_mapping_mark_dup            |                                            |
| rna_mapping_service             | s                                          |
| rna_high_quality_mapped         |                                       2014 |
| rna_mapping_bucket_name         | my-test-shenzhen                           |
| rna_mapping_fastq_second_name   | cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz |
| rna_is_sars_cov2                | False                                      |
+---------------------------------+--------------------------------------------+           

使用自定義病毒庫的比對

1. 從NCBI GeneBank下載下傳reference序列合并成為一個多contig的參考序列

https://www.ncbi.nlm.nih.gov/nuccore

e.g. 搜尋核酸包含'betacov'的所有參考系列, 并下載下傳參考系列

雲上mNGS分析: 通過雲服務60秒完成病毒序列比對

2. 把下載下傳的序列檔案sequence.fa 改名為betacov-ncbi-test.fa

3. 上傳reference 到對象存儲bucket

ossutil cp betacov-ncbi-test.fa oss://my-test-shenzhen/ref/           

4. 送出比對任務,指定reference的路徑

ags remote run rna-mapping \
--region cn-shenzhen \
--fastq1 cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz \
--fastq2 cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz  \
--bucket my-test-shenzhen \
--output-bam bam/ICU6G_S2_virus.bam  \
--reference ref/betacov-ncbi-test.fa

INFO[0002] {"JobName":"rna-mapping-gpu-69mwb"}
INFO[0002] Job submit succeed
           

5. 檢視比對報告和擷取比對的比對資料

ags remote get rna-mapping-gpu-69mwb --show
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
|       JOB NAME        |  JOB NAMESPACE   |  STATUS   |          CREATE TIME          | DURATION |          FINISH TIME          | TOTAL READS | TOTAL BASES |
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
| rna-mapping-gpu-69mwb | 1365606736606053 | Succeeded | 2020-03-04 17:47:00 +0800 CST | 40s      | 2020-03-04 17:47:40 +0800 CST |    10369818 |  1456539874 |
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+


+---------------------------------+--------------------------------------------+
|           JOB DETAIL            |                                            |
+---------------------------------+--------------------------------------------+
| rna_mapping_fastq_first_name    | cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz |
| rna_mapping_fastq_second_name   | cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz |
| rna_mapping_mark_dup            |                                            |
| rna_mapping_oss_region          | cn-shenzhen                                |
| rna_cov_detail_file             | bam/ICU6G_S2_virus.bam.cov.txt             |
| rna_is_sars_cov2                | False                                      |
| rna_mapping_bam_file_name       | bam/ICU6G_S2_virus.bam                     |
| rna_mapping_service             | s                                          |
| rna_matached_reads_alignment    |                                          0 |
| rna_high_quality_mapped         |                                       2014 |
| rna_mapping_bucket_name         | my-test-shenzhen                           |
| rna_mapping_no_unmapped         |                                            |
| rna_mapping_reference_file_name | ref/betacov-ncbi-test.fa                   |
| rna_matached_reads              |                                          0 |
+---------------------------------+--------------------------------------------+
+---------------------------------+------------------------------------------+           

6. 下載下傳比對資料做進一步分析

ossutil ls oss://my-test-shenzhen/bam/ICU6G_S2_virus.bam
LastModifiedTime                   Size(B)  StorageClass   ETAG                                  ObjectName
2020-03-04 17:47:38 +0800 CST       753458      Standard   DF7B1A6CA5AF5DE6BF4FFDBB6DEF71C3      oss://my-test-shenzhen/bam/ICU6G_S2_virus.bam
2020-03-04 17:47:38 +0800 CST         1474      Standard   9D7968A779A0DE7C1993CC2A8D0E5A56      oss://my-test-shenzhen/bam/ICU6G_S2_virus.bam.cov.txt
2020-03-04 17:47:38 +0800 CST          397      Standard   81170E30BAAFEB947A2238E015171A51      oss://my-test-shenzhen/bam/ICU6G_S2_virus.bam.summary.json
Object Number is: 3

ossutil cp oss://my-test-shenzhen/bam/ICU6G_S2_virus.bam.summary.json .

cat bam/ICU6G_S2_virus.bam.summary.json
{
    "total_reads":10369818,
    "total_bases":1456539874,
    "pass_vendor_filter_reads":10369818,
    "mapped_reads":6736,
    "pair_reads":6680,
    "properly_paired_reads":6520,
    "mapq_40_to_inf_reads":2030,
    "mapq_30_to_40_reads":0,
    "mapq_20_to_30_reads":1,
    "mapq_10_to_20_reads":3,
    "mapq_0_to_10_reads":23,
    "mapq_0_reads":10367761,
    "GC":"46.499%",
    "total_alignment":2057,
    "supplementary_alignment":0
}%

ossutil cp oss://my-test-shenzhen/bam/ICU6G_S2_virus.bam .
samtools view bam/ICU6G_S2_virus.bam