通過stringtie軟體得到表達量資料如下:
$ head SRR3823868
Gene ID Gene Name Reference Strand Start End Coverage FPKM TPM
gene-Aa1Ag00004 Aa1Ag00004 Chr1A - 47479 48231 0.721659 0.111406 0.140519
gene-Aa1Ag00001 Aa1Ag00001 Chr1A - 14477 27718 17.4181582.688935 3.391617
gene-Aa1Ag00005 Aa1Ag00005 Chr1A + 61262 67021 0.574441 0.088680 0.111854
gene-Aa1Ag00006 Aa1Ag00006 Chr1A - 67992 68593 9.194076 1.419339 1.790246
目的:将多個樣本的表達量結果整合到一個檔案中。
1. 利用awk提取結果中的Gene name和FPKM
$ awk -F '\t' '{print $2"\t"$8}' SRR3823868 >SRR3823868_FPKM.txt
$ head SRR3823868_FPKM.txt
Gene Name FPKM
Aa1Ag00004 0.111406
Aa1Ag00001 2.688935
Aa1Ag00005 0.088680
Aa1Ag00006 1.419339
2. 利用sed替換檔案中的FPKM,為了防止混淆各個樣本的表達量,将表頭添加SRR号
$ sed -i 's#FPKM#SRR3823868FPKM#' SRR3823868_FPKM.txt
$ head SRR3823868_FPKM.txt
Gene Name SRR3823868FPKM
Aa1Ag00004 0.111406
Aa1Ag00001 2.688935
Aa1Ag00005 0.088680
Aa1Ag00006 1.419339
3. 将其他樣本也做同樣的處理
$ head SRR6274689_FPKM.txt
Gene Name SRR6274689FPKM
Aa1Ag00006 1.176829
Aa1Ag00001 4.954225
Aa1Ag00002 0.556997
Aa1Ag00007 2.232162
$ head SRR3823655_FPKM.txt
Gene Name SRR3823655FPKM
Aa1Ag00004 0.000000
Aa1Ag00005 0.080253
Aa1Ag00002 0.837885
Aa1Ag00003 0.024968
4. 利用join将三個檔案進行合并
$ join -e 'NA' -a 1 -a 2 SRR3823655_FPKM.txt SRR3823868_FPKM.txt >SRR3823655_SRR3823868_FPKM.txt
join: SRR3823655_FPKM.txt:4: is not sorted: Aa1Ag00002 0.837885
join: SRR3823868_FPKM.txt:6: is not sorted: Aa1Ag00002 0.698827
$ head SRR3823655_SRR3823868_FPKM.txt
Gene Name SRR3823655FPKM Name SRR3823868FPKM
Aa1Ag00004 0.000000 0.111406
Aa1Ag00001 2.688935
Aa1Ag00005 0.080253 0.088680
Aa1Ag00002 0.837885
Aa1Ag00003 0.024968
Aa1Ag00006 1.419339
Aa1Ag00002 0.698827
Aa1Ag00003 0.030977
Aa1Ag00007 0.063051 0.707737
報錯了,傷感。。。
需要對表達量的檔案進行排序sort
本來以為表達量的結果隻有存在表達的gene ID才會輸出,才發現原來所有的ID都輸出了,那就隻需要把檔案傳回window然後用excel粘在一起就可以了。哎,雖然浪費了點時間還是學到了好幾個密碼,也算值把。