天天看點

<轉錄組>對stringtie得到的表達量資料進行整理

通過stringtie軟體得到表達量資料如下:

$ head SRR3823868
Gene ID	Gene Name	Reference	Strand	Start	End	Coverage FPKM	TPM
gene-Aa1Ag00004	Aa1Ag00004	Chr1A	-	47479	48231	0.721659 0.111406	0.140519
gene-Aa1Ag00001	Aa1Ag00001	Chr1A	-	14477	27718	17.4181582.688935	3.391617
gene-Aa1Ag00005	Aa1Ag00005	Chr1A	+	61262	67021	0.574441 0.088680	0.111854
gene-Aa1Ag00006	Aa1Ag00006	Chr1A	-	67992	68593	9.194076 1.419339	1.790246
           

目的:将多個樣本的表達量結果整合到一個檔案中。

1. 利用awk提取結果中的Gene name和FPKM

$ awk -F '\t' '{print $2"\t"$8}' SRR3823868 >SRR3823868_FPKM.txt
$ head SRR3823868_FPKM.txt
Gene Name	FPKM
Aa1Ag00004	0.111406
Aa1Ag00001	2.688935
Aa1Ag00005	0.088680
Aa1Ag00006	1.419339
           

2. 利用sed替換檔案中的FPKM,為了防止混淆各個樣本的表達量,将表頭添加SRR号

$ sed -i 's#FPKM#SRR3823868FPKM#' SRR3823868_FPKM.txt
$ head SRR3823868_FPKM.txt
Gene Name	SRR3823868FPKM
Aa1Ag00004	0.111406
Aa1Ag00001	2.688935
Aa1Ag00005	0.088680
Aa1Ag00006	1.419339
           

3. 将其他樣本也做同樣的處理

$ head SRR6274689_FPKM.txt
Gene Name	SRR6274689FPKM
Aa1Ag00006	1.176829
Aa1Ag00001	4.954225
Aa1Ag00002	0.556997
Aa1Ag00007	2.232162
$ head SRR3823655_FPKM.txt
Gene Name	SRR3823655FPKM
Aa1Ag00004	0.000000
Aa1Ag00005	0.080253
Aa1Ag00002	0.837885
Aa1Ag00003	0.024968

           

4. 利用join将三個檔案進行合并

$ join -e 'NA' -a 1 -a 2 SRR3823655_FPKM.txt SRR3823868_FPKM.txt >SRR3823655_SRR3823868_FPKM.txt
join: SRR3823655_FPKM.txt:4: is not sorted: Aa1Ag00002	0.837885
join: SRR3823868_FPKM.txt:6: is not sorted: Aa1Ag00002	0.698827
$ head SRR3823655_SRR3823868_FPKM.txt
Gene Name SRR3823655FPKM Name SRR3823868FPKM
Aa1Ag00004 0.000000 0.111406
Aa1Ag00001 2.688935
Aa1Ag00005 0.080253 0.088680
Aa1Ag00002 0.837885
Aa1Ag00003 0.024968
Aa1Ag00006 1.419339
Aa1Ag00002 0.698827
Aa1Ag00003 0.030977
Aa1Ag00007 0.063051 0.707737
           

報錯了,傷感。。。

需要對表達量的檔案進行排序sort

本來以為表達量的結果隻有存在表達的gene ID才會輸出,才發現原來所有的ID都輸出了,那就隻需要把檔案傳回window然後用excel粘在一起就可以了。哎,雖然浪費了點時間還是學到了好幾個密碼,也算值把。

繼續閱讀