通过stringtie软件得到表达量数据如下:
$ head SRR3823868
Gene ID Gene Name Reference Strand Start End Coverage FPKM TPM
gene-Aa1Ag00004 Aa1Ag00004 Chr1A - 47479 48231 0.721659 0.111406 0.140519
gene-Aa1Ag00001 Aa1Ag00001 Chr1A - 14477 27718 17.4181582.688935 3.391617
gene-Aa1Ag00005 Aa1Ag00005 Chr1A + 61262 67021 0.574441 0.088680 0.111854
gene-Aa1Ag00006 Aa1Ag00006 Chr1A - 67992 68593 9.194076 1.419339 1.790246
目的:将多个样本的表达量结果整合到一个文件中。
1. 利用awk提取结果中的Gene name和FPKM
$ awk -F '\t' '{print $2"\t"$8}' SRR3823868 >SRR3823868_FPKM.txt
$ head SRR3823868_FPKM.txt
Gene Name FPKM
Aa1Ag00004 0.111406
Aa1Ag00001 2.688935
Aa1Ag00005 0.088680
Aa1Ag00006 1.419339
2. 利用sed替换文件中的FPKM,为了防止混淆各个样本的表达量,将表头添加SRR号
$ sed -i 's#FPKM#SRR3823868FPKM#' SRR3823868_FPKM.txt
$ head SRR3823868_FPKM.txt
Gene Name SRR3823868FPKM
Aa1Ag00004 0.111406
Aa1Ag00001 2.688935
Aa1Ag00005 0.088680
Aa1Ag00006 1.419339
3. 将其他样本也做同样的处理
$ head SRR6274689_FPKM.txt
Gene Name SRR6274689FPKM
Aa1Ag00006 1.176829
Aa1Ag00001 4.954225
Aa1Ag00002 0.556997
Aa1Ag00007 2.232162
$ head SRR3823655_FPKM.txt
Gene Name SRR3823655FPKM
Aa1Ag00004 0.000000
Aa1Ag00005 0.080253
Aa1Ag00002 0.837885
Aa1Ag00003 0.024968
4. 利用join将三个文件进行合并
$ join -e 'NA' -a 1 -a 2 SRR3823655_FPKM.txt SRR3823868_FPKM.txt >SRR3823655_SRR3823868_FPKM.txt
join: SRR3823655_FPKM.txt:4: is not sorted: Aa1Ag00002 0.837885
join: SRR3823868_FPKM.txt:6: is not sorted: Aa1Ag00002 0.698827
$ head SRR3823655_SRR3823868_FPKM.txt
Gene Name SRR3823655FPKM Name SRR3823868FPKM
Aa1Ag00004 0.000000 0.111406
Aa1Ag00001 2.688935
Aa1Ag00005 0.080253 0.088680
Aa1Ag00002 0.837885
Aa1Ag00003 0.024968
Aa1Ag00006 1.419339
Aa1Ag00002 0.698827
Aa1Ag00003 0.030977
Aa1Ag00007 0.063051 0.707737
报错了,伤感。。。
需要对表达量的文件进行排序sort
本来以为表达量的结果只有存在表达的gene ID才会输出,才发现原来所有的ID都输出了,那就只需要把文件传回window然后用excel粘在一起就可以了。哎,虽然浪费了点时间还是学到了好几个口令,也算值把。