天天看點

miRNA命名規範miRNA命名規範

miRNA命名規範

上一篇:microRNA簡介

(如果想快速了解miRNA命名規則,請看本篇部落格的總結部分)

miRNA的研究起步很早,最早發現的miRNA是線蟲中的let-7和lin-4,随着越來越多的miRNA被發現,為了友善學術交流,有科學家提出了一套統一的命名規範,對應的文獻如下:

A uniform system for microRNA annotation

Ambros, Victor, et al. 3, 2003, RNA, Vol. 9, pp. 277-279

該文獻中的規範主要用于為新發現的miRNA提供一個統一的名字,對于之前發現并在文獻中給出名稱的miRNA, 依然保留其原來的名稱,比如

hsa-let-7

, 一個典型的成熟miRNA的名稱如下所示:

hsa-miR-1290
           

可以看做由

-

分隔的3個字段組成,第一個三字母縮寫表示miRNA來源的物種,比如

hsa

代表human,

mmu

代表

mouse

;第二個字段為

miR

,代表成熟的miRNA;第三個字段位數字,代表miRNA發現的順序。

正常來講,看懂以下圖檔,就已經可以明了關于miRNA命名的大部分問題了。

miRNA命名規範miRNA命名規範
  • 對于miRNA前體(pre-miRNA), 隻需要

    miR

    替換成

    mir

    就可以了,比如

    hsa-mir-1290

    ; 對于來自同一個miRNA前體的兩個成熟miRNA, 分别用

    -5p

    -3p

    的字尾表示,比如

    hsa-miR-12-5p

    hsa-miR-12-3p

  • 對于同源性非常高的兩個miRNA, 用小寫的英文字母

    a

    ,

    b

    等進行區分,比如

    hsa-miR-5a

    ,

    hsa-miR-5b

    ;
  • 對于由不同基因編産生的完全相同的miRNA,則用添加數字字尾的方式進行區分,比如

    hsa-miR-1290-1

    ,

    hsa-miR-1290-2

    以上這些就是一個miRNA命名的基本規則。

miRBase的miRNA命名

miRBase是由曼徹斯特大學的研究人員開發的一個線上的miRNA資料庫,該資料庫中收錄了來自200多個物種,接近4萬個miRNA的資訊,是最全面的miRNA資料庫,網址如下:

http://www.mirbase.org/index.shtml

miRBase資料庫是miRNA研究最基本的參考資料庫,在該資料庫中,miRNA前體用mir加數字表示, 編号用MI表示,如

hsa-mir-122

, 編号為

MI00042

;成熟miRNA采用miR加數字辨別, 編号用MIMAT表示, 如

hsa-miR-122-5p

對應編号為

MIMAT000421

關于microRNA的資料庫在接下來的部落格會詳細說明,本篇部落客要關注點為miRNA命名規則,在此不對資料庫做過多的介紹。

以下内容選自miRBase命名法說明,翻譯不當,敬請諒解。

What’s in a name?

名字内有什麼含義?

As I briefly mentioned in a previous post, miRBase 17 included two conceptual changes in the miRNA nomenclature scheme, which deserve further detail and clarification.

正如我在之前的一則公告中提到的,miRNA17版本在命名規則上有了兩個概念上的變化,在這裡需要要做進一步的說明。

The name of a miRNA contains some human-readable information. If you stop reading this post halfway, you’ll likely think this is a good thing. Which of course it is, as long as we recognise the limitations. Hold on to the end and hopefully you’ll see that names can create some issues.

關于miRNA命名的可讀性,如果你讀這篇文章的時候半途而廢,那該慶幸是一件好事。當然如果你堅持讀下去了,你會發現有很多的問題。

Take for example, hsa-mir-20b. The “hsa” tells us it is a human miRNA. The “20″ tells us that was discovered early — it’s only the 20th family that was named. “20b” tells us that it is related to another miRNA that we can guess is probably called hsa-mir-20a. We can go further — the (lack of) capitalisation of “mir” tells us we’re talking about the miRNA precursor. Or maybe the genomic locus, or maybe the primary transcript, or maybe the extended hairpin that includes the precursor. So that’s already less useful.

比如hsa-mir-20b,hsa表示這是一個人類的miRNA,20代表第20個家族(排在第20位,可能發現的比較早),20b告訴我們它與另外一個miRNA有關,那個miRNA可能是hsa-mir-20a,mir表示miRNA前體,或者可能是基因組的位置,或者可能是初級轉錄物,或者是包括前體的發夾結構的延伸。

hsa-mir-20b has two mature products, named hsa-miR-20b and hsa-miR-20b* (as of this moment — as you’ll see below, this will change). “miR” tells us we’re talking about a mature sequence. In this case miR-20b arises from the 5′ arm of the mir-20b hairpin, and miR-20b* arises from the 3′ arm. The “” tells us that miR-20b is considered a “minor” product. That means miR-20b* is found in the cell at lower concentration than miR-20b. It is often inferred that miR-20b* is non-functional, and you’ve probably noticed that miR* sequences in general magically disappear in most pictures of miRNA biogenesis, while the dominant arm is magically incorporated into the RISC complex.

hsa-mir-20b有兩個成熟體産物,分别是hsa-miR-20b 和 hsa-miR-20b* (現在是這樣,但是後文會說到這個會改的)。這樣的話,“miR”表示一個成熟體序列。miR-20b 來自于mir-20b發夾結構的5’臂,而miR-20b* 則來自于3’臂,帶 “ * ” 的被認為是未成熟的産物,也就是說,miR-20b* 在細胞中的濃度比miR-20b 要低,一直以來人們推測 miR-20b* 是無作用的。你可能發現miR* 的序列經常在很多miRNA起源的圖檔中神奇的消失了,然而居然隻有優勢臂會和RISC結合。

But hang on a minute, a bunch of papers now tell us that miR* sequences can be functional (eg Yang et al. 2011), perhaps through binding different Agonaute proteins (a glut of papers in the past couple of years nicely reviewed by Czech and Hannon, 2011). And, of course, the miR* sequence from one hairpin might be expressed at orders of magnitude higher level than the dominant miR sequence from another hairpin. Perhaps the arm that makes the dominant product can change in different tissues, stages and species (G-J et al. 2011). Should we rename miR and miR* sequences every time someone produces an ever deeper sequencing dataset? To cap it all, the “*” character causes problems for database searches and the like.

但是請再想一下,一些文獻中告訴我們miR* 序列可能是有作用的( Yang et al. 2011),作用途徑可能是通過結合不同的Ago蛋白(過去的兩年裡有大量的文獻都提到了,Czech and Hannon, 2011)。當然,從發夾結構中的一條miR* 序列可能比另一條優勢序列的表達量還要高一個水準,也有可能在不同組織、不同時期、不同物種中,優勢序列的表達量也會不一樣(G-J et al. 2011),那在産生一個深度測序資料中就要改變一下miR和miR* 序列的名字嗎?那麼加不加“*”就會對資料庫的檢索等操作帶來麻煩。

We therefore intend to retire the miR/miR* nomenclature, in favour of the -5p/-3p nomenclature (the latter has been used in parallel for mature products of approximately equal expression, and will in future be applied to all sequences). We will make this transition in phases, as we can make companion data available to show the expression of mature products from each arm. In miRBase 17, all Drosophila melanogaster mature sequences are renamed as -5p/-3p, and many previously missing second mature products have been added. The available deep sequencing data makes clear which of the potential mature products is dominant. Other species will follow suit in due course.

是以為了解決miR/miR*的命名問題,我們提出了用 -5p/-3p的命名法(後者可以同時表示兩個成熟産物而不考慮他們的表達量水準的高低,未來可能會應用在所有的序列上),我們會分階段進行轉換,并且會提供兩個成熟體相關的表達量資料,在miRBase17版本中,所有的果蠅黑腹菌屬成熟序列都用-5p/-3p來命名,還有很多之前沒有的第二成熟體也增加進去了。現在的深度測序可以測到哪個成熟體可能是占優勢的,後面我們也會按照這種個方式更新其他的物種。

The second change in miRBase 17 concerns the small number of pairs of miRNA sequences that are transcribed from the same locus in opposite directions — that is, sense/antisense pairs. For example, the dme-mir-307 locus has been shown to be transcribed in both directions, and both transcripts are processed to produce mature miRNAs. These miRNAs were previously named dme-mir-307 and dme-mir-307-as in miRBase. The -as is confusing, because it is similar to the suffixes used to denote families of related miRNAs. The classification of sense and antisense is arbitrary. To confuse matters further, -as and -s were used in early miRNA literature to refer to mature products produced from the 5′ and 3′ arms of a hairpin precursor. From miRBase 17 onwards, the -as nomenclature is retired. Sense and antisense miRNAs will be named independently and in the same way as all other sequences: If the sequences are similar then they get a, b suffixes (eg dme-mir-307a and dme-mir-307b), and if they are not deemed similar enough then they get different numbers (eg rno-mir-151 and rno-mir-3586).

miRBase17版本的第二個變化也涉及到了在基因組相對的位置上轉錄成的小的成對的miRNA序列-也就是正反義鍊,比如dme-mir-307 在位置上可以從兩個方向上轉錄,這兩個轉錄本經過轉錄後處理産生兩個成熟的miRNA,這些miRNA在miRBase上之前叫做dme-mir-307 and dme-mir-307-as,“-as”會有點難解釋清楚,因為這個字尾和表示miRNA家族的方法很像,這種正反義鍊的分類的方式是随意的。在更早以前, 早期的文獻中,-as 和-s也用來表示從一個發夾前體上産生的5’和3‘的兩個成熟體。不過在miRBase17版本之前,就沒有用-as命名的方式了。我們現在把來自同一個DNA正反義鍊的兩條序列用自己單獨的名字來命名:如果序列是相似的,會在後面加一個a或者b的字尾(比如dme-mir-307a 和 dme-mir-307b),如果序列的相似度不高就用不同的數字來表示(比如 rno-mir-151 and rno-mir-3586)。

The combined result of these changes is that the name of a miRNA contains less information than previously. This may seem like a retrograde step. However, the problem with encoding information in the name is that people are tempted to use it. MicroRNA names are often pragmatic compromises, and have been overloaded with relatively complex meaning, for example, regarding family relationships and expression levels. Names should be useful, but should never be used in place of the correct analysis, for example, of sequence relationships or expression. We therefore suggest that you’ll find your miRNA life easier if you bear in mind some simple concepts:

這一系列的改變導緻的結果就是miRNA的命名所展示的資訊會比以前更少,這看上去好像是退步了。但是人們更傾向于使用名字的編碼資訊。miRNA命名在實際用法上就妥協了,因為以前超載了太多的複雜意義,比如把家族和表達水準也考慮進去了。命名規則确實是需要一些用處的,卻不應該用來替代精确的分析,比如給一些序列相關性或者表達量做排序。我們相信如果記住了以下幾點,你會在miRNA的工作中進行地更加順利:

  1. Be explicit. If you are referring to the mature miR-20b sequence, you could rely on the capitalisation in miR-20b to say that for you. But it is much better to say “the mature miR-20b sequence”. Even better, show the sequence along with the name; names are not formally stable, but quoting the specific sequence you’ve used in your paper will ensure the entity is traceable forever.

1.确切地說,如果你要表示成熟體 miR-20b序列,你可以用大寫的 miR-20b來表示。這比用“the mature miR-20b sequence”要好的多。或者有更好的辦法,用序列和名字一起來表示;由于名字還沒有正式固定下來,是以在文章中用特定的序列可以在以後更能追本溯源。

  1. Never use the name to encode or derive complex meaning. If you are interested in sequence relationships, you should do some sequence analysis. If you care about expression levels of alternate mature miRNAs, look at expression data. If you derive all your information about miRNA sequence relationships from the name, you will miss a great deal. If you rely on the name to tell you about relative expression then all hope is lost.

2.不讓命名去編碼或者得到什麼複雜的含義。如果對序列的相關性有興趣,你可以做一些序列的分析。如果想關注成熟miRNA的表達量水準,你可以分析表達量資料。如果從miRNA的名字中得到序列的相關性,那可能會不如你所願,如果想讓名字告訴你相關的表達量,你會大失所望的。

參考:(以上内容節選自)http://www.mirbase.org/blog/category/nomenclature/

總結

miRNA名稱與編号

1) miRNA成熟體命名規則(以動物miRNA為例)

①确定命名規則之前發現的miRNA,則保留原來名字,如hsa-let-7。

②miRNA成熟體簡寫成miR,再根據其物種名稱,及被發現的先後順序加上阿拉伯數字,如hsa-miR-122;

③高度同源的miRNA在數字後記上英文小寫字母(a,b,c,…),如hsa-miR-34a,hsa-miR-34b,hsa-miR-34c等;

④由不同染色體上的DNA序列轉錄加工而成的具有相同成熟體序列的miRNA,則在後面機上阿拉伯數字以示區分,如hsa-miR-199a-1和hsa-miR-199a-2;

⑤通常一個miRNA前體長度大約為70~80nt,很可能兩個臂分别産生miRNA。

以前的做法是:表達水準較高的miRNA後面不加任何符号,而表達水準較低的miRNA後面加上* 号,如rno-miR-9* 。有時帶“*”的miRNA就根本不出現。在miRBase 17中則以“-5p”和“-3p”分别命名。如hsa-miR-26b-5p和hsa-miR-26b-3p,分别表明從hsa-mir-26b前體的5’端臂和3’端臂加工而來的。

在以前的命名中,有時也會以“-s”和“-as”來命名,但現在已經取消了這種命名方式。

案例請見:http://www.mirbase.org/cgi-bin/mirna_entry.pl?acc=MI0000442

2) miRNA編号及名稱(以動物miRNA為例)

miRBase記錄了miRNA前體序列及miRNA成熟體序列,其中:

① miRNA前體

發夾狀結構的miRNA前體轉錄本以“mir”命名,其編号以“MI”編号,如人的miRNA 122的前體ID為hsa-mir-122,Accession為MI0000442。

② miRNA成熟體

大約20~23nt的miRNA成熟體以“miR”命名,其編号以“MIMAT”編号,如人的miR-122有兩個成熟體,其中之一ID為hsa-miR-122-5p ,Accession為 MIMAT0000421;另一個為ID為hsa-miR-122-3p ,Accession為 MIMAT000 4590。

案例請見:http://www.mirbase.org/cgi-bin/mirna_entry.pl?acc=MI0000442

3) 不同物種命名方式差别

①動物:

miRNA前體:以動物物種縮寫+“-”+ mir+“-”+命名順序,如hsa-mir-122;

miRNA成熟體:以動物物種縮寫+“-”+ miR+“-”+命名順序,如hsa-miR-122-5p;

②植物:

miRNA前體:以植物物種縮寫+“-”+ MIR+命名順序,如ath-MIR156a。注意:MIR是大寫,并與命名順序之間沒有“-”;

miRNA成熟鍊:以植物物種縮寫+“-”+ miR+命名順序,如ath-miR156a。注意:miR是小寫,并與命名順序之間沒有“-”;

③ 病毒:

miRNA前體:以病毒物種縮寫+“-”+ mir+命名順序,如bhv1-mir-B1;

miRNA成熟鍊:以病毒物種縮寫+“-”+ miR+命名順序,如bhv1-miR-B1。

參考資料來源:

作者:生信修煉手冊

連結:https://www.jianshu.com/p/38ffb0953574

來源:簡書

作者:gaowei2010

連結:http://meeting.dxy.cn/rbmiRNA2012/article/i18707.html

來源:丁香園

翻譯來源:

作者:初陽_l

連結:https://www.jianshu.com/p/5feb4740075a

來源:簡書