【預訓練語言模型】BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)
(1)masked language model(MLM)(類似完形填空一樣對一個句子挖掉一個token,然後去預測該token):
The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked word based only on its context.
【預訓練語言模型】BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)
3.1 Input Representation
BERT的輸入sequence可以是一個單獨的句子sentence(例如在序列标注、文本分類等任務),也可以是一個句子對(a pair of sentences)(例如QA任務),輸入的特征包括:
【預訓練語言模型】BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)
【預訓練語言模型】BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)
【預訓練語言模型】BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)
【預訓練語言模型】BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(BERT)