Deep contextualized word representations (ELMo) 閱讀筆記雙向語言模型ELMo

2023-08-04 15:40:52

ELMo (Embeddings from Language Models)

引入了一種新的深度上下文單詞表示，不僅能對單詞使用的複雜特征（如文法和語義）進行模組化，還能根據上下文語境的不同做出改變（一詞多義）。

與傳統的單詞類型嵌入不同，每一個token的表示是整個輸入句子的函數。它們是在帶有字元卷積的兩層biLMs上計算的

雙向語言模型

給定N個token的序列，(t1，t2，……，tN)，對于前向語言模型（forward language model）來說，tk是由前面的token計算出來的：

Deep contextualized word representations (ELMo) 閱讀筆記雙向語言模型ELMo

後向語言模型（backward language model）是這樣的：

Deep contextualized word representations (ELMo) 閱讀筆記雙向語言模型ELMo

biLM就是結合了上述兩種模型，最大化前向和後向的對數似然：

Deep contextualized word representations (ELMo) 閱讀筆記雙向語言模型ELMo

兩個方向的

Deep contextualized word representations (ELMo) 閱讀筆記雙向語言模型ELMo

是獨立的，

Deep contextualized word representations (ELMo) 閱讀筆記雙向語言模型ELMo

就是一開始輸入的詞向量，

Deep contextualized word representations (ELMo) 閱讀筆記雙向語言模型ELMo

就是softmax層參數。

ELMo

對于token tk，L層的biLM可以計算出2L+1個表達：

Deep contextualized word representations (ELMo) 閱讀筆記雙向語言模型ELMo

是輸入的token層，

Deep contextualized word representations (ELMo) 閱讀筆記雙向語言模型ELMo

對于下遊的具體任務，ELMo把所有層的

Deep contextualized word representations (ELMo) 閱讀筆記雙向語言模型ELMo

壓縮在一起形成一個單獨的vector：

Deep contextualized word representations (ELMo) 閱讀筆記雙向語言模型ELMo

其中，γ可以對特定任務的模型的ELMo向量進行大小縮放，s是softmax-normalized權重。

将ELMo用于下遊任務supervised model時，首先固定biLM的權重，把

Deep contextualized word representations (ELMo) 閱讀筆記雙向語言模型ELMo

與Xk結合，得到ELMo enhanced representation [Xk；

Deep contextualized word representations (ELMo) 閱讀筆記雙向語言模型ELMo

]，輸入NLP 模型中。

Deep contextualized word representations (ELMo) 閱讀筆記雙向語言模型ELMo

雙向語言模型

ELMo

繼續閱讀

ELMO論文學習筆記

ELMo - Deep contextualized word representationsDeep contextualized word representations (ELMo)elmo語言模型Reference

ELMo 原了解析

CentOS上Docker安裝GPU支援Nvidia-docker

場景文本檢測，CTPN tensorflow版本text-detection-ctpnpreparetraindemosome results

論文閱讀筆記20.05-第三周：ResNet的多種變種Residual Attention Network for Image ClassificationRes2Net: A New Multi-scale Backbone ArchitectureResNeSt: Split-Attention Networks

如何寫一篇好的科研論文背景我能夠從你的論文裡學到什麼？

Fast Spatio-Temporal Residual Network for Video Super-Resolution閱讀了解

Visual Attention

Tensorflow Day19 Denoising Autoencoder

Tensorflow Day16 Autoencoder 實作

Tensorflow Day17 Sparse Autoencoder

基于keras的多GPU深度學習網絡模型及參數儲存-筆記

A Guide For Time Series Prediction Using Recurrent Neural Networks (LSTMs)

ICLR 2017 | GAN Missing Modes 和 GAN

【深度學習-基礎知識】batchNormal原理及caffe中是如何使用的