中英文詞向量資源合集

2023-02-25 07:08:04

主要針對目前NLP領域表現較好的詞向量模型進行一下整理，分為中文和英文領域。

1.中文

針對中文領域，不需要再去找什麼資源，github上提供了中文NLP的一個利器，去裡面下載下傳即可。

下載下傳連結：https://github.com/Embedding/Chinese-Word-Vectors

中英文詞向量資源合集

除此之外，由騰訊的AILab釋出的詞向量模型表現也很不錯：https://ai.tencent.com/ailab/nlp/embedding.html

中英文詞向量資源合集

其中針對英文詞向量，大力推薦word2vec、GloVe、fasttext幾個項目開源的pre-trained詞向量，在論文中大量引用的這幾個模型。

word2vec: https://code.google.com/archive/p/word2vec/

中英文詞向量資源合集

Glove: https://nlp.stanford.edu/projects/glove/

中英文詞向量資源合集

fasttext:

(1)Language identification:https://fasttext.cc/docs/en/language-identification.html#content

(2)基于爬取資料和維基百科訓練的157種語言對應的詞向量：https://github.com/facebookresearch/fastText/blob/master/docs/crawl-vectors.md

(3)常見的English Word Vectors：https://fasttext.cc/docs/en/english-vectors.html