常見的32項NLP任務以及對應的評測資料、評測名額、目前的SOTA結果以及對應的Paper

任務	描述	corpus/dataset	評價名額	SOTA 結果	Papers
Chunking	組塊分析	Penn Treebank	F1	95.77	A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
Common sense reasoning	常識推理	Event2Mind	cross-entropy	4.22	Event2Mind: Commonsense Inference on Events, Intents, and Reactions
Parsing	句法分析	Penn Treebank	F1	95.13	Constituency Parsing with a Self-Attentive Encoder
Coreference resolution	指代消解	CoNLL 2012	average F1	73	Higher-order Coreference Resolution with Coarse-to-fine Inference
Dependency parsing	依存句法分析	Penn Treebank	POS UAS LAS	97.3 95.44 93.76	Deep Biaffine Attention for Neural Dependency Parsing
Task-Oriented Dialogue/Intent Detection	任務型對話/意圖識别	ATIS/Snips	accuracy	94.1 97.0	Slot-Gated Modeling for Joint Slot Filling and Intent Prediction
Task-Oriented Dialogue/Slot Filling	任務型對話/槽填充	ATIS/Snips	F1	95.2 88.8	Slot-Gated Modeling for Joint Slot Filling and Intent Prediction
Task-Oriented Dialogue/Dialogue State Tracking	任務型對話/狀态追蹤	DSTC2	Area Food Price Joint	90 84 92 72	Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems
Domain adaptation	領域适配	Multi-Domain Sentiment Dataset	average accuracy	79.15	Strong Baselines for Neural Semi-supervised Learning under Domain Shift
Entity Linking	實體連結	AIDA CoNLL-YAGO	Micro-F1-strong Macro-F1-strong	86.6 89.4	End-to-End Neural Entity Linking
Information Extraction	資訊抽取	ReVerb45K	Precision Recall F1	62.7 84.4 81.9	CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information
Grammatical Error Correction	文法錯誤糾正	JFLEG	GLEU	61.5	Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation
Language modeling	語言模型	Penn Treebank	Validation perplexity Test perplexity	48.33 47.69	Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
Lexical Normalization	詞彙規範化	LexNorm2015	F1 Precision Recall	86.39 93.53 80.26	MoNoise: Modeling Noise Using a Modular Normalization System
Machine translation	機器翻譯	WMT 2014 EN-DE	BLEU	35.0	Understanding Back-Translation at Scale
Multimodal Emotion Recognition	多模态情感識别	IEMOCAP	Accuracy	76.5	Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling
Multimodal Metaphor Recognition	多模态隐喻識别	verb-noun pairs adjective-noun pairs	F1	0.75 0.79	Black Holes and White Rabbits: Metaphor Identification with Visual Features
Multimodal Sentiment Analysis	多模态情感分析	MOSI	Accuracy	80.3	Context-Dependent Sentiment Analysis in User-Generated Videos
Named entity recognition	命名實體識别	CoNLL 2003	F1	93.09	Contextual String Embeddings for Sequence Labeling
Natural language inference	自然語言推理	SciTail	Accuracy	88.3	Improving Language Understanding by Generative Pre-Training
Part-of-speech tagging	詞性标注	Penn Treebank	Accuracy	97.96	Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings
Question answering	問答	CliCR	F1	33.9	CliCR: A Dataset of Clinical Case Reports for Machine Reading Comprehension
Word segmentation	分詞	VLSP 2013	F1	97.90	A Fast and Accurate Vietnamese Word Segmenter
Word Sense Disambiguation	詞義消歧	SemEval 2015	F1	67.1	Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison
Text classification	文本分類	AG News	Error rate	5.01	Universal Language Model Fine-tuning for Text Classification
Summarization	摘要	Gigaword	ROUGE-1 ROUGE-2 ROUGE-L	37.04 19.03 34.46	Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization
Sentiment analysis	情感分析	IMDb	Accuracy	95.4	Universal Language Model Fine-tuning for Text Classification
Semantic role labeling	語義角色标注	OntoNotes	F1	85.5	Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling
Semantic parsing	語義解析	LDC2014T12	F1 Newswire F1 Full	0.71 0.66	AMR Parsing with an Incremental Joint Model
Semantic textual similarity	語義文本相似度	SentEval	MRPC SICK-R SICK-E STS	78.6/84.4 0.888 87.8 78.9/78.6	Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
Relationship Extraction	關系抽取	New York Times Corpus	[email protected]% [email protected]%	73.6 59.5	RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information
Relation Prediction	關系預測	WN18RR	[email protected] [email protected] MRR	59.02 45.37 49.83	Predicting Semantic Relations using Global Graph Properties

常見的32項NLP任務以及對應的評測資料、評測名額、目前的SOTA結果以及對應的Paper

繼續閱讀

seq2sqe與attenton實作聊天機器人

奮戰聊天機器人（四）自然語言進行中的文本分類nltk中的貝葉斯分類器

從詞向量衡量标準到全局向量的詞嵌入模型GloVe再到一詞多義的解決方式衡量标準Evaluation引子全局向量的詞嵌入應用對一詞多義的思考Reference

NLP︱進階詞向量表達（一）——GloVe（理論、相關測評結果、R&python實作、相關應用）一、理論簡述二、測評三、Glove實作&R&python四、相關應用

GloVe與word2vec的差別，及GloVe的缺陷

統計學習大作業-BERT模型1 文本處理-BERT模型2 參考資料：

更别緻的詞向量模型(一)：simpler glove

glove_python安裝（避免編譯錯誤）

python 分析qq聊天記錄

[一起學BERT]（一）：BERT模型的原理基礎Self-Attention機制理論Multi-head Self-Attention注意力機制位置編碼Transformer理論BERT理論

ELMO BERT GPT

BERT、Elmo、GPT一、發展曆史二、bert三、ERNIE四、GPT—transformer的decoder

anaconda中科大鏡像

NLP從入門到放棄_IBM Model1IBM Model1

人工智能如何有效地運用于自然語言處理

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合