天天看點

常見的32項NLP任務以及對應的評測資料、評測名額、目前的SOTA結果以及對應的Paper

任務 描述 corpus/dataset 評價名額

SOTA

結果

Papers
Chunking 組塊分析 Penn Treebank F1 95.77 A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
Common sense reasoning 常識推理 Event2Mind cross-entropy 4.22 Event2Mind: Commonsense Inference on Events, Intents, and Reactions
Parsing 句法分析 Penn Treebank F1 95.13 Constituency Parsing with a Self-Attentive Encoder
Coreference resolution 指代消解 CoNLL 2012 average F1 73 Higher-order Coreference Resolution with Coarse-to-fine Inference
Dependency parsing 依存句法分析 Penn Treebank

POS

UAS

LAS

97.3

95.44

93.76

Deep Biaffine Attention for Neural Dependency Parsing
Task-Oriented Dialogue/Intent Detection 任務型對話/意圖識别 ATIS/Snips accuracy 94.1   97.0 Slot-Gated Modeling for Joint Slot Filling and Intent Prediction
Task-Oriented Dialogue/Slot Filling 任務型對話/槽填充 ATIS/Snips F1

95.2

88.8

Slot-Gated Modeling for Joint Slot Filling and Intent Prediction
Task-Oriented Dialogue/Dialogue State Tracking 任務型對話/狀态追蹤 DSTC2

Area

Food

Price

Joint

90

84

92

72

Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems
Domain adaptation 領域适配 Multi-Domain Sentiment Dataset average accuracy 79.15 Strong Baselines for Neural Semi-supervised Learning under Domain Shift
Entity Linking 實體連結 AIDA CoNLL-YAGO

Micro-F1-strong

Macro-F1-strong

86.6 

89.4

End-to-End Neural Entity Linking
Information Extraction 資訊抽取 ReVerb45K

Precision

Recall

F1

62.7

84.4

81.9

CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information
Grammatical Error Correction 文法錯誤糾正 JFLEG GLEU 61.5 Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation
Language modeling 語言模型 Penn Treebank

Validation perplexity         

Test perplexity

48.33

47.69

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
Lexical Normalization 詞彙規範化 LexNorm2015

F1

Precision

Recall

86.39 93.53 80.26 MoNoise: Modeling Noise Using a Modular Normalization System
Machine translation 機器翻譯 WMT 2014 EN-DE BLEU 35.0 Understanding Back-Translation at Scale
Multimodal Emotion Recognition 多模态情感識别 IEMOCAP Accuracy 76.5 Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling
Multimodal Metaphor Recognition 多模态隐喻識别 verb-noun pairs adjective-noun pairs F1

0.75

0.79

Black Holes and White Rabbits: Metaphor Identification with Visual Features
Multimodal Sentiment Analysis 多模态情感分析 MOSI Accuracy 80.3 Context-Dependent Sentiment Analysis in User-Generated Videos
Named entity recognition 命名實體識别 CoNLL 2003 F1 93.09 Contextual String Embeddings for Sequence Labeling
Natural language inference 自然語言推理 SciTail Accuracy 88.3 Improving Language Understanding by Generative Pre-Training
Part-of-speech tagging 詞性标注 Penn Treebank Accuracy 97.96 Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings
Question answering 問答 CliCR F1 33.9 CliCR: A Dataset of Clinical Case Reports for Machine Reading Comprehension
Word segmentation 分詞 VLSP 2013 F1 97.90 A Fast and Accurate Vietnamese Word Segmenter
Word Sense Disambiguation 詞義消歧 SemEval 2015 F1 67.1 Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison
Text classification 文本分類 AG News Error rate 5.01 Universal Language Model Fine-tuning for Text Classification
Summarization 摘要 Gigaword

ROUGE-1

ROUGE-2

ROUGE-L

37.04

19.03

34.46

Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization
Sentiment analysis 情感分析 IMDb Accuracy 95.4 Universal Language Model Fine-tuning for Text Classification
Semantic role labeling 語義角色标注 OntoNotes F1 85.5 Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling
Semantic parsing 語義解析 LDC2014T12

F1 Newswire

F1 Full

0.71

0.66

AMR Parsing with an Incremental Joint Model
Semantic textual similarity 語義文本相似度 SentEval

MRPC

SICK-R

SICK-E

STS

78.6/84.4

0.888

87.8

78.9/78.6

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
Relationship Extraction 關系抽取 New York Times Corpus

[email protected]%

[email protected]%

73.6

59.5

RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information
Relation Prediction 關系預測 WN18RR

[email protected]

[email protected]

MRR

59.02

45.37

49.83

Predicting Semantic Relations using Global Graph Properties

繼續閱讀