任務 | 描述 | corpus/dataset | 評價名額 | SOTA 結果 | Papers |
Chunking | 組塊分析 | Penn Treebank | F1 | 95.77 | A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks |
Common sense reasoning | 常識推理 | Event2Mind | cross-entropy | 4.22 | Event2Mind: Commonsense Inference on Events, Intents, and Reactions |
Parsing | 句法分析 | Penn Treebank | F1 | 95.13 | Constituency Parsing with a Self-Attentive Encoder |
Coreference resolution | 指代消解 | CoNLL 2012 | average F1 | 73 | Higher-order Coreference Resolution with Coarse-to-fine Inference |
Dependency parsing | 依存句法分析 | Penn Treebank | POS UAS LAS | 97.3 95.44 93.76 | Deep Biaffine Attention for Neural Dependency Parsing |
Task-Oriented Dialogue/Intent Detection | 任務型對話/意圖識别 | ATIS/Snips | accuracy | 94.1 97.0 | Slot-Gated Modeling for Joint Slot Filling and Intent Prediction |
Task-Oriented Dialogue/Slot Filling | 任務型對話/槽填充 | ATIS/Snips | F1 | 95.2 88.8 | Slot-Gated Modeling for Joint Slot Filling and Intent Prediction |
Task-Oriented Dialogue/Dialogue State Tracking | 任務型對話/狀态追蹤 | DSTC2 | Area Food Price Joint | 90 84 92 72 | Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems |
Domain adaptation | 領域适配 | Multi-Domain Sentiment Dataset | average accuracy | 79.15 | Strong Baselines for Neural Semi-supervised Learning under Domain Shift |
Entity Linking | 實體連結 | AIDA CoNLL-YAGO | Micro-F1-strong Macro-F1-strong | 86.6 89.4 | End-to-End Neural Entity Linking |
Information Extraction | 資訊抽取 | ReVerb45K | Precision Recall F1 | 62.7 84.4 81.9 | CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information |
Grammatical Error Correction | 文法錯誤糾正 | JFLEG | GLEU | 61.5 | Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation |
Language modeling | 語言模型 | Penn Treebank | Validation perplexity Test perplexity | 48.33 47.69 | Breaking the Softmax Bottleneck: A High-Rank RNN Language Model |
Lexical Normalization | 詞彙規範化 | LexNorm2015 | F1 Precision Recall | 86.39 93.53 80.26 | MoNoise: Modeling Noise Using a Modular Normalization System |
Machine translation | 機器翻譯 | WMT 2014 EN-DE | BLEU | 35.0 | Understanding Back-Translation at Scale |
Multimodal Emotion Recognition | 多模态情感識别 | IEMOCAP | Accuracy | 76.5 | Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling |
Multimodal Metaphor Recognition | 多模态隐喻識别 | verb-noun pairs adjective-noun pairs | F1 | 0.75 0.79 | Black Holes and White Rabbits: Metaphor Identification with Visual Features |
Multimodal Sentiment Analysis | 多模态情感分析 | MOSI | Accuracy | 80.3 | Context-Dependent Sentiment Analysis in User-Generated Videos |
Named entity recognition | 命名實體識别 | CoNLL 2003 | F1 | 93.09 | Contextual String Embeddings for Sequence Labeling |
Natural language inference | 自然語言推理 | SciTail | Accuracy | 88.3 | Improving Language Understanding by Generative Pre-Training |
Part-of-speech tagging | 詞性标注 | Penn Treebank | Accuracy | 97.96 | Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings |
Question answering | 問答 | CliCR | F1 | 33.9 | CliCR: A Dataset of Clinical Case Reports for Machine Reading Comprehension |
Word segmentation | 分詞 | VLSP 2013 | F1 | 97.90 | A Fast and Accurate Vietnamese Word Segmenter |
Word Sense Disambiguation | 詞義消歧 | SemEval 2015 | F1 | 67.1 | Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison |
Text classification | 文本分類 | AG News | Error rate | 5.01 | Universal Language Model Fine-tuning for Text Classification |
Summarization | 摘要 | Gigaword | ROUGE-1 ROUGE-2 ROUGE-L | 37.04 19.03 34.46 | Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization |
Sentiment analysis | 情感分析 | IMDb | Accuracy | 95.4 | Universal Language Model Fine-tuning for Text Classification |
Semantic role labeling | 語義角色标注 | OntoNotes | F1 | 85.5 | Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling |
Semantic parsing | 語義解析 | LDC2014T12 | F1 Newswire F1 Full | 0.71 0.66 | AMR Parsing with an Incremental Joint Model |
Semantic textual similarity | 語義文本相似度 | SentEval | MRPC SICK-R SICK-E STS | 78.6/84.4 0.888 87.8 78.9/78.6 | Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning |
Relationship Extraction | 關系抽取 | New York Times Corpus | [email protected]% [email protected]% | 73.6 59.5 | RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information |
Relation Prediction | 關系預測 | WN18RR | [email protected] [email protected] MRR | 59.02 45.37 49.83 | Predicting Semantic Relations using Global Graph Properties |