squad_convert_example_to_features

2021-10-15 10:40:00

最近在看QA，對dataset不是很了解，是以看了一下pytorch中的squad_convert_example_to_features。

以下為pytorch源代碼：

其中example資料大緻呈現（不完整）：

關于encode_plus的解釋：

結果：

結果：此時doc_stride=0

{'overflowing_tokens': [7592, 1010, 2026, 2365, 2003, 3013, 2075, 1012], 'num_truncated_tokens': 6, 'input_ids': [101, 7592, 1010, 2026, 2365, 2003, 5870, 1012, 102, 7592, 1010, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

結果：此時doc_stirde=1

{'overflowing_tokens': [1010, 2026, 2365, 2003, 3013, 2075, 1012], 'num_truncated_tokens': 6, 'input_ids': [101, 7592, 1010, 2026, 2365, 2003, 5870, 1012, 102, 7592, 1010, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

結果：此時doc_stride=2

{'overflowing_tokens': [2026, 2365, 2003, 3013, 2075, 1012], 'num_truncated_tokens': 6, 'input_ids': [101, 7592, 1010, 2026, 2365, 2003, 5870, 1012, 102, 7592, 1010, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

pytorch中源代碼：

在生成dataset的時候，若樣本中的contetx長度過長，将會進行分段（question+context）,當作一組訓練資料。

squad_convert_example_to_features

繼續閱讀

HDU 4721 Food and Productivity

ZOJ 1041 Transmitters

CSU 1562 Fun House

CodeChef PALPROB Palindromeness

UVA 10344- 23 out of 5

ZOJ 1104 Leaps Tall Buildings

ZOJ 3700 Ever Dream

HDU 2821 Pusher

ZOJ 1199 Point of Intersection

UVA 1401 Remember the Word

ZOJ 2748 Free Kick

CSU 1567 Reverse Rot

JAVA 系列——>開發工具IntelliJ IDEA的安裝以及配置、快捷鍵IDEA 簡介

UVA 519 Puzzle (II)

磁盤結構及在Linux中的命名

詳解STM32單片機的堆棧