文章目錄
- 1、COCO資料集的介紹
- 2、COCO資料集标注格式
-
- 2.1執行個體分割Object Instance檔案格式
-
- 2.1.1 info中的内容
- 2.1.2 licenses中的内容
- 2.1.3 images中的内容
- 2.1.4 annotations中的内容
- 2.1.4 categories中的内容
- 2.2 關鍵點檢測Object Keypoint檔案格式
-
- 2.2.1 annotations中的内容
- 2.2.2 categories中的内容
- 2.3 看圖說話Image Caption檔案格式
-
- 2.3.1 annotation中的内容
- 本文參考
本文主要是為了熟悉COCO資料集。
1、COCO資料集的介紹
首先上兩個連結,第一個 ,第二個
有以上兩個連結足夠了解COCO
整個資料集的分布如下
#step1: 下載下傳資料集
2017 Train images [118K/18GB]
2017 Val images [5K/1GB]
2017 Test images [41K/6GB]
2017 Train/Val annotations [241MB]
#step2: 按照下面結構存放檔案夾
coco
├── annotations
│ ├── instances_train2014.json
│ ├── instances_train2017.json
│ ├── instances_val2014.json
│ ├── instances_val2017.json
│ | ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
│ | ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
│ | ...
| ...
2、COCO資料集标注格式
本部分主要是參考https://zhuanlan.zhihu.com/p/70878433,這個連結進行同步整理的,直接看原文也可以,隻是覺的原文有點亂,不便于整體掌據該資料集。
COCO資料集大量使用Amazon Mechanical Turk來收集資料。COCO資料集現主要有三種标注類型:
- object instance 目标執行個體
- object keypoints 目标關鍵點
-
image captions 看圖說話。
标注檔案使用JSON檔案進行存儲。如下為COCO2017資料集中train,val的标注檔案:
原檔案是
,解壓後是annotations_trainval2017.zip
檔案夾。可以看到一共有三種類型,每種類型包含訓練和驗證,共有6個JSON檔案。annotations
2.1執行個體分割Object Instance檔案格式
以instance_val2017.json為例(驗證集檔案軟小,打開較快),總體格式如下:
{
"info": info,
"licenses": [license],
"images":[image],
"annotations":[annotation],
"categories":[category]
}
- images字段下是一個清單,清單長度等同于劃入訓練集(或驗證集)的圖檔數量
- annotatons字段下也是一個清單,清單長度等同地訓練集(或驗證集)中bounding box 的數量
- categories字段下也是一個清單,清單長度等同于資料集類别的數,coco2017分類數是80,用VScode打開看:
可以看到整個JSON檔案是一個大的數字典。
通過jupyterlab打開看:
2.1.1 info中的内容
"info": {
"description": "COCO 2017 Dataset",
"url": "http://cocodataset.org",
"version": "1.0",
"year": 2017,
"contributor": "COCO Consortium",
"date_created": "2017/09/01"
},
info中包括一些基本資訊,時間,版本,貢獻者等,沒什麼太大價值,可以忽略。
2.1.2 licenses中的内容
内容較少,這裡全部列出:
"licenses": [
{
"url": "http://creativecommons.org/licenses/by-nc-sa/2.0/",
"id": 1,
"name": "Attribution-NonCommercial-ShareAlike License"
},
{
"url": "http://creativecommons.org/licenses/by-nc/2.0/",
"id": 2,
"name": "Attribution-NonCommercial License"
},
{
"url": "http://creativecommons.org/licenses/by-nc-nd/2.0/",
"id": 3,
"name": "Attribution-NonCommercial-NoDerivs License"
},
{
"url": "http://creativecommons.org/licenses/by/2.0/",
"id": 4,
"name": "Attribution License"
},
{
"url": "http://creativecommons.org/licenses/by-sa/2.0/",
"id": 5,
"name": "Attribution-ShareAlike License"
},
{
"url": "http://creativecommons.org/licenses/by-nd/2.0/",
"id": 6,
"name": "Attribution-NoDerivs License"
},
{
"url": "http://flickr.com/commons/usage/",
"id": 7,
"name": "No known copyright restrictions"
},
{
"url": "http://www.usa.gov/copyright.shtml",
"id": 8,
"name": "United States Government Work"
}
],
一共有8條,也沒什麼價值,可以忽略。
2.1.3 images中的内容
内容較多,列幾條:
"images": [
{
"license": 4,
"file_name": "000000397133.jpg",
"coco_url": "http://images.cocodataset.org/val2017/000000397133.jpg",
"height": 427,
"width": 640,
"date_captured": "2013-11-14 17:02:52",
"flickr_url": "http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg",
"id": 397133
},
{
"license": 1,
"file_name": "000000037777.jpg",
"coco_url": "http://images.cocodataset.org/val2017/000000037777.jpg",
"height": 230,
"width": 352,
"date_captured": "2013-11-14 20:55:31",
"flickr_url": "http://farm9.staticflickr.com/8429/7839199426_f6d48aa585_z.jpg",
"id": 37777
},
{
"license": 4,
"file_name": "000000252219.jpg",
"coco_url": "http://images.cocodataset.org/val2017/000000252219.jpg",
"height": 428,
"width": 640,
"date_captured": "2013-11-14 22:32:02",
"flickr_url": "http://farm4.staticflickr.com/3446/3232237447_13d84bd0a1_z.jpg",
"id": 252219
},
jupyter中看的效果:
images是一個清單,清單中每一個元素是一個字典,存儲一張圖檔中的資訊。分别就圖檔資訊做出說明:
- license: 沒用
- file_name:圖檔檔案名
- coco_url:沒用
- height:圖檔高
- width:圖檔寬
- date_captured:沒用
- flickr_url沒用
-
id:圖檔的身份ID,每個圖檔特有的
在以上資訊中,height,width,file_name,id這四個值非常重要。
2.1.4 annotations中的内容
該内容較多,列幾條:
"annotations": [
{
"segmentation": [
[
510.66,
423.01,
511.72,
...
423.01,
510.45,
423.01
]
],
"area": 702.1057499999998,
"iscrowd": 0,
"image_id": 289343,
"bbox": [
473.07,
395.93,
38.65,
28.67
],
"category_id": 18,
"id": 1768
},
{
"segmentation": [
[
289.74,
443.39,
302.29,
...
444.27,
291.88,
443.74
]
],
"area": 27718.476299999995,
"iscrowd": 0,
"image_id": 61471,
"bbox": [
272.1,
200.23,
151.97,
279.77
],
"category_id": 18,
"id": 1773
},
......
"segmentation": {
"counts": [
272,
2,
4,
4,
...
16,
228,
8,
10250
],
"size": [
240,
320
]
},
"area": 18419,
"iscrowd": 1,
"image_id": 448263,
"bbox": [
1,
0,
276,
122
],
"category_id": 1,
"id": 900100448263
},
jupyter中效果:
annotations是該JSON檔案中最重要的。annotations是包含多個annotation執行個體的數組,annotation類型本身又包含一系列的字段:
- segmentation:分割标簽
- area:面積
- iscrowd: 是否多個目标
- image_id:與images中的id對應
- bbox:目标框
- category_id:類别
-
id:标注框的一個序号
整體來說annotation的格式如下:
annotation{
"segmentation": RLE or [polygon],
"area" :float,
"iscrowd": 0 or 1,
"imgae_id": int,
"bbox": [x,y,width,height],
"category_id": int,
"id": int
注意,單個對像(iscrowd=0)可能需要多個polygon來表示,比如這個對像在圖像中被擋住;而iscrow=1時(将标注一組對像,比如一群人),segmentation的格式是RLE格式。也就是說,隻要iscrowd=0,那麼segmentation格式就是polygon; 而iscrowd=1,則segmentation格式是RLE。另外不論iscrowd是0還是1,每個對像都會有一個矩型框bbox,提供框的左上角坐标以及矩形框的高和寬。
segmentation polygon格式,可以看到,是一個二維的清單,裡面的一堆數字是像素級分割得到的物體邊緣坐标,從上文中也能看到,坐标是成對出現的;RLE格式如下:
segmentation :
{
'counts': [272, 2, 4, 4, 4, 4, 2, 9, 1, 2, 16, 43, 143, 24......],
'size': [240, 320]
}
COCO資料集的RLE都是uncompressed RLE格式(與之相對的是compact RLE)。 RLE所占位元組的大小和邊界上的像素數量是正相關的。RLE格式帶來的好處就是當基于RLE去計算目标區域的面積以及兩個目标之間的unoin和intersection時會非常有效率。 上面的segmentation中的counts數組和size數組共同組成了這幅圖檔中的分割 mask。其中size是這幅圖檔的寬高,然後在這幅圖像中,每一個像素點要麼在被分割(标注)的目标區域中,要麼在背景中。很明顯這是一個bool量:如果該像素在目标區域中為true那麼在背景中就是False;如果該像素在目标區域中為1那麼在背景中就是0。對于一個240x320的圖檔來說,一共有76800個像素點,根據每一個像素點在不在目标區域中,我們就有了76800個bit,比如像這樣(随便寫的例子,和上文的數組沒關系):00000111100111110…;但是這樣寫很明顯浪費空間,我們直接寫上0或者1的個數不就行了嘛(Run-length encoding),于是就成了54251…,這就是上文中的counts數組。
area指向該segmentation的面積,iscrowd=0表示沒有重疊,iscrowd=1表示有重疊;image_id就是前面images中存儲的id.bbox指向的是物體的标注框;category_id指向的數字代表分類,共有80個分類;id不同于images中的id,這裡的id隻是每個框的身份編号。
2.1.4 categories中的内容
如下:
"categories": [
{
"supercategory": "person",
"id": 1,
"name": "person"
},
{
"supercategory": "vehicle",
"id": 2,
"name": "bicycle"
},
{
"supercategory": "vehicle",
"id": 3,
"name": "car"
},
{
"supercategory": "vehicle",
"id": 4,
"name": "motorcycle"
},
{
"supercategory": "vehicle",
"id": 5,
"name": "airplane"
},
{
"supercategory": "vehicle",
"id": 6,
"name": "bus"
},
{
"supercategory": "vehicle",
"id": 7,
"name": "train"
},
{
"supercategory": "vehicle",
"id": 8,
"name": "truck"
},
{
"supercategory": "vehicle",
"id": 9,
"name": "boat"
},
{
"supercategory": "outdoor",
"id": 10,
"name": "traffic light"
},
{
"supercategory": "outdoor",
"id": 11,
"name": "fire hydrant"
},
{
"supercategory": "outdoor",
"id": 13,
"name": "stop sign"
},
{
"supercategory": "outdoor",
"id": 14,
"name": "parking meter"
},
{
"supercategory": "outdoor",
"id": 15,
"name": "bench"
},
{
"supercategory": "animal",
"id": 16,
"name": "bird"
},
{
"supercategory": "animal",
"id": 17,
"name": "cat"
},
{
"supercategory": "animal",
"id": 18,
"name": "dog"
},
{
"supercategory": "animal",
"id": 19,
"name": "horse"
},
{
"supercategory": "animal",
"id": 20,
"name": "sheep"
},
{
"supercategory": "animal",
"id": 21,
"name": "cow"
},
{
"supercategory": "animal",
"id": 22,
"name": "elephant"
},
{
"supercategory": "animal",
"id": 23,
"name": "bear"
},
{
"supercategory": "animal",
"id": 24,
"name": "zebra"
},
{
"supercategory": "animal",
"id": 25,
"name": "giraffe"
},
{
"supercategory": "accessory",
"id": 27,
"name": "backpack"
},
{
"supercategory": "accessory",
"id": 28,
"name": "umbrella"
},
{
"supercategory": "accessory",
"id": 31,
"name": "handbag"
},
{
"supercategory": "accessory",
"id": 32,
"name": "tie"
},
{
"supercategory": "accessory",
"id": 33,
"name": "suitcase"
},
{
"supercategory": "sports",
"id": 34,
"name": "frisbee"
},
{
"supercategory": "sports",
"id": 35,
"name": "skis"
},
{
"supercategory": "sports",
"id": 36,
"name": "snowboard"
},
{
"supercategory": "sports",
"id": 37,
"name": "sports ball"
},
{
"supercategory": "sports",
"id": 38,
"name": "kite"
},
{
"supercategory": "sports",
"id": 39,
"name": "baseball bat"
},
{
"supercategory": "sports",
"id": 40,
"name": "baseball glove"
},
{
"supercategory": "sports",
"id": 41,
"name": "skateboard"
},
{
"supercategory": "sports",
"id": 42,
"name": "surfboard"
},
{
"supercategory": "sports",
"id": 43,
"name": "tennis racket"
},
{
"supercategory": "kitchen",
"id": 44,
"name": "bottle"
},
{
"supercategory": "kitchen",
"id": 46,
"name": "wine glass"
},
{
"supercategory": "kitchen",
"id": 47,
"name": "cup"
},
{
"supercategory": "kitchen",
"id": 48,
"name": "fork"
},
{
"supercategory": "kitchen",
"id": 49,
"name": "knife"
},
{
"supercategory": "kitchen",
"id": 50,
"name": "spoon"
},
{
"supercategory": "kitchen",
"id": 51,
"name": "bowl"
},
{
"supercategory": "food",
"id": 52,
"name": "banana"
},
{
"supercategory": "food",
"id": 53,
"name": "apple"
},
{
"supercategory": "food",
"id": 54,
"name": "sandwich"
},
{
"supercategory": "food",
"id": 55,
"name": "orange"
},
{
"supercategory": "food",
"id": 56,
"name": "broccoli"
},
{
"supercategory": "food",
"id": 57,
"name": "carrot"
},
{
"supercategory": "food",
"id": 58,
"name": "hot dog"
},
{
"supercategory": "food",
"id": 59,
"name": "pizza"
},
{
"supercategory": "food",
"id": 60,
"name": "donut"
},
{
"supercategory": "food",
"id": 61,
"name": "cake"
},
{
"supercategory": "furniture",
"id": 62,
"name": "chair"
},
{
"supercategory": "furniture",
"id": 63,
"name": "couch"
},
{
"supercategory": "furniture",
"id": 64,
"name": "potted plant"
},
{
"supercategory": "furniture",
"id": 65,
"name": "bed"
},
{
"supercategory": "furniture",
"id": 67,
"name": "dining table"
},
{
"supercategory": "furniture",
"id": 70,
"name": "toilet"
},
{
"supercategory": "electronic",
"id": 72,
"name": "tv"
},
{
"supercategory": "electronic",
"id": 73,
"name": "laptop"
},
{
"supercategory": "electronic",
"id": 74,
"name": "mouse"
},
{
"supercategory": "electronic",
"id": 75,
"name": "remote"
},
{
"supercategory": "electronic",
"id": 76,
"name": "keyboard"
},
{
"supercategory": "electronic",
"id": 77,
"name": "cell phone"
},
{
"supercategory": "appliance",
"id": 78,
"name": "microwave"
},
{
"supercategory": "appliance",
"id": 79,
"name": "oven"
},
{
"supercategory": "appliance",
"id": 80,
"name": "toaster"
},
{
"supercategory": "appliance",
"id": 81,
"name": "sink"
},
{
"supercategory": "appliance",
"id": 82,
"name": "refrigerator"
},
{
"supercategory": "indoor",
"id": 84,
"name": "book"
},
{
"supercategory": "indoor",
"id": 85,
"name": "clock"
},
{
"supercategory": "indoor",
"id": 86,
"name": "vase"
},
{
"supercategory": "indoor",
"id": 87,
"name": "scissors"
},
{
"supercategory": "indoor",
"id": 88,
"name": "teddy bear"
},
{
"supercategory": "indoor",
"id": 89,
"name": "hair drier"
},
{
"supercategory": "indoor",
"id": 90,
"name": "toothbrush"
}
]
分類從1到90,但有些數字是跳過的,是以隻有80個分類。
2.2 關鍵點檢測Object Keypoint檔案格式
COCO資料集中person_keypoints_train2017.json、person_keypoints_val2017.json這兩個檔案就是這種格式。檔案整體格式是:
{
"info": info,
"licenses": [license],
"images": [image],
"annotations": [annotation],
"categories": [category]
}
與instance_val2017.json相同。其中,info、licenses、images這三部分在不同的JSON檔案中是相同的,定義是共享的,不共享的是annotations和category這兩種在不同類型的JSON檔案中是不一樣的。
- images字段下是一個清單,清單長度等同于劃入訓練集(或驗證集)的圖檔數量
- annotatons字段下也是一個清單,清單長度等同地訓練集(或驗證集)中bounding box 的數量,這裡隻有人這個類别的bounding box
-
categories字段下也是一個清單,清單長度等同于資料集類别的數,這裡是1,隻有person這一個類。
相同内容這裡就不再列了,隻列不同的。
2.2.1 annotations中的内容
這個類型中的annotation結構中包含 object instance中annotation所有的字段,再加上兩個額外的字段。新增的keypoints是一個長度為3*k的數組,第一個和第二個元素分别是x和y坐标值,第三個是标志位v,v為0時表示這個關鍵點沒有标注(這種情況下x=y=v=0),v為1時表示這個關鍵點标注了但是不可見(被遮擋了),v為2時表示這個關鍵點标注了同時也可見。num_keypoints表示這個目标上被标注的關鍵點的數量(v>0),比較小的目标上可能就無法标注關鍵點。
annotation{
"segmentation": RLE or [polygon],
"num_keypoints": int,
"area": float,
"iscrowd": 0 or 1,
"keypoints": [x1,y1,v1,...],
"image_id": int,
"bbox": [x,y,width,height],
"category_id": int,
"id": int
}
列舉一個:
{
"segmentation": [
[
492.38,
238.33,
491.91,
234.15,
494.47,
227.65,
495.17,
215.1,
497.02,
199.54,
503.53,
197.22,
503.3,
194.43,
503.3,
190.95,
506.08,
183.51,
511.89,
185.84,
514.21,
187,
514.21,
196.29,
521.88,
200.7,
526.76,
216.03,
520.25,
227.65,
519.56,
234.38,
519.09,
239.49,
519.09,
244.84,
519.56,
246.93,
518.16,
248.32,
516.3,
256.91,
510.03,
256.45,
513.28,
240.89
]
],
"num_keypoints": 13,
"area": 1394.7431,
"iscrowd": 0,
"keypoints": [
508,
192,
2,
510,
191,
2,
506,
191,
2,
512,
192,
2,
503,
192,
1,
515,
202,
2,
499,
202,
2,
524,
214,
2,
497,
215,
2,
516,
226,
2,
496,
224,
2,
511,
232,
2,
497,
230,
2,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0
],
"image_id": 440475,
"bbox": [
491.91,
183.51,
34.85,
73.4
],
"category_id": 1,
"id": 183302
}
可以看到一共有17個關鍵點。
2.2.2 categories中的内容
對于category,相比object instance中的category,新增了兩個字段,keypoints是一個長度為k的數組,包含每個關鍵點的名稱;skeleton定義各關鍵點的連接配接性(比如人的左手腕和左肘就是連接配接的,但是左手腕和右手腕就不是)。目前,COCO的keypoints隻标注了person category (分類為人)。定義如下:
{
"supercategory": str,
"id": int,
"name": str,
"keypoints": [str],
"skeleton": [edge]
}
具體的:
"categories": [
{
"supercategory": "person",
"id": 1,
"name": "person",
"keypoints": [
"nose",
"left_eye",
"right_eye",
"left_ear",
"right_ear",
"left_shoulder",
"right_shoulder",
"left_elbow",
"right_elbow",
"left_wrist",
"right_wrist",
"left_hip",
"right_hip",
"left_knee",
"right_knee",
"left_ankle",
"right_ankle"
],
"skeleton": [
[
16,
14
],
[
14,
12
],
[
17,
15
],
[
15,
13
],
[
12,
13
],
[
6,
12
],
[
7,
13
],
[
6,
7
],
[
6,
8
],
[
7,
9
],
[
8,
10
],
[
9,
11
],
[
2,
3
],
[
1,
2
],
[
1,
3
],
[
2,
4
],
[
3,
5
],
[
4,
6
],
[
5,
7
]
]
}
]
2.3 看圖說話Image Caption檔案格式
captions_train2017.json、captions_val2017.json這兩個檔案就是這種格式。Image Caption這種格式的檔案從頭至尾按照順序分為以下段落,看起來和Object Instance一樣,不過沒有最後的categories字段:
{
"info": info,
"licenses": [license],
"images": [image],
"annotations": [annotation]
}
其中,info、licenses、images這三個結構體/類型 ,在不同的JSON檔案中這三個類型是一樣的,定義是共享的。不共享的是annotations這種結構體,它在不同類型的JSON檔案中是不一樣的。
- annotations: 數量要多于圖檔的數量,這是因為一個圖檔可以有多個場景描述;
2.3.1 annotation中的内容
這個類型中的annotation用來存儲描述圖檔的語句。每個語句描述了對應圖檔的内容,而每個圖檔至少有5個描述語句(有的圖檔更多)。annotation定義如下:
annotation{
"image_id": int,
"id": int,
"caption": str
}
取一個具體片段:
{
"image_id": 546219,
"id": 396378,
"caption": "A large group is sitting together and eating at a restaurant."
},
{
"image_id": 546219,
"id": 397413,
"caption": "The people are gathered at the table for dinner."
},
{
"image_id": 146155,
"id": 397604,
"caption": "Two men standing near a bar drinking together"
},
{
"image_id": 546219,
"id": 399732,
"caption": "A large group of people pose for a photo at dinner."
},
{
"image_id": 546219,
"id": 400023,
"caption": "The diners are enjoying their various beverages with their meals.."
}
這裡的image_id對應images中的Id.
本文參考
- https://zhuanlan.zhihu.com/p/70878433
- https://blog.csdn.net/weixin_38293440/article/details/81196428