VOC数据集批量处理——提取需要的类别

2023-04-24 16:35:10

VOC数据集含有20个类别，根据不同的任务和场景，我们可能用不到所有的数据集，此时我们可以从所有的数据集中提取出我们想要的类别。

提取过程：

首先我们需要根据xml文件中的类别信息来判断是否是需要的类别，然后提取出相应的xml文件，其次我们根据xml文件的名字确定图片的名称，然后将对应图片提取出来。

所有的操作的路径，注意修改为自己的路径，代码如下：

#!/usr/bin/env python
# -*- encoding: utf-8 -*-
import os
import xml.etree.ElementTree as ET
import shutil

#根据自己的情况修改相应的路径
ann_filepath = 'Annotations/'
img_filepath = 'JPEGImages/'
img_savepath = 'test/JPEGImages/'
ann_savepath = 'test/Annotations/'
if not os.path.exists(img_savepath):
    os.mkdir(img_savepath)

if not os.path.exists(ann_savepath):
    os.mkdir(ann_savepath)

#这是VOC数据集中所有类别
# classes = ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle',
#             'bus', 'car', 'cat', 'chair', 'cow', 'diningtable',
#              'dog', 'horse', 'motorbike', 'pottedplant',
#           'sheep', 'sofa', 'train', 'person','tvmonitor']

classes = ['aeroplane', 'bicycle', 'tvmonitor']    #这里是需要提取的类别

def save_annotation(file):
    tree = ET.parse(ann_filepath + '/' + file)
    root = tree.getroot()
    result = root.findall("object")
    bool_num = 0
    for obj in result:
        if obj.find("name").text not in classes:
            root.remove(obj)
        else:
            bool_num = 1
    if bool_num:
        tree.write(ann_savepath + file)
        return True
    else:
        return False

def save_images(file):
    name_img = img_filepath + os.path.splitext(file)[0] + ".jpg"
    shutil.copy(name_img, img_savepath)
    #文本文件名自己定义，主要用于生成相应的训练或测试的txt文件
    with open('test/test.txt', 'a') as file_txt:
        file_txt.write(os.path.splitext(file)[0])
        file_txt.write("\n")
    return True

if __name__ == '__main__':
    for f in os.listdir(ann_filepath):
        if save_annotation(f):
            save_images(f)

如果需要使用YOLOv4训练自己的数据集，可以参考：https://blog.csdn.net/ldm_666/article/details/108196877

如果需要批量测试并保存图片，可以参考：https://blog.csdn.net/ldm_666/article/details/109284190

VOC数据集批量处理——提取需要的类别

继续阅读

来自python的【条件控制/语句循环/break/continue/else/pass】一、条件控制二、语句循环

无法解析的外部符号 wmain，该符号在函数 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink导出用例转换工具(XML2Excel)

YAML简介和PyYAML安全操作YAML支持的类型YAML的优点：yaml的基本语法python操作

Small tricks

libsvm for python 安装

学习软件测试基础测试第七天

Zeppelin 配置访问 REST APIApache Zeppelin Configuration REST API

【Torch】最简洁logging使用指南

27. Remove Element(列表)题目代码

Cloud Studio初体验

使用 ctypes 进行 Python 和 C 的混合编程

【python】【数据处理】画多维数据分布图

【python】netconf协议对接管理设备

「Python 网络自动化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 网络设备

在python中创建excel并写入