laitimes

Python automation: Automatically extract pictures in word, save children's textbooks, and solve educational anxiety

author:Artificial intelligence learns from people

In modern society, parents are paying more and more attention to their children's education. With the summer vacation approaching, my friend Xiao Wang is facing a common anxiety of many parents: how to prepare for their children who are about to enter fifth grade. Xiao Wang's children are smart and lively and curious about new knowledge, but Xiao Wang is worried that there are some problems with the electronic version of the textbook, which will affect the child's learning experience.

It turned out that the electronic version of the textbook that Xiao Wang found was the Word version, which came with a large number of pictures [there is only one picture on each page of word]. These pictures were originally intended to assist in teaching, but when they were printed, they found that there were obvious black frames on the edges, which was very unsightly. Xiao Wang didn't want his children to be resistant to learning because of these problems, so he thought of my friend who can program to see if he could solve this problem with technical means.

I thought about it for a while and decided that I could write a program in Python that would extract images from a Word document and save them in JPG format. In this way, Xiao Wang can print out the extracted pictures to give the child a clearer and more beautiful learning material.

Python automation: Automatically extract pictures in word, save children's textbooks, and solve educational anxiety

Core code

import docx
import os, re
word_path = 'E:\\code\\plan_work\\Demo.docx'
result_path = "./img_result"
# doc = docx.Document(word_path)
# dict_rel = doc.part._rels
# for rel in dict_rel:
#     rel = dict_rel[rel]
#     if "image" in rel.target_ref:
#         if not os.path.exists(result_path):
#             os.makedirs(result_path)
#         img_name = re.findall("/(.*)", rel.target_ref)[0]
#         word_name = os.path.splitext(word_path)[0]
#         if os.sep in word_name:
#             new_name = word_name.split('\\')[-1]
#         else:
#             new_name = word_name.split('/')[-1]
#         img_name = f'{new_name}-'+'.'+f'{img_name}'
# with open(f'{result_path}/{img_name}', "wb") as f:
#     f.write(rel.target_part.blob)


def get_pictures(word_path, result_path):
    """
    图片提取
    :param word_path: word路径
    :return: 
    """
    try:
        doc = docx.Document(word_path)
        dict_rel = doc.part._rels
        for rel in dict_rel:
            rel = dict_rel[rel]
            if "image" in rel.target_ref:
                if not os.path.exists(result_path):
                    os.makedirs(result_path)
                img_name = re.findall("/(.*)", rel.target_ref)[0]
                word_name = os.path.splitext(word_path)[0]
                if os.sep in word_name:
                    new_name = word_name.split('\\')[-1]
                else:
                    new_name = word_name.split('/')[-1]
                img_name = f'{new_name}-'+'.'+f'{img_name}'
                with open(f'{result_path}/{img_name}', "wb") as f:
                    f.write(rel.target_part.blob)
    except:
        pass


if __name__ == '__main__':


    #获取文件夹下的word文档列表,路径自定义


    # os.chdir("D:\Demo")
    # spam=os.listdir(os.getcwd())
    # for i in spam:
        # get_pictures(str(i),os.getcwd())
    get_pictures(word_path,result_path)           
Python automation: Automatically extract pictures in word, save children's textbooks, and solve educational anxiety

Read on