python中文标点转英文标点

2023-04-14 21:28:01

unicode有个normalize的过程，按照unicode标准，有C、D、KC、KD四种，KC会将大部分的中文标点符号转化为对应的英文，还会将全角字符转化为相应的半角字符，比如：

import unicodedata
t = u'中国，中文，标点符号！你好？１２３４５＠＃【】+=-（）'
t2 = unicodedata.normalize('NFKC', t)
'''
>>> print t2
中国,中文,标点符号!你好[email protected]#【】+=-()
'''

作者：灵剑
链接：https://www.zhihu.com/question/37720196/answer/115870233
来源：知乎
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

with open('F:/src.txt', 'r', encoding='utf-8') as f:
    res = unicodedata.normalize('NFKC', f.read())
    with open('F:/dst.txt', 'w', encoding='utf-8') as ff:
        ff.write(res)

输入字符串或者txt文件路径进行处理

def punctuation_mend(string):
    import unicodedata
    import os

    table = {ord(f):ord(t) for f,t in zip(
        u'，。！？【】（）％＃＠＆１２３４５６７８９０“”‘’',
        u',.!?[]()%#@&1234567890""\'\'')}
    if os.path.isfile(string):
        with open(string, 'r', encoding='utf-8') as f:
            res = unicodedata.normalize('NFKC', f.read())
            res = res.translate(table)
        with open(string, 'w', encoding='utf-8') as f:
            f.write(res)
    else:
        res = unicodedata.normalize('NFKC', string)
        res = res.translate(table)
        return res

print(punctuation_mend('【】（）％＃＠＆“”'))
punctuation_mend('F:/z.txt')

python中文标点转英文标点

输入字符串或者txt文件路径进行处理

继续阅读

来自python的【条件控制/语句循环/break/continue/else/pass】一、条件控制二、语句循环

无法解析的外部符号 wmain，该符号在函数 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink导出用例转换工具(XML2Excel)

YAML简介和PyYAML安全操作YAML支持的类型YAML的优点：yaml的基本语法python操作

Small tricks

libsvm for python 安装

学习软件测试基础测试第七天

Zeppelin 配置访问 REST APIApache Zeppelin Configuration REST API

【Torch】最简洁logging使用指南

27. Remove Element(列表)题目代码

Cloud Studio初体验

使用 ctypes 进行 Python 和 C 的混合编程

【python】【数据处理】画多维数据分布图

【python】netconf协议对接管理设备

「Python 网络自动化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 网络设备

在python中创建excel并写入