python中文标點轉英文标點

2023-04-14 21:28:01

unicode有個normalize的過程，按照unicode标準，有C、D、KC、KD四種，KC會将大部分的中文标點符号轉化為對應的英文，還會将全角字元轉化為相應的半角字元，比如：

import unicodedata
t = u'中國，中文，标點符号！你好？１２３４５＠＃【】+=-（）'
t2 = unicodedata.normalize('NFKC', t)
'''
>>> print t2
中國,中文,标點符号!你好[email protected]#【】+=-()
'''

作者：靈劍
連結：https://www.zhihu.com/question/37720196/answer/115870233
來源：知乎
著作權歸作者所有。商業轉載請聯系作者獲得授權，非商業轉載請注明出處。

with open('F:/src.txt', 'r', encoding='utf-8') as f:
    res = unicodedata.normalize('NFKC', f.read())
    with open('F:/dst.txt', 'w', encoding='utf-8') as ff:
        ff.write(res)

輸入字元串或者txt檔案路徑進行處理

def punctuation_mend(string):
    import unicodedata
    import os

    table = {ord(f):ord(t) for f,t in zip(
        u'，。！？【】（）％＃＠＆１２３４５６７８９０“”‘’',
        u',.!?[]()%#@&1234567890""\'\'')}
    if os.path.isfile(string):
        with open(string, 'r', encoding='utf-8') as f:
            res = unicodedata.normalize('NFKC', f.read())
            res = res.translate(table)
        with open(string, 'w', encoding='utf-8') as f:
            f.write(res)
    else:
        res = unicodedata.normalize('NFKC', string)
        res = res.translate(table)
        return res

print(punctuation_mend('【】（）％＃＠＆“”'))
punctuation_mend('F:/z.txt')

python中文标點轉英文标點

輸入字元串或者txt檔案路徑進行處理

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入