è§£å³python读åMySQL䏿乱ç çé®é¢
- åè¨
- å符ç¼ç ç±»å忥ç
-
-
- å符ç¼ç ç±»å
- ç¼ç ç±»åæ¥çæ¹æ³
-
- pythonä¸çç¼ç åè§£ç
- MySQLä¸latin1ç¼ç æ°æ®çè¯»åæ¹å¼
-
-
- é¢å¯¹é®é¢
- é¢æç®æ
- è§£å³æ¹æ³
-
åè¨
项ç®ä¸ç»å¸¸è¦è¯»åMySQLæ°æ®åºï¼pythonä¸çpymysql彿°å¯è¿è¡ç¸å ³è¯»åæä½ï¼ä½å ¶ä¸ä¾ç¶ä¼éå°åç§åæ ·çé®é¢ï¼å æ¤ä¸åæ»ç»ã
å符ç¼ç ç±»å忥ç
å符ç¼ç ç±»å
常è§å符ç¼ç æ åå æ¬ï¼å½å æ ågbkãå½é æ å ISO-8859 ãå½é ç»ä¸æ å unicodeçã
gbk å å« gb3212ãgb18030 çç±»åï¼
ISO-8859 å å« latin1(ä¹ç§° ISO-8859-1)ãlatin2 çç±»åï¼
unicode å å« utf-8ãutf-16 çç±»åã
python3ä¸çé»è®¤ç¼ç ç±»åæ¯
utf-8
ï¼å¦æéè¦è¯»åå ¶ä»ç±»åå°±éè¦ç¹å«å£°æå¯¹åºçç¼ç ç±»åã
ç¼ç ç±»åæ¥çæ¹æ³
pythonä¸å¯ä»¥éè¿
chardet
æ¥çå符类åã
import chardet
for line in test:
print(chardet.detect(line))
**注æï¼
- ** 妿æ¥é
ï¼å说æ line æ¯stræ ¼å¼ï¼éè¦å 转ç 为ç¸åºçåç¬¦ä¸²æ ¼å¼ï¼ä¾å¦utf-8çã
TypeError: Expected object of type bytes or bytearray, got: <class 'str'>
å¾å°çæ¯å ³äºç¼ç ç±»åç0-1ç置信度ï¼ä¸è¬æ¥è¯´è¾¾å°99%置信度çç»æå°±æ¯è¾åç¡®äºã
charset.dectect()
å ä¸ºæ°æ®çç¼ç æ ¼å¼å为stråbyteä¸¤ç§æ ¼å¼ï¼è
charset
åªè½ç¨æ¥æ¥çbyteç±»åçæ°æ®ã
pythonä¸çç¼ç åè§£ç
pythonä¸å符转ç å¯ä»¥éè¿
encode
å
decode
å®ç°ã
encode
å°str转å为byteæ ¼å¼ï¼
decode
å°byte转å为stræ ¼å¼ã
è¿æ ·æ¥çç¼ç å°±å¯ä»¥éç¨ä»¥ä¸æ¹å¼ï¼
import chardet
for line in test:
lines = line.encode('encoding')
print(chardet.detect(lines))
MySQLä¸latin1ç¼ç æ°æ®çè¯»åæ¹å¼
è¿ä¸ªé®é¢å°æ°äºå¥½å 天ï¼ç½ä¸æ¥äºä¸å°å 容ï¼åç°æåè¿æ¯å¾éè¿ç¼ç åè§£ç è§£å³ã
é¢å¯¹é®é¢
ä¸äºèçMySQLåºå¨å»ºè¡¨æ¶è®¾ç½®ç¼ç æ¹å¼ä¸ºé»è®¤çlatin1ç±»åï¼è¿å¯¼è´åå ¥å ¶ä¸ç䏿å符å¨
SELECT
è¯»åæ°æ®æ¶ä¼åºç°ä¹±ç ã
é¢æç®æ
æ£ç¡®è¯»å䏿å符并è¾åºã
è§£å³æ¹æ³
- å¨ç¨pythonè¯»åæ°æ®åºæ¶è®¾ç½®è¯»åç±»å
:charset='latin1'
import pymysql
conn = pymysql.connect(host='host', port='port', user='user_name', passwd='password', db='database', charset='charset')
- å¨è¯»å对åºè¡¨åå°ä¸æå符转ç åå ¥txtæä»¶ï¼
path = './test.txt'
with open(path, 'w') as f:
for line in title:
lines = line.encode('latin1').decode('gb18030', 'ignore')
f.write(str(lines)+'\n')
è¿é
decode
ä¸éç¨çç¼ç ç±»åæ¯gb18030ï¼è¿æ¯å 为 gb18030 æ¯ gbk çä¸ç§æ©å±ç±»åï¼å 嫿´å¤çæ±åï¼åæ¶å®å ¨æ¯æ unicodeã
æ¤å¤ï¼
decode
ä¸è¿å å ¥äºåæ° ignore ï¼è¿æ¯å ä¸ºå¦ææ²¡æå¿½ç¥ç¹æ®å符ç说æä¼åºç°å¦ä¸éè¯¯ï¼æ¤æ¶ä¼åç°è½å¤å¾å°ä¸é¨åæ¾ç¤ºæ£ç¡®çæ°æ®ä½ä¸å ¨ï¼
UnicodeDecodeError: 'gb18030' codec can't decode byte 0xba in position 49: incomplete multibyte sequence
OKï¼è¿æ ·å°±è§£å³äºè¯»åMySQLæ°æ®åºä¸æä¹±ç çé®é¢äºã