è°è°Unicodeç¼ç ï¼ç®è¦è§£éUCSãUTFãBMPãBOMçåè¯
转èª: ä¼æ¨ä¸ä¸é¸é¸£å¤å¤ fmddlmyy.home4u.china.com
è¿æ¯ä¸ç¯ç¨åºååç»ç¨åºåç趣å³è¯»ç©ãæè°è¶£å³æ¯æå¯ä»¥æ¯è¾è½»æ¾å°äºè§£ä¸äºåæ¥ä¸æ¸ æ¥çæ¦å¿µï¼å¢è¿ç¥è¯ï¼ç±»ä¼¼äºæRPG游æçå级ãæ´çè¿ç¯æç« çå¨æºæ¯ä¸¤ä¸ªé®é¢ï¼
- é®é¢ä¸ï¼
-
使ç¨Windowsè®°äºæ¬çâå¦å为âï¼å¯ä»¥å¨GBKãUnicodeãUnicode big endianåUTF-8è¿å ç§ç¼ç æ¹å¼é´ç¸äºè½¬æ¢ãåæ ·æ¯txtæ件ï¼Windowsæ¯ææ ·è¯å«ç¼ç æ¹å¼çå¢ï¼
æ å¾æ©åå°±åç°UnicodeãUnicode big endianåUTF-8ç¼ç çtxtæ件çå¼å¤´ä¼å¤åºå 个åèï¼åå«æ¯FFãFEï¼Unicodeï¼,FEãFFï¼Unicode big endianï¼,EFãBBãBFï¼UTF-8ï¼ãä½è¿äºæ è®°æ¯åºäºä»ä¹æ åå¢ï¼
é®é¢äºï¼ - æ è¿å¨ç½ä¸çå°ä¸ä¸ªConvertUTF.cï¼å®ç°äºUTF-32ãUTF-16åUTF-8è¿ä¸ç§ç¼ç æ¹å¼çç¸äºè½¬æ¢ã对äºUnicode(UCS2)ã GBKãUTF-8è¿äºç¼ç æ¹å¼ï¼æåæ¥å°±äºè§£ãä½è¿ä¸ªç¨åºè®©ææäºç³æ¶ï¼æ³ä¸èµ·æ¥UTF-16åUCS2æä»ä¹å ³ç³»ã
æ¥äºæ¥ç¸å ³èµæï¼æ»ç®å°è¿äºé®é¢å¼æ¸ æ¥äºï¼é¡ºå¸¦ä¹äºè§£äºä¸äºUnicodeçç»èãåæä¸ç¯æç« ï¼éç»æè¿ç±»ä¼¼çé®çæåãæ¬æå¨åä½æ¶å°½éåå°éä¿ææï¼ä½è¦æ±è¯»è ç¥éä»ä¹æ¯åèï¼ä»ä¹æ¯åå è¿å¶ã
0ãbig endianålittle endian
big endianålittle endianæ¯CPUå¤çå¤åèæ°çä¸åæ¹å¼ãä¾å¦âæ±âåçUnicodeç¼ç æ¯6C49ãé£ä¹åå°æ件éæ¶ï¼ç©¶ç«æ¯å°6Cåå¨åé¢ï¼è¿æ¯å°49åå¨å é¢ï¼å¦æå°6Cåå¨åé¢ï¼å°±æ¯big endianãå¦æå°49åå¨åé¢ï¼å°±æ¯little endianã
âendianâè¿ä¸ªè¯åºèªãæ ¼åä½æ¸¸è®°ããå°äººå½çå æå°±æºäºå鸡èæ¶æ¯ç©¶ç«ä»å¤§å¤´(Big-Endian)æ²å¼è¿æ¯ä»å°å¤´(Little-Endian)æ²å¼ï¼ç±æ¤æ¾åçè¿å 次åä¹±ï¼ä¸ä¸ªçå¸éäºå½ï¼å¦ä¸ä¸ªä¸¢äºçä½ã
æ们ä¸è¬å°endianç¿»è¯æâåèåºâï¼å°big endianålittle endian称ä½â大尾âåâå°å°¾âã
1ãå符ç¼ç ãå ç ï¼é¡ºå¸¦ä»ç»æ±åç¼ç
åç¬¦å¿ é¡»ç¼ç åæè½è¢«è®¡ç®æºå¤çã计ç®æºä½¿ç¨ç缺çç¼ç æ¹å¼å°±æ¯è®¡ç®æºçå ç ãæ©æç计ç®æºä½¿ç¨7ä½çASCIIç¼ç ï¼ä¸ºäºå¤çæ±åï¼ç¨åºå设计äºç¨äºç®ä½ä¸æçGB2312åç¨äºç¹ä½ä¸æçbig5ã
GB2312(1980å¹´)ä¸å ±æ¶å½äº7445个å符ï¼å æ¬6763个æ±åå682ä¸ªå ¶å®ç¬¦å·ãæ±ååºçå ç èå´é«åèä»B0-F7ï¼ä½åèä»A1-FEï¼å ç¨çç ä½æ¯72*94=6768ãå ¶ä¸æ5个空ä½æ¯D7FA-D7FEã
GB2312æ¯æçæ±å太å°ã1995å¹´çæ±åæ©å±è§èGBK1.0æ¶å½äº21886个符å·ï¼å®å为æ±ååºåå¾å½¢ç¬¦å·åºãæ±ååºå æ¬21003个å符ã
ä»ASCIIã GB2312å°GBKï¼è¿äºç¼ç æ¹æ³æ¯åä¸å ¼å®¹çï¼å³åä¸ä¸ªå符å¨è¿äºæ¹æ¡ä¸æ»æ¯æç¸åçç¼ç ï¼åé¢çæ åæ¯ææ´å¤çå符ãå¨è¿äºç¼ ç ä¸ï¼è±æåä¸æå¯ä»¥ç»ä¸å°å¤çãåºåä¸æç¼ç çæ¹æ³æ¯é«åèçæé«ä½ä¸ä¸º0ãæç §ç¨åºåç称å¼ï¼GB2312ãGBKé½å±äºååèå符é (DBCS)ã
2000å¹´çGB18030æ¯å代GBK1.0çæ£å¼å½å®¶æ åã该æ åæ¶å½äº27484个æ±åï¼åæ¶è¿æ¶å½äºèæãèæãç»´ å¾å°æç主è¦çå°æ°æ°æ æåãä»æ±ååæ±ä¸è¯´ï¼GB18030å¨GB13000.1ç20902个æ±åçåºç¡ä¸å¢å äºCJKæ©å±Aç6582个æ±åï¼Unicodeç 0x3400-0x4db5ï¼ï¼ä¸å ±æ¶å½äº27484个æ±åã
CJKå°±æ¯ä¸æ¥é©çææãUnicode为äºèçç ä½ï¼å°ä¸æ¥é©ä¸å½è¯è¨ä¸çæåç»ä¸ç¼ç ãGB13000.1å°±æ¯ISO/IEC 10646-1çä¸æçï¼ç¸å½äºUnicode 1.1ã
GB18030çç¼ç éç¨ååèãååèå4åèæ¹æ¡ãå ¶ä¸ååèãååèåGBKæ¯å®å ¨å ¼å®¹çã4åèç¼ç çç ä½å°±æ¯æ¶å½äºCJKæ©å±Aç6582ä¸ªæ± åã ä¾å¦ï¼UCSç0x3400å¨GB18030ä¸çç¼ç åºè¯¥æ¯8139EF30ï¼UCSç0x3401å¨GB18030ä¸çç¼ç åºè¯¥æ¯8139EF31ã
微软æä¾äºGB18030çå级å ï¼ä½è¿ä¸ªå级å åªæ¯æä¾äºä¸å¥æ¯æCJKæ©å±Aç6582个æ±åçæ°åä½ï¼æ°å®ä½-18030ï¼å¹¶ä¸æ¹åå ç ãWindows çå ç ä»ç¶æ¯GBKã
è¿éè¿æä¸äºç»èï¼
- GB2312çåæè¿æ¯åºä½ç ï¼ä»åºä½ç å°å ç ï¼éè¦å¨é«åèåä½åèä¸åå«å ä¸A0ã
- 对 äºä»»ä½å符ç¼ç ï¼ç¼ç åå ç顺åºæ¯ç±ç¼ç æ¹æ¡æå®çï¼ä¸endianæ å ³ãä¾å¦GBKçç¼ç åå æ¯åèï¼ç¨ä¸¤ä¸ªåè表示ä¸ä¸ªæ±åã è¿ä¸¤ä¸ªåèç顺åºæ¯åºå®çï¼ä¸åCPUåèåºçå½±åãUTF-16çç¼ç åå æ¯wordï¼ååèï¼ï¼wordä¹é´ç顺åºæ¯ç¼ç æ¹æ¡æå®çï¼wordå é¨ç åèæåæä¼åå°endiançå½±åãåé¢è¿ä¼ä»ç»UTF-16ã
- GB2312 ç两个åèçæé«ä½é½æ¯1ãä½ç¬¦åè¿ä¸ªæ¡ä»¶çç ä½åªæ128*128=16384个ãæ以GBKåGB18030çä½åèæé«ä½é½å¯è½ä¸æ¯1ãä¸è¿è¿ä¸å½± åDBCSå符æµç解æï¼å¨è¯»åDBCSå符æµæ¶ï¼åªè¦éå°é«ä½ä¸º1çåèï¼å°±å¯ä»¥å°ä¸ä¸¤ä¸ªåèä½ä¸ºä¸ä¸ªååèç¼ç ï¼èä¸ç¨ç®¡ä½åèçé«ä½æ¯ä»ä¹ã
2ãUnicodeãUCSåUTF
åé¢æå°ä»ASCIIãGB2312ãGBKå°GB18030çç¼ç æ¹æ³æ¯åä¸å ¼å®¹çãèUnicodeåªä¸ASCIIå ¼å®¹ï¼æ´åç¡®å°è¯´ï¼æ¯ä¸ISO-8859-1å ¼å®¹ï¼ï¼ä¸GBç ä¸å ¼å®¹ãä¾å¦âæ±âåçUnicodeç¼ç æ¯6C49ï¼èGBç æ¯BABAã
Unicode ä¹æ¯ä¸ç§å符ç¼ç æ¹æ³ï¼ä¸è¿å®æ¯ç±å½é ç»ç»è®¾è®¡ï¼å¯ä»¥å®¹çº³å ¨ä¸çææè¯è¨æåçç¼ç æ¹æ¡ãUnicodeçå¦åæ¯"Universal Multiple-Octet Coded Character Set"ï¼ç®ç§°ä¸ºUCSãUCSå¯ä»¥çä½æ¯"Unicode Character Set"ç缩åã
æ ¹æ®ç»´åºç¾ç§å ¨ä¹¦(http://zh.wikipedia.org/wiki/)çè®°è½½ï¼åå²ä¸åå¨ä¸¤ä¸ªè¯å¾ç¬ç«è®¾è®¡ Unicodeçç»ç»ï¼å³å½é æ ååç»ç»ï¼ISOï¼åä¸ä¸ªè½¯ä»¶å¶é åçåä¼ï¼unicode.orgï¼ãISOå¼åäºISO 10646项ç®ï¼Unicodeåä¼å¼åäºUnicode项ç®ã
å¨1991å¹´ååï¼åæ¹é½è®¤è¯å°ä¸çä¸éè¦ä¸¤ä¸ªä¸å ¼å®¹çå符éãäºæ¯å®ä»¬å¼å§å并åæ¹çå·¥ä½ææï¼å¹¶ä¸ºåç«ä¸ä¸ªåä¸ç¼ç 表èååå·¥ä½ãä»Unicode2.0å¼å§ï¼Unicode项ç®éç¨äºä¸ISO 10646-1ç¸åçååºååç ã
ç®å两个项ç®ä»é½åå¨ï¼å¹¶ç¬ç«å°å ¬å¸åèªçæ åãUnicodeåä¼ç°å¨çææ°çæ¬æ¯2005å¹´çUnicode 4.1.0ãISOçææ°æ åæ¯ISO 10646-3:2003ã
UCS åªæ¯è§å®å¦ä½ç¼ç ï¼å¹¶æ²¡æè§å®å¦ä½ä¼ è¾ãä¿åè¿ä¸ªç¼ç ãä¾å¦âæ±âåçUCSç¼ç æ¯6C49ï¼æå¯ä»¥ç¨4个asciiæ°åæ¥ä¼ è¾ãä¿åè¿ä¸ªç¼ç ï¼ä¹å¯ä»¥ç¨ utf-8ç¼ç :3个è¿ç»çåèE6 B1 89æ¥è¡¨ç¤ºå®ãå ³é®å¨äºéä¿¡åæ¹é½è¦è®¤å¯ãUTF-8ãUTF-7ãUTF-16é½æ¯è¢«å¹¿æ³æ¥åçæ¹æ¡ãUTF-8çä¸ä¸ªç¹å«ç好å¤æ¯å®ä¸ISO- 8859-1å®å ¨å ¼å®¹ãUTFæ¯âUCS Transformation Formatâç缩åã
IETFçRFC2781å RFC3629以RFCçä¸è´¯é£æ ¼ï¼æ¸ æ°ãæå¿«åä¸å¤±ä¸¥è°¨å°æè¿°äºUTF-16åUTF-8çç¼ç æ¹æ³ãææ»æ¯è®°ä¸å¾IETFæ¯Internet Engineering Task Forceç缩åãä½IETFè´è´£ç»´æ¤çRFCæ¯Internetä¸ä¸åè§èçåºç¡ã
2.1ãå ç åcode page
ç®åWindowsçå æ ¸å·²ç»æ¯æUnicodeå符éï¼è¿æ ·å¨å æ ¸ä¸å¯ä»¥æ¯æå ¨ä¸çææçè¯è¨æåãä½æ¯ç±äºç°æç大éç¨åºåææ¡£é½éç¨äºæç§ç¹å®è¯è¨çç¼ç ï¼ä¾å¦GBKï¼Windowsä¸å¯è½ä¸æ¯æç°æçç¼ç ï¼èå ¨é¨æ¹ç¨Unicodeã
Windows使ç¨ä»£ç 页(code page)æ¥éåºå个å½å®¶åå°åºãcode pageå¯ä»¥è¢«ç解为åé¢æå°çå ç ãGBK对åºçcode pageæ¯CP936ã
微软ä¹ä¸ºGB18030å®ä¹äºcode pageï¼CP54936ãä½æ¯ç±äºGB18030æä¸é¨å4åèç¼ç ï¼èWindowsç代ç 页åªæ¯æååèåååèç¼ç ï¼æ以è¿ä¸ªcode pageæ¯æ æ³çæ£ä½¿ç¨çã
3ãUCS-2ãUCS-4ãBMP
UCSæ两ç§æ ¼å¼ï¼UCS-2åUCS-4ã顾åæä¹ï¼UCS-2å°±æ¯ç¨ä¸¤ä¸ªåèç¼ç ï¼UCS-4å°±æ¯ç¨4个åèï¼å®é ä¸åªç¨äº31ä½ï¼æé«ä½å¿ 须为0ï¼ç¼ç ãä¸é¢è®©æ们åä¸äºç®åçæ°å¦æ¸¸æï¼
UCS-2æ2^16=65536个ç ä½ï¼UCS-4æ2^31=2147483648个ç ä½ã
UCS -4æ ¹æ®æé«ä½ä¸º0çæé«åèåæ2^7=128个groupãæ¯ä¸ªgroupåæ ¹æ®æ¬¡é«åèå为256个planeãæ¯ä¸ªplaneæ ¹æ®ç¬¬3个åèå为 256è¡ (rows)ï¼æ¯è¡å å«256个cellsãå½ç¶åä¸è¡çcellsåªæ¯æåä¸ä¸ªåèä¸åï¼å ¶ä½é½ç¸åã
group 0çplane 0被称ä½Basic Multilingual Plane, å³BMPãæè 说UCS-4ä¸ï¼é«ä¸¤ä¸ªåè为0çç ä½è¢«ç§°ä½BMPã
å°UCS-4çBMPå»æåé¢ç两个é¶åèå°±å¾å°äºUCS-2ãå¨UCS-2ç两个åèåå ä¸ä¸¤ä¸ªé¶åèï¼å°±å¾å°äºUCS-4çBMPãèç®åçUCS-4è§èä¸è¿æ²¡æä»»ä½å符被åé å¨BMPä¹å¤ã
4ãUTFç¼ç
UTF-8å°±æ¯ä»¥8ä½ä¸ºåå 对UCSè¿è¡ç¼ç ãä»UCS-2å°UTF-8çç¼ç æ¹å¼å¦ä¸ï¼
UCS-2ç¼ç (16è¿å¶) | UTF-8 åèæµ(äºè¿å¶) |
0000 - 007F | 0xxxxxxx |
0080 - 07FF | 110xxxxx 10xxxxxx |
0800 - FFFF | 1110xxxx 10xxxxxx 10xxxxxx |
ä¾å¦âæ±âåçUnicodeç¼ç æ¯6C49ã6C49å¨0800-FFFFä¹é´ï¼æ以è¯å®è¦ç¨3åè模æ¿äºï¼1110xxxx 10xxxxxx 10xxxxxxãå°6C49åæäºè¿å¶æ¯ï¼0110 110001 001001ï¼ ç¨è¿ä¸ªæ¯ç¹æµä¾æ¬¡ä»£æ¿æ¨¡æ¿ä¸çxï¼å¾å°ï¼11100110 10110001 10001001ï¼å³E6 B1 89ã
读è å¯ä»¥ç¨è®°äºæ¬æµè¯ä¸ä¸æ们çç¼ç æ¯å¦æ£ç¡®ãéè¦æ³¨æï¼UltraEditå¨æå¼utf-8ç¼ç çææ¬æ件æ¶ä¼èªå¨è½¬æ¢ä¸ºUTF-16ï¼å¯è½äº§çæ··æ·ãä½ å¯ä»¥å¨è®¾ç½®ä¸å ³æè¿ä¸ªé项ãæ´å¥½çå·¥å ·æ¯Hex Workshopã
UTF -16以16ä½ä¸ºåå 对UCSè¿è¡ç¼ç ã对äºå°äº0x10000çUCSç ï¼UTF-16ç¼ç å°±çäºUCSç 对åºç16ä½æ 符å·æ´æ°ã对äºä¸ å°äº0x10000çUCSç ï¼å®ä¹äºä¸ä¸ªç®æ³ãä¸è¿ç±äºå®é 使ç¨çUCS2ï¼æè UCS4çBMPå¿ ç¶å°äº0x10000ï¼æ以就ç®åèè¨ï¼å¯ä»¥è®¤ä¸º UTF-16åUCS-2åºæ¬ç¸åãä½UCS-2åªæ¯ä¸ä¸ªç¼ç æ¹æ¡ï¼UTF-16å´è¦ç¨äºå®é çä¼ è¾ï¼æ以就ä¸å¾ä¸èèåèåºçé®é¢ã
5ãUTFçåèåºåBOM
UTF -8以åè为ç¼ç åå ï¼æ²¡æåèåºçé®é¢ãUTF-16以两个åè为ç¼ç åå ï¼å¨è§£éä¸ä¸ªUTF-16ææ¬åï¼é¦å è¦å¼æ¸ æ¥æ¯ä¸ªç¼ç åå çå èåºãä¾å¦âå¥âçUnicodeç¼ç æ¯594Eï¼âä¹âçUnicodeç¼ç æ¯4E59ãå¦ææ们æ¶å°UTF-16åèæµâ594Eâï¼é£ä¹è¿æ¯âå¥â è¿æ¯âä¹âï¼
Unicodeè§èä¸æ¨èçæ è®°åè顺åºçæ¹æ³æ¯BOMãBOMä¸æ¯âBill Of MaterialâçBOM表ï¼èæ¯Byte Order MarkãBOMæ¯ä¸ä¸ªæç¹å°èªæçæ³æ³ï¼
å¨UCS ç¼ç ä¸æä¸ä¸ªå«å"ZERO WIDTH NO-BREAK SPACE"çå符ï¼å®çç¼ç æ¯FEFFãèFFFEå¨UCSä¸æ¯ä¸åå¨çå符ï¼æ以ä¸åºè¯¥åºç°å¨å®é ä¼ è¾ä¸ãUCSè§è建议æ们å¨ä¼ è¾åèæµåï¼å ä¼ è¾ å符"ZERO WIDTH NO-BREAK SPACE"ã
è¿æ ·å¦ææ¥æ¶è æ¶å°FEFFï¼å°±è¡¨æè¿ä¸ªåèæµæ¯Big-Endiançï¼å¦ææ¶å°FFFEï¼å°±è¡¨æè¿ä¸ªåèæµæ¯Little-Endiançãå æ¤å符"ZERO WIDTH NO-BREAK SPACE"å被称ä½BOMã
UTF -8ä¸éè¦BOMæ¥è¡¨æåè顺åºï¼ä½å¯ä»¥ç¨BOMæ¥è¡¨æç¼ç æ¹å¼ãå符"ZERO WIDTH NO-BREAK SPACE"çUTF-8ç¼ç æ¯EF BB BFï¼è¯»è å¯ä»¥ç¨æ们åé¢ä»ç»çç¼ç æ¹æ³éªè¯ä¸ä¸ï¼ãæ以å¦ææ¥æ¶è æ¶å°ä»¥EF BB BFå¼å¤´çåèæµï¼å°±ç¥éè¿æ¯UTF-8ç¼ç äºã
Windowså°±æ¯ä½¿ç¨BOMæ¥æ è®°ææ¬æ件çç¼ç æ¹å¼çã
6ãè¿ä¸æ¥çåèèµæ
æ¬æ主è¦åèçèµææ¯ "Short overview of ISO-IEC 10646 and Unicode" (http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html)ã
æè¿æ¾äºä¸¤ç¯çä¸å»ä¸éçèµæï¼ä¸è¿å 为æå¼å§ççé®é½æ¾å°äºçæ¡ï¼æ以就没æçï¼
- "Understanding Unicode A general introduction to the Unicode Standard" (http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=IWS-Chapter04a)
- "Character set encoding basics Understanding character set encodings and legacy encodings" (http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=IWS-Chapter03)
æåè¿UTF-8ãUCS-2ãGBKç¸äºè½¬æ¢ç软件å ï¼å æ¬ä½¿ç¨Windows APIåä¸ä½¿ç¨Windows APIççæ¬ã以åææ¶é´çè¯ï¼æä¼æ´çä¸ä¸æ¾å°æç个人主页ä¸(http://fmddlmyy.home4u.china.com)ã
ææ¯æ³æ¸ æ¥ææé®é¢åæå¼å§åè¿ç¯æç« çï¼å以为ä¸ä¼å¿å°±è½å好ã没æ³å°èèæªè¾åæ¥è¯ç»èè±è´¹äºå¾é¿æ¶é´ï¼ç«ç¶ä»ä¸å1:30åå°9:00ãå¸ææ读è è½ä»ä¸åçã
éå½1 å说说åºä½ç ãGB2312ãå ç å代ç 页
æçæå对æç« ä¸è¿å¥è¯è¿æçé®ï¼
âGB2312çåæè¿æ¯åºä½ç ï¼ä»åºä½ç å°å ç ï¼éè¦å¨é«åèåä½åèä¸åå«å ä¸A0ãâ
æå详ç»è§£éä¸ä¸ï¼
âGB2312 çåæâæ¯æå½å®¶1980å¹´çä¸ä¸ªæ åãä¸å人æ°å ±åå½å½å®¶æ å ä¿¡æ¯äº¤æ¢ç¨æ±åç¼ç å符é åºæ¬é GB 2312-80ããè¿ä¸ªæ åç¨ä¸¤ä¸ªæ°æ¥ç¼ç æ±ååä¸æ符å·ã第ä¸ä¸ªæ°ç§°ä¸ºâåºâï¼ç¬¬äºä¸ªæ°ç§°ä¸ºâä½âãæ以ä¹ç§°ä¸ºåºä½ç ã1-9åºæ¯ä¸æ符å·ï¼16-55 åºæ¯ä¸çº§æ±åï¼56-87åºæ¯äºçº§æ±åãç°å¨Windowsä¹è¿æåºä½è¾å ¥æ³ï¼ä¾å¦è¾å ¥1601å¾å°âåâãï¼è¿ä¸ªåºä½è¾å ¥æ³å¯ä»¥èªå¨è¯å«16è¿å¶ç GB2312å10è¿å¶çåºä½ç ï¼ä¹å°±æ¯è¯´è¾å ¥B0A1åæ ·ä¼å¾å°âåâãï¼
å ç æ¯ææä½ç³»ç»å é¨çå符ç¼ç ãæ©ææä½ç³»ç»çå ç æ¯ä¸è¯è¨ç¸å ³çãç°å¨çWindowså¨ç³»ç»å é¨æ¯æUnicodeï¼ç¶åç¨ä»£ç 页éåºåç§è¯è¨ï¼âå ç âçæ¦å¿µå°±æ¯è¾æ¨¡ç³äºã微软ä¸è¬å°ç¼ºç代ç 页æå®çç¼ç 说ææ¯å ç ã
å ç è¿ä¸ªè¯æ±ï¼å¹¶æ²¡æä»ä¹å®æ¹çå®ä¹ï¼ä»£ç 页ä¹åªæ¯å¾®è½¯è¿ä¸ªå ¬å¸çå«æ³ãä½ä¸ºç¨åºåï¼æ们åªè¦ç¥éå®ä»¬æ¯ä»ä¹ä¸è¥¿ï¼æ²¡æå¿ è¦è¿å¤å°èè¯è¿äºåè¯ã
Windowsä¸æ缺ç代ç 页çæ¦å¿µï¼å³ç¼ºçç¨ä»ä¹ç¼ç æ¥è§£éå符ãä¾å¦Windowsçè®°äºæ¬æå¼äºä¸ä¸ªææ¬æ件ï¼éé¢çå 容æ¯åèæµï¼BAãBAãD7ãD6ãWindowsåºè¯¥å»æä¹è§£éå®å¢ï¼
æ¯ æç §Unicodeç¼ç 解éãè¿æ¯æç §GBK解éãè¿æ¯æç §BIG5解éï¼è¿æ¯æç §ISO8859-1å»è§£éï¼å¦ææGBKå»è§£éï¼å°±ä¼å¾å°âæ± åâ两个åãæç §å ¶å®ç¼ç 解éï¼å¯è½æ¾ä¸å°å¯¹åºçå符ï¼ä¹å¯è½æ¾å°é误çå符ãæè°âé误âæ¯æä¸ææ¬ä½è çæ¬æä¸ç¬¦ï¼è¿æ¶å°±äº§çäºä¹±ç ã
çæ¡æ¯Windowsæç §å½åç缺ç代ç 页å»è§£éææ¬æ件éçåèæµã缺ç代ç 页å¯ä»¥éè¿æ§å¶é¢æ¿çåºåé项设置ãè®°äºæ¬çå¦å为ä¸æä¸é¡¹ANSIï¼å ¶å®å°±æ¯æç §ç¼ºç代ç 页çç¼ç æ¹æ³ä¿åã
Windowsçå ç æ¯Unicodeï¼å®å¨ææ¯ä¸å¯ä»¥åæ¶æ¯æå¤ä¸ªä»£ç 页ãåªè¦æ件è½è¯´æèªå·±ä½¿ç¨ä»ä¹ç¼ç ï¼ç¨æ·åå®è£ äºå¯¹åºç代ç 页ï¼Windowså°±è½æ£ç¡®æ¾ç¤ºï¼ä¾å¦å¨HTMLæ件ä¸å°±å¯ä»¥æå®charsetã
æ çHTMLæ件ä½è ï¼ç¹å«æ¯è±æä½è ï¼è®¤ä¸ºä¸çä¸ææ人é½ä½¿ç¨è±æï¼å¨æ件ä¸ä¸æå®charsetãå¦æä»ä½¿ç¨äº0x80-0xffä¹é´çåç¬¦ï¼ ä¸æWindowsåæç §ç¼ºççGBKå»è§£éï¼å°±ä¼åºç°ä¹±ç ãè¿æ¶åªè¦å¨è¿ä¸ªhtmlæ件ä¸å ä¸æå®charsetçè¯å¥ï¼ä¾å¦ï¼
<meta http-equiv="Content-Type" content="text/html; charset=ISO8859-1">
å¦æåä½è 使ç¨ç代ç 页åISO8859-1å ¼å®¹ï¼å°±ä¸ä¼åºç°ä¹±ç äºã
å 说åºä½ç ï¼åçåºä½ç æ¯1601ï¼åæ16è¿å¶æ¯0x10,0x01ãè¿å计ç®æºå¹¿æ³ä½¿ç¨çASCIIç¼ç å²çªã为äºå ¼å®¹00-7fçASCII ç¼ç ï¼æ们å¨åºä½ç çé«ãä½åèä¸åå«å ä¸A0ãè¿æ ·âåâçç¼ç å°±æ为B0A1ãæ们å°å è¿ä¸¤ä¸ªA0çç¼ç ä¹ç§°ä¸ºGB2312ç¼ç ï¼è½ç¶GB2312ç åææ ¹æ¬æ²¡æå°è¿ä¸ç¹ã
Â