天天看點

不用iconv函數實作UTF-8編碼轉換GB2312的PHP函數

 不用iconv函數實作UTF-8編碼轉換GB2312的PHP函數

如果使用 iconv() 函數轉換編碼就相比比較簡單了,不過很多虛拟主機裡并不支援這個元件,我在網上

找半天,才找到一個gb2312轉utf-8的方法,但不能逆向轉換。

這個函數如下:

function gb2utf8($gbstr) {

 global $CODETABLE;

 if(trim($gbstr)=="") return $gbstr;

 if(empty($CODETABLE)){

  $filename = dirname(__FILE__)."/gb2312-utf8.table";

  $fp = fopen($filename,"r");

  while ($l = fgets($fp,15))

  { $CODETABLE[hexdec(substr($l, 0, 6))] = substr($l, 7, 6); }

  fclose($fp);

 }

 $ret = "";

 $utf8 = "";

 while ($gbstr) {

  if (ord(substr($gbstr, 0, 1)) > 127) {

   $thisW = substr($gbstr, 0, 2);

   $gbstr = substr($gbstr, 2, strlen($gbstr));

   $utf8 = "";

   @$utf8 = u2utf8(hexdec($CODETABLE[hexdec(bin2hex($thisW)) - 0x8080]));

   if($utf8!=""){

    for ($i = 0;$i < strlen($utf8);$i += 3)

     $ret .= chr(substr($utf8, $i, 3));

   }

  }

  else

  {

   $ret .= substr($gbstr, 0, 1);

   $gbstr = substr($gbstr, 1, strlen($gbstr));

  }

 }

 return $ret;

}

//Unicode轉utf8

function u2utf8($c) {

 for ($i = 0;$i < count($c);$i++)

  $str = "";

 if ($c < 0x80) {

  $str .= $c;

 } else if ($c < 0x800) {

  $str .= (0xC0 | $c >> 6);

  $str .= (0x80 | $c & 0x3F);

 } else if ($c < 0x10000) {

  $str .= (0xE0 | $c >> 12);

  $str .= (0x80 | $c >> 6 & 0x3F);

  $str .= (0x80 | $c & 0x3F);

 } else if ($c < 0x200000) {

  $str .= (0xF0 | $c >> 18);

  $str .= (0x80 | $c >> 12 & 0x3F);

  $str .= (0x80 | $c >> 6 & 0x3F);

  $str .= (0x80 | $c & 0x3F);

 }

 return $str;

}

因為gb2312都是雙位元組的,是以轉換為utf-8就相對比較簡單,但反之有很麻煩了,我嘗試了一下:

這樣

function utf82gb($utfstr)

{

 global $UC2GBTABLE;

 $okstr = "";

 if(trim($utfstr)=="") return $utfstr;

 if(empty($UC2GBTABLE)){

  $filename = dirname(__FILE__)."/gb2312-utf8.table";

  $fp = fopen($filename,"r");

  while($l = fgets($fp,15))

  { $UC2GBTABLE[hexdec(substr($l, 7, 6))] = hexdec(substr($l, 0, 6));}

  fclose($fp);

 }

 $ulen = strlen($utfstr);

 for($i=0;$i<$ulen;$i++)

 {

  if(ord($utfstr[$i])<0x81) $okstr .= $utfstr[$i];

  else

  {

   if($ulen>$i+2)

   {

    $utfc = substr($utfstr,$i,3);

    $c = "";

    @$c = dechex($UC2GBTABLE[utf82u_3($utfc)]+0x8080);

    if($c!=""){

       $okstr .= chr(hexdec($c[0].$c[1])).chr(hexdec($c[2].$c[3]));

    }

   }

   else

   { $okstr .= $utfstr[$i]; }

  }

  }

  $okstr = trim($okstr);

  return $okstr;

}

function utf82u_3($c)

{

      $n = (ord($c[0]) & 0x1f) << 12;

      $n += (ord($c[1]) & 0x3f) << 6;

      $n += ord($c[2]) & 0x3f;

      return $n;

}

按這種方法,大部份字元也算是能轉換成功的了,不過總是有點不妥之處,我把程式改成這樣子:

function utf82gb($utfstr)

{

 global $UC2GBTABLE;

 $okstr = "";

 if(trim($utfstr)=="") return $utfstr;

 if(empty($UC2GBTABLE)){

  $filename = dirname(__FILE__)."/gb2312-utf8.table";

  $fp = fopen($filename,"r");

  while($l = fgets($fp,15))

  { $UC2GBTABLE[hexdec(substr($l, 7, 6))] = hexdec(substr($l, 0, 6));}

  fclose($fp);

 }

 $okstr = "";

 $utfstr = urlencode($utfstr);

 $ulen = strlen($utfstr);

 for($i=0;$i<$ulen;$i++)

 {

  if($utfstr[$i]=="%")

  {

   if($ulen>$i+2){

    $hexnext = hexdec("0x".substr($utfstr,$i+1,2));

    if($hexnext<127){

     $okstr .= chr($hexnext);

     $i = $i+2;

    }

    else{

     if($ulen>=$i+9){

      $hexnext = substr($utfstr,$i+1,8);

      $c = "";

      @$c = dechex($UC2GBTABLE[url_utf2u($hexnext)]+0x8080);

      if($c!=""){

        $okstr .= chr(hexdec($c[0].$c[1])).chr(hexdec($c[2].$c[3]));

      }

      $i = $i+8;

     }

    }

   }

   else

   { $okstr .= $utfstr[$i]; }

  }

  else if($utfstr[$i]=="+")

   $okstr .= " ";

  else

   $okstr .= $utfstr[$i];

 }

 $okstr = trim($okstr);

 return $okstr;

}

//三位元組的URL編碼轉成的utf8字元轉為unicode編碼

function url_utf2u($c)

{

 $utfc = "";

 $cs = split("%",$c);

 for($i=0;$i<count($cs);$i++){

  $utfc .= chr(hexdec("0x".$cs[$i]));

 }

 $n = (ord($utfc[0]) & 0x1f) << 12;

  $n += (ord($utfc[1]) & 0x3f) << 6;

  $n += ord($utfc[2]) & 0x3f;

 return $n;

}

一測試,發現完全OK,而且速度居然比上一個方法要快,我真是搞不懂這是什麼原因了

誰要 gb2312-utf8.table 這個檔案請加我的QQ 2500875 IT柏拉圖 或與 1877000 泡泡 聯系

轉載位址:http://prato.bokele.com/?ArticleID=19533