blog » PHP / PKI » PHP的UTF16转UTF8代码

PHP的UTF16转UTF8代码

今天碰到个客户的CA证书用的都是中文信息,相当不爽,而且还是UNICODE,明显WINDOWS上的程序处理的。想了想办法,UNICODE标准使用的是UTF16格式,只要把UTF16转到UTF8即可。用openssl 获取证书内容,可以发现如下信息:

Subject: C=N-V\xFD, ST=mYl_w\x01, …….

怎么解呢,从老外写的JSON函数里掏了个解码函数:

  1. /**
  2. * convert a string from one UTF-16 char to one UTF-8 char
  3. *
  4. * Normally should be handled by mb_convert_encoding, but
  5. * provides a slower PHP-only method for installations
  6. * that lack the multibye string extension.
  7. *
  8. * @param    string  $utf16  UTF-16 character
  9. * @return   string  UTF-8 character
  10. * @access   private
  11. */
  12. function utf162utf8($utf16)
  13. {
  14.     // oh please oh please oh please oh please oh please
  15.     if(function_exists('mb_convert_encoding')) {
  16.         return mb_convert_encoding($utf16, 'UTF-8', 'UTF-16');
  17.     }
  18.  
  19.     $bytes = (ord($utf16{0}) << 8) | ord($utf16{1});
  20.  
  21.     switch(true) {
  22.         case ((0x7F & $bytes) == $bytes):
  23.             // this case should never be reached, because we are in ASCII range
  24.             // see: http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
  25.             return chr(0x7F & $bytes);
  26.  
  27.         case (0x07FF & $bytes) == $bytes:
  28.             // return a 2-byte UTF-8 character
  29.             // see: http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
  30.             return chr(0xC0 | (($bytes >> 6) & 0x1F))
  31.                 . chr(0x80 | ($bytes & 0x3F));
  32.  
  33.         case (0xFFFF & $bytes) == $bytes:
  34.             // return a 3-byte UTF-8 character
  35.             // see: http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
  36.             return chr(0xE0 | (($bytes >> 12) & 0x0F))
  37.                 . chr(0x80 | (($bytes >> 6) & 0x3F))
  38.                 . chr(0x80 | ($bytes & 0x3F));
  39.     }
  40. // ignoring UTF-32 for now, sorry
  41.     return '';
  42. }

然后就很简单了,由于UTF16格式是固定双字节的,那只要2个一处理即可:

  1. $s = "N-V\xfd";
  2. for($i = 0,$step = 2;$i<strlen($s);$i+=2) {
  3.     $t = substr($s,$i,$step);
  4.     $ret = utf162utf8($t);
  5.     echo $ret;
  6. }

成功得到结果,中国,即C的值,国家了~

相关文章:


RSS 2.0 | leave a response | trackback

发表评论