2012年9月25日 星期二

[PHP] PHP UTF-8 cheatsheet

要確認PHP的輸出是否為UTF-8,有許多步驟要檢查,不過已經有人整理好了,順便筆記一下吧:

1. Update your database tables to use UTF-8
CREATE DATABASE db_name
 CHARACTER SET utf8
 DEFAULT CHARACTER SET utf8
 COLLATE utf8_general_ci
 DEFAULT COLLATE utf8_general_ci
 ;

ALTER DATABASE db_name
 CHARACTER SET utf8
 DEFAULT CHARACTER SET utf8
 COLLATE utf8_general_ci
 DEFAULT COLLATE utf8_general_ci
 ;

ALTER TABLE tbl_name
 DEFAULT CHARACTER SET utf8
 COLLATE utf8_general_ci
 ;

2. Install the mbstring extension for PHP

3. Configure mbstring
$ vim /path/to/php.ini
mbstring.language             = Neutral ; Set default language to Neutral(UTF-8) (default)
mbstring.internal_encoding    = UTF-8  ; Set default internal encoding to UTF-8
mbstring.encoding_translation = On  ;  HTTP input encoding translation is enabled
mbstring.http_input           = auto  ; Set HTTP input character set dectection to auto
mbstring.http_output          = UTF-8  ; Set HTTP output encoding to UTF-8
mbstring.detect_order         = auto  ; Set default character encoding detection order to auto
mbstring.substitute_character = none  ; Do not print invalid characters
default_charset               = UTF-8  ; Default character set for auto content type header

4. Deal with non-multibyte-safe functions in PHP
$ vim /path/to/php.ini
mbstring.func_overload = 7 ; All non-multibyte-safe functions are overloaded with the mbstring alternatives
change functions
mail()          -> mb_send_mail()
strlen()        -> mb_strlen() 
strpos()        -> mb_strpos()
strrpos()       -> mb_strrpos()
substr()        -> mb_substr()
strtolower()    -> mb_strtolower()
strtoupper()    -> mb_strtoupper()
substr_count()  -> mb_substr_count()
ereg()          -> mb_ereg()
eregi()         -> mb_eregi()
ereg_replace()  -> mb_ereg_replace()
eregi_replace() -> mb_eregi_replace() 
split()         -> mb_split()

5. Sort out HTML entities
add wrapper
/**
 * Encodes HTML safely for UTF-8. Use instead of htmlentities.
 *
 * @param string $var
 * @return string
 */
function html_encode($var)
{
 return htmlentities($var, ENT_QUOTES, 'UTF-8') ;
}

6. Check content-type headers
modify output
header('Content-type: text/html; charset=UTF-8') ;
and
<meta http-equiv="Content-type" value="text/html; charset=UTF-8" />

7. Update email scripts
use UTF-8 encoding in text file, and mb_encode_mimeheader() with content

8. Check binary files and strings


reference:
* PHP UTF-8 cheatsheet

沒有留言: