要確認PHP的輸出是否為
UTF-8,有許多步驟要檢查,不過已經有人
整理好了,順便筆記一下吧:
1. Update your database tables to use UTF-8
CREATE DATABASE db_name
CHARACTER SET utf8
DEFAULT CHARACTER SET utf8
COLLATE utf8_general_ci
DEFAULT COLLATE utf8_general_ci
;
ALTER DATABASE db_name
CHARACTER SET utf8
DEFAULT CHARACTER SET utf8
COLLATE utf8_general_ci
DEFAULT COLLATE utf8_general_ci
;
ALTER TABLE tbl_name
DEFAULT CHARACTER SET utf8
COLLATE utf8_general_ci
;
2. Install the mbstring extension for PHP
3. Configure mbstring
$ vim /path/to/php.ini
mbstring.language = Neutral ; Set default language to Neutral(UTF-8) (default)
mbstring.internal_encoding = UTF-8 ; Set default internal encoding to UTF-8
mbstring.encoding_translation = On ; HTTP input encoding translation is enabled
mbstring.http_input = auto ; Set HTTP input character set dectection to auto
mbstring.http_output = UTF-8 ; Set HTTP output encoding to UTF-8
mbstring.detect_order = auto ; Set default character encoding detection order to auto
mbstring.substitute_character = none ; Do not print invalid characters
default_charset = UTF-8 ; Default character set for auto content type header
4. Deal with non-multibyte-safe functions in PHP
$ vim /path/to/php.ini
mbstring.func_overload = 7 ; All non-multibyte-safe functions are overloaded with the mbstring alternatives
change
functions
mail() -> mb_send_mail()
strlen() -> mb_strlen()
strpos() -> mb_strpos()
strrpos() -> mb_strrpos()
substr() -> mb_substr()
strtolower() -> mb_strtolower()
strtoupper() -> mb_strtoupper()
substr_count() -> mb_substr_count()
ereg() -> mb_ereg()
eregi() -> mb_eregi()
ereg_replace() -> mb_ereg_replace()
eregi_replace() -> mb_eregi_replace()
split() -> mb_split()
5. Sort out HTML entities
add wrapper
/**
* Encodes HTML safely for UTF-8. Use instead of htmlentities.
*
* @param string $var
* @return string
*/
function html_encode($var)
{
return htmlentities($var, ENT_QUOTES, 'UTF-8') ;
}
6. Check content-type headers
modify output
header('Content-type: text/html; charset=UTF-8') ;
and
<meta http-equiv="Content-type" value="text/html; charset=UTF-8" />
7. Update email scripts
use UTF-8 encoding in text file, and
mb_encode_mimeheader() with content
8. Check binary files and strings
reference:
*
PHP UTF-8 cheatsheet