1. Update your database tables to use UTF-8
CREATE DATABASE db_name CHARACTER SET utf8 DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT COLLATE utf8_general_ci ; ALTER DATABASE db_name CHARACTER SET utf8 DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT COLLATE utf8_general_ci ; ALTER TABLE tbl_name DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci ;
2. Install the mbstring extension for PHP
3. Configure mbstring
$ vim /path/to/php.ini
mbstring.language = Neutral ; Set default language to Neutral(UTF-8) (default) mbstring.internal_encoding = UTF-8 ; Set default internal encoding to UTF-8 mbstring.encoding_translation = On ; HTTP input encoding translation is enabled mbstring.http_input = auto ; Set HTTP input character set dectection to auto mbstring.http_output = UTF-8 ; Set HTTP output encoding to UTF-8 mbstring.detect_order = auto ; Set default character encoding detection order to auto mbstring.substitute_character = none ; Do not print invalid characters default_charset = UTF-8 ; Default character set for auto content type header
4. Deal with non-multibyte-safe functions in PHP
$ vim /path/to/php.ini
mbstring.func_overload = 7 ; All non-multibyte-safe functions are overloaded with the mbstring alternativeschange functions
mail() -> mb_send_mail() strlen() -> mb_strlen() strpos() -> mb_strpos() strrpos() -> mb_strrpos() substr() -> mb_substr() strtolower() -> mb_strtolower() strtoupper() -> mb_strtoupper() substr_count() -> mb_substr_count() ereg() -> mb_ereg() eregi() -> mb_eregi() ereg_replace() -> mb_ereg_replace() eregi_replace() -> mb_eregi_replace() split() -> mb_split()
5. Sort out HTML entities
add wrapper
/** * Encodes HTML safely for UTF-8. Use instead of htmlentities. * * @param string $var * @return string */ function html_encode($var) { return htmlentities($var, ENT_QUOTES, 'UTF-8') ; }
6. Check content-type headers
modify output
header('Content-type: text/html; charset=UTF-8') ;and
<meta http-equiv="Content-type" value="text/html; charset=UTF-8" />
7. Update email scripts
use UTF-8 encoding in text file, and mb_encode_mimeheader() with content
8. Check binary files and strings
reference:
* PHP UTF-8 cheatsheet
沒有留言:
張貼留言