此文已由作者左裕初授权网易云社区发布。
欢迎访问网易云社区,了解更多网易技术产品运营经验
Unipack系统要保存一些越南文的包名,由于这是一个上古系统,处理起来一定要采取庖丁解牛的态度
数据库支不支持(数据库层)
浏览器支持不支持(前端,客户端系统)
应用层支持不支持
最重要的当然是我们的字段是采用什么编码的,可以采用如下方法进行查询
For Schemas:
SELECT default_character_set_name FROM information_schema.SCHEMATA WHERE schema_name = "schemaname";
For Tables:
SELECT CCSA.character_set_name FROM information_schema.`TABLES` T, information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA WHERE CCSA.collation_name = T.table_collation AND T.table_schema = "schemaname" AND T.table_name = "tablename";
For Columns:
SELECT character_set_name FROM information_schema.`COLUMNS` WHERE table_schema = "schemaname" AND table_name = "tablename" AND column_name = "columnname";
查出来是:
mysql> SELECT character_set_name FROM information_schema.`COLUMNS` WHERE table_schema = "apkpack51" AND table_name = "apk_params" AND column_name = "param_value";
+--------------------+
| character_set_name |
+--------------------+
| utf8 |
+--------------------+
1 row in set (0.00 sec)
所以就是utf8咯。
下面我们看看MySQL的utf-8包不包含越南文的支持
mysql> show collation;
collation就相当于是指定了一套字符串的比对方法(大小之类的),对于建索引是很有意义的。
结果:
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 |
| utf8_bin | utf8 | 83 | | Yes | 1 |
| utf8_unicode_ci | utf8 | 192 | | Yes | 8 |
| utf8_icelandic_ci | utf8 | 193 | | Yes | 8 |
| utf8_latvian_ci | utf8 | 194 | | Yes | 8 |
| utf8_romanian_ci | utf8 | 195 | | Yes | 8 |
| utf8_slovenian_ci | utf8 | 196 | | Yes | 8 |
| utf8_polish_ci | utf8 | 197 | | Yes | 8 |
| utf8_estonian_ci | utf8 | 198 | | Yes | 8 |
| utf8_spanish_ci | utf8 | 199 | | Yes | 8 |
| utf8_swedish_ci | utf8 | 200 | | Yes | 8 |
| utf8_turkish_ci | utf8 | 201 | | Yes | 8 |
| utf8_czech_ci | utf8 | 202 | | Yes | 8 |
| utf8_danish_ci | utf8 | 203 | | Yes | 8 |
| utf8_lithuanian_ci | utf8 | 204 | | Yes | 8 |
| utf8_slovak_ci | utf8 | 205 | | Yes | 8 |
| utf8_spanish2_ci | utf8 | 206 | | Yes | 8 |
| utf8_roman_ci | utf8 | 207 | | Yes | 8 |
| utf8_persian_ci | utf8 | 208 | | Yes | 8 |
| utf8_esperanto_ci | utf8 | 209 | | Yes | 8 |
| utf8_hungarian_ci | utf8 | 210 | | Yes | 8 |
| utf8_sinhala_ci | utf8 | 211 | | Yes | 8 |
| utf8_german2_ci | utf8 | 212 | | Yes | 8 |
| utf8_croatian_ci | utf8 | 213 | | Yes | 8 |
| utf8_unicode_520_ci | utf8 | 214 | | Yes | 8 |
| utf8_vietnamese_ci | utf8 | 215 | | Yes | 8 |
| utf8_general_mysql500_ci | utf8 | 223 | | Yes | 1 |
所以毫无疑问是支持越南文的。
这是请求的返回头,可以看出是utf-8编码的,所以从这个角度来说,客户端也是没有问题的
HTTP/1.1 200 OK
Date: Tue, 08 Dec 2015 09:11:09 GMT
Server: Apache/2.4.12 (Ubuntu)
X-Powered-By: PHP/5.6.11-1ubuntu3.1
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Content-Length: 11
Keep-Alive: timeout=5, max=91
Connection: Keep-Alive
Content-Type: application/text;charset=UTF-8
最麻烦的地方在于应用层,以一个restful接口为例,从数据库取出数据与数据库连接采取的编码有关,生成的json数据又和应用层采用的编码方法有关。确定这个根源的办法其实也很简单,采用控制变量法即可。最后我们发现在建立连接的时候有这么一句:
```php $this->conn = ADONewConnection('mysql');
this−>conn−>PConnect(db_host, dbuser,db_pass, $db_name);
$this->conn->Execute('SET NAMES GBK'); # 数据库使用GBK编码 ```
所以说,为什么越南文插入失败呢,就是因为数据库连接采用了GBK编码,而GBK编码无法啊编码越南文,所以也就无法以正确的姿态插入数据库。
utf-8(浏览器) -> gbk(应用层) -> utf-8(数据库)
为了不牵一发而动全身,我们还是采用了保守的方法,多取了一个utf-8的连接,这样在有需要的时候使用这个连接。
Web的界面乱码是常见的问题,一般来说,只要抓住这三个点,感觉总能找到问题的根源。
免费领取验证码、内容安全、短信发送、直播点播体验包及云服务器等套餐
更多网易技术、产品、运营经验分享请点击。