浅谈MySQL字符集

 

Preface

 

    MySQL use character set & collation to organize the different charater.It provides a flexible way in setting individual character set on a database,a table even on a single column of table.Each character set has a series of collations with one default collation.We can generally see the character set in MySQL as the combination of code page & character encoding.

    In the early version of MySQL(eg. version 5.5) ,latin1 is the default character set which does not support Chinese characters.There're some other commonly used character set such as GBK,UTF-8.MySQL 5.7 chooses UTF-8 as default character set nowadays in order to support more characters of different languages.

    Messy code is a general issue about character set.It always occur in our MySQL databases if we do not use character rules appropriately.Worse,it leads to data loss in some cases what is really a big trouble we should avoid.

 

Introduce

 

    MySQL provides a lot of parameters to specify character set in various dimentionality.We should know clearly about the principle and function of each parameter to avert underlying messy code issue.Let's see details about it.

 

Procedure

 

Check the character set supported by MySQL(version 5.7).

 1 (root@localhost mysql3306.sock)[(none)]>show character set;
 2 +----------+---------------------------------+---------------------+--------+
 3 | Charset  | Description                     | Default collation   | Maxlen |
 4 +----------+---------------------------------+---------------------+--------+
 5 | big5     | Big5 Traditional Chinese        | big5_chinese_ci     |      2 |
 6 | dec8     | DEC West European               | dec8_swedish_ci     |      1 |
 7 | cp850    | DOS West European               | cp850_general_ci    |      1 |
 8 | hp8      | HP West European                | hp8_english_ci      |      1 |
 9 | koi8r    | KOI8-R Relcom Russian           | koi8r_general_ci    |      1 |
10 | latin1   | cp1252 West European            | latin1_swedish_ci   |      1 |
11 | latin2   | ISO 8859-2 Central European     | latin2_general_ci   |      1 |
12 | swe7     | 7bit Swedish                    | swe7_swedish_ci     |      1 |
13 | ascii    | US ASCII                        | ascii_general_ci    |      1 |
14 | ujis     | EUC-JP Japanese                 | ujis_japanese_ci    |      3 |
15 | sjis     | Shift-JIS Japanese              | sjis_japanese_ci    |      2 |
16 | hebrew   | ISO 8859-8 Hebrew               | hebrew_general_ci   |      1 |
17 | tis620   | TIS620 Thai                     | tis620_thai_ci      |      1 |
18 | euckr    | EUC-KR Korean                   | euckr_korean_ci     |      2 |
19 | koi8u    | KOI8-U Ukrainian                | koi8u_general_ci    |      1 |
20 | gb2312   | GB2312 Simplified Chinese       | gb2312_chinese_ci   |      2 |
21 | greek    | ISO 8859-7 Greek                | greek_general_ci    |      1 |
22 | cp1250   | Windows Central European        | cp1250_general_ci   |      1 |
23 | gbk      | GBK Simplified Chinese          | gbk_chinese_ci      |      2 |
24 | latin5   | ISO 8859-9 Turkish              | latin5_turkish_ci   |      1 |
25 | armscii8 | ARMSCII-8 Armenian              | armscii8_general_ci |      1 |
26 | utf8     | UTF-8 Unicode                   | utf8_general_ci     |      3 |
27 | ucs2     | UCS-2 Unicode                   | ucs2_general_ci     |      2 |
28 | cp866    | DOS Russian                     | cp866_general_ci    |      1 |
29 | keybcs2  | DOS Kamenicky Czech-Slovak      | keybcs2_general_ci  |      1 |
30 | macce    | Mac Central European            | macce_general_ci    |      1 |
31 | macroman | Mac West European               | macroman_general_ci |      1 |
32 | cp852    | DOS Central European            | cp852_general_ci    |      1 |
33 | latin7   | ISO 8859-13 Baltic              | latin7_general_ci   |      1 |
34 | utf8mb4  | UTF-8 Unicode                   | utf8mb4_general_ci  |      4 |
35 | cp1251   | Windows Cyrillic                | cp1251_general_ci   |      1 |
36 | utf16    | UTF-16 Unicode                  | utf16_general_ci    |      4 |
37 | utf16le  | UTF-16LE Unicode                | utf16le_general_ci  |      4 |
38 | cp1256   | Windows Arabic                  | cp1256_general_ci   |      1 |
39 | cp1257   | Windows Baltic                  | cp1257_general_ci   |      1 |
40 | utf32    | UTF-32 Unicode                  | utf32_general_ci    |      4 |
41 | binary   | Binary pseudo charset           | binary              |      1 |
42 | geostd8  | GEOSTD8 Georgian                | geostd8_general_ci  |      1 |
43 | cp932    | SJIS for Windows Japanese       | cp932_japanese_ci   |      2 |
44 | eucjpms  | UJIS for Windows Japanese       | eucjpms_japanese_ci |      3 |
45 | gb18030  | China National Standard GB18030 | gb18030_chinese_ci  |      4 |
46 +----------+---------------------------------+---------------------+--------+
47 41 rows in set (0.00 sec)
48 
49 //There're 41 results of the supported character set.
50 //Each character set has a default collation.
51 //Maxlen is the max bytes of corresponding character set(eg. utf8mb4 supports 4 bytes).

 

**Check character set parameters of current MySQL server.**

 1 (root@localhost mysql3306.sock)[(none)]>show variables like 'character%';
 2 +--------------------------+----------------------------------------------------------------+
 3 | Variable_name            | Value                                                          |
 4 +--------------------------+----------------------------------------------------------------+
 5 | character_set_client     | utf8                                                           |
 6 | character_set_connection | utf8                                                           |
 7 | character_set_database   | utf8                                                           |
 8 | character_set_filesystem | binary                                                         |
 9 | character_set_results    | utf8                                                           |
10 | character_set_server     | utf8                                                           |
11 | character_set_system     | utf8                                                           |
12 | character_sets_dir       | /usr/local/mysql-5.7.21-linux-glibc2.12-x86_64/share/charsets/ |
13 +--------------------------+----------------------------------------------------------------+
14 8 rows in set (0.00 sec)
15 
16 //character_set_client:It's used by client when connect to servers for requesting data.
17 //character_set_connection:It's used for those literals not have a character set introducer for conversion. 
18 //character_set_database:It's used by default database.The value of "character_set_server" will be inherited if it is not specified.
19 //character_set_filesystem:It's used to interpret string literals refer to file names.
20 //character_set_results:It's used to return query results to the client.
21 //character_set_server:It's the default character set of server.
22 //character_set_system:It's used by server for storing identifiers,the value is utf8 forever.
23 //character_sets_dir:It's the directory where contains the xml files of installed character set.

 

The relationship of  above character set parameters shows below.

图片 1

 

 ***Check collation parameters of current MySQL server.***

 1 (root@localhost mysql3306.sock)[(none)]>show variables like 'collation%';
 2 +----------------------+-----------------+
 3 | Variable_name        | Value           |
 4 +----------------------+-----------------+
 5 | collation_connection | utf8_general_ci |
 6 | collation_database   | utf8_general_ci |
 7 | collation_server     | utf8_general_ci |
 8 +----------------------+-----------------+
 9 3 rows in set (0.01 sec)
10 
11 //collation_connection:The collation of connection character set.
12 //collation_database:The collation of default database.It will inherite the value of "collation_server" if not specified.
13 //collation_server:The default collation of server.
14 //"ci" means Case Insensitive.

 

**Case of change character set from utf8 to latin1 with "set names ...;".**

 1 (root@localhost mysql3306.sock)[(none)]>s
 2 --------------
 3 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
 4 
 5 Connection id:        7
 6 Current database:    
 7 Current user:        root@localhost
 8 SSL:            Not in use
 9 Current pager:        stdout
10 Using outfile:        ''
11 Using delimiter:    ;
12 Server version:        5.7.21-log MySQL Community Server (GPL)
13 Protocol version:    10
14 Connection:        Localhost via UNIX socket
15 Server characterset:    utf8
16 Db     characterset:    utf8
17 Client characterset:    utf8
18 Conn.  characterset:    utf8
19 UNIX socket:        /tmp/mysql3306.sock
20 Uptime:            4 hours 33 min 11 sec
21 
22 Threads: 1  Questions: 52  Slow queries: 0  Opens: 110  Flush tables: 1  Open tables: 103  Queries per second avg: 0.003
23 --------------
24 
25 (root@localhost mysql3306.sock)[(none)]>set names latin1;
26 Query OK, 0 rows affected (0.00 sec)
27 
28 (root@localhost mysql3306.sock)[(none)]>s
29 --------------
30 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
31 
32 Connection id:        7
33 Current database:    
34 Current user:        root@localhost
35 SSL:            Not in use
36 Current pager:        stdout
37 Using outfile:        ''
38 Using delimiter:    ;
39 Server version:        5.7.21-log MySQL Community Server (GPL)
40 Protocol version:    10
41 Connection:        Localhost via UNIX socket
42 Server characterset:    utf8
43 Db     characterset:    utf8
44 Client characterset:    latin1
45 Conn.  characterset:    latin1
46 UNIX socket:        /tmp/mysql3306.sock
47 Uptime:            4 hours 33 min 18 sec
48 
49 Threads: 1  Questions: 56  Slow queries: 0  Opens: 110  Flush tables: 1  Open tables: 103  Queries per second avg: 0.003
50 --------------
51 
52 (root@localhost mysql3306.sock)[(none)]>select @@character_set_client;
53 +------------------------+
54 | @@character_set_client |
55 +------------------------+
56 | latin1                 |
57 +------------------------+
58 1 row in set (0.00 sec)
59 
60 (root@localhost mysql3306.sock)[(none)]>select @@character_set_connection;
61 +----------------------------+
62 | @@character_set_connection |
63 +----------------------------+
64 | latin1                     |
65 +----------------------------+
66 1 row in set (0.00 sec)
67 
68 (root@localhost mysql3306.sock)[(none)]>select @@character_set_results;
69 +-------------------------+
70 | @@character_set_results |
71 +-------------------------+
72 | latin1                  |
73 +-------------------------+
74 1 row in set (0.00 sec)
75 
76 (root@localhost mysql3306.sock)[(none)]>show variables like '%collation%';
77 +----------------------+-------------------+
78 | Variable_name        | Value             |
79 +----------------------+-------------------+
80 | collation_connection | latin1_swedish_ci |
81 | collation_database   | utf8_general_ci   |
82 | collation_server     | utf8_general_ci   |
83 +----------------------+-------------------+
84 3 rows in set (0.00 sec)
85 
86 //The influence of command "set names latin1" to character set is to change "character_set_client","character_set_connection","character_set_results" into latin1.
87 //The influence of command "set names latin1" to collation is to change "collation_connection" into latin1.

 

**Case of change character set from utf8 to latin1 with "set character set ...;".**

 1 (root@localhost mysql3306.sock)[(none)]>s
 2 --------------
 3 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
 4 
 5 Connection id:        7
 6 Current database:    
 7 Current user:        root@localhost
 8 SSL:            Not in use
 9 Current pager:        stdout
10 Using outfile:        ''
11 Using delimiter:    ;
12 Server version:        5.7.21-log MySQL Community Server (GPL)
13 Protocol version:    10
14 Connection:        Localhost via UNIX socket
15 Server characterset:    utf8
16 Db     characterset:    utf8
17 Client characterset:    latin1
18 Conn.  characterset:    utf8
19 UNIX socket:        /tmp/mysql3306.sock
20 Uptime:            4 hours 44 min 8 sec
21 
22 Threads: 1  Questions: 72  Slow queries: 0  Opens: 111  Flush tables: 1  Open tables: 104  Queries per second avg: 0.004
23 --------------
24 
25 (root@localhost mysql3306.sock)[(none)]>select @@character_set_client;
26 +------------------------+
27 | @@character_set_client |
28 +------------------------+
29 | latin1                 |
30 +------------------------+
31 1 row in set (0.00 sec)
32 
33 (root@localhost mysql3306.sock)[(none)]>select @@character_set_connection;
34 +----------------------------+
35 | @@character_set_connection |
36 +----------------------------+
37 | utf8                       |
38 +----------------------------+
39 1 row in set (0.00 sec)
40 
41 (root@localhost mysql3306.sock)[(none)]>select @@character_set_results;
42 +-------------------------+
43 | @@character_set_results |
44 +-------------------------+
45 | latin1                  |
46 +-------------------------+
47 1 row in set (0.00 sec)
48 
49 (root@localhost mysql3306.sock)[(none)]>show variables like '%collation%';
50 +----------------------+-----------------+
51 | Variable_name        | Value           |
52 +----------------------+-----------------+
53 | collation_connection | utf8_general_ci |
54 | collation_database   | utf8_general_ci |
55 | collation_server     | utf8_general_ci |
56 +----------------------+-----------------+
57 3 rows in set (0.00 sec)
58 
59 //The two variables about "connection" was not altered this time.They were still utf8 relevant.

 

**Case of change character set of server & database from utf8 to latin1.**

 1 (root@localhost mysql3306.sock)[(none)]>s
 2 --------------
 3 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
 4 
 5 Connection id:        7
 6 Current database:    
 7 Current user:        root@localhost
 8 SSL:            Not in use
 9 Current pager:        stdout
10 Using outfile:        ''
11 Using delimiter:    ;
12 Server version:        5.7.21-log MySQL Community Server (GPL)
13 Protocol version:    10
14 Connection:        Localhost via UNIX socket
15 Server characterset:    utf8
16 Db     characterset:    utf8
17 Client characterset:    utf8
18 Conn.  characterset:    utf8
19 UNIX socket:        /tmp/mysql3306.sock
20 Uptime:            4 hours 50 min 33 sec
21 
22 Threads: 1  Questions: 92  Slow queries: 0  Opens: 111  Flush tables: 1  Open tables: 104  Queries per second avg: 0.005
23 --------------
24 
25 (root@localhost mysql3306.sock)[(none)]>set character_set_server=latin1;
26 Query OK, 0 rows affected (0.00 sec)
27 
28 (root@localhost mysql3306.sock)[(none)]>set character_set_database=latin1;
29 Query OK, 0 rows affected, 1 warning (0.00 sec)
30 
31 (root@localhost mysql3306.sock)[(none)]>show warnings;
32 +---------+------+-------------------------------------------------------------------------------------------------+
33 | Level   | Code | Message                                                                                         |
34 +---------+------+-------------------------------------------------------------------------------------------------+
35 | Warning | 1681 | Updating 'character_set_database' is deprecated. It will be made read-only in a future release. |
36 +---------+------+-------------------------------------------------------------------------------------------------+
37 1 row in set (0.00 sec)
38 
39 (root@localhost mysql3306.sock)[(none)]>s
40 --------------
41 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
42 
43 Connection id:        7
44 Current database:    
45 Current user:        root@localhost
46 SSL:            Not in use
47 Current pager:        stdout
48 Using outfile:        ''
49 Using delimiter:    ;
50 Server version:        5.7.21-log MySQL Community Server (GPL)
51 Protocol version:    10
52 Connection:        Localhost via UNIX socket
53 Server characterset:    latin1
54 Db     characterset:    latin1
55 Client characterset:    utf8
56 Conn.  characterset:    utf8
57 UNIX socket:        /tmp/mysql3306.sock
58 Uptime:            4 hours 51 min 0 sec
59 
60 Threads: 1  Questions: 98  Slow queries: 0  Opens: 111  Flush tables: 1  Open tables: 104  Queries per second avg: 0.005
61 --------------
62 
63 //It shows that change "character_set_server" online is not supported in future release because of safety concern.
64 //Change character set of database may bring about risk of data loss if your client program using the supersetwhile database using subset.(eg. client->utf8mb4,database->utf8,will lost emoji data.) 

 

**Example of messy code.**

  1 (root@localhost mysql3306.sock)[zlm]>create table test_charset(
  2     -> s1 char(10) character set latin1 not null,
  3     -> s2 char(10) char set gbk,
  4     -> s3 varchar(10) charset utf8,
  5     -> s4 varchar(10)) character set=utf8mb4 engine=innodb;
  6 Query OK, 0 rows affected (0.01 sec)
  7 
  8 (root@localhost mysql3306.sock)[zlm]>show create table test_charset;
  9 +--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 10 | Table        | Create Table                                                                                                                                                                                                                                             |
 11 +--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 12 | test_charset | CREATE TABLE `test_charset` (
 13   `s1` char(10) CHARACTER SET latin1 NOT NULL,
 14   `s2` char(10) CHARACTER SET gbk DEFAULT NULL,
 15   `s3` varchar(10) CHARACTER SET utf8 DEFAULT NULL,
 16   `s4` varchar(10) DEFAULT NULL
 17 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
 18 +--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 19 1 row in set (0.00 sec)
 20 
 21 (root@localhost mysql3306.sock)[zlm]>insert into test_charset values('ASCII','国标','万国','表情');
 22 Query OK, 1 row affected (0.00 sec)
 23 
 24 (root@localhost mysql3306.sock)[zlm]>select * from test_charset;
 25 +-------+--------+--------+--------+
 26 | s1    | s2     | s3     | s4     |
 27 +-------+--------+--------+--------+
 28 | ASCII | 国标   | 万国   | 表情   |
 29 +-------+--------+--------+--------+
 30 1 row in set (0.00 sec)
 31 
 32 (root@localhost mysql3306.sock)[zlm]>s
 33 --------------
 34 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
 35 
 36 Connection id:        9
 37 Current database:    zlm
 38 Current user:        root@localhost
 39 SSL:            Not in use
 40 Current pager:        stdout
 41 Using outfile:        ''
 42 Using delimiter:    ;
 43 Server version:        5.7.21-log MySQL Community Server (GPL)
 44 Protocol version:    10
 45 Connection:        Localhost via UNIX socket
 46 Server characterset:    utf8
 47 Db     characterset:    utf8
 48 Client characterset:    utf8
 49 Conn.  characterset:    utf8
 50 UNIX socket:        /tmp/mysql3306.sock
 51 Uptime:            5 hours 26 min 54 sec
 52 
 53 Threads: 1  Questions: 123  Slow queries: 0  Opens: 118  Flush tables: 1  Open tables: 111  Queries per second avg: 0.006
 54 --------------
 55 
 56 (root@localhost mysql3306.sock)[zlm]>set names latin1;
 57 Query OK, 0 rows affected (0.00 sec)
 58 
 59 (root@localhost mysql3306.sock)[zlm]>s
 60 --------------
 61 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
 62 
 63 Connection id:        9
 64 Current database:    zlm
 65 Current user:        root@localhost
 66 SSL:            Not in use
 67 Current pager:        stdout
 68 Using outfile:        ''
 69 Using delimiter:    ;
 70 Server version:        5.7.21-log MySQL Community Server (GPL)
 71 Protocol version:    10
 72 Connection:        Localhost via UNIX socket
 73 Server characterset:    utf8
 74 Db     characterset:    utf8
 75 Client characterset:    latin1
 76 Conn.  characterset:    latin1
 77 UNIX socket:        /tmp/mysql3306.sock
 78 Uptime:            5 hours 18 min 0 sec
 79 
 80 Threads: 1  Questions: 114  Slow queries: 0  Opens: 118  Flush tables: 1  Open tables: 111  Queries per second avg: 0.008
 81 --------------
 82 
 83 (root@localhost mysql3306.sock)[zlm]>
 84     
 85 (root@localhost mysql3306.sock)[zlm]>select * from test_charset;
 86 +-------+------+------+------+
 87 | s1    | s2   | s3   | s4   |
 88 +-------+------+------+------+
 89 | ASCII | ??   | ??   | ??   |
 90 +-------+------+------+------+
 91 1 row in set (0.00 sec)
 92 
 93 (root@localhost mysql3306.sock)[zlm]>set names gbk;
 94 Query OK, 0 rows affected (0.00 sec)
 95 
 96 (root@localhost mysql3306.sock)[zlm]>s
 97 --------------
 98 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
 99 
100 Connection id:        9
101 Current database:    zlm
102 Current user:        root@localhost
103 SSL:            Not in use
104 Current pager:        stdout
105 Using outfile:        ''
106 Using delimiter:    ;
107 Server version:        5.7.21-log MySQL Community Server (GPL)
108 Protocol version:    10
109 Connection:        Localhost via UNIX socket
110 Server characterset:    utf8
111 Db     characterset:    utf8
112 Client characterset:    gbk
113 Conn.  characterset:    gbk
114 UNIX socket:        /tmp/mysql3306.sock
115 Uptime:            5 hours 29 min 22 sec
116 
117 Threads: 1  Questions: 129  Slow queries: 0  Opens: 118  Flush tables: 1  Open tables: 111  Queries per second avg: 0.006
118 --------------
119 
120 (root@localhost mysql3306.sock)[zlm]>select * from test_charset;
121 +-------+------+------+------+
122 | s1    | s2   | s3   | s4   |
123 +-------+------+------+------+
124 | ASCII | ¹螠   | β¹ | ±     |
125 +-------+------+------+------+
126 1 row in set (0.00 sec)
127 
128 (root@localhost mysql3306.sock)[zlm]>set names utf8mb4;
129 Query OK, 0 rows affected (0.00 sec)
130 
131 (root@localhost mysql3306.sock)[zlm]>s
132 --------------
133 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
134 
135 Connection id:        9
136 Current database:    zlm
137 Current user:        root@localhost
138 SSL:            Not in use
139 Current pager:        stdout
140 Using outfile:        ''
141 Using delimiter:    ;
142 Server version:        5.7.21-log MySQL Community Server (GPL)
143 Protocol version:    10
144 Connection:        Localhost via UNIX socket
145 Server characterset:    utf8
146 Db     characterset:    utf8
147 Client characterset:    utf8mb4
148 Conn.  characterset:    utf8mb4
149 UNIX socket:        /tmp/mysql3306.sock
150 Uptime:            5 hours 30 min 15 sec
151 
152 Threads: 1  Questions: 134  Slow queries: 0  Opens: 118  Flush tables: 1  Open tables: 111  Queries per second avg: 0.006
153 --------------
154 
155 (root@localhost mysql3306.sock)[zlm]>select * from test_charset;
156 +-------+--------+--------+--------+
157 | s1    | s2     | s3     | s4     |
158 +-------+--------+--------+--------+
159 | ASCII | 国标   | 万国   | 表情   |
160 +-------+--------+--------+--------+
161 1 row in set (0.00 sec)
162 
163 //MySQL support define character set on database,table even on a single column.
164 //Messy code will occur when "character_set_result" is subset of the value of character set which has been stored only if it turns back to the value equal or bigger than the stored value.

 

**Example of losting data.**

  1 (root@localhost mysql3306.sock)[(none)]>s
  2 --------------
  3 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
  4 
  5 Connection id:        4
  6 Current database:    
  7 Current user:        root@localhost
  8 SSL:            Not in use
  9 Current pager:        stdout
 10 Using outfile:        ''
 11 Using delimiter:    ;
 12 Server version:        5.7.21-log MySQL Community Server (GPL)
 13 Protocol version:    10
 14 Connection:        Localhost via UNIX socket
 15 Server characterset:    utf8
 16 Db     characterset:    utf8
 17 Client characterset:    utf8
 18 Conn.  characterset:    utf8
 19 UNIX socket:        /tmp/mysql3306.sock
 20 Uptime:            1 min 45 sec
 21 
 22 Threads: 2  Questions: 23  Slow queries: 0  Opens: 108  Flush tables: 1  Open tables: 101  Queries per second avg: 0.219
 23 --------------
 24 
 25 (root@localhost mysql3306.sock)[(none)]>set @@character_set_server=latin1;
 26 Query OK, 0 rows affected (0.00 sec)
 27 
 28 (root@localhost mysql3306.sock)[(none)]>set @@character_set_database=latin1;
 29 Query OK, 0 rows affected, 1 warning (0.01 sec)
 30 
 31 (root@localhost mysql3306.sock)[(none)]>set @@character_set_connection=latin1;
 32 Query OK, 0 rows affected (0.00 sec)
 33 
 34 (root@localhost mysql3306.sock)[(none)]>s
 35 --------------
 36 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
 37 
 38 Connection id:        4
 39 Current database:    
 40 Current user:        root@localhost
 41 SSL:            Not in use
 42 Current pager:        stdout
 43 Using outfile:        ''
 44 Using delimiter:    ;
 45 Server version:        5.7.21-log MySQL Community Server (GPL)
 46 Protocol version:    10
 47 Connection:        Localhost via UNIX socket
 48 Server characterset:    latin1
 49 Db     characterset:    latin1
 50 Client characterset:    utf8
 51 Conn.  characterset:    latin1
 52 UNIX socket:        /tmp/mysql3306.sock
 53 Uptime:            2 min 16 sec
 54 
 55 Threads: 2  Questions: 29  Slow queries: 0  Opens: 108  Flush tables: 1  Open tables: 101  Queries per second avg: 0.213
 56 --------------
 57 
 58 (root@localhost mysql3306.sock)[(none)]>insert into test_charset values('ASCII','国标','万国','表情');
 59 ERROR 1046 (3D000): No database selected
 60 (root@localhost mysql3306.sock)[(none)]>use zlm
 61 Reading table information for completion of table and column names
 62 You can turn off this feature to get a quicker startup with -A
 63 
 64 Database changed
 65 (root@localhost mysql3306.sock)[zlm]>insert into test_charset values('ASCII','国标','万国','表情');
 66 Query OK, 1 row affected, 3 warnings (0.01 sec)
 67 
 68 (root@localhost mysql3306.sock)[zlm]>show warnings;
 69 +---------+------+-----------------------------------------------------------+
 70 | Level   | Code | Message                                                   |
 71 +---------+------+-----------------------------------------------------------+
 72 | Warning | 1300 | Invalid utf8 character string: 'xE5x9BxBDxE6xA0x87' |
 73 | Warning | 1300 | Invalid utf8 character string: 'xE4xB8x87xE5x9BxBD' |
 74 | Warning | 1300 | Invalid utf8 character string: 'xE8xA1xA8xE6x83x85' |
 75 +---------+------+-----------------------------------------------------------+
 76 3 rows in set (0.00 sec)
 77 
 78 (root@localhost mysql3306.sock)[zlm]>select @@character_set_results;
 79 +-------------------------+
 80 | @@character_set_results |
 81 +-------------------------+
 82 | utf8                    |
 83 +-------------------------+
 84 1 row in set (0.00 sec)
 85 
 86 (root@localhost mysql3306.sock)[zlm]>select * from test_charset;
 87 +-------+--------+--------+--------+
 88 | s1    | s2     | s3     | s4     |
 89 +-------+--------+--------+--------+
 90 | ASCII | 国标   | 万国   | 表情   |
 91 | ASCII | ??     | ??     | ??     |
 92 +-------+--------+--------+--------+
 93 2 rows in set (0.00 sec)
 94 
 95 (root@localhost mysql3306.sock)[zlm]>set @@character_set_results=latin1;
 96 Query OK, 0 rows affected (0.00 sec)
 97 
 98 (root@localhost mysql3306.sock)[zlm]>select @@character_set_results;
 99 +-------------------------+
100 | @@character_set_results |
101 +-------------------------+
102 | latin1                  |
103 +-------------------------+
104 1 row in set (0.00 sec)
105 
106 (root@localhost mysql3306.sock)[zlm]>select * from test_charset;
107 +-------+------+------+------+
108 | s1    | s2   | s3   | s4   |
109 +-------+------+------+------+
110 | ASCII | ??   | ??   | ??   |
111 | ASCII | ??   | ??   | ??   |
112 +-------+------+------+------+
113 2 rows in set (0.00 sec)
114 
115 //The data of first row has been correctly resored in database.
116 //The data of second row has lost the Chinese character data.
117 //The value of "character_set_result" only influence the screen output.
118 //There's a data loss risk while value of character set of client is superset of the one of database.

 

Summary

  • MySQL character set is flexible and various,be more careful when modify data.
  • The parameter "default_character_set" only affect original mysql client not for the other client tools.
  • Make sure your character set of client is a subset but superset of the value of database when modifying data.Meanwhile it needs to be small than character set of connection to avoid data loss.
  • It's recommended to set character set of database to a big set such as utf8 even utf8mb4 to be compatible with most characters of various languages.

 

本文由金沙官网线上发布于数据库,转载请注明出处:浅谈MySQL字符集

您可能还会对下面的文章感兴趣: