Sunday, February 3, 2008

Simplified Chinese Character Set GB2312

GB2312 is an official standard for simplified Chinese issued in 1980 by the People's Republic of China. It is actually a encoding mechanism originally for PC, now widely adopted by Internet web. If you see Chinese characters in a web site, mostly the Chinese characters are in the computer digitized form of GB2312 encoding.

GB2312 include all the symbols, English letters, other language letters, and Chinese characters in the set. All the items in GB2312 are arrange by matrix 94x94 (rows by columns). Row 16-55 contain level 1 Chinese characters(3755), ordered by Pinyin, and row 56-87 for level 2 Chinese characters(3008), by radical then stroke. Therefore, the total number of official Chinese in GB2312 is 6763. In addition to that, GB2312 includes 682 non-Hanzi characters. Here is the link, GB2312 Character Set, with detail information about GB2312 and all Chinese characters by Dr. Herong Yang.

It is very interesting to know that GB2312 is using Pinyin to arrange Chinese characters. The encoding is based on ASCII encoding in sequence. As you know that many Chinese are in the same Pinyin symbol such as many shi4 in Chinese. It is very easy to find Chinese pinyin by its encoding. Actually, many tools on the web are programed in this manner.