FACTOID # 133: The top 10 countries for electricity generation using a nuclear energy source are all in Europe.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > GB 2312

GB2312 is the registered internet name for a key official character set of the People's Republic of China, used for simplified Chinese characters. GB2312 has now been superseded by GBK and GB18030, but is still in use. GB abbreviates Guojia Biaozhun (国家标准), which means national standard in Chinese. A character encoding is a code that pairs a set of characters (such as an alphabet or syllabary) with a set of something else, such as numbers or electrical pulses. ... Simplified Chinese characters (Simplified Chinese: 简体字; Traditional Chinese: 簡體字; pinyin: jiǎntǐzì; also called 简化字/簡化字, jiǎnhuàzì) are one of two standard character sets of printed contemporary Chinese written language. ... GBK is an extension of the GB2312 character set for simplified Chinese characters, used in the Peoples Republic of China. ... GB18030 is the registered Internet name for the official character set of the Peoples Republic of China (PRC). ... Guóbiāo (国标) (GB) in the Peoples Republic of China is the abbreviation of Guójiā Biāozhǔn (国家标准), meaning the National Standards, or Guójiā Biāozhǔn Mǎ (国家标准码), meaning the National Standard Encoding as Chinese character encoding system. ...


GB2312 (1980) includes 6,763 Chinese characters (on two levels: the first is arranged by reading, the second by radical then number of strokes), along with symbols and punctuation, Japanese kana, the Greek and Cyrillic alphabets, Zhuyin, and a double-byte set of Pinyin letters with tone marks.


There is a closely related analog character set to GB2312, with traditional character forms replacing simplified forms, known as GB/T 12345. GB-encoded fonts often come in pairs, one with the GB 2312 (jianti) character set and the other with the GB/T 12345 (fanti) character set. Traditional Chinese characters are one of two standard character sets of printed contemporary Chinese written language. ...


While GB2312 has covered 99.75% of the characters used for Chinese input, historical texts and many names remain out of the scope of GB2312. Despite this limitation, GB2312 is still in widespread use. GB2312 has been superseded by GBK and GB18030, which include additional characters. GBK is an extension of the GB2312 character set for simplified Chinese characters, used in the Peoples Republic of China. ... GB18030 is the registered Internet name for the official character set of the Peoples Republic of China (PRC). ...

Contents


Codepoints

Characters in GB2312 are divided into codepoints, with each codepoint comprising 94 glyphs/characters.


The codepoints can be divided and categorised as:

  • 01-09, comprising punctuation and other special characters.
  • 16-55, the first plane for chinese characters, arranged according to Pinyin.
  • 56-87, the second plane for chinese characters, arranged according to radical and strokes.

The codepoints 10-15 and 88-94 are unassigned.


Technical Aspects

EUC is often used as the character encoding (i.e. for external storage) in programs that deal with GB2312, thus maintaining compatibility with ASCII. Two bytes are used to represent every character not found in ASCII. The value of the first byte is from 0xA1-0xF7, while the value of the second byte is from 0xA1-0xFE. Hence, like UTF-8, it is possible to check if a byte is part of a two-byte construct. Extended Unix Coding (EUC) is an 8-bit character encoding used primarily for Japanese and Korean. ... A character encoding consists of a code that pairs a set of characters (representations of graphemes or grapheme-like units, such as might appear in an alphabet or syllabary for the communication of a natural language) with a set of something else, such as numbers or electrical pulses, in order... For other uses, see ASCII (disambiguation). ... UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode created by Ken Thompson and Rob Pike. ...


Compared to UTF-8, GB2312 is also more storage efficient, since Chinese characters are limited to a maximum of two bytes each, while UTF-8 uses at least three bytes.


To map the code points to bytes, add A0 to the 100's and 1000's value of the code point to form the high byte, and add A0 to the 1's and 10's value of the code point to form the low byte.


So, for example, if you have GB2312 code point 4566 ("foreign,") then the high byte will come from 45 (4500), and the low byte will come from 66 (0066). The encoding is in the range A1-F7 for the high byte, and in the range A1-FE for the low byte. So, we convert 45 to hex (2D), and add A0 to it (2D+A0=CD), and get the high byte. And we convert 66 to hex (42), and add A0 to it (42+A0=E2), and get the low byte. So, the full encoding is 0xCDE2.


See also

Guobiao code is a collective term of the national standard encoding of the Peoples Republic of China. ... CJK is a collective term for Chinese, Japanese, and Korean, which comprise the main East Asian languages. ... In computing, Chinese character encodings can be used to represent text written in the CJK languages — Chinese, Japanese, Korean — and (rarely) Vietnamese, all of which use Chinese characters. ... Technical note: Due to technical limitations, some web browsers may not display some special characters in this article. ... GB18030 is the registered Internet name for the official character set of the Peoples Republic of China (PRC). ... Big-5 or Big5 is a character encoding method used in Taiwan (Republic of China) and Hong Kong for Traditional Chinese characters. ...

External links

  • Graphical View of GB2312 in ICU's Converter Explorer
  • Evolution of GBK and GB2312 into GB18030

  Results from FactBites:
 
GB 18030: A mega-codepage (2276 words)
GB 18030-2000 is a new Chinese standard that specifies an extended codepage and a mapping table to Unicode.
The mapping table between GB 18030 and Unicode is backwards compatible with the one between GB 2312-1980 and Unicode, and with some exceptions (with the one between GBK and Unicode), most of the changes compared to the GBK mapping table are due to updates for Unicode 3.0.
GB 18030 is defined as much with charts of assigned characters as with a mapping table to and from Unicode.
  More results at FactBites »


 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m