|
Big-5 or Big5 is a character encoding method used in Taiwan ,Hong Kong and Macau for Traditional Chinese characters. Its Mainland China equivalent is GB. Big five can have the following meanings: // In international diplomacy, it refers to the five permanent members of the UN Security Council: the United States, the United Kingdom, France, Russia, and the Peoples Republic of China. ...
In computing, Chinese character encodings can be used to represent text written in the CJK languages â Chinese, Japanese, Korean â and (rarely) Vietnamese, all of which use Chinese characters. ...
Traditional Chinese characters refers to one of two standard sets of printed Chinese characters. ...
...
Guobiao code is a collective term of the national standard encoding of the Peoples Republic of China. ...
Organization
The original Big5 character set is sorted first by usage frequency, second by stroke count, lastly by Kangxi radical. The following is a list of all 214 Kangxi radicals, used originally in the 1716 Kangxi dictionary, in order of the number of strokes along with some examples of characters containing them. ...
The original Big5 character set lacked many commonly used characters. To solve this problem, each vendor developed its own extension. The ETen extension became part of the current Big5 standard through popularity. The structure of Big5 does not conform to the ISO 2022 standard, but rather bears a certain similarity to the Shift JIS encoding. It is a double-byte character set (DBCS) with the following structure: ISO 2022, more formally ISO/IEC 2022, is an ISO standard (equivalent to the ECMA standard ECMA-35) specifying a technique for including multiple character sets in a single character encoding. ...
Shift_JIS (SJIS) is a character encoding for the Japanese language developed by a Japanese company called ASCII and adopted by, amongst others, Microsoft. ...
In computing, Unicode is the international standard whose goal is to provide the means to encode the text of every document people want to store in computers. ...
| First byte ("lead byte") | 0x81 to 0xfe | | Second byte | 0x40 to 0x7e, 0xa1 to 0xfe | Certain variants of the Big5 character set, for example the HKSCS, use an expanded range for the lead byte including values in the 0x81 to 0xA0 range (similar to Shift JIS). The Hong Kong Supplementary Character Set (commonly abbreviated to HKSCS) is a set of Chinese characters -- 4,702 in total in the initial release -- used exclusively in Cantonese. ...
If the second byte is not in the correct range, behaviour is undefined (i.e., varies from system to system). The numerical value of individual Big5 codes are frequently given as a 4-digit hexadecimal number, which describes the two bytes that comprise the Big5 code as if the two bytes were a big endian representation of a 16-bit number. For example, the Big5 code for a full-width space, which are the bytes 0xa1 0x40, is usually written as 0xa140 or just A140. When integers or any other data are represented with multiple bytes, there is no unique way of ordering of those bytes in memory or in a transmission over some medium, and so the order is subject to arbitrary convention. ...
Strictly speaking, the Big5 encoding contains only DBCS characters. However, in practice, the Big5 codes are always used together with an unspecified, system-dependent single-byte character set (ASCII, or an 8-bit character set such as code page 437), so that you will find a mix of DBCS characters and single-byte characters in Big5-encoded text. Bytes in the range 0x00 to 0x7f that are not part of a double-byte character are assumed to be single-byte characters. (For a more detailed description of this problem, please see the discussion on "The Matching SBCS" below.) Image:ASCII fullsvg There are 95 printable ASCII characters, numbered 32 to 126. ...
IBM PC or MS-DOS code page 437, often abbreviated CP437 and also known as DOS-US or OEM-US, is the original character set of the IBM PC, circa 1981. ...
The meaning of non-ASCII single bytes outside the permitted values that are not part of a double-byte character varies from system to system. In old MSDOS-based systems, they are likely to be displayed as 8-bit characters; in modern systems, they are likely to either give unpredictable results or generate an error.
A more detailed look at the organization In the original Big5, the encoding is compartmentalized into different zones: | 0x8140 to 0xa0fe | Reserved for user-defined characters 造字 | | 0xa140 to 0xa3bf | "Graphical characters" 圖形碼 | | 0xa3c0 to 0xa3fe | Reserved, "not" for user-defined characters | | 0xa440 to 0xc67e | Frequently used characters 常用字 | | 0xc6a1 to 0xc8fe | Reserved for user-defined characters | | 0xc940 to 0xf9d5 | Less frequently used characters 次常用字 | | 0xf9d6 to 0xfefe | Reserved for user-defined characters | The "graphical characters" actually comprise punctuation marks, partial punctuation marks (e.g., half of a dash, half of an ellipsis; see below), dingbats, foreign characters, and other special characters (e.g., presentational "full width" forms, digits for Suzhou numerals, zhuyin fuhao, etc.) A dingbat is an ornament or spacer used in typesetting, sometimes more formally known as a printers ornament. The term supposedly originated as onomatopoeia in old style metal-type print shops, where extra space around text or illustrations would be filled by dinging an ornament into the space...
Today, speakers of Chinese use three numeral systems: the ubiquitous system of Arabic numerals, along with two ancient Chinese numeral systems. ...
Zhuyin fuhao (Traditional Chinese: ; Simplified Chinese: ; Hanyu Pinyin: ; Tongyong Pinyin: ; Wade-Giles: Chu-yin fu-hao), or Symbols for Annotating Sounds, often abbreviated as Zhuyin, or known as Bopomofo (ã
ããã) after the first four letters of this Chinese phonemic alphabet (bo po mo fo), is the national phonetic system of the...
In most vendor extensions, extended characters are placed in the various zones reserved for user-defined characters, each of which are normally regarded as associated with the preceding zone. For example, additional "graphical characters" (e.g., punctuation marks) would be expected to be placed in the 0xa3c0–0xa3fe range, and additional logograms would be placed in either the 0xc6a1–0xc8fe or the 0xf9d6–0xfefe range. Sometimes, this is not possible due to the large number of extended characters to be added; for example, Cyrillic letters and Japanese kana have been placed in the zone associated with "frequently-used characters". The Cyrillic alphabet (or azbuka, from the old name of the first two letters) is an alphabet used for several East and South Slavic languages; (Belarusian, Bulgarian, Macedonian, Russian, Rusyn, Serbian, and Ukrainian) and many other languages of the former Soviet Union, Asia and Eastern Europe. ...
Japanese writing Kanji æ¼¢å Kana ä»®å Hiragana 平仮å Katakana çä»®å Manyogana ä¸èä»®å Uses Furigana æ¯ãä»®å Okurigana éãä»®å RÅmaji ãã¼ãå For other meanings of Kana, see Kana (disambiguation). ...
What a Big5 code actually encodes Contrary to popular belief, an individual Big5 code does not always represent a complete semantic unit. The Big5 codes of logograms are always logograms, but codes in the "graphical characters" section are not always complete "graphical characters". What Big5 encodes are particular graphical representations of characters or part of characters that happen to fit in the space taken by two monospaced ASCII characters. This is a property of double-byte character sets as normally used in CJK (Chinese, Japanese, and Korean) computing, and is not a unique problem of Big5. (The above might need some explanation by putting it in historical perspective, as it is theoretically incorrect: Back when text mode personal computing was still the norm, characters were normally represented as single bytes and each character takes one position on the screen. There was therefore a practical reason to insist that double-byte characters must take up two positions on the screen, namely that off-the-shelf, American-made software would then be usable without modification in a DBCS-based system. If a character can take an arbitrary number of screen positions, software which was written with the assumption that one byte of text takes one screen position would produce incorrect output. Of course, if a computer never had to deal with the text screen, the manufacturer would not enforce this artificial restriction; the Apple Macintosh is an example. Nevertheless, the encoding itself must be designed so that it works correctly on text-screen-based systems.) To illustrate this point, consider the Big5 code 0xa14b (…). To English speakers this looks like an ellipsis and the Unicode standard identifies it as such; however, in Chinese, the ellipsis consists of six dots that fit in the space of two Chinese characters (……), so in fact there is no Big5 code for the Chinese ellipsis, and the Big5 code 0xa14b just represents half of a Chinese ellipsis. It represents only half of an ellipsis because the whole ellipsis should take the space of two Chinese characters, and in many DBCS systems one DBCS character must take exactly the space of one Chinese character. Characters encoded in Big5 do not always represent things that can be readily used in plain text files; an example is "citation mark" (0xa1ca, ﹋), which is, when used, required to be typeset under the title of literary works. Another example is the Suzhou numerals, which is a form of scientific notation that requires the number to be laid out in a 2-D form consisting of at least two rows. Today, speakers of Chinese use three numeral systems: the ubiquitous system of Arabic numerals, along with two ancient Chinese numeral systems. ...
Scientific notation, also known as standard form, is a notation for writing numbers that is often used by scientists and mathematicians to make it easier to write large and small numbers. ...
The Matching SBCS In practice, Big5 cannot be used without a matching Single Byte Character Set (SBCS); this is mostly to do with a compatibility reason. However, as in the case of other CJK DBCS character sets, the SBCS to use has never been specified. Big5 has always been defined as a DBCS, though when used it must be paired with a suitable, unspecfied SBCS and therefore used as what some people call a MBCS; nevertheless, Big5 by itself, as defined, is strictly a DBCS. SBCS, or Single Byte Character Set, is sometimes used to refer to character sets which use one byte for each graphic character. ...
A variable-width encoding is a type of character encoding scheme in which codes of differing lengths are used to encode a character set (a repertoire of symbols) for representation in a computer. ...
The SBCS to use being unspecified implies that the SBCS used can theoretically vary from system to system. Nowadays, ASCII is the only possible SBCS one would use. However, in old DOS-based systems, Code Page 437—with its extra special symbols in the control code area including position 127—was much more common. Yet, on a Macintosh system with the Chinese Language Kit, or on a Unix system running the cxterm terminal emulator, the SBCS paired with Big5 would not be Code Page 437. Microsofts disk operating system, MS-DOS, was Microsofts implementation of DOS, which was the first popular operating system for the IBM PC, and until recently, was widely used on the PC compatible platform. ...
IBM PC or MS-DOS code page 437, often abbreviated CP437 and also known as DOS-US or OEM-US, is the original character set of the IBM PC, circa 1981. ...
Outside the valid range of Big5, the old DOS-based systems would routinely interpret things according to the SBCS that is paired with Big5 on that system. In such systems, characters 127 to 160, for example, were very likely not avoided because they would produce invalid Big5, but used because they would be valid characters in Code Page 437. The modern characterization of Big5 as an MBCS consisting of the DBCS of Big5 plus the SBCS of ASCII is therefore historically incorrect and potentially flawed, as the choice of the matching SBCS was, and theoretically still is, quite independent of the flavour of Big5 being used.
History The Big5 encoding was defined by the Institute for Information Industry of Taiwan in 1984. The name "Big5" may derive from the five companies (Acer, MiTAC, JiaJia, Zero One, FIC) that developed the standard [1], or it may refer to the intended development and support of five major software packages. [2] According to some accounts, Big5 was popularized by its adoption in several commercial software packages, especially the ET Chinese system which ran on MS-DOS. Microsofts disk operating system, MS-DOS, was Microsofts implementation of DOS, which was the first popular operating system for the IBM PC, and until recently, was widely used on the PC compatible platform. ...
The Republic of China government declared it their standard in mid-1980s since Big5 was already the de facto standard by that time. For the Chinese civilization, see China. ...
Extensions The original Big-5 only include CJK logograms from 常用國字標準字體表 and 次常用國字標準字體表, but not letters from people's names, place names, dialects, chemistry, biology, Japanese kana. As a result, many Big-5 supporting software include extensions to address the problems. For other uses, see Chemistry (disambiguation). ...
Biology studies the variety of life (clockwise from top-left) E. coli, tree fern, gazelle, Goliath beetle Biology (from Greek: βίοÏ, bio, life; and λÏγοÏ, logos, knowledge), also referred to as the biological sciences, is the study of living organisms utilizing the scientific method. ...
Japanese writing Kanji æ¼¢å Kana ä»®å Hiragana 平仮å Katakana çä»®å Manyogana ä¸èä»®å Uses Furigana æ¯ãä»®å Okurigana éãä»®å RÅmaji ãã¼ãå For other meanings of Kana, see Kana (disambiguation). ...
Vendor Extensions ETEN extensions In ETEN(倚天) Chinese operating system, the following code points are added to make it compliant with IBM5550 code page: - A3C0-A3E0: 33 control characters.
- C6A1-C875: circle 1-10, bracket 1-10, Roman letters 1-9 (i-ix), CJK radical glyphs, Japanese hiragana, Japanese katakana, Cyrillic characters
- F9D6-F9FE: '碁', '銹', '恒', '裏', '墻', '粧', '嫺', and 34 extra symbols.
In some versions of Eten, there are extra graphical symbols and Simplified Chinese characters. Hiragana ) is a Japanese syllabary, one component of the Japanese writing system, along with katakana and kanji; the Latin alphabet is also used in some cases. ...
Katakana ) is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji, and in some cases the Latin alphabet. ...
The Cyrillic alphabet (or azbuka, from the old name of the first two letters) is an alphabet used for several East and South Slavic languages; (Belarusian, Bulgarian, Macedonian, Russian, Rusyn, Serbian, and Ukrainian) and many other languages of the former Soviet Union, Asia and Eastern Europe. ...
This article or section does not adequately cite its references or sources. ...
Microsoft code pages Microsoft(微軟) created its own version of Big5 extension as Code page 950 for use with Microsoft Windows which supports ETEN's extensions, but only the F9D6-F9FE code points. In Windows ME, the euro currency symbol is mapped to Big-5 code point A3E1, but not in later versions of the operating system. Code page 950 is Microsofts implementation of the defacto standard Big5. ...
âWindowsâ redirects here. ...
Windows Millennium Edition, or Windows Me (IPA pronunciation: [miË], [Ém iË]), is a hybrid 16-bit/32-bit graphical operating system released on September 14, 2000 by Microsoft. ...
For other uses, see Euro (disambiguation). ...
The euro (€; ISO 4217 code EUR) is the currency of twelve of the twenty-five nations that form the European Union (and four outside it, as well as Montenegro and Kosovo), which form the Economic and Monetary Union (EMU). ...
After installing Microsoft's HKSCS patch on top of traditional Chinese Windows (or any version of Windows 2000 and above with proper language pack), applications using code page 950 automatically use a hidden code page 951 table. The table supports all code points in HKSCS-2001, except for the compatibility code points specified by the standard[3].
ChinaSea font ChinaSea fonts(中國海字集) are Tranditional Chinese fonts made by ChinaSea. The fonts are rarely sold separately, but are bundled with other products, such as the Chinese version of Microsoft Office 97. The fonts support Japanese kana, kokuji, and other characters missing in Big-5. As a result, the ChinaSea extensions have become more popular than the government-supported extensions. Some Hong Kong BBSes had used encodings in ChinaSea fonts before the introduction of HKSCS. Microsoft Office is a suite of productivity programs created by Microsoft and developed for Microsoft Windows and Apple Macintosh operating systems. ...
Japanese writing Kanji æ¼¢å Kana ä»®å Hiragana 平仮å Katakana çä»®å Manyogana ä¸èä»®å Uses Furigana æ¯ãä»®å Okurigana éãä»®å RÅmaji ãã¼ãå For other meanings of Kana, see Kana (disambiguation). ...
The characters for Kanji, lit. ...
A bulletin board system or BBS is a computer system running software that allows users to dial into the system over a phone line and, using a terminal program, perform functions such as downloading software and data, uploading data, playing games, reading news, and exchanging messages with other users. ...
The Hong Kong Supplementary Character Set (commonly abbreviated to HKSCS) is a set of Chinese characters -- 4,702 in total in the initial release -- used exclusively in Cantonese. ...
'Sakura' font The 'Sakura' font (日和字集 Sakura Version) is developed in Hong Kong and is designed to be compatible with HKSCS. It adds support for kokuji and proprietary dingbats (including Doraemon) not found in HKSCS. The Hong Kong Supplementary Character Set (commonly abbreviated to HKSCS) is a set of Chinese characters -- 4,702 in total in the initial release -- used exclusively in Cantonese. ...
The characters for Kanji, lit. ...
A dingbat is an ornament or spacer used in typesetting, sometimes more formally known as a printers ornament. The term supposedly originated as onomatopoeia in old style metal-type print shops, where extra space around text or illustrations would be filled by dinging an ornament into the space...
Original run April 2, 1979 â Present No. ...
Unicode-at-on Unicode-at-on (Unicode補完計畫), formerly BIG5 Extension, extends BIG-5 by altering code page tables, but uses the ChinaSea extensions starting with version 2. However, with the bankruptcy of ChinaSea, late development, and the increasing popularity of HKSCS and Unicode (the project is not compatible with HKSCS), the success of this extension is limited at best. The Hong Kong Supplementary Character Set (commonly abbreviated to HKSCS) is a set of Chinese characters -- 4,702 in total in the initial release -- used exclusively in Cantonese. ...
The Unicode Standard, Version 5. ...
OPG The web sites of the Oriental Daily News and Sun Daily, belongs to the Oriental Press Group Limited(東方報業集團有限公司) in Hong Kong, use a downloadable font which the Big-5 extension coding differ from the HKSCS. Oriental Daily News (Traditional Chinese: ) is a Chinese language newspaper in Hong Kong. ...
The Sun (太陽報) is one of the newspapers in Hong Kong. ...
The Hong Kong Supplementary Character Set (commonly abbreviated to HKSCS) is a set of Chinese characters -- 4,702 in total in the initial release -- used exclusively in Cantonese. ...
Official Extensions Taiwan Ministry of Education font The Taiwan Ministry of Education supplied its own font, the Taiwan Ministry of Education font(臺灣教育部造字檔) for use internally.
Taiwan Council of Agriculture font Taiwan's Council of Agriculture font, Executive Yuan introduced a 133-character custom font, the Taiwan Council of Agriculture font(臺灣農委會常用中文外字集) that includes 84 characters from the 'fish' radical and 7 from the 'bird' radical.
Big5+ The Chinese Foundation for Digitization Technology(中文數位化技術推廣委員會) introduced Big5+ in 1997, which used over 20000 code points to incorporate all CJK logograms in Unicode 1.1. However, the extra code points exceeded the original Big-5 definition (Big5+ uses high byte values 81-FE and low byte values 40-7E and 80-FE), preventing it from being installed on Microsoft Windows.
Big-5E To allow Windows users to use custom fonts, the Chinese Foundation for Digitization Technology introduced Big-5E, which included 3954 characters. The system is incompatible with Big5+ and removed the Japanese kana from the ETEN extension, so it became unpopular.
Big5-2003 The Chinese Foundation for Digitization Technology made a Big5 definition and put it into CNS 11643 in note form, making it part of the official standard in Taiwan. The CNS 11643 character set (Chinese National Standard 11643), also officially known as the Chinese Standard Interchange Code (ä¸ææ¨æºäº¤æç¢¼), is officially the standard character set of the Republic of China. ...
Big5-2003 incorporates all Big-5 characters introduced in the 1984 ETEN extensions (code points A3C0-A3E0, C6A1-C7F2, and F9D6-F9FE) and the Euro symbol. Cyrillic characters were not included because the authority claimed CNS 11643 does not include such characters.
CDP The Academia Sinica made a CDP font(漢字構形資料庫) in late 90s, which the latest release version 2.5 included 112,533 characters, some less than the Mojikyo fonts. The Academia Sinica (Chinese: ; Pinyin: ), headquartered in the Nan-kang district () of Taipei, is the national academy for Taiwan. ...
Mojikyo (æåé¡) is a set of computer software and fonts for enhanced logogram word-processing. ...
HKSCS Hong Kong also adopted Big5 for character encoding. However, Cantonese uses many archaic and some colloquial Chinese characters that were not available in the normal Big5 character set. To solve this problem, the Hong Kong Government created the Big5 extensions Government Chinese Character Set in 1995 and Hong Kong Supplementary Character Set in 1999. The Hong Kong extensions were commonly distributed as a patch. It is still being distributed as a patch by Microsoft, but a full Unicode font is also available from the Hong Kong Government’s web site. This article is about all of the Cantonese (Yue) dialects. ...
On July 1, 1997, the Peoples Republic of China (PRC) resumed its exercise of sovereignty over Hong Kong, ending more than 150 years of British colonial control. ...
Government Chinese Character Set (æ¿åºéç¨å庫) or GCCS is a set of supplementary Chinese character coded in the user-defined areas of BIG-5 character set. ...
Year 1995 (MCMXCV) was a common year starting on Sunday (link will display full 1995 Gregorian calendar). ...
The Hong Kong Supplementary Character Set (commonly abbreviated to HKSCS) is a set of Chinese characters -- 4,702 in total in the initial release -- used exclusively in Cantonese. ...
This article is about the year. ...
There are two encoding schemes of HKSCS: one encoding scheme is for the Big-5 coding standard and the other is for the ISO 10646 standard. Subsequent to the initial release, there are also HKSCS-2001 and HKSCS-2004. The HKSCS-2004 is aligned technically with the ISO/IEC 10646:2003 and its Amendment 1 published in April 2004 by the International Organization for Standardization (ISO). HKSCS includes all the characters from the common ETEN extension, plus some characters from Simplified Chinese, place names, people's names, and Cantonese phrases (including profanity). The Hong Kong Supplementary Character Set (commonly abbreviated to HKSCS) is a set of Chinese characters -- 4,702 in total in the initial release -- used exclusively in Cantonese. ...
This article or section does not adequately cite its references or sources. ...
Look up Profanity in Wiktionary, the free dictionary. ...
See also The Unicode Standard, Version 5. ...
Han unification is the process used by the authors of Unicode and the Universal Character Set to map multiple character sets of the CJK languages into a single set of unified characters. ...
Since the Chinese language uses a logographic script â that is, a script where one or two characters corresponds roughly to one word or meaning â there are vastly more characters, or glyphs, than there are keys on a standard computer keyboard. ...
References - Lunde, Ken (1999). CJKV Information Processing, First Edition, O'Reilly and Associates, Inc.. ISBN 1-56592-224-7.
External links - Big5 character code table
- Chinese character codes: an update by Christian Wittern
- CNS 11643 official web site has information about the Big5e character set (an extended version of Big5) in the "Chinese Information Code" section
- Big5 introduction Contains differences between extensions.
- Graphical View of Big5 in ICU's Converter Explorer
- 教育部標準字體 Download page of the Taiwan Ministry of Education fonts
- 文獻處理實驗室 Download pages of the CDP font
- Hong Kong Supplementary Character Set Info Downloadable HKSCS documents & font
- 香港參考宋體 Download page of Dynalab(華康科技有限公司)'s HKSCS font.
- Microsoft's Windows Codepage 950 (Traditional Chinese Big5)
- on.cc Download page of the OPG font
|