|
ISO 2022, more formally ISO/IEC 2022, is an ISO standard (equivalent to the ECMA standard ECMA-35) specifying a technique for including multiple character sets in a single character encoding. Unlike ISO 8859 character encodings which use 8 bits for every character, the ISO 2022 encodings are variable size encodings typically using either 8 or 16 bits per character. Several character encodings use ISO 2022 mechanisms. For example, ISO-2022-JP is a widely used character encoding for the Japanese language. In other languages Iso means the following in other languages: Big in Finnish a Latin transliteration of the Greek ίσος, meaning equal. ...
Ecma International - European association for standardising information and communication systems came into existence in 1994, when the European Computer Manufacturers Association (ECMA) changed its name in order to reflect the international activities of the organisation (the long form of ECMA was dropped then, and capitalization changed to reflect this). ...
A character encoding is a code that pairs a set of natural language characters (such as an alphabet or syllabary) with a set of something else, such as numbers or electrical pulses. ...
ISO 8859, more formally ISO/IEC 8859, is a joint ISO and IEC standard for 8-bit character encodings for use by computers. ...
A bit (abbreviated b) is the most basic information unit used in computing and information theory. ...
Introduction
Many languages or language families not based on the Latin alphabet such as Greek, Russian, Arabic, or Hebrew have historically been represented on computers with 8-bit extended ASCII encodings including the ISO 8859 family of character sets. Written East Asian languages, specifically Chinese, Japanese, and Korean, use far more characters than can be represented in an 8-bit computer byte and were first represented on computers with language-specific double byte encodings. ISO 2022 was developed as a technique to represent characters in multiple character sets within a single character encoding. The ISO 2022 character encodings include escape sequences which indicate the character set for characters which follow. The escape sequences are registered with ISO and are often three characters long starting with the ASCII ESCAPE character (hexadecimal 1B, octal 33). These character encodings require data to be processed sequentially in a forward direction since the correct interpretation of the data depends on the most recently encountered escape sequence. Although the ISO 2022 character sets are still in common use, particularly ISO-2022-JP, most modern e-mail applications are converting to use the simpler Unicode character encodings such as UTF-8. Most languages are known to belong to language families (families hereforth). ...
The Latin alphabet, also called the Roman alphabet, is the most widely used alphabetic writing system in the world. ...
Arabic (العربية) is a Semitic language, closely related to Hebrew and Aramaic. ...
The Modern Hebrew language is a Semitic language of the Afro-Asiatic language family. ...
The term extended ASCII (or high ASCII) describes eight-bit or larger character encodings that include the standard seven-bit ASCII characters as well as others. ...
ISO 8859, more formally ISO/IEC 8859, is a joint ISO and IEC standard for 8-bit character encodings for use by computers. ...
East Asia is a subregion of Asia. ...
A bit (abbreviated b) is the most basic information unit used in computing and information theory. ...
This article refers to the unit of binary information. ...
In computing, Unicode is the international standard whose goal is to provide the means to encode the text of every document people want to store in computers. ...
An escape sequence is a series of characters used to trigger some sort of command state in computers and their attached peripherals. ...
There are 95 printable ASCII characters, numbered 32 to 126. ...
Wikipedia does not yet have an article with this exact name. ...
In computing, Unicode is the international standard whose goal is to provide the means to encode the text of every document people want to store in computers. ...
UTF-8 (8-bit Unicode Transformation Format) is a lossless, variable-length character encoding for Unicode created by Ken Thompson and Rob Pike. ...
ISO 2022 Character Sets Character encodings using ISO 2022 mechanism include: - ISO-2022-JP - widely used encoding for Japanese. Starts in ASCII and includes the following escape sequences
- ESC ( B to switch to ASCII (1 byte per character)
- ESC ( J to switch to JIS X 0201-1976 (1 byte per character)
- ESC $ @ to switch to JIS X 0208-1978 (2 bytes per character)
- ESC $ B to switch to JIS X 0208-1983 (2 bytes per character)
- ISO-2022-JP-1 - Same as ISO-2022-JP with one additional escape sequence
- ESC $ ( D to switch to JIS X 0212-1990 (2 bytes per character)
- ISO-2022-JP-2 - Multilingual extension of ISO-2022-JP. Same as ISO-2022-JP-1 with the following additional escape sequences
- ESC $ A to switch to GB 2312-1980 (2 bytes per character)
- ESC $ ( C to switch to KS X 1001-1992 (2 bytes per character)
- ESC . A to switch to ISO 8859-1 (1 byte per character)
- ESC . F to switch to ISO 8859-7 (1 byte per character)
- ISO-2022-JP-3 - Same as ISO-2022-JP with two additional escape sequences
- ESC $ ( O to switch to JIS X 0213-2000 Plane 1 (2 bytes per character)
- ESC $ ( P to switch to JIS X 0213-2000 Plane 2 (2 bytes per character)
- ISO-2022-KR - Korean
- ESC $ ) C to switch to KS X 1001-1992 (2 bytes per character)
- ISO-2022-CN - Chinese
- ESC $ ) A to switch to GB 2312-1980 (2 bytes per character)
- ESC $ ) G to switch to CNS 11643-1992 Plane 1 (2 bytes per character)
- ESC $ * H to switch to CNS 11643-1992 Plane 2 (2 bytes per character)
- ISO-2022-CN-EXT - Same as ISO-2022-CN with six additional escape sequences
- ESC $ ) E to switch to ISO-IR-165 (2 bytes per character)
- ESC $ + I to switch to CNS 11643-1992 Plane 3 (2 bytes per character)
- ESC $ + J to switch to CNS 11643-1992 Plane 4 (2 bytes per character)
- ESC $ + K to switch to CNS 11643-1992 Plane 5 (2 bytes per character)
- ESC $ + L to switch to CNS 11643-1992 Plane 6 (2 bytes per character)
- ESC $ + M to switch to CNS 11643-1992 Plane 7 (2 bytes per character)
JIS X 0208 is a Japanese Industrial Standard defining a set of kanji indexed by a pair of integers from 1 to 94 (this is known as the kuten pair of the kanji). ...
JIS X 0208 is a Japanese Industrial Standard defining a set of kanji indexed by a pair of integers from 1 to 94 (this is known as the kuten pair of the kanji). ...
GB2312 is the registered internet name for the official character set of the Peoples Republic of China. ...
ISO 8859-1, more formally cited as ISO/IEC 8859-1 or less formally as Latin-1, is part 1 of ISO/IEC 8859, a standard character encoding defined by ISO. It encodes what it refers to as Latin alphabet no. ...
ISO 8859-7, also known as Greek, is an 8-bit character encoding, part of the ISO 8859 standard. ...
GB2312 is the registered internet name for the official character set of the Peoples Republic of China. ...
See also ISO 646 is an ISO standard that specifies a 7 bit character code from which several national standards are derived, the best known of which is ASCII. Since the portion of ISO 646 shared by all countries specified only the letters used in the English alphabet, other countries using the...
CJK is a collective term for Chinese, Japanese, and Korean, which comprise the largest of East Asian languages. ...
Mojibake (文字化け, moji character + bake change, literally ghost characters or changed characters) is Japanese for broken characters: the result of trying to display text in character encodings which a piece of software is not configured to deal with. ...
External links - International Organization for Standardization (http://www.iso.org/)
- ECMA-35 (http://www.ecma-international.org/publications/standards/Ecma-035.htm)
- International Register of Coded Character Sets to be Used with Escape Sequences (http://www.itscj.ipsj.or.jp/ISO-IR/)
- History of Character Codes in North America, Europe, and East Asia (http://tronweb.super-nova.co.jp/characcodehist.html)
- CJK.INF: a document on encoding Chinese, Japanese, and Korean (CJK) languages, including a discussion of the various variants of ISO 2022 (ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf). Also available by HTTP (http://examples.oreilly.com/cjkvinfo/doc/cjk.inf).
- RFCs
- RFC 1468: description of ISO-2022-JP
- RFC 2237: description of ISO-2022-JP-1
- RFC 1554: description of ISO-2022-JP-2
- RFC 1922: description of ISO-2022-CN and ISO-2022-CN-EXT
- RFC 1557: description of ISO-2022-KR
|