|
The term extended ASCII (or high ASCII) describes eight-bit or larger character encodings that include the standard seven-bit ASCII characters as well as others. The use of the term has sometimes been criticized, because it can be mistakenly interpreted that the ASCII standard has been updated to include more than 128 characters or that the term unambiguously identifies a single encoding, both of which are untrue. 8-bit refers to the number of bits used in the data bus of a computer. ...
A character encoding consists of a code that pairs a sequence of characters from a given set with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the storage of text in computers and the transmission of text through telecommunication networks. ...
This article is about the unit of information. ...
There are 95 printable ASCII characters, numbered 32 to 126. ...
Motives for extending
Because the number of written symbols used in common natural languages far exceeds the limited range of the ASCII code, many extensions to it have been used to facilitate handling of those languages. Markets for computers and communication equipment outside English-speaking countries were historically open long before standards bodies had time to deliberate upon the best way to accommodate them, so there are many incompatible proprietary extensions to ASCII. The term natural language is used to distinguish languages spoken and signed (by hand signals and facial expressions) by humans for general-purpose communication from constructs such as writing, computer-programming languages or the languages used in the study of formal logic, especially mathematical logic. ...
Since ASCII is a seven-bit code and most computers manipulate data in eight-bit bytes, many extensions use the additional 128 codes available by using all eight bits of each byte. This helps include many languages otherwise not easily representable in ASCII, but still not enough to cover all languages of countries in which computers are sold, so even these eight-bit extensions had to have local variants. In computer science a byte is a ubiquitous unit of storage measurement. ...
Proprietary extensions Various proprietary extensions appeared on non-EBCDIC mainframe and mini-computers, especially in universities. Commodore microcomputers added many graphic symbols to their non-standard ASCII (PETSCII, based on the original ASCII standard of 1963). IBM introduced eight-bit extended ASCII codes on the original IBM PC and later produced variations for different languages and cultures. IBM called such character sets code pages and assigned numbers to both those they themselves invented as well as many invented and used by other manufacturers. Accordingly, character sets are very often indicated by their IBM code page number. In ASCII-compatible code pages, the lower 128 characters maintained their standard US-ASCII values, and different pages (or sets of characters) could be made available in the upper 128 characters. DOS computers built for the North American market, for example, used code page 437, which included accented characters needed for French, German, and a few other European languages, as well as some graphical line-drawing characters. The larger character set made it possible to create documents in a combination of languages such as English and French (though French computers usually use code page 850), but not, for example, in English and Greek (which required code page 737). EBCDIC (Extended Binary Coded Decimal Interchange Code) is an 8-bit character encoding (code page) used on IBM mainframe operating systems, like z/OS, OS/390, VM and VSE, as well as IBM minicomputer operating systems like OS/400 and i5/OS. It is also employed on various non-IBM...
PETSCII (PET Standard Code of Information Interchange), also known as CBM ASCII, is the variation of the ASCII character set used in Commodore (CBM)s 8-bit home computers, starting with the PET from 1977 and including the VIC-20, C64, Plus/4, C16 and C128. ...
IBM PC (IBM 5150) with keyboard and green screen monochrome monitor (IBM 5151), running MS-DOS 5. ...
Code page is the traditional IBM term used for a specific character encoding table: a mapping in which a sequence of bits, usually a single octet representing integer values 0 through 255, is associated with a specific character. ...
â¹ The template below has been proposed for deletion. ...
IBM PC or MS-DOS code page 437, often abbreviated CP437 and also known as DOS-US or OEM-US, is the original character set of the IBM PC, circa 1981. ...
The English language is a West Germanic language that originates in England. ...
The code page 850 is a code page which was used in occidental Europe, under systems such as DOS. It has been largely replaced with ISO 8859-1 and UTF-8, but is still sometimes used. ...
Digital Equipment Corporation developed a "Multinational Character Set", which had fewer characters but more letter and diacritic combinations, based on draft versions of ISO 8859. It was supported by the VT220 and later DEC computer terminals. The DEC logo Digital Equipment Corporation was a pioneering American company in the computer industry. ...
ISO 8859, more formally ISO/IEC 8859, is a joint ISO and IEC standard for 8-bit character encodings for use by computers. ...
The VT220 was a terminal produced by Digital Equipment Corporation from 1983 to 1987. ...
A computer terminal is an electronic or electromechanical hardware device that is used for entering data into, and displaying data from, a computer or a computing system. ...
ISO 8859 and proprietary adaptions Eventually, ISO released this standard as ISO 8859 describing its own set of eight-bit ASCII extensions. The most popular was ISO 8859-1, also called ISO Latin1, which contained characters sufficient for the most common Western European languages. Variations were standardized for other languages as well: ISO 8859-2 for Eastern European languages and ISO 8859-5 for Cyrillic languages, for example. One notable way in which ISO character sets differ from code pages is that the character positions 128 to 159, corresponding to ASCII control characters with the high-order bit set, are specifically unused and undefined in the ISO standards, though they had often been used for printable characters in proprietary code pages, a breaking of ISO standards that was almost universal. Microsoft later created code page 1252, a compatible superset of ISO 8859-1 with extra characters in the ISO unused range. Code page 1252 is the standard character encoding of western European language versions of Microsoft Windows, including English versions. ISO 8859-1 is the common character encoding used by the X Window System, and most Internet standards. The Apple Macintosh, under Mac OS X, currently uses Unicode as its default encoding. Under Mac OS, it used Mac OS Roman. The International Organization for Standardization (ISO) is an international standard-setting body composed of representatives from national standards bodies. ...
ISO 8859, more formally ISO/IEC 8859, is a joint ISO and IEC standard for 8-bit character encodings for use by computers. ...
ISO 8859-1, more formally cited as ISO/IEC 8859-1 or less formally as Latin-1, is part 1 of ISO/IEC 8859, a standard character encoding originally developed by ISO, but later jointly maintained by ISO and IEC. The standard, when supplemented with additional character assignments, is the...
In computing, a control character or non-printing character, is a code point (a number) in a character set that does not in itself represent a written symbol. ...
The legacy components of Microsoft Windows in English and some other Western languages use, by default, an encoding that is a superset of ISO 8859-1, but differs by using displayable characters rather than control characters in the 0x80 to 0x9F range. ...
Microsoft Windows is the name of several families of proprietary software operating systems by Microsoft. ...
KDE 3. ...
The first Macintosh computer, introduced in 1984, upgraded to a 512K Fat Mac. The Macintosh or Mac, is a line of personal computers designed, developed, manufactured, and marketed by Apple Computer. ...
Mac OS X (official IPA pronunciation: ) is a line of proprietary, graphical operating systems developed, marketed, and sold by Apple Inc. ...
Unicode is an industry standard designed to allow text and symbols from all of the writing systems of the world to be consistently represented and manipulated by computers. ...
This article or section does not cite its references or sources. ...
Mac OS Roman is a character encoding primarily used by Mac OS to represent text. ...
Character set confusion Because these ASCII extensions have so many variants, it is necessary to identify which set is being used for a particular text for it to be interpreted correctly. However, because the most-used characters (those in ASCII, the seven-bit code points) are common to all sets--even most proprietary ones like the Macintosh--failure to correctly identify a character set often suffers no adverse consequences if the user is typing in English. Further, because many Internet standards use ISO 8859-1, and because Microsoft Windows (using the code page 1252 superset of ISO 8859-1) is the dominant operating system for personal computers today, unannounced use of ISO 8859-1 is quite commonplace, and should generally be assumed without evidence to the contrary. In many protocols, most importantly e-mail and HTTP, the character encoding of content has to be tagged with IANA-assigned character set identifiers. Wikipedia does not yet have an article with this exact name. ...
HTTP (for HyperText Transfer Protocol) is the primary method used to convey information on the World Wide Web. ...
For other uses of IANA, see IANA (disambiguation). ...
Unicode A proposal called Unicode was made in 1991 to address many of these problems, and is now widely accepted. Unicode reserves 1,114,112 (= 220 + 216) code points, and currently assigns characters to more than 101,000 of those code points. The first 256 codes precisely match those of ISO-8859-1. The majority of the 96,000 code points are, at this time, used for Chinese, Japanese and Korean characters. Unicode is an industry standard designed to allow text and symbols from all of the writing systems of the world to be consistently represented and manipulated by computers. ...
1991 (MCMXCI) was a common year starting on Tuesday of the Gregorian calendar. ...
ISO 8859-1, more formally cited as ISO/IEC 8859-1 or less formally as Latin-1, is part 1 of ISO/IEC 8859, a standard character encoding defined by ISO. It encodes what it refers to as Latin alphabet no. ...
See also An IME for inputting Japanese characters in Mac OS 9 An input method editor (IME) is a program or operating system component that allows computer users to enter complex characters and symbols (such as Chinese, Japanese, Korean and Tibetan characters), using a standard Western keyboard. ...
External links - Quick Key Character Grid inserts any character with one click.
- Character Sets and Code Pages at the Push of a Button
- AllChars Utility for Windows
- Apple's page about internationalization support for Mac OS X
- Roman Czyborra's Unicode and extended ASCII information pages
|