FACTOID # 29: Qataris have lots and lots of gas.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > Code pages

Code page is the traditional IBM term used for a specific character encoding table: a mapping in which a sequence of bits, usually a single octet representing integer values 0 through 255, is associated with a specific character. A few code pages use more than 8 bits per character and thus encode more than 256 characters. The term cmap (character map) is used in technical documentation on Macintosh platforms.


Although IBM created and maintained many code pages, the term came to be associated primarily with character maps used by the IBM PC and compatible platforms, especially prior to the advent of Unicode-capable programming languages and operating systems.


To this day, it is typical for PC hardware to support a single 8-bit code page that is, by default, for a particular regional market, and to make available mechanisms for operating systems to switch to other code pages. However, it is now commonplace for operating system vendors to provide their own character encoding and rendering systems that bypass the hardware code pages entirely. These alternative character encodings are sometimes called code pages as well.

Contents

Relationship to ASCII

The basis of many PC code pages is ASCII, a 7-bit code representing 128 characters and control codes. In the past, 8-bit extensions to the ASCII code often either set the top bit to zero, or used it as a parity bit in network data transmissions. When this bit was instead made available for representing character data, another 128 characters and control codes could be represented. IBM used this extended range to encode characters used by various languages. No formal standard existed for these ‘extended character sets’; IBM merely referred to the variants as code pages, as it had always done for variants of EBCDIC encodings.


Partial List of IBM Code Pages

These codepages are most often used under MS-DOS-like operating systems; they include a lot of box drawing characters. Since the original IBM PC code page (number 437) was not really designed for international use, several incompatible variants emerged. Examples include:

Other code pages of note

In modern applications, operating systems and programming languages, the IBM code pages have been rendered obsolete by international standards, such as ISO 8859-1 and Unicode.


Microsoft code pages

Microsoft defined a number of proprietary code page extensions which were subtly (or grossly) incompatible with those by other vendors:

The most notable of these is the windows-1252 code page, which contains a range of typographical punctuation characters, the euro sign, and a few other special characters, in character positions which were reserved for control characters in the ISO 8859_1 "latin_1" code page.


Many Microsoft products produce characters in these ranges automatically, notably with ‘smartquotes’. This means that other software has to choose between

  • not interoperating with documents produced with Microsoft applications
  • mis_rendering the text in question
  • adding support for the Microsoft code pages, in effect making Microsoft’s implementation a de facto standard.

These code pages were sometimes viewed as part of Microsoft’s embrace, extend and extinguish strategy towards open standards. On the other hand, when standards bodies decided to not assign characters to the control code positions 80–9F, a precious 12.5% of the available space appeared to be wasted. This, perhaps, was not in users’ best interests, either. Fortunately, the ongoing transition to Unicode support now offers standards-based applications the possibility of full interoperability with the character repertoire of these documents without giving up standards compliance on output.


Private code pages

When, early in the history of personal computers, users didn’t find their character encoding requirements met, private or local codepages were created using Terminate and Stay Resident utilities or by re-programming BIOS EPROMs. In some cases, unofficial code page numbers were invented (e.g., cp895).


When more diverse character set support became available most of those code pages fell into disuse, with some exceptions such as the Kamenický or KEYBCS2 encoding for the Czech and Slovak alphabets.


See also

External links

  • IBM code pages (http://www-1.ibm.com/servers/eserver/iseries/software/globalization/codepages.html)
  • Microsoft Console code pages (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/console_code_pages.asp)
  • Character Sets And Code Pages At The Push Of A Button (http://www.i18nguy.com/unicode/codepages.html)



  Results from FactBites:
 
Code page - Wikipedia, the free encyclopedia (971 words)
Code page is the traditional IBM term used for a specific character encoding table: a mapping in which a sequence of bits, usually a single octet representing integer values 0 through 255, is associated with a specific character.
Microsoft defined a number of code pages known as the ANSI code pages (as the first one, 1252 was based on an ansi draft of what became ISO 8859-1).
Code page 1252 is built on ISO 8859-1 but uses the range 0x80-0x9F for extra printable characters rather than the C1 control codes used in ISO-8859-1.
Code page 437 - Wikipedia, the free encyclopedia (396 words)
IBM PC or MS-DOS code page 437, often abbreviated CP437 and also known as DOS-US or OEM-US, is the original character set of the IBM PC, circa 1981.
The codes can assume their original function as controls, but when placed in display RAM, for example in a screen editor like MS-DOS edit, they show as graphics.
All CP437 characters are in Unicode and in Microsoft's WGL4 character set, therefore in most of the fonts on Microsoft Windows, and also in the default VGA font of the Linux kernel, and the ISO 10646 fonts for X11.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.