FACTOID # 131: United we stand? The United Kingdom and United States are both in the top ten for Gross Domestic Product - and for child poverty.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > ISO 8859

ISO 8859, more formally ISO/IEC 8859, is a joint ISO and IEC standard for 8-bit character encodings for use by computers. The standard is divided into numbered, separately published parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc., each of which may be informally referred to as a standard in and of itself. There are currently 15 parts. In other languages Iso means the following in other languages: Big in Finnish a Latin transliteration of the Greek ίσος, meaning equal. ... The International Electrotechnical Commission (IEC) is a standards organization dealing with electrical, electronic and related technologies. ... A character encoding is a code that pairs a set of natural language characters (such as an alphabet or syllabary) with a set of something else, such as numbers or electrical pulses. ...

Contents

Introduction

While the bit patterns of the 96 printable ASCII characters are sufficient to exchange information in modern English, most other languages that use the Roman alphabet need additional symbols not covered by ASCII, such as ß (German) and å (Swedish and other Nordic languages). ISO 8859 sought to remedy this problem by utilizing the eighth bit in an 8-bit byte in order to allow positions for another 128 characters. (This bit was previously used for data transmission protocol information, or was left unused.) However, more characters were needed than could fit in a single 8-bit character encoding, so several mappings were developed, including at least 10 just to cover the Latin script. There are 95 printable ASCII characters, numbered 32 to 126. ... The English language is a West Germanic language that originates in England. ... The Latin alphabet, also called the Roman alphabet, is the most widely used alphabetic writing system in the world. ... The ß — Eszett (IPA ) in German or scharfes S (sharp S) if spelled out — is a letter used only in the German alphabet. ... Å, or å, is a letter, representing a vowel, in the Swedish, Finnish, Danish, Norwegian, Walloon and Chamorro alphabets. ... A North Germanic language is any of several Germanic languages spoken in Scandinavia, parts of Finland and on the islands west of Scandinavia. ... This article refers to the unit of binary information. ...


Although ISO 8859-n and ISO-8859-n are terms often used interchangeably, the ISO 8859 standard is not the same as the well-known ISO-8859-n character encodings registered by the IANA for use on the Internet. Besides the extra hyphen being present in the IANA-registered names, the encodings differ in that each part of the ISO standard assigns, at most, 191 characters to the byte ranges 32 to 126 and 160 to 255, whereas the corresponding IANA-registered character encoding merges these mappings with the C0 control set (control characters mapped to bytes 0 to 31) and the C1 control set (control characters mapped to bytes 127 to 159), resulting in a full 8-bit character map with most, if not all, bytes assigned.


Characters

The ISO 8859 standard is designed for reliable information exchange, not typography; the standard omits symbols needed for high-quality typography, such as optional ligatures, curly quotation marks, dashes, etc. As a result, high-quality typesetting systems often use proprietary or idiosyncratic extensions on top of the ASCII and ISO 8859 standards, or use Unicode instead. Typographic work Typography (from the Greek words typos = form and grapho = write) is the art and technique of selecting and arranging type styles, point sizes, line lengths, line leading, character spacing, and word spacing for typeset applications. ... There are 95 printable ASCII characters, numbered 32 to 126. ... In computing, Unicode is the international standard whose goal is to provide the means to encode the text of every document people want to store in computers. ...


As a rule of thumb, if a character or symbol was not already part of a widely used data-processing character set and was also not usually provided on typewriter keyboards for a national language, it didn't get in. Hence the directional double quotation marks « and » used for some European languages were included, but not the directional double quotation marks and used for English and some other languages. French didn't get its œ and Œ ligatures because they could be typed as 'oe'. Ÿ, needed for all-caps text, was left out as well. These characters were, however, included later with ISO 8859-15, which also introduced the new Euro character . Likewise Dutch did not get the 'ij' and 'IJ' letters, because Dutch speakers had gotten used to typing these as two letters instead. Romanian did not initially get its ' Ș/ș' and ' Ț/ț' (with comma) letters, because these letters were initially unified with 'Ş/ş' and 'Ţ/ţ' (with cedilla) by the Unicode Consortium, considering the shapes with comma beneath to be glyph variants of the shapes with cedilla. However, the letters with explicit comma below were later added to the Unicode standard and are also in ISO 8859-16. ISO 8859-15 is part 15 of ISO 8859, a standard character encoding defined by ISO. It is also known as Latin-9, and unofficially as Latin-0 but not as Latin-15. ... The euro (€; ISO 4217 code EUR) is the currency of twelve of the twenty-five nations that form the European Union (and four outside it, as well as Montenegro and Kosovo), which form the Economic and Monetary Union (EMU). ... A comma ( , ) is a punctuation mark. ... A cedilla is a hook (¸) added under certain consonant letters as a diacritic mark to modify their pronunciation. ... In computing, Unicode is the international standard whose goal is to provide the means to encode the text of every document people want to store in computers. ... A glyph is a carved figure or character, incised or in relief; a carved pictograph; hence, a pictograph representing a form originally adopted for sculpture, whether carved or painted. ... ISO 8859-16, also known as Latin-10 or South-Eastern European, is an 8-bit character encoding, part of the ISO 8859 standard. ...


Most of the ISO 8859 encodings provide diacritic marks required for various European languages. Others provide non-Roman alphabets: Greek, Cyrillic, Hebrew, Arabic and Thai. However, the standard makes no provision for the scripts of East Asian languages (CJK), as their ideographic writing systems require many thousands of code points. Although it uses Latin based characters, Vietnamese does not fit into 96 positions either; Japanese syllabic Kana scripts, on the other hand, might, but like several other alphabets of the world isn't encoded in the ISO 8859 system. The Cyrillic alphabet (or azbuka, from the old name of the first letters) is an alphabet used to write six natural Slavic languages (Belarusian, Bulgarian, Macedonian, Russian, Serbian, and Ukrainian) and many other languages of the former Soviet Union, Asia and Eastern Europe. ... Note: This article contains special characters. ... The Arabic alphabet is the script used for writing the Arabic language. ... CJK is a collective term for Chinese, Japanese, and Korean, which comprise the largest of East Asian languages. ... A writing system, also called a script, is used to visually record a language with symbols. ... Japanese writing Kanji 漢字 Kana 仮名 Hiragana 平仮名 Katakana 片仮名 Uses Furigana 振り仮名 Okurigana 送り仮名 Romaji ローマ字 Kana is a general term for two types of syllabic Japanese script: hiragana (ひらがな) and katakana (カタカナ). ...


The Parts of ISO 8859

ISO 8859 is divided into the following parts:

ISO 8859-1 Latin-1
Western European
Perhaps the most widely used part of ISO 8859, covering most Western European languages: Basque, Catalan, Danish, Dutch (partial[1]), English, Faeroese, Finnish (partial²), French (partial[2]), German, Icelandic, Irish, Italian, Norwegian, Portuguese, Rhaeto-Romanic, Scottish, Spanish, and Swedish, Eastern European Albanian, as well as the African languages Afrikaans and Swahili. The missing Euro symbol and capital Ÿ are in the revised version ISO 8859-15. The corresponding IANA-approved character set ISO-8859-1 is the default encoding for legacy HTML documents and for documents transmitted via MIME messages, such as HTTP responses when the document's media type is "text" (as in "text/html").
ISO 8859-2 Latin-2
Central European
supports those Central and Eastern European languages that use a Roman alphabet, including Polish, Croatian, Czech, Slovak, Slovenian, and Hungarian. The missing Euro symbol can be found in version ISO 8859-16.
ISO 8859-3 Latin-3
South European
Turkish, Maltese, and Esperanto; largely superseded by ISO 8859-9 for Turkish and Unicode for Esperanto.
ISO 8859-4 Latin-4
North European
Estonian, Latvian, Lithuanian, Greenlandic, and Sami.
ISO 8859-5 Cyrillic Covers mostly Slavic languages that use a Cyrillic alphabet, including Belarusian, Bulgarian, Macedonian, Russian, Serbian, and Ukrainian (partial[3]).
ISO 8859-6 Arabic Covers the most common Arabic language characters. Doesn't support other languages using the Arabic script.
ISO 8859-7 Greek Covers the modern Greek language (monotonic orthography). Can also be used for Ancient Greek written without accents or in monotonic orthography, but lacks the diacritics for polytonic orthography. These were introduced with Unicode.
ISO 8859-8 Hebrew Covers the modern Hebrew alphabet as used in Israel. In practice two different encodings exist, logical and visual.
ISO 8859-9 Latin-5
Turkish
Largely the same as ISO 8859-1, replacing the rarely used Icelandic letters with Turkish ones. It is also used for Kurdish.
ISO 8859-10 Latin-6
Nordic
a rearrangement of Latin-4. Considered more useful for Nordic languages. Baltic languages use Latin-4 more.
ISO 8859-11 Thai Contains most glyphs needed for the Thai language.
ISO 8859-12   was supposed to be Latin-7 and cover Celtic, but this draft was rejected. Numbering continued with -13.
ISO 8859-13 Latin-7
Baltic Rim
Added some glyphs for Baltic languages which were missing from Latin-4 and Latin-6.
ISO 8859-14 Latin-8
Celtic
Mostly a rearrangement of the ISO 8859-12 draft. Covers Celtic languages such as Gaelic and the Breton language.
ISO 8859-15 Latin-9 a revision of 8859-1 that removes some little-used symbols, replacing them with the Euro symbol and the letters Š, š, Ž, ž, Œ, œ, and Ÿ, which completes the coverage of French, Finnish and Estonian.
ISO 8859-16 Latin-10
South-Eastern European
Intended for Albanian, Croatian, Hungarian, Italian, Polish, Romanian and Slovenian, but also Finnish, French, German and Irish Gaelic (new orthography). The focus lies more on letters than symbols. The currency sign is replaced with the Euro symbol.

[1]—only the IJ/ij (letter IJ) is missing, which is usually represented as IJ.
[2]—missing characters are in ISO 8859-15.
[3]—missing Ґ/ґ characters were reintroduced into Ukrainian in 1991. ISO 8859-1, more formally cited as ISO/IEC 8859-1 or less formally as Latin-1, is part 1 of ISO/IEC 8859, a standard character encoding defined by ISO. It encodes what it refers to as Latin alphabet no. ... Basque (Euskara in Basque) is the language spoken by the Basque people, who live in northern Spain and the adjoining area of southwestern France. ... Catalan (Català, Valencià) is a Romance language understood by as many as 12 million people in portions of Spain, France, Andorra and Italy, although the majority of active Catalan speakers are in Spain. ... The English language is a West Germanic language that originates in England. ... Faroese is a West Nordic or West Scandinavian language spoken by about 48,000 people in the Faroe Islands and about 25,000 in Denmark. ... Note: This page contains IPA phonetic symbols in Unicode. ... Scottish language may be: Scottish English - the form of the English language used in Scotland. ... Afrikaans is a West Germanic language mainly spoken in South Africa and Namibia. ... Swahili (also called Kiswahili; see below for a discussion of the nomenclature) is a Bantu language widely spoken in East Africa. ... The euro (€; ISO 4217 code EUR) is the currency of twelve European Union member states: Austria, Belgium, Finland, France, Germany, Greece, Ireland, Italy, Luxembourg, the Netherlands, Portugal, and Spain. ... In computing, HyperText Markup Language (HTML) is a markup language designed for the creation of web pages and other information viewable in a browser. ... HTTP (for HyperText Transfer Protocol) is the primary method used to convey information on the World Wide Web. ... ISO 8859-2, more formally cited as ISO/IEC 8859-2 or less formally as Latin-2, is part 2 of ISO/IEC 8859, a standard character encoding defined by ISO. It encodes what it refers to as Latin alphabet no. ... The euro (€; ISO 4217 code EUR) is the currency of twelve European Union member states: Austria, Belgium, Finland, France, Germany, Greece, Ireland, Italy, Luxembourg, the Netherlands, Portugal, and Spain. ... ISO 8859-3, also known as Latin-3 or South European is an 8-bit character encoding, part of the ISO 8859 standard. ... Esperanto flag Esperanto is the most widely spoken constructed international language. ... ISO 8859-9, also known as Latin-5 or Turkish, is an 8-bit character encoding, part of the ISO 8859 standard. ... In computing, Unicode is the international standard whose goal is to provide the means to encode the text of every document people want to store in computers. ... ISO 8859-4, also known as Latin-4 or North European, is an 8-bit character encoding, part of the ISO 8859 standard. ... Sami is a general name for a group of Finno-Ugric languages spoken in parts of Norway, Sweden, Finland and Russia, in Northern Europe. ... ISO 8859-5, also known as Cyrillic is an 8-bit character encoding, part of the ISO 8859 standard. ... The Cyrillic alphabet (or azbuka, from the old name of the first letters) is an alphabet used to write six natural Slavic languages (Belarusian, Bulgarian, Macedonian, Russian, Serbian, and Ukrainian) and many other languages of the former Soviet Union, Asia and Eastern Europe. ... The Serbian language or Serb language is one of the standard versions of the Central-South Slavic diasystem, formerly (and still frequently) called Serbo-Croatian. ... ISO 8859-6, also known as Arabic, is an 8-bit character encoding, part of the ISO 8859 standard. ... Arabic (العربية) is a Semitic language, closely related to Hebrew and Aramaic. ... The Arabic alphabet is the script used for writing the Arabic language, which is the language of the Quran, the holy book of Islam. ... ISO 8859-7, also known as Greek, is an 8-bit character encoding, part of the ISO 8859 standard. ... The Greek language (Greek Ελληνικά, IPA – Hellenic) is an Indo-European language with a documented history of some 3,000 years. ... Monotonic orthography is the simplified way for spelling modern Greek introduced in the 1980s. ... Polytonic orthography for Greek uses a variety of diacritics (πολύ = many + τόνος = accent) to represent aspects of Ancient Greek pronunciation. ... ISO 8859-8, more formally cited as ISO/IEC 8859-8 (but not as Latin-8!), is part 8 of ISO/IEC 8859, a standard character encoding defined by ISO. ISO 8859-8 contains all the Hebrew letters (consonants only, no Hebrew vowel signs). ... Note: This article contains special characters. ... ISO 8859-9, also known as Latin-5 or Turkish, is an 8-bit character encoding, part of the ISO 8859 standard. ... Geographic distribution The Kurdish languages or Kurdish dialects are spoken in the region loosely called Kurdistan including Kurdish populations in parts of Iran, Iraq, Syria and Turkey. ... ISO 8859-10, also known as Latin-6, is an 8-bit character encoding, part of the ISO 8859 standard. ... ISO 8859-11 is an 8-bit character encoding, part of the ISO 8859 standard. ... The Thai language is the official language of Thailand. ... ISO 8859-12 does not exist as part of the ISO 8859 standard. ... ISO 8859-13, also known as Latin-7 or Baltic Rim, is an 8-bit character encoding, part of the ISO 8859 standard. ... ISO 8859-14, also known as Latin-8 or Celtic, is an 8-bit character encoding, part of the ISO 8859 standard. ... Goidelic is one of two major divisions of modern-day Celtic languages (the other being Brythonic). ... Breton (Brezhoneg) is a Celtic language spoken by some of the inhabitants of Brittany in France. ... ISO 8859-15 is part 15 of ISO 8859, a standard character encoding defined by ISO. It is also known as Latin-9, and unofficially as Latin-0 but not as Latin-15. ... The euro (€; ISO 4217 code EUR) is the currency of twelve European Union member states: Austria, Belgium, Finland, France, Germany, Greece, Ireland, Italy, Luxembourg, the Netherlands, Portugal, and Spain. ... ISO 8859-16, also known as Latin-10 or South-Eastern European, is an 8-bit character encoding, part of the ISO 8859 standard. ... Note: This page contains phonetic information presented in the International Phonetic Alphabet (IPA) using Unicode. ... The euro (€; ISO 4217 code EUR) is the currency of twelve European Union member states: Austria, Belgium, Finland, France, Germany, Greece, Ireland, Italy, Luxembourg, the Netherlands, Portugal, and Spain. ... IJ (IJ) is a letter from the Dutch alphabet used to represent the diphthong or . ...


Each part of ISO 8859 is designed to support languages that often borrow from each other, so the characters needed by each language are usually accommodated by a single part. However, there are some characters and language combinations that are not accommodated without transcriptions. Efforts were made to make conversions as smooth as possible. For example, German has all its seven special chars at the same positions in all Latin variants (1-4, 9-10, 13-16), and in many positions the characters only differ in the diacritics between the sets. In particular, variants 1-4 were designed jointly, and have the property that every encoded character appears either at a given position or not at all.


Table

Comparison of the various parts of ISO 8859
Binary Oct Dec Hex 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16
10100000 240 160 A0 Non-breaking space (NBSP)
10100001 241 161 A1 ¡ Ą Ħ Ą Ё     ¡ Ą ¡ Ą
10100010 242 162 A2 ¢ ˘ ˘ ĸ Ђ   ¢ ¢ Ē ¢ ¢ ą
10100011 243 163 A3 £ Ł £ Ŗ Ѓ   £ £ £ Ģ £ £ £ Ł
10100100 244 164 A4 ¤ ¤ ¤ ¤ Є ¤ ¤ ¤ Ī ¤ Ċ
10100101 245 165 A5 ¥ Ľ   Ĩ Ѕ   ¥ ¥ Ĩ ċ ¥
10100110 246 166 A6 ¦ Ś Ĥ Ļ І   ¦ ¦ ¦ Ķ ¦ Š Š
10100111 247 167 A7 § § § § Ї   § § § § § § § §
10101000 250 168 A8 ¨ ¨ ¨ ¨ Ј   ¨ ¨ ¨ Ļ Ø š š
10101001 251 169 A9 © Š İ Š Љ   © © © Đ © © © ©
10101010 252 170 AA ª Ş Ş Ē Њ   ͺ × ª Š Ŗ ª Ș
10101011 253 171 AB « Ť Ğ Ģ Ћ   « « « Ŧ « « «
10101100 254 172 AC ¬ Ź Ĵ Ŧ Ќ ، ¬ ¬ ¬ Ž ¬ ¬ Ź
10101101 255 173 AD ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­
10101110 256 174 AE ® Ž   Ž Ў     ® ® Ū ® ® ® ź
10101111 257 175 AF ¯ Ż Ż ¯ Џ   ¯ ¯ Ŋ Æ Ÿ ¯ Ż
10110000 260 176 B0 ° ° ° ° А   ° ° ° ° ° ° °
10110001 261 177 B1 ± ą ħ ą Б   ± ± ± ą ± ± ±
10110010 262 178 B2 ² ˛ ² ˛ В   ² ² ² ē ² Ġ ² Č
10110011 263 179 B3 ³ ł ³ ŗ Г   ³ ³ ³ ģ ³ ġ ³ ł
10110100 264 180 B4 ´ ´ ´ ´ Д   ΄ ´ ´ ī Ž Ž
10110101 265 181 B5 µ ľ µ ĩ Е   ΅ µ µ ĩ µ µ
10110110 266 182 B6 ś ĥ ļ Ж   Ά ķ
10110111 267 183 B7 · ˇ · ˇ З   · · · · · · ·
10111000 270 184 B8 ¸ ¸ ¸ ¸ И   Έ ¸ ¸ ļ ø ž ž
10111001 271 185 B9 ¹ š ı š Й   Ή ¹ ¹ đ ¹ ¹ č
10111010 272 186 BA º ş ş ē К   Ί ÷ º š ŗ º ș
10111011 273 187 BB » ť ğ ģ Л ؛ » » » ŧ » » »
10111100 274 188 BC ¼ ź ĵ ŧ М   Ό ¼ ¼ ž ¼ Œ Œ
10111101 275 189 BD ½ ˝ ½ Ŋ Н   ½ ½ ½ ½ œ œ
10111110 276 190 BE ¾ ž   ž О   Ύ ¾ ¾ ū ¾ Ÿ Ÿ
10111111 277 191 BF ¿ ż ż ŋ П ؟ Ώ   ¿ ŋ æ ¿ ż
11000000 300 192 C0 À Ŕ À Ā Р   ΐ   À Ā Ą À À À
11000001 301 193 C1 Á Á Á Á С ء Α   Á Á Į Á Á Á
11000010 302 194 C2 Â Â Â Â Т آ Β   Â Â Ā Â Â Â
11000011 303 195 C3 Ã Ă   Ã У أ Γ   Ã Ã Ć Ã Ã Ă
11000100 304 196 C4 Ä Ä Ä Ä Ф ؤ Δ   Ä Ä Ä Ä Ä Ä
11000101 305 197 C5 Å Ĺ Ċ Å Х إ Ε   Å Å Å Å Å Ć
11000110 306 198 C6 Æ Ć Ĉ Æ Ц ئ Ζ   Æ Æ Ę Æ Æ Æ
11000111 307 199 C7 Ç Ç Ç Į Ч ا Η   Ç Į Ē Ç Ç Ç
11001000 310 200 C8 È Č È Č Ш ب Θ   È Č Č È È È
11001001 311 201 C9 É É É É Щ ة Ι   É É É É É É
11001010 312 202 CA Ê Ę Ê Ę Ъ ت Κ   Ê Ę Ź Ê Ê Ê
11001011 313 203 CB Ë Ë Ë Ë Ы ث Λ   Ë Ë Ė Ë Ë Ë
11001100 314 204 CC Ì Ě Ì Ė Ь ج Μ   Ì Ė Ģ Ì Ì Ì
11001101 315 205 CD Í Í Í Í Э ح Ν   Í Í Ķ Í Í Í
11001110 316 206 CE Î Î Î Î Ю خ Ξ   Î Î Ī Î Î Î
11001111 317 207 CF Ï Ď Ï Ī Я د Ο   Ï Ï Ļ Ï Ï Ï
11010000 320 208 D0 Ð Đ   Đ а ذ Π   Ğ Ð Š Ŵ Ð Đ
11010001 321 209 D1 Ñ Ń Ñ Ņ б ر Ρ   Ñ Ņ Ń Ñ Ñ Ń
11010010 322 210 D2 Ò Ň Ò Ō в ز     Ò Ō Ņ Ò Ò Ò
11010011 323 211 D3 Ó Ó Ó Ķ г س Σ   Ó Ó Ó Ó Ó Ó
11010100 324 212 D4 Ô Ô Ô Ô д ش Τ   Ô Ô Ō Ô Ô Ô
11010101 325 213 D5 Õ Ő Ġ Õ е ص Υ   Õ Õ Õ Õ Õ Ő
11010110 326 214 D6 Ö Ö Ö Ö ж ض Φ   Ö Ö Ö Ö Ö Ö
11010111 327 215 D7 × × × × з ط Χ   × Ũ × × Ś
11011000 330 216 D8 Ø Ř Ĝ Ø и ظ Ψ   Ø Ø Ų Ø Ø Ű
11011001 331 217 D9 Ù Ů Ù Ų й ع Ω   Ù Ų Ł Ù Ù Ù
11011010 332 218 DA Ú Ú Ú Ú к غ Ϊ   Ú Ú Ś Ú Ú Ú
11011011 333 219 DB Û Ű Û Û л   Ϋ   Û Û   Ū Û Û Û
11011100 334 220 DC Ü Ü Ü Ü м   ά   Ü Ü   Ü Ü Ü Ü
11011101 335 221 DD Ý Ý Ŭ Ũ н   έ   İ Ý   Ż Ý Ý Ę
11011110 336 222 DE Þ Ţ Ŝ Ū о   ή   Ş Þ   Ž Ŷ Þ Ț
11011111 337 223 DF ß ß ß ß п   ί ß ß ฿ ß ß ß ß
11100000 340 224 E0 à ŕ à ā р ـ ΰ א à ā ą à à à
11100001 341 225 E1 á á á á с ف α ב á á į á á á
11100010 342 226 E2 â â â â т ق β ג â â ā â â â
11100011 343 227 E3 ã ă   ã у ك γ ד ã ã ć ã ã ă
11100100 344 228 E4 ä ä ä ä ф ل δ ה ä ä ä ä ä ä
11100101 345 229 E5 å ĺ ċ å х م ε ו å å å å å ć
11100110 346 230 E6 æ ć ĉ æ ц ن ζ ז æ æ ę æ æ æ
11100111 347 231 E7 ç ç ç į ч ه η ח ç į ē ç ç ç
11101000 350 232 E8 è č è č ш و θ ט è č č è è è
11101001 351 233 E9 é é é é щ ى ι י é é é é é é
11101010 352 234 EA ê ę ê ę ъ ي κ ך ê ę ź ê ê ê
11101011 353 235 EB ë ë ë ë ы ً λ כ ë ë ė ë ë ë
11101100 354 236 EC ì ě ì ė ь ٌ μ ל ì ė ģ ì ì ì
11101101 355 237 ED í í í í э ٍ ν ם í í ķ í í í
11101110 356 238 EE î î î î ю َ ξ מ î î ī î î î
11101111 357 239 EF ï ď ï ī я ُ ο ן ï ï ļ ï ï ï
11110000 360 240 F0 ð đ   đ ȑ ِ π נ ğ ð š ŵ ð đ
11110001 361 241 F1 ñ ń ñ ņ ё ّ ρ ס ñ ņ ń ñ ñ ń
11110010 362 242 F2 ò ň ò ō ђ ْ ς ע ò ō ņ ò ò ò
11110011 363 243 F3 ó ó ó ķ ѓ   σ ף ó ó ó ó ó ó
11110100 364 244 F4 ô ô ô ô є   τ פ ô ô ō ô ô ô
11110101 365 245 F5 õ ő ġ õ ѕ   υ ץ õ õ õ õ õ ő
11110110 366 246 F6 ö ö ö ö і   φ צ ö ö ö ö ö ö
11110111 367 247 F7 ÷ ÷ ÷ ÷ ї   χ ק ÷ ũ ÷ ÷ ś
11111000 370 248 F8 ø ř ĝ ø ј   ψ ר ø ø ų ø ø ű
11111001 371 249 F9 ù ů ù ų љ   ω ש ù ų ł ù ù ù
11111010 372 250 FA ú ú ú ú њ   ϊ ת ú ú ś ú ú ú
11111011 373 251 FB û ű û û ћ   ϋ   û û ū û û û
11111100 374 252 FC ü ü ü ü ќ   ό   ü ü   ü ü ü ü
11111101 375 253 FD ý ý ŭ ũ §   ύ LRM ı ý   ż ý ý ę
11111110 376 254 FE þ ţ ŝ ū ў   ώ RLM ş þ   ž ŷ þ ț
11111111 377 255 FF ÿ ˙ ˙ ˙ џ       ÿ ĸ   ÿ ÿ ÿ

At position 0xA0 there's always the non breaking space and 0xAD is mostly the soft hyphen, which only shows at line breaks. Other empty fields are either unassigned or the system used isn't able to display them. The binary or base-two numeral system is a system for representing numbers in which a radix of two is used; that is, each digit in a binary numeral may have either of two different values. ... The octal numeral system is the base-8 number system, and uses the digits 0 to 7. ... Decimal, or less commonly, denary, usually refers to the base 10 numeral system. ... In mathematics and computer science, hexadecimal or simply hex is a numeral system with a radix or base of 16 usually written using the symbols 0–9 and A–F or a–f. ...


There are new additions as ISO/IEC 8859-7:2003 and ISO/IEC 8859-8:1999 versions. LRM stands for left-to-right mark (U+200E) and RLM stands for right-to-left mark (U+200F).


Relationship to Unicode and the UCS

Since 1991, the Unicode Consortium has been working with ISO to develop the Unicode Standard and ISO/IEC 10646: the Universal Character Set (UCS) in tandem. This pair of standards was created to unify the ISO 8859 character repertoire, among others, by assigning each character, initially, to a 16-bit code value, with some code values left unassigned. Over time, their models adapted to map characters to abstract numeric code points rather than fixed bit-width values, so that more code points and encoding methods could be supported. In computing, Unicode is the international standard whose goal is to provide the means to encode the text of every document people want to store in computers. ... The Universal Character Set is a character encoding that is defined by the international standard ISO/IEC 10646. ...


Unicode and ISO/IEC 10646 currently assign about 100,000 characters to a code space consisting of over a million code points, and they define several standard encodings that are capable of representing every available code point. The standard encodings of Unicode and the UCS use sequences of one to four 8-bit code values (UTF-8), sequences of one or two 16-bit code values (UTF-16), or one 32-bit code value (UTF-32 or UCS-4). There is also an older encoding that uses one 16-bit code value (UCS-2), capable of representing one-seventeenth of the available code points. Of these encoding forms, only UTF-8's byte sequences are in a fixed order; the others are subject to platform-dependent byte ordering issues that may be addressed via special codes or indicated via out-of-band means. UTF-8 (8-bit Unicode Transformation Format) is a lossless, variable-length character encoding for Unicode created by Ken Thompson and Rob Pike. ... In computing, UTF-16 is a 16-bit Unicode Transformation Format, a character encoding form that provides a way to represent a series of abstract characters from Unicode and ISO/IEC 10646 as a series of 16-bit words suitable for storage or transmission via data networks. ... UTF-32 is a method of encoding Unicode characters, using a fixed amount of 32 bits for each character. ... UTF-32 is a method of encoding Unicode characters, using a fixed amount of 32 bits for each character. ... The Universal Character Set is a character encoding that is defined by the international standard ISO/IEC 10646. ...


Newer editions of ISO 8859 express characters in terms of their Unicode/UCS names and the U+nnnn notation, effectively causing each part of ISO 8859 to be a Unicode/UCS character encoding scheme that maps a very small subset of the UCS to single 8-bit bytes. The first 256 characters in Unicode and the UCS are identical to those in ISO-8859-1.


ISO 8859 was favored throughout the 1990s, having the advantages of being well-established and more easily implemented in software: the equation of one byte to one character is simple and adequate for most single-language applications, and there are no combining characters or variant forms.


As the relative cost, in computing resources, of using more than one byte per character began to diminish, programming languages and operating systems added native support for Unicode alongside their system of code pages. As Unicode-enabled operating systems became more widespread, ISO 8859 and other legacy encodings became less popular. While remnants of ISO 8859 and single-byte character models remain entrenched in many operating systems, programming languages, data storage systems, networking applications, display hardware, and end-user application software, most modern computing applications use Unicode internally, and rely on conversion tables to map to and from the simpler encodings, when necessary. Code page is the traditional IBM term used for a specific character encoding table: a mapping in which a sequence of bits, usually a single octet representing integer values 0 through 255, is associated with a specific character. ...


Development status

The ISO/IEC 8859 standard was maintained by ISO/IEC Joint Technical Committee 1, Subcommittee 2, Working Group 3 (ISO/IEC JTC 1/SC 2/WG 3). In June 2004, WG 3 disbanded, and maintenance duties were transferred to SC 2. The standard is not currently being updated, as the Subcommittee's only remaining Working Group, WG 2, is concentrating on development of ISO/IEC 10646. The Universal Character Set is a character encoding that is defined by the international standard ISO/IEC 10646. ...


References

  • Published versions of each part of ISO/IEC 8859 are available, for a fee, from the ISO catalogue site (http://www.iso.ch/iso/en/stdsdevelopment/tc/tclist/TechnicalCommitteeStandardsListPage.TechnicalCommitteeStandardsList?COMMID=23) and from the ANSI eStandards Store (http://webstore.ansi.org/ansidocstore/find.asp?find_spec=8859).
  • PDF versions of the final drafts of some parts of ISO/IEC 8859 as submitted for review & publication by ISO/IEC JTC 1/SC 2/WG 3 are available at the WG 3 web site (http://anubis.dkuug.dk/JTC1/SC2/WG3/):
    • ISO/IEC 8859-1:1998 (http://anubis.dkuug.dk/JTC1/SC2/WG3/docs/n411.pdf) - 8-bit single-byte coded graphic character sets, Part 1: Latin alphabet No. 1 (draft dated February 12, 1998, published April 15, 1998)
    • ISO/IEC 8859-4:1998 (http://anubis.dkuug.dk/JTC1/SC2/WG3/docs/n413.pdf) - 8-bit single-byte coded graphic character sets, Part 4: Latin alphabet No. 4 (draft dated February 12, 1998, published July 1, 1998)
    • ISO/IEC 8859-7:1999 (http://anubis.dkuug.dk/jtc1/sc2/open/02n3329.pdf) - 8-bit single-byte coded graphic character sets, Part 7: Latin/Greek alphabet (draft dated June 10, 1999; superseded by ISO/IEC 8859-7:2003, published October 10, 2003)
    • ISO/IEC 8859-10:1998 (http://anubis.dkuug.dk/JTC1/SC2/WG3/docs/n415.pdf) - 8-bit single-byte coded graphic character sets, Part 10: Latin alphabet No. 6 (draft dated February 12, 1998, published July 15, 1998)
    • ISO/IEC 8859-11:1999 (http://anubis.dkuug.dk/jtc1/sc2/open/02n3333.pdf) - 8-bit single-byte coded graphic character sets, Part 11: Latin/Thai character set (draft dated June 22, 1999; superseded by ISO/IEC 8859-11:2001, published Dec 15, 2001)
    • ISO/IEC 8859-13:1998 (http://anubis.dkuug.dk/JTC1/SC2/WG3/docs/n451.pdf) - 8-bit single-byte coded graphic character sets, Part 13: Latin alphabet No. 7 (draft dated April 15, 1998, published October 15, 1998)
    • ISO/IEC 8859-15:1998 (http://anubis.dkuug.dk/JTC1/SC2/WG3/docs/n404.pdf) - 8-bit single-byte coded graphic character sets, Part 15: Latin alphabet No. 9 (draft dated August 1, 1997; superseded by ISO/IEC 8859-15:1999, published March 15, 1999)
    • ISO/IEC 8859-16:2000 (http://anubis.dkuug.dk/jtc1/sc2/open/02n3389.pdf) - 8-bit single-byte coded graphic character sets, Part 16: Latin alphabet No. 10 (draft dated November 15, 1999; superseded by ISO/IEC 8859-16:2001, published July 15, 2001)
  • ECMA standards, which in intent correspond exactly to the ISO/IEC 8859 character set standards, can be found at:
    • Standard ECMA-94 (http://www.ecma-international.org/publications/standards/Ecma-094.htm): 8-Bit Single Byte Coded Graphic Character Sets - Latin Alphabets No. 1 to No. 4 2nd edition (June 1986)
    • Standard ECMA-113 (http://www.ecma-international.org/publications/standards/Ecma-113.htm): 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Cyrillic Alphabet 3rd edition (December 1999)
    • Standard ECMA-114 (http://www.ecma-international.org/publications/standards/Ecma-114.htm): 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Arabic Alphabet 2nd edition (December 2000)
    • Standard ECMA-118 (http://www.ecma-international.org/publications/standards/Ecma-118.htm): 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Greek Alphabet (December 1986)
    • Standard ECMA-121 (http://www.ecma-international.org/publications/standards/Ecma-121.htm): 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Hebrew Alphabet 2nd edition (December 2000)
    • Standard ECMA-128 (http://www.ecma-international.org/publications/standards/Ecma-128.htm): 8-Bit Single-Byte Coded Graphic Character Sets - Latin Alphabet No. 5 2nd edition (December 1999)
    • Standard ECMA-144 (http://www.ecma-international.org/publications/standards/Ecma-144.htm): 8-Bit Single-Byte Coded Character Sets - Latin Alphabet No. 6 3rd edition (December 2000)
  • ISO/IEC 8859-1 to Unicode mapping tables (ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859) as plain text files are at the Unicode FTP site.
  • Informal descriptions and code charts for most ISO 8859 standards are available in ISO 8859 Alphabet Soup (http://czyborra.com/charsets/iso8859.html) (Mirror) (http://www.lysator.liu.se/~jmo/czyborra_index.html)

  Results from FactBites:
 
ISO/IEC 8859 - Wikipedia, the free encyclopedia (2019 words)
ISO 8859 sought to remedy this problem by utilizing the eighth bit in an 8-bit byte in order to allow positions for another 128 characters.
The ISO 8859 standard is designed for reliable information exchange, not typography; the standard omits symbols needed for high-quality typography, such as optional ligatures, curly quotation marks, dashes, etc. As a result, high-quality typesetting systems often use proprietary or idiosyncratic extensions on top of the ASCII and ISO 8859 standards, or use Unicode instead.
ISO 8859 was favored throughout the 1990s, having the advantages of being well-established and more easily implemented in software: the equation of one byte to one character is simple and adequate for most single-language applications, and there are no combining characters or variant forms.
Encyclopedia4U - ISO 8859 - Encyclopedia Article (762 words)
ISO 8859 is a group of related ISO standards for 8-bit character encodings for use by computers.
ISO 8859-3 (aka Latin-3 or "South European")---Turkish, Maltese, and Esperanto; largely superseded by ISO 8859-9 for Turkish and Unicode for Esperanto.
However, ISO 8859 has the advantage of being well-established, and simpler software is needed to manipulate it: the equation of one byte to one character holds, there are no combining characters or variant forms, and fonts remain conveniently small.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.