FACTOID # 93: Saudi diplomats have 367 unpaid parking fines in Britain.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > Binary Ordered Compression for Unicode
Unicode
Encodings
UCS
Mapping
Bi-directional text
BOM
Han unification
Unicode and HTML
Unicode and e-mail
Unicode typefaces

BOCU-1 is a MIME compatible Unicode compression scheme. BOCU stands for Binary Ordered Compression for Unicode. BOCU-1 combines the wide applicability of UTF-8 with the compactness of SCSU. This Unicode encoding is useful for compressing short strings, and it maintains code point order. Usually, the zip, bzip2, and other industry standard algorithms compact larger amounts of Unicode text more efficiently. Unicode is an industry standard designed to allow text and symbols from all of the writing systems of the world to be consistently represented and manipulated by computers. ... This page compares Unicode encodings. ... UTF-7 (7-bit Unicode Transformation Format) is a variable-length character encoding that was proposed for representing Unicode-encoded text using a stream of ASCII characters, for example for use in Internet e-mail messages. ... UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode created by Ken Thompson and Rob Pike. ... CESU-8 is a variant of UTF-8 that is described in Unicode Technical Report 26. ... In computing, UTF-16 is a variable-length (16 or 32 bits) character encoding. ... UTF-32 and UCS-4 are alternate names for a method of encoding Unicode characters, using the fixed amount of exactly 32 bits for each Unicode code point. ... UTF-EBCDIC is an encoding of Unicode that is meant to be EBCDIC friendly so that some older EBCDIC applications can handle some Unicode data. ... The Standard Compression Scheme for Unicode (SCSU) is a Unicode Technical Standard to reduce the number of bytes needed to represent text, especially if that text uses mostly characters from a small number of Unicode blocks. ... This article or section may be confusing for some readers, and should be edited to be clearer. ... GB18030 is the registered internet name for the official character set of the Peoples Republic of China. ... The international standard ISO/IEC 10646 defines the Universal Character Set (UCS) as a character encoding. ... Unicode’s Universal Character Set potentially supports over 1 million code points (1,114,112 = 220 + 216 or 17 × 216, hexadecimal 110000) code points. ... Some writing systems of the world, such as Arabic and Hebrew, are written in a form known as right-to-left (RTL), in which writing begins at the right-hand side of a page and concludes at the left-hand side. ... A Byte Order Mark (BOM) is the character at code point U+FEFF (zero-width no-break space), when that character is used to denote the endianness of a string of UCS/Unicode characters encoded in UTF-16 or UTF-32 and/or as a marker to indicate that text... Han unification is the process used by the authors of Unicode and the Universal Character Set to map multiple character sets of the CJK languages into a single set of unified characters. ... The relationship between Unicode and HTML tends to be a difficult topic for many computer professionals, document authors, and web users alike. ... Many e-mail clients are now able to use Unicode. ... Unicode typefaces (also known as UCS fonts and Unicode fonts) contains wide range of characters, letters, digits, glyphs, symbols, ideograms, logograms, etc, which are collectively mapped into Universal Character Set, also known as, UCS (which is an international standard ISO/IEC 10646), derived from many different languages, scripts from all... Multipurpose Internet Mail Extensions (MIME) is an Internet Standard that extends the format of e-mail to support: text in character sets other than US-ASCII; non-text attachments; multi-part message bodies; and header information in non-ASCII character sets. ... UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode created by Ken Thompson and Rob Pike. ... The Standard Compression Scheme for Unicode (SCSU) is a Unicode Technical Standard to reduce the number of bytes needed to represent text, especially if that text uses mostly characters from a small number of Unicode blocks. ... Unicode is an industry standard designed to allow text and symbols from all of the writing systems of the world to be consistently represented and manipulated by computers. ... Look up encoding in Wiktionary, the free dictionary. ... The ZIP file format is the most widely-used compressed file format in the IBM PC world. ... The correct title of this article is . ...


SCSU was created as a Unicode compression scheme with a byte/code point ratio similar to language-specific code pages. It has not been widely adopted although it fulfills the criteria for an IANA charset and is registered with IANA. SCSU is not suitable for MIME “text” media types. For example, SCSU cannot be used directly in emails and similar protocols. SCSU requires a complicated encoder design for good performance. Code page is the traditional IBM term used for a specific character encoding table: a mapping in which a sequence of bits, usually a single octet representing integer values 0 through 255, is associated with a specific character. ... The Internet Assigned Numbers Authority (IANA) is the entity that oversees global IP address allocation, DNS root zone management, and other Internet protocol assignments. ...


It is worth noting that SCSU has been adopted as an official Unicode Technical Standard. BOCU-1 has not been officially adopted by the Unicode consortium, but Unicode Technical Note #6 does describe this encoding in more detail.


See also

International Components for Unicode (ICU) is an open source project of mature C/C++ and Java libraries for Unicode support, software internationalization and software globalization. ...

External links


  Results from FactBites:
 
Binary-ordered compression for Unicode (2535 words)
Binary-Ordered Compression for Unicode (BOCU) is a new, general compression format for Unicode.
The binary sort order of BOCU is the same as Unicode code points: SCSU has essentially random binary order, which makes it unsuitable in many applications.
International Components for Unicode (ICU) is a C and C++ library that provides robust and full-featured Unicode support on a wide variety of platforms.
Binary Ordered Compression for Unicode - Wikipedia, the free encyclopedia (230 words)
BOCU stands for Binary Ordered Compression for Unicode.
This Unicode encoding is useful for compressing short strings, and it maintains code point order.
SCSU was created as a Unicode compression scheme with a byte/code point ratio similar to language-specific codepages.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.