FACTOID # 148: The top ten tourist destinations France, Spain, USA, Italy, China, UK, Austria, Mexico, Germany and Canada account for 49.6 percent of all tourist arrivals worldwide.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > BOCU
Unicode
Encodings
UCS
Bi-directional text
BOM
Han unification
Unicode and HTML
Unicode and e-mail

BOCU-1 is a MIME compatible Unicode compression scheme. BOCU stands for Binary Ordered Compression for Unicode. BOCU-1 combines the wide applicability of UTF-8 with the compactness of SCSU. This Unicode encoding is useful for compressing short strings, and it maintains code point order. Usually, the zip, bzip2, and other industry standard algorithms compact larger amounts of Unicode text more efficiently. In computing, Unicode provides an international standard which has the goal of providing the means to encode the text of every document people want to store on computers. ... A character encoding consists of a code that pairs a set of characters (representations of graphemes or grapheme-like units, such as might appear in an alphabet or syllabary for the communication of a natural language) with a set of something else, such as numbers or electrical pulses, in order... UTF-7 (7-bit Unicode Transformation Format) is a variable-length character encoding that was proposed for representing Unicode-encoded text using a stream of ASCII characters, for example for use in Internet e-mail messages. ... UTF-8 (8-bit Unicode Transformation Format) is a lossless, variable-length character encoding for Unicode created by Ken Thompson and Rob Pike. ... ... In computing, UCS-2 and UTF-16 are alternative names for a 16-bit Unicode Transformation Format, a character encoding form that provides a way to represent a series of abstract characters from Unicode and ISO/IEC 10646 as a series of 16-bit words suitable for storage or transmission... UTF-32 and UCS-4 are alternate names for a method of encoding Unicode characters, using the fixed amount of exactly 32 bits for each Unicode code point. ... UTF-EBCDIC is an encoding of Unicode that is meant to be EBCDIC friendly so that some older EBCDIC applications can handle some Unicode data. ... The Standard Compression Scheme for Unicode (SCSU) is a Unicode Technical Standard to reduce the number of bytes needed to represent text, especially if that text uses mostly characters from a small number of Unicode blocks. ... Punycode, defined in RFC 3492, is a self-proclaimed Bootstring encoding of Unicode strings into the limited character set supported by the Domain Name System. ... GB18030 is the registered internet name for the official character set of the Peoples Republic of China (PRC). ... The Universal Character Set (UCS) is a character encoding that is defined by the international standard ISO/IEC 10646. ... The writing systems of some languages, such as Persian (Farsi), Hebrew, and Arabic are written from right to left (RTL). ... A Byte Order Mark (BOM) is the character at code point FEFF (ZERO-WIDTH NO-BREAK SPACE), when that character is used to denote the Endianness of a string of UCS/Unicode characters encoded in UTF-16 or UTF-32. ... Han unification is the process used by the authors of Unicode and the Universal Character Set to map multiple character sets of the CJK languages into a single set of unified characters. ... HTML 4. ... Many email clients are now able to use Unicode. ... Multipurpose Internet Mail Extensions (MIME) is an Internet Standard for the format of e-mail. ... UTF-8 (8-bit Unicode Transformation Format) is a lossless, variable-length character encoding for Unicode created by Ken Thompson and Rob Pike. ... The Standard Compression Scheme for Unicode (SCSU) is a Unicode Technical Standard to reduce the number of bytes needed to represent text, especially if that text uses mostly characters from a small number of Unicode blocks. ... In computing, Unicode provides an international standard which has the goal of providing the means to encode the text of every document people want to store on computers. ... The word encoding has a number of meanings. ... The ZIP file format is the most widely-used compressed file format in the IBM PC world. ... bzip2 is an open source data compression algorithm and program developed by Julian Seward. ...


SCSU was created as a Unicode compression scheme with a byte/code point ratio similar to language-specific codepages. It has not been widely adopted although it fulfills the criteria for an IANA charset and is registered with IANA. SCSU is not suitable for MIME “text” media types. For example, SCSU cannot be used directly in emails and similar protocols. SCSU requires a complicated encoder design for good performance. The Internet Assigned Numbers Authority (IANA) is an organisation that oversees IP address, top level domain and Internet protocol code point allocations. ...


It is worth noting that SCSU has been adopted as an official Unicode Technical Standard. BOCU-1 has not been officially adopted by the Unicode consortium, but Unicode Technical Note #6 does describe this encoding in more detail.


External links

  • Unicode Technical Note #6 BOCU-1: MIME Compatible Unicode Compression
  • International Components for Unicode A library that can convert between BOCU-1 and other Unicode encodings

  Results from FactBites:
 
Binary-ordered compression for Unicode (2535 words)
The binary sort order of BOCU is the same as Unicode code points: SCSU has essentially random binary order, which makes it unsuitable in many applications.
A variant of BOCU can be used in very restricted environments, such as bar-codes (47 allowable byte values) or domain names (37 allowable byte values).
The column heading "B:16" is the size of BOCU compared to UTF-16, and "B:8" is the size compared to UTF-8.
Binary-Ordered Compression for Unicode (1866 words)
BOCU is a general compression format for Unicode.
The principal form of BOCU, used in most cases, is BOCU-1.
ICU uses a variant of BOCU for a final (code point) level in collation.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.