FACTOID # 123: The top five countries of origin for refugees are all in Africa.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Japanese language and computers

In relation to the Japanese language and computers many adaptation issues arise, some unique to Japanese and others common to languages which have a very large number of characters. The number of characters needed in order to write English is very small, and thus it is possible to use only one byte to encode one English character. However, the number of characters in Japanese is much more than 256, and hence Japanese cannot be encoded using only one byte, and Japanese is thus encoded using two or more bytes, in a so-called "double byte" or "multi-byte" encoding,. Some problems relate to transliteration and romanization, some to character encoding, and some to the input of Japanese text. A byte is commonly used as a unit of storage measurement in computers, regardless of the type of data being stored. ... Transliteration in a narrow sense is a mapping from one system of writing into another. ... In linguistics, romanization (or Latinization, also spelled romanisation or Latinisation) is the representation of a word or language with the Roman (Latin) alphabet, or a system for doing so, where the original word or language uses a different writing system. ... The term input has a variety of uses in different fields. ...

Contents


Character encodings

There are several standard methods to encode characters for use on a computer, including JIS, SJIS, EUC, and Unicode. While mapping the set of kana is a simple matter, kanji has proven more difficult. Despite efforts, none of the encoding schemes have become the de facto standard, and multiple encoding standards are still in use today. For example, most Japanese e-mails are in JIS encoding and web pages in Shift-JIS and yet mobile phones in Japan usually use some form of EUC. If a program fails to determine the encoding scheme employed, it can cause mojibake (misconverted characters, literally "character change for the worse" from the combination of moji文字 meaning character and a shortening of bakeru化ける meaning to change for the worse, to appear in disguise) and thus unreadable text on computers. A character encoding consists of a code that pairs a set of characters (representations of graphemes or grapheme-like units, such as might appear in an alphabet or syllabary for the communication of a natural language) with a set of something else, such as numbers or electrical pulses, in order... In computing, JIS encoding refers to several Japanese Industrial Standards for encoding the Japanese language. ... The title given to this article is incorrect due to technical limitations. ... Extended Unix Coding Equipment under Control IEC 61508 ... Technical note: Due to technical limitations, some web browsers may not display some special characters in this article. ... Japanese writing Kanji 漢字 Kana 仮名 Hiragana 平仮名 Katakana 片仮名 Uses Furigana 振り仮名 Okurigana 送り仮名 Romaji ローマ字 For other meanings of Kana, see Kana (disambiguation). ... Japanese writing Kanji 漢字 Kana 仮名 Hiragana 平仮名 Katakana 片仮名 Uses Furigana 振り仮名 Okurigana 送り仮名 Romaji ローマ字 Kanji ( (help· info), literally Han characters), is a Japanese writing system which is derived from Chinese characters (Hanzi). ... Wikipedia does not yet have an article with this exact name. ... In computing, JIS encoding refers to several Japanese Industrial Standards for encoding the Japanese language. ... A Web page or webpage is a resource on the World Wide Web, usually in HTML/XHTML format (the file extensions are typically htm or html) and with hypertext links to enable navigation from one page or section to another. ... The title given to this article is incorrect due to technical limitations. ... Extended Unix Coding Equipment under Control IEC 61508 ... Mojibake (文字化け, moji character + bake change, literally ghost characters or changed characters) is Japanese for broken characters: the result of trying to display text in character encodings which a piece of software is not configured to deal with. ...


To understand how this state of affairs has arisen, it is useful to learn a little about the history of the encodings. The first encoding to become widely used was JIS X 0201, which is a single-byte encoding that only covers standard 7-bit ASCII characters with half-width katakana extensions. This was widely used in systems that were neither powerful enough nor had the storage to handle kanji (including DOS and old embedded equipment such as cash registers). The development of kanji encodings was the beginning of the split. Shift_JIS was developed to be completely backward compatible with JIS X 0201, and thus is used in Windows (for backwards compatibility with DOS), and in much embedded electronic equipment. However, Shift_JIS has the unfortunate property that it often breaks any parser that is not specifically designed to handle it (thus causing mojibake on many forum-style websites). EUC, on the other hand, is not backwards compatible with JIS X 0201, but is handled much better by parsers that have been written for 7-bit ASCII (and thus EUC encodings are used on UNIX where much of the file-handling code was historically only written for English encodings). Further complications arise because the original Internet e-mail standards only support 7-bit transfer protocols. Thus JIS encoding was developed for sending and receiving e-mails. JIS X 0201, developed in 1969, was the first Japanese character encoding to become widely used. ... ISO 646 is an ISO standard that specifies a 7 bit character code from which several national standards are derived, the best known of which is ASCII. Since the portion of ISO 646 shared by all countries specified only the letters used in the English alphabet, other countries using the... Shift_JIS (SJIS) is a character encoding for the Japanese language developed by a Japanese company called ASCII and adopted by, amongst others, Microsoft. ... JIS X 0201, developed in 1969, was the first Japanese character encoding to become widely used. ... Shift_JIS (SJIS) is a character encoding for the Japanese language developed by a Japanese company called ASCII and adopted by, amongst others, Microsoft. ... Mojibake (文字化け, moji character + bake change, literally ghost characters or changed characters) is Japanese for broken characters: the result of trying to display text in character encodings which a piece of software is not configured to deal with. ... Extended Unix Coding Equipment under Control IEC 61508 ... JIS X 0201, developed in 1969, was the first Japanese character encoding to become widely used. ... Extended Unix Coding Equipment under Control IEC 61508 ... In computing, JIS encoding refers to several Japanese Industrial Standards for encoding the Japanese language. ...


Not all required characters may be included in a character set standard such as JIS, so gaiji (外字, external characters) are sometimes used to supplement the character set. Gaiji may come in the form of external font packs, where normal characters have been replaced with new characters, or the new characters have been added to unused character positions. However, gaiji are not practical in Internet environments since the font set must be transferred with text to use the gaiji. As a result, such characters are written with similar or simpler characters in place, or the text may need to be written using a larger character set (such as Unicode) that supports the required character. A character encoding is a code that pairs a set of characters (such as an alphabet or syllabary) with a set of something else, such as numbers or electrical pulses. ... JIS X 0208 is a Japanese Industrial Standard defining a set of kanji indexed by a pair of integers from 1 to 94 (this is known as the kuten pair of the kanji). ... The characters for Kanji, lit. ...


Text input

Main article: Japanese input methods

Typing Japanese text on a computer is a complicated matter because Japanese has far more characters than there are keys on most keyboards. On modern computers, the reading of characters is usually entered first, then an input method editor (IME), also sometimes known as a front-end processor, shows a list of candidate kanji that are a phonetic match, and allows the user to choose the correct characters. More-advanced IMEs work not by word but by phrase, thus increasing the likelihood of getting the desired characters as the first option presented. The input can be either via romanization (rōmaji nyūryoku) or direct kana input (kana nyūryoku). Direct kana input is not commonly used, but is widely supported. Japanese input methods are the methods used to input Japanese characters on a computer. ... An input method editor (IME) is a program or operating system component that allows computer users to enter complex characters and symbols (such as Japanese, Chinese, Tibetan and Korean characters), using a standard Western keyboard. ... In linguistics, romanization (or Latinization, also spelled romanisation or Latinisation) is the representation of a word or language with the Roman (Latin) alphabet, or a system for doing so, where the original word or language uses a different writing system. ... Japanese writing Kanji 漢字 Kana 仮名 Hiragana 平仮名 Katakana 片仮名 Uses Furigana 振り仮名 Okurigana 送り仮名 Rōmaji ローマ字 Category Rōmaji (ローマ字 Roman characters, sometimes misunderstood as romanji in English), is a Japanese...


There are two main systems for the romanization of Japanese, known as Kunrei-shiki and Hepburn, "keyboard romaji" (also known as wāpuro rōmaji or "word processor romaji") generally allows a loose combination of both; IME implementations may even handle keys for letters unused in any romanization scheme, such as L, converting them to the most appropriate equivalent. With kana input, each key on the keyboard directly corresponds to one kana. The JIS keyboard system is national standard, but some people use altenatives like Oyayubi shift system. In linguistics, romanization (or Latinization, also spelled romanisation or Latinisation) is the representation of a word or language with the Roman (Latin) alphabet, or a system for doing so, where the original word or language uses a different writing system. ... Japanese writing Kanji 漢字 Kana 仮名 Hiragana 平仮名 Katakana 片仮名 Uses Furigana 振り仮名 Okurigana 送り仮名 Romaji ローマ字 Kunrei-shiki (訓令式, Cabinet-ordered system) is a romanization system, that is, a system for transcribing the Japanese language into the Roman alphabet. ... Japanese writing Kanji 漢字 Kana 仮名 Hiragana 平仮名 Katakana 片仮名 Uses Furigana 振り仮名 Okurigana 送り仮名 Romaji ローマ字 The Hepburn romanization system (Japanese: ヘボン式, Hebon-shiki) was devised by the Reverend James Curtis Hepburn to transcribe the sounds of the Japanese language into the Roman alphabet for his Japanese–English dictionary, published in 1867. ... Japanese writing Kanji 漢字 Kana 仮名 Hiragana 平仮名 Katakana 片仮名 Uses Furigana 振り仮名 Okurigana 送り仮名 Romaji ローマ字 The title given to this article lacks diacritics because of certain technical limitations. ...


Direction of text

Japanese has two directions of writing, called yokogaki and tategaki. The yokogaki style is the same as English, but the tategaki style involves columns of text written downwards, right to left. Chinese calligraphy by Song Dynasty scholar Su Shi. ...


At present, handling of downward text is incomplete. For example, HTML has no support for tategaki and Japanese users must use HTML tables to simulate it. However, CSS level 3 includes a property writing-mode which can render tategaki when given the value tb-rl (top to bottom, right to left). Word processors and DTP software have more complete support for it. A piece of HTML code with syntax highlighting In computing, Hypertext Markup Language (HTML) is a markup language designed for the creation of web pages with hypertext and other information to be displayed in a web browser. ... In computing, Cascading Style Sheets (CSS) is a stylesheet language used to describe the presentation of a document written in a markup language. ...


See also

Japanese writing Kanji 漢字 Kana 仮名 Hiragana 平仮名 Katakana 片仮名 Uses Furigana 振り仮名 Okurigana 送り仮名 Rōmaji ローマ字 Category This article describes the modern Japanese writing system and its history. ... Japanese (Japanese: 日本語; (help· info)) is a language spoken by over 127 million people, mainly in Japan, but also by Japanese emigrant communities around the world. ... CJK is a collective term for Chinese, Japanese, and Korean, which comprise the main East Asian languages. ...

External links

  • A complete introduction to Japanese character encodings
  • CJK.INF, a document providing information on CJK (that is, Chinese, Japanese, and Korean) character set standards and encoding systems
  • Japanese text encoding

Japanese text editors

  • JWPce, a free Japanese Word Processor for Windows distributed under the GNU General Public License.

  Results from FactBites:
 
Japanese language and computers - Wikipedia, the free encyclopedia (893 words)
In relation to the Japanese language and computers many adaptation issues arise, some unique to Japanese and others common to languages which have a very large number of characters.
However, the number of characters in Japanese is much more than 256, and hence Japanese cannot be encoded using only one byte, and Japanese is thus encoded using two or more bytes, in a so-called "double byte" or "multi-byte" encoding,.
On modern computers, the reading of characters is usually entered first, then an input method editor (IME), also sometimes known as a front-end processor, shows a list of candidate kanji that are a phonetic match, and allows the user to choose the correct characters.
Japanese language - Wikipedia, the free encyclopedia (4457 words)
It is considered an agglutinative language and is distinguished by a complex system of honorifics reflecting the hierarchical nature of Japanese society, with verb forms and particular vocabulary which indicate the relative status of speaker and listener.
Japanese is a relative of extinct languages spoken by historic cultures of Korea and Manchuria.
Phonological similarities and geographical proximity to Austronesian languages have led to the theory that Japanese may be a kind of creole, with an Altaic substratum and an Austronesian superstratum, or vice versa.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.