The UTF-8-encoded Japanese Wikipedia article for mojibake, as displayed in ISO-8859-1 encoding. Mojibake is the phenomenon of incorrect, unreadable characters (garbage characters) shown when computer software fails to render a text correctly according to its associated character encoding. It is a loanword from Japanese. Image File history File links Download high-resolution version (1012x592, 143 KB)Cropped screenshot of the ja. ...
Image File history File links Download high-resolution version (1012x592, 143 KB)Cropped screenshot of the ja. ...
UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. ...
ISO 8859-1, more formally cited as ISO/IEC 8859-1 or less formally as Latin-1, is part 1 of ISO/IEC 8859, a standard character encoding defined by ISO. It encodes what it refers to as Latin alphabet no. ...
It has been suggested that this article or section be merged with Computer program. ...
A character encoding or character set (sometimes referred to as code page) consists of a code that pairs a sequence of characters from a given set with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the storage of text in computers...
A loanword (or loan word) is a word directly taken into one language from another with little or no translation. ...
Etymology
The Japanese word 文字化け (mojibake) is composed of 文字 (moji), which means letter, character, and 化け (bake), from the verb 化ける (bakeru), which means to appear in disguise, to take the form of, to change for the worse. Literally, it means "character changing".
Causes Mojibake is often caused by forced display of writing systems or character encodings that are "foreign" to the user's computer system: if a computer does not have the software required to process a foreign language's characters, it will attempt to process them in its default language encoding, usually resulting in gibberish. Messages transferred between different encodings of the same language can also have mojibake problems. Japanese language users, with several different encodings historically employed, would encounter this problem relatively often. For example, the intended word "文字化け", encoded in UTF-8, is incorrectly displayed as "æ–‡å—化ã‘" in some Windows software that is configured to expect text in ISO-8859-1 encoding.[1] Writing systems of the world today. ...
A character encoding or character set (sometimes referred to as code page) consists of a code that pairs a sequence of characters from a given set with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the storage of text in computers...
UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. ...
ISO 8859-1, more formally cited as ISO/IEC 8859-1 or less formally as Latin-1, is part 1 of ISO/IEC 8859, a standard character encoding defined by ISO. It encodes what it refers to as Latin alphabet no. ...
A web browser may not be able to distinguish a page coded in EUC-JP and another in Shift-JIS if the coding scheme is not assigned explicitly using the HTTP headers sent along with the documents, or using the HTML document's meta tags that are used to substitute for missing HTTP headers if the server cannot be configured to send the proper HTTP headers. Heuristics can be applied to guess at the character set, but these are not always successful. An example of a web browser (Internet Explorer), displaying the English Wikipedia main page. ...
Extended Unix Coding (EUC) is an 8-bit character encoding used primarily for Japanese and Korean. ...
The title given to this article is incorrect due to technical limitations. ...
Hypertext Transfer Protocol (HTTP) is a communications protocol used to transfer or convey information on the World Wide Web. ...
HTML, short for Hypertext Markup Language, is the predominant markup language for the creation of web pages. ...
Meta tags are used to provide structured data about data. ...
In the mid 1990s, as this problem became common, several websites featured mojibake not as a problem to be tackled but simply for amusement. Words and even sentences were "deciphered" with meanings made up to deliver funny messages. Mojibake can also occur among same font sets. It often occurs between Windows users and Macintosh users as the font set's name are the same, however each system includes extra characters in their font set. Many people are unaware of the extra characters and use them in websites, e-mails, blogs, and so on as common characters, and as a result, mojibake occurs in same font sets. 1. ...
The Macintosh 128K, the first Macintosh computer The iMac, a current Mac computer Macintosh is a brand name which covers several lines of personal computers designed, developed, and marketed by Apple Inc. ...
A website (alternatively, Web site or web site) is a collection of Web pages, images, videos and other digital assets that is hosted on one or several Web server(s), usually accessible via the Internet, cell phone or a LAN. A Web page is a document, typically written in HTML...
Wikipedia does not yet have an article with this exact name. ...
To meet Wikipedias quality standards, this article or section may require cleanup. ...
Problems in other languages In Chinese, this phenomenon is called luanma Simplified Chinese: 乱码; Traditional Chinese: 亂碼; Pinyin: luànmǎ; literally "haphazard code". This article does not cite any references or sources. ...
Traditional Chinese characters refers to one of two standard sets of printed Chinese characters. ...
Hanyu Pinyin (Simplified Chinese: ; Traditional Chinese: ; pinyin: ), commonly called Pinyin, is the most common variant of Standard Mandarin romanization system in use. ...
Users of Central and Eastern European languages can also be affected. Because most computers were not connected to any network, during the mid- to late eighties there were different character encodings for every language with diacritical characters. Central Europe The Alpine Countries and the Visegrád Group (Political map, 2004) Central Europe is the region lying between the variously and vaguely defined areas of Eastern and Western Europe. ...
Pre-1989 division between the West (grey) and Eastern Bloc (orange) superimposed on current national boundaries: Russia (dark orange), other countries of the former USSR (medium orange),members of the Warsaw pact (light orange), and other former Communist regimes not aligned with Moscow (lightest orange). ...
Handwritten krakozyabry corrected by a postal employee. In Russian, mojibake is called krakozyabry (кракозя́бры). During the 1990s, several different encodings for the Cyrillic alphabet (Unix KOI8-R, Windows CP-1251, DOS 866, standard ISO 8859-5, and several others) competed. Poorly configured servers and lack of compatibility made garbled text a common and frustrating experience. Many e-mail servers stripped the 8th bit from the characters as permitted by earlier standards (which renders UTF-8 unreadable, as well as all of the above). For this reason many Cyrillic users resorted to Volapuk encoding. An even more frustrating problem emerged in the early 2000s, when the popular e-mail client Microsoft Outlook started to replace correctly entered Cyrillic characters with question marks when replying to or forwarding messages created in competing encodings. An image of a post envelope with address written in Krokozyabry. ...
An image of a post envelope with address written in Krokozyabry. ...
Look up encoding in Wiktionary, the free dictionary. ...
This article needs additional references or sources for verification. ...
KOI8-R is an 8-bit character encoding, designed to cover Russian, which uses the Cyrillic alphabet. ...
Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic alphabet such as Russian, Bulgarian and other languages. ...
CP866 is a Cyrillic code page to be used with MS-DOS. It is based on the alternative character set of GOST 19768-87. ...
ISO 8859-5, also known as Cyrillic is an 8-bit character encoding, part of the ISO 8859 standard. ...
UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. ...
Volapuk encoding (Russian: кодиÑовка волапÑк, kodirovka volapyuk) or Pseudotranslit is a slang term for rendering the letters of the Cyrillic alphabet with Latin ones. ...
Microsoft Outlook or Outlook (full name Microsoft Office Outlook since Outlook 2003) is a personal information manager from Microsoft, and is part of the Microsoft Office suite. ...
In Bulgarian, mojibake is often called maymunitsa (маймуница), meaning monkey's alphabet. In Poland every company selling early DOS computers created its own encoding, and simply reprogrammed the EPROMs of the video cards (typically CGA, EGA or Hercules) with the according character shapes. Additionally, users of then-popular home computers (such as the Amiga and Atari ST) invented their own encodings, incompatible with international standards (ISO 8859-2), vendor standards (IBM CP852, Windows CP1250) and locally agreed-upon PC/MS DOS standards (Mazovia). The situation began to improve when, after pressure from academic and user groups, ISO 8859-2 succeeded as the "Internet standard" with limited support of the dominant vendors' software (today largely replaced by Unicode). With the numerous problems caused by the variety of encodings, even today some users tend to refer to Polish diacritical characters as krzaki ("bushes"). Instructions on how to use the directory command. ...
EPROM. The small quartz window admits UV light during erasure. ...
The Color Graphics Adapter (CGA), introduced in 1981, was IBMs first color graphics card, and the first color computer display standard for the IBM PC. The standard IBM CGA graphics card was equipped with 16 kilobytes of video memory. ...
The Enhanced Graphics Adapter (EGA) is the IBM PC computer display standard specification located between CGA and VGA in terms of graphics performance (that is, colour and space resolution). ...
The Hercules Graphics Card (HGC) was a mid-1980s computer graphics controller which through its popularity became a de-facto display standard. ...
The original Amiga 1000 (1985) with various peripherals The Amiga 500 (1987) was the most popular variant of the Amiga. ...
The Atari ST is a home/personal computer that was commercially popular from 1985 to the early 1990s. ...
ISO 8859-2, more formally cited as ISO/IEC 8859-2 or less formally as Latin-2, is part 2 of ISO/IEC 8859, a standard character encoding defined by ISO. It encodes what it refers to as Latin alphabet no. ...
Code page 852 (CP 852, IBM 852, OEM 852) is a code page to be used under MS-DOS with Eastern European languages that use Latin script. ...
Windows-1250 is a code page used under Microsoft Windows to represent texts in Eastern European languages that use Latin script, such as Polish, Czech, Slovak, Hungarian, Slovene, Croatian, Romanian and Albanian. ...
Mazovia encoding is used under MS-DOS to represent Polish texts. ...
ISO 8859-2, more formally cited as ISO/IEC 8859-2 or less formally as Latin-2, is part 2 of ISO/IEC 8859, a standard character encoding defined by ISO. It encodes what it refers to as Latin alphabet no. ...
See also Polish requires 18 additional characters beyond plain ASCII. Many ways have been used to encode those characters; today the most popular ones are ISO 8859-2, Windows-1250, Mazovia and Unicode. ...
References - Reprinted article from the Japan Times
- Coldfusion Developers Guide article
- Translators guide
|