|
In textual criticism and bibliography, collation is the reading of two (or more) texts side-by-side in order to note their differences. Textual criticism is a branch of philology that examines the extant manuscript copies of an ancient or medieval literary work to produce a text that is as close as possible to the original. ...
Bibliography is the study of books. ...
In printing and photocopying, collation is the arrangement of pages in order when several copies of a document are bound after printing or copying. The folder of newspaper web offset printing press Printing is an industrial process for production of texts and images, typically with ink on paper using a printing press. ...
A small, much-used Xerox copier in a high school library. ...
Bookbinding is the process of physically assembling a book from a number of separate sheets of paper or other material. ...
Collation can also refer to the detailed bibliographical description of a book or the comparison of the physical makeup of two copies of a book. Bibliography is the study of books. ...
In library and information science and computer science, collation is the assembly of written information into a standard order. In common usage, this is called alphabetisation, though collation is not limited to ordering letters of the alphabet. Collating lists of words or names into alphabetical order is the basis of most office filing systems, library catalogues, and books of reference. Library and information science (LIS) is the study of issues related to libraries and the information fields. ...
Computer science - Wikipedia, the free encyclopedia /**/ @import /skins-1. ...
An alphabet is a complete standardized set of letters â basic written symbols â each of which roughly represents a phoneme of a spoken language, either as it exists now or as it may have been in the past. ...
Collation differs from classification in that classification is concerned with arranging information into logical categories, while collation is concerned with the partial ordering of those categories. In mathematics, a partially ordered set (or poset for short) is a set equipped with a special binary relation which formalizes the intuitive concept of an ordering. ...
Collation differs from a sort algorithm in that whereas sort algorithms decide which pairs of elements to compare, collation defines a total order ≤ on pairs that the sort algorithm uses to determine when to swap the elements (usually a lexicographical order). In fact, sort algorithms are often implemented to take a collation as an input. In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a list in a certain order. ...
In mathematics, a total order, linear order or simple order on a set X is any binary relation on X that is antisymmetric, transitive, and total. ...
In mathematics, the lexicographical order, or dictionary order, is a natural order structure of the cartesian product of two ordered sets. ...
Collation systems
Numerical sorting The simplest collation system is numerical sorting: ordering numbers by their magnitude. For example, 4 17 3 5 collates to 3 4 5 17. While this might appear to work only for numbers, computers can use this method for any textual information since computers internally use character sets which assign a numeric code point to each letter or glyph. For example, a computer using ASCII code (or any of its supersets such as Unicode) and numerical sorting would collate a b C d $ to $ C a b d. A drawing of the everyday computer. ...
A character encoding is a code that pairs a set of characters (such as an alphabet or syllabary) with a set of something else, such as numbers or electrical pulses. ...
These are the astrological glyphs as most commonly used in Western Astrology A glyph is a carved figure or character, incised or in relief; a carved pictograph; hence, a pictograph representing a form originally adopted for sculpture, whether carved or painted. ...
There are 95 printable ASCII characters, numbered 32 to 126. ...
A is a subset of B If X and Y are sets and every element of X is also an element of Y, then we say or write: X is a subset of (or is included in) Y; X ⊆ Y; Y is a superset of (or includes) X; Y ⊇ X...
Unicode is an industry standard whose goal is to provide the means by which text of all forms and languages can be encoded for use by computers. ...
Why the curious "ASCIIbetical order"? The numerical values that ASCII uses are $ = 36, a = 97, b = 98, C = 67, and d = 100. This style of collation is commonly used, often with the refinement of converting uppercase letters to lowercase before comparing ASCII values, since most people do not expect capitalised words to jump the head of the list. This system fails to properly sort numbers written as text because a human-readable number stored in a computer text string is a sequence of numeric codes for numerals. A numeral is a symbol or group of symbols that represents a number. ...
For example, 156.1 (a string) is represented by ASCII code as the five ordered numbers 49, 53, 54, 46, and 49; 35.29 corresponds to 51, 53, 46, 50, and 57; because 49 comes before 51, 156.1 comes before 35.29.
Alphabetical sorting A more elaborate collation system is alphabetical sorting, which orders words or names based on the conventional order of letters in an alphabet or abjad (most of which have a single conventional order). Each nth letter is compared with the nth letter of other words in the list, starting at the first letter of each word and advancing to the second, third, fourth, and so on, until the order is established. An alphabet is a complete standardized set of letters â basic written symbols â each of which roughly represents a phoneme of a spoken language, either as it exists now or as it may have been in the past. ...
An abjad is a type of writing system where there is one symbol per consonantal phoneme, sometimes also called a consonantary. ...
For example, the words foo · bar · bibble collate to bar · bibble · foo because (1) f comes after b so bar and bibble both precede foo and (2) a comes before i so bar precedes bibble. Numeric sorting on a computer and alphabetical sorting often produce the same ordering for English. The English language is a West Germanic language that originates in England. ...
The difference between computer-style numerical sorting and true alphabetical sorting becomes obvious in languages using an extended Latin alphabet. The Latin alphabet, also called the Roman alphabet, is the most widely used alphabetic writing system in the world today. ...
For example, the thirty-letter alphabet of Spanish treats ñ as a basic letter following n, and formerly treated ch and ll as basic letters following c, l, respectively. Ch and ll are still considered letters, but are alphabetized as digraphs. (The new alphabetization rule was issued by the Royal Spanish Academy in 1994.) (On the other hand, the letter rr follows rqu as expected.) A numeric sort may order ñ incorrectly following z and treat ch as c + h, also incorrect. The Real Academia Española (Royal Spanish Academy or RAE) is the institution responsible for regulating the Spanish language. ...
1994 (MCMXCIV) was a common year starting on Saturday of the Gregorian calendar, and was designated the International year of the Family. ...
Similar differences between computer numeric sorting and alphabetic sorting occur in Danish and Norwegian (aa is ordered as å at the end of the alphabet), German (ß is ordered as s + s; ä, ö, ü are ordered as a + e, o + e, u + e in phone books, but as o elsewhere, and behind o in Austria), Icelandic (ð follows d), English (æ is ordered as a + e), and many other languages. Usually the spaces or hyphens between words are ignored. A space is a punctuation convention for providing interword separation in some scripts, including the Latin, Greek, Cyrillic, and Arabic. ...
A hyphen ( -, or â ) is a punctuation mark. ...
See also Latin alphabet for a list of collating rules for Latin based alphabets. The Latin alphabet, also called the Roman alphabet, is the most widely used alphabetic writing system in the world today. ...
Languages that used a syllabary or abugida instead of an alphabet (for example, Cherokee) can use approximately the same system if there is a set ordering for the symbols. A syllabary is a set of written symbols that represent (or approximate) syllables, which make up words. ...
An abugida or alphasyllabary is a writing system composed of signs (graphemes) denoting consonants with an inherent following vowel, which are consistently modified to indicate other vowels (or, in some cases, the lack of a vowel). ...
Original distribution of the Cherokee language Cherokee (Cherokee: á£á³á©) is an Iroquoian language spoken by the Cherokee people. ...
The Character Palette from Mac OS X provides an easy way to search Chinese, Japanese and Korean characters with Radical Mac OS X is the operating system which is included with all shipping Apple Macintosh computers in the consumer and professional markets. ...
Radical-and-stroke sorting Another form of collation is radical-and-stroke sorting, used for non-alphabetic writing systems such as Chinese logographs and Japanese kanji, whose thousands of symbols defy ordering by convention. In this system, common components of characters (radicals) are identified. Characters are then grouped by their primary radical, then ordered by number of pen strokes within radicals. When there is no obvious radical or more than one radical, convention governs which is used for collation. For example, the Chinese character for "mother" (媽) is sorted as a thirteen-stroke character under the three-stroke primary radical (女). The left part of mÄ, a Chinese character meaning mother, is a radical that means woman A radical (from Latin radix, meaning root) is a basic identifiable component of every Chinese character. ...
Japanese writing Kanji æ¼¢å Kana ä»®å Hiragana 平仮å Katakana çä»®å Uses Furigana æ¯ãä»®å Okurigana éãä»®å RÅmaji ãã¼ãå Category Kanji (æ¼¢å â¶(?), literally Han characters) is the name of Chinese characters in the Japanese language. ...
The radical-and-stroke system is cumbersome compared to an alphabetical system in which there are a few characters, all unambiguous. As a result, logographic languages often supplement radical-and-stroke ordering with alphabetic sorting of a phonetic conversion of the logographs. For example, the kanji word Tokyo (東京) can be sorted as if it is spelled out in the Japanese alphabet sequence "to-u-ki-yo-u" (とうきょう). Nevertheless, the radical-and-stroke system is the only practical method for constructing dictionaries that someone may use to look up a logograph whose pronunciation is unknown.
Multilingual ordering When lists of names or words need to be ordered, but the context does not define a particular single language or alphabet, the Unicode Collation Algorithm provides a way to put them in sequence. The Unicode collation algorithm provides a standard way to put names, words or strings of text in sequence according to the needs of a particular situation. ...
Complications Compound words and special characters A complication in alphabetical sorting can arise due to disagreements over how groups of words (separated compound words, names, titles, etc.) should be ordered. One rule is to remove spaces for purposes of ordering, another is to consider a space as a character that is ordered before numbers and letters (this method is consistent with ASCII-ordering), and a third is to order a space after numbers and letters. Given the following strings to alphabetize — "catch", "cattle", "cat food" — the first rule produces "catch" "cat food" "cattle", the second "cat food" "catch" "cattle", and the third "catch" "cattle" "cat food". The first rule is used in most (but not all) dictionaries, the second in telephone directories (so that Wilson, Jim K appears with other people named Wilson, Jim and not after Wilson, Jimbo). The third rule is rarely used. A compound is a word (lexeme) that consists of more than one free morpheme. ...
A name is a label for a thing, person, place, product (as in a brand name) and even an idea or concept, normally used to distinguish one from another. ...
A title is a prefix or suffix added to a persons name to signify either veneration, an official position or a professional or academic qualification. ...
A space is a punctuation convention for providing interword separation in some scripts, including the Latin, Greek, Cyrillic, and Arabic. ...
...
In telephony, a telephone directory (also called a telephone book) is a listing of telephone subscribers in a geographical area or subscribers to services provided by the organisation that publishes the directory. ...
A similar complication arises when special characters such as hyphens or apostrophes appear in words or names. Any of the same rules as above can be used in this case as well; however, the strict ASCII sorting no longer corresponds exactly to any of the rules. A hyphen ( -, or â ) is a punctuation mark. ...
Apostrophe (from Greek αÏοÏÏÏοÏη, turning away) can refer to several things: Apostrophe (mark) ( â ), a form of punctuation Apostrophe (figure of speech) Apostrophe (), an album by Frank Zappa Apostrophe (television), a French television show about literature Apostrophe (band), punk rock band from...
Name/Surname ordering The telephone directory example sheds light on another complication. In cultures where family names are written after given names, it is usually still desired to sort by family name first. In this case, names need to be reordered to be sorted properly. For example, Juan Hernandes and Brian O'Leary should be sorted as Hernandes, Juan and O'Leary, Brian even if they are not written this way. Capturing this rule in a computer collation algorithm is difficult, and simple attempts will necessarily fail. For example, unless the algorithm has at its disposal an extensive list of family names, there is no way to decide if "Gillian Lucille van der Waal" is "van der Waal, Gillian Lucille", "Waal, Gillian Lucille van der", or even "Lucille van der Waal, Gillian". A family name, or surname, is the part of a persons name that indicates to what family he or she belongs. ...
A given name specifies and differentiates between members of a group of individuals, especially a family, all of whose members usually share the same family name. ...
In telephone directories in English speaking countries, surnames beginning with Mc are sometimes sorted as if starting with Mac and placed between "Mabxxx" and "Madxxx". Under these rules, the telephone directory order of the following names would be: Maam, McAllan, Macbeth, MacCarthy, McDonald, Macy, Mboko.
Abbreviations and common words When abbreviations are used, it is sometimes desired to expand the abbreviations for sorting. In this case, "St. Paul" comes before "Shanghei". Obviously, to capture this behavior in a collation algorithm, we need a list of abbreviations. It may be more practical in some cases to store two sets of strings, one for sorting and one display. A similar problem arises when letters are replaced by numbers or special symbols in an irregular manner, for example 1337 for leet or the movie Se7en. In this case, proper sorting necessitates keeping two sets of strings. The ELinks web browser, set to Leet language, on the Leet language version of Google. ...
Seven promotional poster. ...
In certain contexts, very common words (such as articles) at the beginning of a sequence of words are not considered for ordering, or are moved to the end. So "The Shining" is considered "Shining" or "Shining, The" when alphabetizing and therefore is ordered before "Summer of Sam". This rule is fairly easy to capture in an algorithm, but many programs rely instead on simple lexicographic ordering. One fairly quaint exception to this rule, is the flying of the flag of The Former Yugoslav Republic of Macedonia; at the United Nations, between those of Thailand, and Timor Leste. An article is a word that is put next to a noun to indicate the type of reference being made to the noun. ...
The Shining may mean: The Shining (novel), by Stephen King The Shining (film), Stanley Kubricks adaptation of the novel The Shining (mini-series), the ABC mini-series scripted by Stephen King The Shining, a song from Badly Drawn Boys album The Hour Of Bewilderbeast The Shining (SpongeBob SquarePants...
DVD cover Summer of Sam is a 1999 film directed and produced by Spike Lee. ...
Official languages Macedonian¤ Capital Skopje President Branko Crvenkovski Prime Minister Vlado BuÄkovski Area â Total â % water Ranked 147th 25,713 km² 1. ...
The examples and perspective in this article do not represent a worldwide view. ...
The Democratic Republic of Timor-Leste (also known as East Timor or Timor Lorosae) is an Australasian country which recently acquired statehood, located geographically in Oceania, although it is sometimes assigned to Asia instead (see Line dividing Asia and Oceania). ...
Numerical sorting of strings Sometimes, it is desired to order text with embedded numbers using proper numerical order. For example, "Figure 7b" goes before "Figure 11a". This can be extended to Roman numerals. This behavior is not particularly difficult to produce as long as only integers are to be sorted, although it can slow down sorting significantly. The system of Roman numerals is a numeral system originating in ancient Rome, and was adapted from Etruscan numerals. ...
For example, Windows XP does this when sorting file names (much to the annoyance of some people who are used to a simple lexicographic ordering). Sorting decimals properly is a bit more difficult, due to the fact that different locales use different symbols for a decimal point, and sometimes the same character used as a decimal point is also used as a separator, for example "Section 3.2.5". There is no universal answer for how to sort such strings; any rules are application dependent. Windows XP is a major revision of the Microsoft Windows operating system created for use on desktop and business computer systems. ...
See Filing system for this term as it is used in libraries and offices In computing, a file system is a method for storing and organizing computer files and the data they contain to make it easy to find and access them. ...
The decimal separator is a symbol used to mark the boundary between the integer and the fractional parts of a decimal numeral. ...
See also The Unicode collation algorithm provides a standard way to put names, words or strings of text in sequence according to the needs of a particular situation. ...
In mathematics, the lexicographical order, or dictionary order, is a natural order structure of the cartesian product of two ordered sets. ...
The Amarna letters is the name popularly given to an archive of correspondence, mostly diplomatic, found at Amarna, the modern name for the capital of the Egyptian New Kingdom primarily from the reign of pharaoh Amenhotep IV, better known as Akhenaten (1369 - 1353 BCE). ...
External links and references - Unicode Collation Algorithm: Unicode Technical Standard #10
- Collation in Spanish (http://spanish.about.com/library/weekly/aa092099.htm#letters)
- [1] Collation of the names of the member states of the United Nations
|