FACTOID # 92: One in every three Australians is a victim of crime.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
 
 
WHAT'S NEW
RELATED ARTICLES
People who viewed "CRM114" also viewed:
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > CRM114

CRM114 (full name: "The CRM114 Discriminator") is a program based upon a statistical approach for classifying data, and especially used for filtering email spam. While others have done statistical Bayesian filtering based upon the frequency of single word occurrences in email, CRM114 achieves a higher rate of spam recognition through creating hits based upon phrases up to five words in length. These phrases are used to form a hidden Markov model of the incoming texts. With this additional contextual recognition, it is one of the more accurate spam filters available. The author claims recognition rates as high as 99.87%, however these results are not reproduced in independent tests by Holden (although the author writes that he may have had "some sort of installation or usage error on my part", and recommends trying CRM114) and at TREC 2005. It should also be noted that these tests are outdated, and CRM114 has several new training methods (including Double Sided Thick Threshold Training with Testing Refutation) that yields considerable accuracy improvements, and independent retests are greatly needed. CRM114's classifier can also be switched to use Lightstone's Winnow algorithm, character-by-character correlation, a variant on KNN (K-nearest neighbor algorithm) classification called Hyperspace, a bit-entropic classifier that uses entropy encoding to determine similarity, and other more experimental classifiers. A typical spam advertisement Spam by e-mail is one type of spamming that involves sending identical or nearly identical messages to thousands (or millions) of recipients. ... Bayesian filtering is the process of using Bayesian statistical methods to classify documents into categories. ... State transitions in a hidden Markov model (example) x — hidden states y — observable outputs a — transition probabilities b — output probabilities A hidden Markov model (HMM) is a statistical model where the system being modeled is assumed to be a Markov process with unknown parameters, and the challenge is to determine... A classifier, in linguistics, is a word or morpheme used in some languages in certain contexts to indicate the word class of a noun. ... Positive linear correlations between 1000 pairs of numbers. ... In pattern recognition, the k-nearest neighbor algorithm is a method for classifying phenomena based upon observable features, similar to the nearest neighbour classification method. ... An entropy encoding is a coding scheme that assigns codes to symbols so as to match code lengths with the probabilities of the symbols. ...


As an example of pattern recognition software, CRM114 is a good example of machine learning accomplished with a reasonably simple algorithm. GPLed Source code in C is available through the external link. Pattern recognition is a field within the area of machine learning. ...


At a deeper level, CRM114 is also a string pattern matching language, similar to grep or even Perl; although it is Turing complete it is highly tuned for matching text, and even a simple (recursive) definition of the factorial takes almost ten lines, looking somewhat confusing to the uninitiated. Part of this is because the crm114 language syntax is not positional, but declensional. As a programming language, it may be used for many other applications aside from detecting spam. CRM114 uses the TRE approximate-match regex engine, so it's possible to write programs that do not depend on absolutely identical strings matching to function correctly. grep is a command line utility that was originally written for use with the Unix operating system. ... Perl is a dynamic programming language created by Larry Wall and first released in 1987. ... In computability theory a programming language or any other logical system is called Turing-complete if it has a computational power equivalent to a universal Turing machine. ... Positional notation or place-value notation is a numeral system in which each position is related to the next by a constant multiplier called the base (or radix) of that numeral system. ... In linguistics, declension is the inflection of nouns, pronouns and adjectives to indicate such features as number (typically singular vs. ... This article needs cleanup. ...


Trivia

The term CRM114 is first applied to the radio discriminator aboard a B-52 in Stanley Kubrick's Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb. Stanley Kubrick (July 26, 1928 – March 7, 1999) was an influential, Academy Award-winning and acclaimed American film director and producer. ... For the hit 1987 single by Depeche Mode, see the album Music for the Masses Film poster for Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb is a 1964 satirical film directed by Stanley Kubrick. ...


After Dr. Strangelove, the Kubrick rubric CRM114 appears in three subsequent movies. The spacecraft Discovery's registration/serial number in 2001: A Space Odyssey is CRM 114, and in Eyes Wide Shut, the mortuary is located on Level/Wing C, Room 114. Kubrick cleverly uses the homonym "Serum 114," a drug injected into Alex to help his reformation, in A Clockwork Orange. Eyes Wide Shut is a 1999 feature-length motion picture directed and co-written by Stanley Kubrick, based on the novella Traumnovelle (in English Dream Story) by Arthur Schnitzler. ... This article or section does not adequately cite its references or sources. ...


Other films continue Kubrick's CRM114 tradition. An amplifier in Dr. Emmett Brown's laboratory in Back to the Future is labeled CRM-114, and the remake of Fun with Dick and Jane includes a financial transaction form number CRM-114. A low-budget short film made in 2006 was entitled CRM-114, and the plot revolved around a mysterious packet of blood known only by the code CRM-114. Doc Emmett L. Brown is a fictional character central to the Back to the Future movies created by Robert Zemeckis and Bob Gale. ... Back to the Future is an American science fiction/comedy film directed by Robert Zemeckis and released in 1985. ... Fun with Dick and Jane is a 2005 film, a remake of a 1977 film. ...


CRM-114 has cropped up in the TV show Heroes. CRM-114 is the code name assigned to a Kensei sword that a character named Hiro is searching for in an episode titled 'Parasite'. This episode aired in the US on March 5th 2007. Heroes is an American drama television series, created by Tim Kring, which premiered on NBC on September 25, 2006. ... In ancient Japan, a kensei (剣聖; also sometimes erroneously rendered in English as Kensai, Ken Sai, Kansei, or Kenshei) was a warrior of legendary skill in swordsmanship. ...


External links



 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m