FACTOID # 53: If you thought Antarctica was inhospitable, think again - its land area is only ninety-eight percent ice. Reassuringly, the other 2% is categorised as "barren rock".
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RELATED ARTICLES
People who viewed "CAPTCHA" also viewed:
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > CAPTCHA
Early CAPTCHAs such as these, generated by the EZ-Gimpy program, were used on Yahoo. However, technology was developed to read this type of CAPTCHA[1].
A modern CAPTCHA. Rather than attempting to create a distorted background and high levels of warping on the text, this CAPTCHA focuses on making segmentation difficult by adding an angled line.
Another way to make segmentation difficult. Crowded symbols can be easily read by humans but can't be segmentated by bots.

A CAPTCHA (IPA: /ˈkæptʃə/) is a type of challenge-response test used in computing to determine whether the user is human. "CAPTCHA" is a contrived acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart", trademarked by Carnegie Mellon University. A CAPTCHA involves one computer (a server) which asks a user to complete a test. While the computer is able to generate and grade the test, it is not able to solve the test on its own. Because computers are unable to solve the CAPTCHA, any user entering a correct solution is presumed to be human. The term CAPTCHA was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper (all of Carnegie Mellon University), and John Langford (then of IBM). A common type of CAPTCHA requires that the user type the letters of a distorted image, sometimes with the addition of an obscured sequence of letters or digits that appears on the screen. Image File history File links Captcha. ... Image File history File links Captcha. ... Image File history File links No higher resolution available. ... Image File history File links No higher resolution available. ... Image File history File links No higher resolution available. ... Image File history File links No higher resolution available. ... For the spam-filtering technique, see Challenge-response spam filtering. ... RAM (Random Access Memory) Look up computing in Wiktionary, the free dictionary. ... This article is about modern humans. ... A backronym or bacronym is a reverse acronym, that is, the words of the expanded term were chosen to fit the letters of the acronym. ... For the Doctor Who novel named after the test, see The Turing Test (novel). ... Carnegie Mellon University is a private research university in Pittsburgh, Pennsylvania, United States. ... In information technology, a server is an application or device that performs services for connected clients as part of a client-server architecture. ... Luis von Ahn Luis von Ahn is an assistant professor in the Computer Science Department at Carnegie Mellon University, where he also received his Ph. ... Manuel Blum (born 26 April 1938 in Caracas, Venezuela) is a computer scientist who received the Turing Award in 1995 In recognition of his contributions to the foundations of computational complexity theory and its application to cryptography and program checking. // Biography Blum attended MIT, where he received his bachelors... For other persons named John Langford, see John Langford (disambiguation). ... International Business Machines Corporation (IBM, or colloquially, Big Blue) (NYSE: IBM) (incorporated June 15, 1911, in operation since 1888) is headquartered in Armonk, New York, USA. The company manufactures and sells computer hardware, software, and services. ...


A CAPTCHA is sometimes described as a reverse Turing test, because it is administered by a machine and targeted to a human, in contrast to the standard Turing test that is typically administered by a human and targeted to a machine. The term reverse Turing test has no single clear definition, but has been used to describe various situations based on the Turing test in which the objective and/or one or more of the roles have been reversed between computers and humans. ... For the Doctor Who novel named after the test, see The Turing Test (novel). ...


Currently, reCAPTCHA is recommended as the official CAPTCHA implementation by the original CAPTCHA creators.[2] An example of a reCAPTCHA challenge reCAPTCHA is the process of utilizing CAPTCHA to improve the process of digitizing books. ...

Contents

Characteristics

A CAPTCHA system is a means of generating new challenges which:

  • Current computers are unable to accurately solve.
  • Most humans can solve.[3]
  • Does not rely on the attacker never having seen the given type of CAPTCHA before. For example, although a checkbox "check here if you are not a bot" might serve to distinguish between humans and computers, it is not a CAPTCHA because it relies on the fact that an attacker has not spent effort to break that specific form.
  • Is able to automatically generate new challenges that require artificial intelligence techniques to solve.

In practice, the algorithm used to create the CAPTCHA does not need to be made public, though it may be covered by a patent. Although publication can help demonstrate that breaking it requires the solution to a difficult problem in the field of artificial intelligence, deliberate withholding of the algorithm can increase the integrity of a limited set of systems (see security through obscurity). The most important factor in deciding whether an algorithm should be made open or restricted is the size of the system. Although an algorithm which survives scrutiny by security experts may be assumed to be more conceptually secure than an unevaluated algorithm, an unevaluated algorithm specific to a very limited set of systems is always of less interest to those engaging in automated abuse. Breaking a CAPTCHA generally requires some effort specific to that particular CAPTCHA implementation, and an abuser may decide that the benefit granted by automated bypass is negated by the effort required to engage in abuse of that system in the first place. AI redirects here. ... In cryptography and computer security, security through obscurity (sometimes security by obscurity) is to some a controversial principle in security engineering, which attempts to use secrecy (of design, implementation, etc. ...


Origin

The potential difficulty of differentiating humans from computers pretending to be humans was addressed at least as early as 1950, when Alan Turing described his now-famous Turing test. (His test was not automated.) The first discussion of automated tests which distinguish humans from computers for the purpose of controlling access to web services appears in a 1996 manuscript of Moni Naor from the Weizmann Institute of Science, entitled "Verification of a human in the loop, or Identification via the Turing Test" Naor, Moni (1996). "Verification of a human in the loop or Identification via the Turing Test". Alan Mathison Turing, OBE, FRS (23 June 1912 – 7 June 1954) was an English mathematician, logician, and cryptographer. ... For the Doctor Who novel named after the test, see The Turing Test (novel). ... Moni Naor is an Israeli computer scientist, currently a professor at the Weizmann Institute of Science. ... The Koffler accelerator, one of the best-known buildings on campus. ...


Primitive CAPTCHAs seem to have been developed in 1997 at AltaVista by Andrei Broder and his colleagues to prevent bots from adding URLs to their search engine. In order to make the images resistant to OCR (Optical Character Recognition), the team simulated situations that scanner manuals claimed resulted in bad OCR. In 2000, von Ahn and Blum developed and publicized the notion of a CAPTCHA, which included any program that can distinguish humans from computers. They invented multiple examples of CAPTCHAs, including the first CAPTCHAs to be widely used (at Yahoo!). This article does not cite any references or sources. ... Andrei Broder is a Research Fellow and Vice President of Emerging Search Technology for Yahoo. ... Internet bots, also known as web robots, WWW robots or simply bots, are software applications that run automated tasks over the internet. ... “URL” redirects here. ... This article is about search engines. ... Optical character recognition, usually abbreviated to OCR, is a type of computer software designed to translate images of handwritten or typewritten text (usually captured by a scanner) into machine-editable text, or to translate pictures of characters into a standard encoding scheme representing them (e. ... Yahoo redirects here. ...


Applications

CAPTCHAs are used to prevent automated software from performing actions which degrade the quality of service of a given system, whether due to abuse or resource expenditure. Although CAPTCHAs are most often deployed as a response to encroachment by commercial interests, the notion that they exist to stop only spammers is mistaken[citation needed]. CAPTCHAs can be deployed to protect systems vulnerable to e-mail spam, such as the webmail services of Gmail, Hotmail, and Yahoo!. CAPTCHAs have also found active use in stopping automated posting to blogs or forums, whether as a result of commercial promotion, or harassment and vandalism. CAPTCHAs also serve an important function in rate limiting, as automated usage of a service might be desirable until such usage is done in excess, and to the detriment of human users. In such a case, a CAPTCHA can enforce automated usage policies as set by the administrator when certain usage metrics exceed a given threshold. An example of a system in which vulnerabilities exist, which could easily be prevented using CAPTCHA, is presented in [4]. This article is about spam, the abuse of electronic communications media to send unsolicited bulk messages. ... E-mail spam, also known as bulk e-mail or junk e-mail is a subset of spam that involves sending nearly identical messages to numerous recipients by e-mail. ... Webmail is a class of web applications that allow users to read and write e-mail using a web browser, or in a more general sense, an e-mail account accessed through such an application. ... For other uses, see Gmail (disambiguation). ... Hotmail is a free webmail e-mail service, which is accessible via a web browser. ... See also Hotmail Gmail External links Yahoo! Mail Yahoo! Mail FAQ YPOPs: An application that provides POP3 access to Yahoo! Mail Categories: Domain name stubs | Email | Yahoo! | Websites ... To meet Wikipedias quality standards, this article or section may require cleanup. ... A typical Internet forum discussion, with common elements such as quotes and spoiler brackets A page from a forum showcasing emoticons and Internet slang An Internet forum is a web application for holding discussions and posting user generated content. ... This article does not cite any references or sources. ... A Do not feed the troll image In Internet terminology, a troll is someone who comes into an established community such as an online discussion forum, and posts inflammatory, rude, repetitive or offensive messages designed intentionally to annoy or antagonize the existing members or disrupt the flow of discussion, including...


Accessibility

See also: Web accessibility

Because CAPTCHAs rely on perception, users unable to perceive a CAPTCHA (for example, due to a disability or because it is difficult to read) will be unable to perform the task protected by a CAPTCHA. As such, sites implementing CAPTCHAs should provide an audio version of the CAPTCHA in addition to the visual method. The official CAPTCHA site [5] recommends providing an audio CAPTCHA for accessibility reasons. For Wikipedias accessibility guideline, see Wikipedia:Accessibility. ...


Attempts at more accessible CAPTCHAs

Even an audio and visual CAPTCHA will require manual intervention for some users, such as those who are both deaf and blind. There have been various attempts at creating CAPTCHAs that are more accessible. Attempts include the use of JavaScript[6], mathematical questions ("what is 1+1"), or "common sense" questions ("what color is the sky"). These attempts violate one or both of the principles of CAPTCHAs: either they cannot be automatically generated or they can be easily cracked given the state of artificial intelligence. As such, the only security these CAPTCHAs provide is security through obscurity; an attacker is unlikely to have encountered the formulation of the CAPTCHA in question, and unlikely to find it worth the time spending resources to break the CAPTCHA of a small site. In cryptography and computer security, security through obscurity (sometimes security by obscurity) is to some a controversial principle in security engineering, which attempts to use secrecy (of design, implementation, etc. ...


Due to the lack of security provided by text based CAPTCHAs, most sites choose to use an audio and visual CAPTCHA as a way of balancing accessibility and security. Often, email support is used to manually provide access to users who are unable to solve a CAPTCHA.


Circumvention

There are a few approaches to defeating CAPTCHAs: using cheap human labor to recognize them, exploiting bugs in the implementation that allow the attacker to completely bypass the CAPTCHA, and finally improving character recognition software. In computer science, human-based computation is a technique when a computational process performs its function via outsourcing certain steps to humans. ...


Legality

The circumvention of CAPTCHAs is illegal in the United States. In 2007 Ticketmaster sued software maker RMG Technologies for it's product which circumvented the ticket seller's CAPTCHAs on the basis that it violates the anti-circumvention clause of the DMCA. In October 2007, an injunction was issued stating that Ticketmaster would "likely succeed" in making it's case [7] Ticketmaster is based in West Hollywood, California, USA, but has operations in many countries around the world. ... The Digital Millennium Copyright Act (DMCA) is a controversial United States copyright law which criminalizes production and dissemination of technology that can circumvent measures taken to protect copyright, not merely infringement of copyright itself, and heightens the penalties for copyright infringement on the Internet. ...


Human solvers

CAPTCHA is vulnerable to a relay attack that uses humans to solve the puzzles. One approach involves relaying the puzzles to a sweatshop of human operators who can solve CAPTCHAs. In this scheme, a computer fills out a form and when it reaches a CAPTCHA, it gives the CAPTCHA to the human operator to solve. If the humans are dedicated employees who receive minimum wage this is not likely to be viable.[8] Another variation of this technique involves copying the CAPTCHA images and using them as CAPTCHAs for a high-traffic site owned by the attacker. With enough traffic, the attacker can get a solution to the CAPTCHA puzzle in time to relay it back to the target site.[9] In October 2007, a piece of malware appeared in the wild which enticed users to solve CAPTCHAs in order to see progressively further into a series of "striptease" images. [10][11] The introduction to this article provides insufficient context for those unfamiliar with the subject matter. ... Sweatshop is a pejorative term used to describe a manufacturing facility, usually a garment manufacturing facility, where working conditions are poor and workers are paid little. ... The minimum wage is the minimum rate a worker can legally be paid (usually per hour) as opposed to wages that are determined by the forces of supply and demand in a free market. ... For other uses, see Striptease (disambiguation). ...


Insecure implementation

Like any security system, design flaws in a system implementation can prevent the theoretical security from being realized. Many CAPTCHA implementations, especially those which have not been designed and reviewed by experts in the fields of security, are prone to common attacks.


Some CAPTCHA protection systems can be bypassed without using OCR simply by re-using the session ID of a known CAPTCHA image. A correctly designed CAPTCHA does not allow multiple solution attempts at one CAPTCHA. This prevents the reuse of a correct CAPTCHA solution or making a second guess after an incorrect OCR attempt.[12]. Other CAPTCHA implementations use a hash (such as an MD5 hash) of the solution as a key passed to the client to validate the CAPTCHA. Often the CAPTCHA is of small enough size that this hash could be cracked.[13] Further, the hash could assist an OCR based attempt. A more secure scheme would use an HMAC. Finally, some implementations use only a small fixed pool of CAPTCHA images. Eventually, when enough CAPTCHA image solutions have been collected by an attacker over a period of time, the CAPTCHA can be broken by simply looking up solutions in a table, based on a hash of the challenge image. Optical character recognition, usually abbreviated to OCR, is a type of computer software designed to translate images of handwritten or typewritten text (usually captured by a scanner) into machine-editable text, or to translate pictures of characters into a standard encoding scheme representing them (e. ... In computer science a session ID is a piece of data that is used e. ... In cryptography, MD5 (Message-Digest algorithm 5) is a widely used cryptographic hash function with a 128-bit hash value. ... A keyed-hash message authentication code, or HMAC, is a type of message authentication code (MAC) calculated using a cryptographic hash function in combination with a secret key. ...


Computer character recognition

A number of research projects have attempted (often with success) to beat visual CAPTCHAs by creating programs that contain the following functionality:

  1. Extraction of the image from the web page.
  2. Removal of background clutter, for example with color filters and detection of thin lines.
  3. Segmentation, i.e. splitting the image into segments containing a single letter.
  4. Identifying the letter for each segment.

Steps 1, 2, and 4 are easy tasks for computers [14] The only part where humans still outperform computers is segmentation. If the background clutter consists of shapes similar to letter shapes, and the letters are connected by this clutter, the segmentation becomes nearly impossible with current software. Hence, an effective CAPTCHA should focus on the segmentation.


Several research projects have broken real world CAPTCHAs, including one of Yahoo's early CAPTCHAs called "EZ-Gimpy"[15] and the CAPTCHA used by popular sites such as Paypal and LiveJournal as well as open source software such as phpBB.[16] [17]


Image-recognition CAPTCHAs

Some researchers promote image recognition CAPTCHAs as a possible alternative for text based CAPTCHAs. To date, no major website has made use of an image based CAPTCHA. As such, the technology would be best described as in the stage of theoretical research. Image recognition CAPTCHAs face many potential problems which have not been fully studied:

  • It is difficult for a small site to acquire a large dictionary of images which an attacker does not have access to. Without a means of automatically acquiring new labelled images, an image based challenge does not meet the definition of a CAPTCHA.
  • Some current image recognition CAPTCHAs ask the user to make a binary choice (is this a cat or a dog?[18]). Even with 16 images, a bot has a 1 in 65536 (=216) chance of getting the image right. In order to be effective against a botnet attack, the user would be forced to solve a prohibitively large number of images.

Botnet is a jargon term for a collection of software robots, or bots, which run autonomously. ...

Collateral benefits

Some of the original inventors of the CAPTCHA system have implemented a means by which some of the effort and time spent by people who are responding to CAPTCHA challenges can be harnessed as a distributed work system. This works by including "solved" and "unrecognized" elements (images which were not successfully recognized via OCR) in each challenge. The respondent thus answers both elements and roughly half of his or her effort validates the challenge while the other half is captured as work. Optical character recognition, usually abbreviated to OCR, is a type of computer software designed to translate images of handwritten or typewritten text (usually captured by a scanner) into machine-editable text, or to translate pictures of characters into a standard encoding scheme representing them (e. ...


This reCAPTCHA system is being used to aid in the conversion of printed works (scanned images) into digital text. The approach is similar to one of the techniques by which CAPTCHA systems can be circumvented (in that the respondents are performing human intelligence to accomplish small amounts of work in a highly distributed way). An example of a reCAPTCHA challenge reCAPTCHA is the process of utilizing CAPTCHA to improve the process of digitizing books. ...


The reCAPTCHA maintainers estimate that existing CAPTCHA systems represent approximately 150,000 hours of labor per day that could be transparently tapped into via their revised system. This would be equivalent to nearly 19,000 people working 8 hours per day on correcting OCR.[19]


References

  1. ^ http://www.cs.sfu.ca/~mori/research/gimpy/
  2. ^ http://www.captcha.net/
  3. ^ http://www.captcha.net/
  4. ^ Amrinder Arora (2007). "Statistics Hacking - Exploiting Vulnerabilities in News Websites" (PDF). International Journal of Computer Science and Network Security 7: 342 - 347. 
  5. ^ http://www.captcha.net/
  6. ^ http://www.protectwebform.com/smartcaptcha
  7. ^ You must specify title = and url = when using {{cite web}}..
  8. ^ Hire People To Solve CAPTCHA Challenges. Petmail Design (2005-07-21). Retrieved on 2006-08-22.
  9. ^ Doctorow, Cory (2004-01-27). Solving and creating CAPTCHAs with free porn. Boing Boing. Retrieved on 2006-08-22.
  10. ^ Scams Use Striptease to Break Web Traps
  11. ^ Magazine: Striptease Used to Recruit Help in Cracking Sites
  12. ^ Breaking CAPTCHAs Without Using OCR. Howard Yeend (pureMango.co.uk) (2005). Retrieved on 2006-08-22.
  13. ^ Online services allow MD5 hashes to be cracked. Retrieved on 2007-01-04.
  14. ^ Kumar Chellapilla, Kevin Larson, Patrice Simard, Mary Czerwinski (2005). "Computers beat Humans at Single Character Recognition in Reading based Human Interaction Proofs (HIPs)" (PDF). Microsoft Research. Retrieved on 2006-08-02.
  15. ^ http://www.cs.sfu.ca/~mori/research/gimpy/
  16. ^ http://sam.zoy.org/pwntcha/
  17. ^ http://www.captcha.ru/en/breakings/
  18. ^ Asirra
  19. ^ [1]

Year 2006 (MMVI) was a common year starting on Sunday of the Gregorian calendar. ... is the 234th day of the year (235th in leap years) in the Gregorian calendar. ... Cory Doctorow (born July 17, 1971) is a blogger, journalist and science fiction author who serves as co-editor of the blog Boing Boing. ... Year 2006 (MMVI) was a common year starting on Sunday of the Gregorian calendar. ... is the 234th day of the year (235th in leap years) in the Gregorian calendar. ... Year 2006 (MMVI) was a common year starting on Sunday of the Gregorian calendar. ... is the 234th day of the year (235th in leap years) in the Gregorian calendar. ... Year 2007 (MMVII) is the current year, a common year starting on Monday of the Gregorian calendar and the AD/CE era in the 21st century. ... is the 4th day of the year in the Gregorian calendar. ... Microsoft Research (MSR) is a division of Microsoft created in 1991 for researching various computer science topics and issues. ... Year 2006 (MMVI) was a common year starting on Sunday of the Gregorian calendar. ... is the 214th day of the year (215th in leap years) in the Gregorian calendar. ...

See also

For the Doctor Who novel named after the test, see The Turing Test (novel). ... Image Spam is a kind of spamming which often appears on websites as various image popups on the computer screen, (or by email) often by hackers or Spyware. ... An example of a reCAPTCHA challenge reCAPTCHA is the process of utilizing CAPTCHA to improve the process of digitizing books. ... Unsolved problems in computer science: Do one-way functions exist? A one-way function is a function that is easy to compute but hard to invert (in the sense defined below). ...

External links

Defeating CAPTCHAs: The World Wide Web Consortium (W3C) is a consortium that produces standards—recommendations, as they call them—for the World Wide Web. ... Xerox Palo Alto Research Center (PARC) was a flagship research division of the Xerox Corporation, based in Palo Alto, California, USA, which essentially created the modern personal computer paper paradigm. ...

  • Breaking a Visual CAPTCHA (Gimpy) By Greg Mori and Jitendra Malik
  • Breaking CAPTCHAs without using OCR (talks about a common but easy to fix bug in programming CAPTCHAs, allowing session re-use)
  • Defeating a simple CAPTCHA with Open Source software
  • Will Solve Captcha for Money? - Article on Slashdot about using low-paid data entry workers to defeat CAPTCHAs in bulk.
  • Captcha Exchange Server - Distributed project, defeating captchas from one-click hostings (description in Russian)
  • New virus attempts to defeat CAPTCHA with porn - "Virtual Stripper" takes off clothing for each CAPTCHA entered


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.