FACTOID # 140: In Switzerland, the average person has to work for 102 minutes to buy a kilogram of beef - one of the longest times in the developed world. On the other hand, they only have work 14 hours to buy a refrigerator for it.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > Search engine spider

See WebCrawler for the specific search engine of that name.


A web crawler (also known as web spider) is a program which browses the World Wide Web in a methodical, automated manner. A web crawler is one type of bot. Web crawlers not only keep a copy of all the visited pages for later processing - for example by a search engine but also index these pages to make the search narrower.


In general, the web crawler starts with a list of URLs to visit. As it visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit. The process is either ended manually, or after a certain number of links have been followed.


Web crawlers typically take great care to spread their visits to a particular site over a period of time, because they access many more pages than the normal (human) user and therefore can make the site appear slow to the other users if they access the same site repeatedly.


For similar reasons, web crawlers are supposed to obey the robots.txt protocol, with which web site owners can indicate which pages should not be spidered.


The procedure of following links and not submitting queries to databases causes much content to be ignored: the deep web.


See also: Google, PageRank, Data mining


External links

  • InternetAdSales.com: Robots, Spiders, Crawlers and HTTP_User_Agents (http://www.internetadsales.com/modules/wfsection/index.php?category=23) - Comprehensive listing of common web crawlers
  • Google Dance Tool (http://www.google-dance-tool.com/) - A tool to help webmasters determine when Google's webcrawler is crawling the web

  Results from FactBites:
 
Search engine - Wikipedia, the free encyclopedia (2691 words)
Search engines were also known as some of the brightest stars in the Internet investing frenzy that occurred in the late 1990s.
Most Web search engines are commercial ventures supported by advertising revenue and, as a result, some employ the controversial practice of allowing advertisers to pay money to have their listings ranked higher in search results.
A recent enhancement to search engine technology is the addition of geocoding and geoparsing to the processing of the ingested documents being indexed, to enable searching within a specified locality (or region).
Web crawler - Wikipedia, the free encyclopedia (5385 words)
From the search engine's point of view, there is a cost associated with not detecting an event, and thus having an outdated copy of a resource.
Recently commercial search engines like Ask Jeeves, MSN and Yahoo are able to use an extra "Crawl-delay:" parameter in the robots.txt file to indicate the number of seconds to delay between requests.
DataparkSearch is a crawler and search engine released under the GNU General Public License.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.