FACTOID # 49: Kazakhstan is the world's largest landlocked country.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Apache Lucene
Lucene
Lucene logo
Maintainer: Apache Software Foundation
Latest release: 2.0.0 / 26 May 2006
OS: Cross-platform
Use: Search and index API.
License: Apache Software License
Website: lucene.apache.org

Lucene is a free open source, information retrieval API originally implemented in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene has been ported to other programming languages including Perl, C#, C++, Python, Ruby and PHP. Image File history File links Lucene_logo_green_300. ... In software engineering, software maintenance is the process of enhancing and optimizing deployed software (software release), as well as remedying defects. ... The Apache Software Foundation (ASF) is a non-profit corporation (classified as 501(c)3 in the United States) to support Apache software projects, including the Apache HTTP Server. ... A software release refers to the creation and availability of a new version of a computer software product. ... To meet Wikipedias quality standards, this article or section may require cleanup. ... A cross-platform (or platform independent) programming language, software application or hardware device works on more than one system platform (e. ... In computer science, a search algorithm, broadly speaking, is an algorithm that takes a problem as input and returns a solution to the problem, usually after evaluating a number of possible solutions. ... Index has two distinct meanings in computer science: an integer which identifies an array element, and a data structure which enables sublinear-time lookup. ... An application programmatic interface (API) is the interface that a computer system, library or application provides in order to allow requests for services to be made of it by other computer programs, and/or to allow data to be exchanged between them. ... A software license is a legal agreement which may take the form of a proprietary or gratuitous license as well as a memorandum of contract between a producer and a user of computer software. ... The Apache Software License is an open source license used by the Apache Software Foundation. ... Website - Wikipedia, the free encyclopedia /**/ @import /skins-1. ... Open source refers to projects that are open to the public and which draw on other projects that are freely available to the general public. ... Information retrieval (IR) is the science of searching for information in documents, searching for documents themselves, searching for metadata which describe documents, or searching within databases, whether relational stand-alone databases or hypertext networked databases such as the Internet or intranets, for text, sound, images or data. ... An application programmatic interface (API) is the interface that a computer system, library or application provides in order to allow requests for services to be made of it by other computer programs, and/or to allow data to be exchanged between them. ... Java is an object-oriented programming language developed by James Gosling and colleagues at Sun Microsystems in the early 1990s. ... The Apache Software Foundation (ASF) is a non-profit corporation (classified as 501(c)3 in the United States) to support Apache software projects, including the Apache HTTP Server. ... The Apache Software License is an open source license used by the Apache Software Foundation. ... Perl, also Practical Extraction and Report Language (a backronym, see below) is a dynamic procedural programming language designed by Larry Wall and first released in 1987. ... The title given to this article is incorrect due to technical limitations. ... C++ (generally pronounced /si plÊŒs plÊŒs/) is a general-purpose, high-level programming language with low-level facilities. ... Python is an interpreted programming language created by Guido van Rossum in 1990. ... Ruby is a reflective, object-oriented programming language. ... PHP (PHP: Hypertext Preprocessor) is a reflective programming language originally designed for producing dynamic Web pages. ...


While suitable for any application which requires full text indexing and searching capability, Lucene has been widely recognized for its utility in the implementation of internet search engines and local, single-site searching. This has occasionally led to the misperception that Lucene is itself a search engine with built-in crawling and HTML parsing functionality. Instead, any such application utilizing Lucene would have to provide this functionality independently. See WebCrawler for the specific search engine of that name. ... A parser is a computer program or a component of a program that analyses the grammatical structure of an input, with respect to a given formal grammar, a process known as parsing. ...


At the core of Lucene's logical architecture is a notion of a document containing fields of text. This flexibility allows Lucene's API to be agnostic of file format. Text from PDFs, HTML, Microsoft Word documents, as well as many others can all be indexed so long as their textual information can be extracted. To meet Wikipedias quality standards, this article or section may require cleanup. ... An example of HTML code with syntax highlighting and line numbers In computing, HyperText Markup Language (HTML) is a predominant markup language for the creation of web pages. ... Microsoft Word, or Microsoft Office Word, is Microsofts flagship word processing software. ...

Contents

Software using Lucene

  • Wikipedia uses Lucene for full-text search.
  • Beagle uses a port of Lucene to C# called Lucene.Net as its indexer.
  • Docco (homepage) uses Lucene for desktop search.
  • DSpace (homepage) uses Lucene.
  • CNET uses Lucene to search their product category listings.
  • LjFind uses Lucene to search over 110,000,000 LiveJournal posts.
  • Nutch is a complete search engine implementation that utilises Lucene.
  • Red-Piranha is another Lucene based search engine. It is ready to use, deployable as a GUI, command line or Tomcat web application, and has the ability to "learn" what the user wants.
  • The Flock web browser uses Clucene, a C++ version, to do a full text search of browser history.
  • KnowledgeBase - A service focused CRM platform which uses the Lucene search engine
  • Zimbra groupware incorporates Lucene.
  • Ants P2P is using Lucene for the search option, within this anonymous file sharing program.
  • Solr is an open-source search server based on Lucene with XML/HTTP APIs, caching, replication, and a web admin interface.
  • LIRE - Lucene Image Retrieval CBIR library, which uses the Lucene search engine
  • MMBase has an expansion that uses Lucene for indexing its data.
  • Alfresco[1], an open source Entrerprise Content Management system
  • Strigi[2] uses CLucene, a C++ version, to index and search the desktop.
  • SubEthaMail A sophisticated mailing list manager. Used for searching the mailing list archives.

A more extensive list of software that uses Lucene is in the PoweredBy page of Lucene's wiki. Wikipedia is a multilingual, Web-based free content encyclopedia project. ... Old UI for Beagle, code name BEST Beagle is a search tool for Linux, enabling the user to search documents, chat logs, email and contact lists in a similar way to Spotlight in Mac OS X, or Google Desktop under Microsoft Windows. ... The title given to this article is incorrect due to technical limitations. ... This article is in need of improvement. ... This article or section does not cite its references or sources. ... DSpace is an open source software package which provides the tools for management of digital assets, and is commonly used as the basis for an institutional repository. ... {{Infobox_Company | company_name = CNET Networks| company_logo = | company_type = [[Publicly traded NASDAQ: CNET foundation = 1993| location = San Francisco, California, USA| key_people = Shelby W. Bonnie, Co-founder, Chairman, and CEO| num_employees = 2,080 (2006)| industry = Internet Information Provider| homepage = [1] ==CNET Networks, Inc. ... LiveJournal (often abbreviated LJ) is a virtual community where Internet users can keep a blog, journal, or diary. ... Nutch is an effort to build an open source search engine. ... GUI can refer to the following: GUI is short for graphical user interface, a term used to describe a type of interface in computing. ... A command line interface or CLI is a method of interacting with a computer by giving it lines of textual commands (that is, a sequence of characters) either from keyboard input or from a script. ... Apache Tomcat (formerly under the Apache Jakarta Project; Tomcat is now a top level project) is a web container developed at the Apache Software Foundation. ... In software engineering, a web application is an application delivered to users from a web server over a network such as the World Wide Web or an intranet. ... Flock is both a web browser based on Mozilla technologies and the name of the company developing the web browser. ... An example of a web browser (Mozilla Firefox running under Microsoft Windows). ... Screenshot of Zimbras webmail interface Screenshot of Zimbras contacts interface Screenshot of Zimbras calendar interface Zimbra Collaboration Suite (ZCS) is a groupware product created by Zimbra Inc. ... ANts P2P is an anonymous peer-to-peer open source GPL-licensed application written in Java. ... Content-based image retrieval (CBIR), also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR) is the application of computer vision to the image retrieval problem, that is, the problem of searching for digital images in large databases. ... ...


Ports

Lucene has been ported or is in the process of being ported to various programming languages other than Java: In computer science, porting is the adaptation of a piece of software so that it will function in a different computing environment to that for which it was originally written. ...

See also

Hadoop is a collection of Free Java software previously developed by the Nutch project but now maintainted by Lucene[1]. The system includes a distributed filesystem reminiscent of GoogleFS named the Hadoop Distributed File System (or just DFS[1]), a clone of MapReduce called HadoopMapReduce[2] and a few other... Nutch is an effort to build an open source search engine. ...

External links

  • Lucene homepage
  • Lucene in Action
  • Article "Behind the Scenes of the SourceForge.net Search System" by Chris Conrad
  • Lucene Wikipedia indexer — introductory article with Java code for search on Wikipedia data
  • Simple Lucene Examples


 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m