FACTOID # 38: Southern European women hugely outnumber their menfolk amongst the unemployed.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Distributed hash table

Distributed hash tables (DHTs) are a class of decentralized distributed systems that provide a lookup service similar to a hash table: (name, value) pairs are stored in the DHT, and any participating node can efficiently retrieve the value associated with a given name. Responsibility for maintaining the mapping from names to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows DHTs to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures. Distributed computing is a method of computer processing in which different parts of a program run simultaneously on two or more computers that are communicating with each other over a network. ... In computer science, a hash table is a data structure that speeds up searching for information by a particular aspect of that information, called a key. ... Network node (NN): A grouping of one or more network elements (at one or more sites) which provides network related functions, and is administered as a single entity. ... Scale in the computing field is used as a verb. ...


DHTs form an infrastructure that can be used to build more complex services, such as distributed file systems, peer-to-peer file sharing and content distribution systems, cooperative web caching, multicast, anycast, domain name services, and instant messaging. Applications that use DHTs include BitTorrent, eMule, YaCy, and the Coral Content Distribution Network. // For the Microsoft distributed file system (DFS), see Distributed File System (Microsoft). ... A peer-to-peer (or P2P) computer network is a network that relies on the computing power and bandwidth of the participants in the network rather than concentrating it in a relatively few servers. ... File sharing is the activity of making files available to other users for download over the Internet, but also over smaller networks. ... Content delivery describes the delivery of digital media content such as digital audio or digital video over a delivery medium such as broadcasting or the Internet. ... Web caching is the caching of web documents (e. ... Routing Schemes anycast broadcast multicast unicast Multicast is sometimes also used to refer to a multiplexed broadcast, although that is a very different thing and should not be confused. ... Routing Schemes anycast broadcast multicast unicast Anycast is a network addressing and routing scheme whereby data is routed to the nearest or best destination as viewed by the routing topology. ... It has been suggested that this article be split into multiple articles. ... Instant messaging (IM) is a form of real-time communication between two or more people based on typed text. ... This article is about the protocol. ... eMule is a peer-to-peer file sharing application for Microsoft Windows. ... YaCy (read ya see) is an open-source distributed search engine, built on principles of peer-to-peer (P2P) networks, and released under the GPL. Its core is a computer program written in Java distributed on several hundred of computers, as of September 2006, so-called YaCy-peers. ... Coral is an open source, peer-to-peer content distribution network designed to mirror web content. ...

Contents

History

DHT research was originally motivated, in part, by peer-to-peer systems such as Napster, Gnutella, and Freenet, which took advantage of resources distributed across the Internet to provide a single useful application. In particular, they took advantage of increased bandwidth and hard disk capacity to provide a file sharing service. A peer-to-peer (or P2P) computer network is a network that relies on the computing power and bandwidth of the participants in the network rather than concentrating it in a relatively few servers. ... Second version (revised 2001) of Napster logo: Cat wearing headphones. ... Gnutella (pronounced: with a silent g, or alternatively , following R. M. Stallmans pronunciation of GNU) is a file sharing network. ... For other uses, see Freenet (disambiguation) Freenet is a decentralized censorship-resistant peer-to-peer distributed data store aiming to provide electronic freedom of speech through strong anonymity. ... This article does not cite any references or sources. ... Typical hard drives of the mid-1990s. ...


These systems differed in how they found the data their peers contained. Napster had a central index server: each node, upon joining, would send a list of locally held files to the server, which would perform searches and refer the querier to the nodes that held the results. This central component left the system vulnerable to attacks and lawsuits. Gnutella and similar networks moved to a flooding query model—in essence, each search would result in a message being broadcast to every other machine in the network. While avoiding a single point of failure, this method was significantly less efficient than Napster. Finally, Freenet was also fully distributed, but employed a heuristic key based routing in which each file was associated with a key, and files with similar keys tended to cluster on a similar set of nodes. Queries were likely to be routed through the network to such a cluster without needing to visit many peers. However, Freenet did not guarantee that data would be found. Reliable system design is the design of systems with high levels of reliability and availability. ... Key based routing (KBR) is a lookup method used in conjunction with distributed hash tables (DHTs). ...


Distributed hash tables use a more structured key based routing in order to attain both the decentralization of Gnutella and Freenet, and the efficiency and guaranteed results of Napster. One drawback is that, like Freenet, DHTs only directly support exact-match search, rather than keyword search, although that functionality can be layered on top of a DHT.


The first four DHTs—CAN, Chord,[1] Pastry, and Tapestry—were introduced about the same time in 2001. Since then this area of research has been quite active. Outside academia, DHT technology has been adopted as a component of BitTorrent and in the Coral Content Distribution Network. The Content Addressable Network (CAN) is a distributed decentralized P2P infrastructure that provides hashtable functionality on Internet-like scale. ... Chord is one of the original distributed hash table protocols. ... Pastry is an overlay and routing network for the implementation of a distributed hash table similar to Chord. ... Tapestry is a distributed hash table which provides a decentralized object location, routing, and multicasting infrastructure for distributed applications. ... This article is about the protocol. ... Coral is an open source, peer-to-peer content distribution network designed to mirror web content. ...


Properties

DHTs characteristically emphasize the following properties:

  • Decentralisation: the nodes collectively form the system without any central coordination.
  • Scalability: the system should function efficiently even with thousands or millions of nodes.
  • Fault tolerance: the system should be reliable (in some sense) even with nodes continuously joining, leaving, and failing.

A key technique used to achieve these goals is that any one node needs to coordinate with only a few other nodes in the system – most commonly, Θ(logn) of the n participants (see below) – so that only a limited amount of work needs to be done for each change in membership. Decentralisation (American: decentralization) is any of various means of more widely distributing decision-making to bring it closer to the point of service or action. ... Scale in the computing field is used as a verb. ... In computer science, Fault-tolerance is the property of a computer system to continue operation at an acceptable quality, despite the unexpected occurrence of hardware or software failures. ...


Some DHT designs seek to be secure against malicious participants and to allow participants to remain anonymous, though this is less common than in many other peer-to-peer (especially file sharing) systems; see anonymous P2P. This page covers security in the sense of protection from hostile action. ... This article does not cite any references or sources. ... A peer-to-peer (or P2P) computer network is a network that relies on the computing power and bandwidth of the participants in the network rather than concentrating it in a relatively few servers. ... File sharing is the activity of making files available to other users for download over the Internet, but also over smaller networks. ... An anonymous P2P computer network is a particular type of peer-to-peer network in which the users and their nodes are pseudonymous by default. ...


Finally, DHTs must deal with more traditional distributed systems issues such as load balance, data integrity, and performance (in particular, ensuring that operations such as routing and data storage or retrieval complete quickly). In computer networking, load balancing is a technique (usually performed by load balancers) to spread work between many computers, processes, hard disks or other resources in order to get optimal resource utilization and decrease computing time. ... In computer science and telecommunications, the term data integrity has the following meanings: The condition in which data is identically maintained during any operation, such as transfer, storage, and retrieval. ...


Structure

The structure of a DHT can be decomposed into several main components.[2][3] The foundation is an abstract keyspace, such as the set of 160-bit strings. A keyspace partitioning scheme splits ownership of this keyspace among the participating nodes. An overlay network then connects the nodes, allowing them to find the owner of any given key in the keyspace. In computer programming and formal language theory, (and other branches of mathematics), a string is an ordered sequence of symbols. ...


Once these components are in place, a typical use of the DHT for storage and retrieval might proceed as follows. Suppose the keyspace is the set of 160-bit strings. To store a file with given filename and data in the DHT, the SHA1 hash of filename is found, producing a 160-bit key k, and a message put(k,data) is sent to any node participating in the DHT. The message is forwarded from node to node through the overlay network until it reaches the single node responsible for key k as specified by the keyspace partitioning, where the pair (k,data) is stored. Any other client can then retrieve the contents of the file by again hashing filename to produce k and asking any DHT node to find the data associated with k with a message get(k). The message will again be routed through the overlay to the node responsible for k, which will reply with the stored data. The SHA hash functions are five cryptographic hash functions designed by the National Security Agency (NSA) and published by the NIST as a U.S. Federal Information Processing Standard. ...


The keyspace partitioning and overlay network components are described below with the goal of capturing the principal ideas common to most DHTs; many designs differ in the details.


Keyspace partitioning

Most DHTs use some variant of consistent hashing to map keys to nodes. This technique employs a function δ(k1,k2) which defines an abstract notion of the distance from key k1 to key k2, which is unrelated to physical distance or network latency. Each node is assigned a single key called its identifier (ID). A node with ID i owns all the keys for which i is the closest ID, measured according to δ. Consistent hashing is a scheme that provides hash table functionality in a way that the addition or removal of one slot does not significantly change the mapping of keys to slots. ... Distance is a numerical description of how far apart objects are at any given moment in time. ... Latency is a time delay between the moment something is initiated, and the moment one of its effects begins. ...

Example. The Chord DHT treats keys as points on a circle, and δ(k1,k2) is the distance traveling clockwise around the circle from k1 to k2. Thus, the circular keyspace is split into contiguous segments whose endpoints are the node identifiers. If i1 and i2 are two adjacent IDs, then the node with ID i2 owns all the keys that fall between i1 and i2. Chord is one of the original distributed hash table protocols. ...

Consistent hashing has the essential property that removal or addition of one node changes only the set of keys owned by the nodes with adjacent IDs, and leaves all other nodes unaffected. Contrast this with a traditional hash table in which addition or removal of one bucket causes nearly the entire keyspace to be remapped. Since any change in ownership typically corresponds to bandwidth-intensive movement of objects stored in the DHT from one node to another, minimizing such reorganization is required to efficiently support high rates of churn (node arrival and failure). In computer science, a hash table is a data structure that speeds up searching for information by a particular aspect of that information, called a key. ... This article does not cite any references or sources. ...


Overlay network

Each node maintains a set of links to other nodes (its neighbors or routing table). Together these links form the overlay network. A node picks its neighbors according to a certain structure, called the network's topology. In telecommunication, the term data link has the following meanings: The means of connecting one location to another for the purpose of transmitting and receiving data. ... In computer networking a routing table is an electronic table (file) or database type object that is stored in a router or a networked computer. ... An overlay network is a computer network which is built on top of another network. ... Diagram of different network topologies. ...


All DHT topologies share some variant of the most essential property: for any key k, the node either owns k or has a link to a node that is closer to k in terms of the keyspace distance defined above. It is then easy to route a message to the owner of any key k using the following greedy algorithm: at each step, forward the message to the neighbor whose ID is closest to k. When there is no such neighbor, then we must have arrived at the closest node, which is the owner of k as defined above. This style of routing is sometimes called key based routing. The greedy algorithm determines the minimum number of US coins to give while making change. ... Key based routing (KBR) is a lookup method used in conjunction with distributed hash tables (DHTs). ...


Beyond basic routing correctness, two key constraints on the topology are to guarantee that the maximum number of hops in any route (route length) is low, so that requests complete quickly; and that the maximum number of neighbors of any node (maximum node degree) is low, so that maintenance overhead is not excessive. Of course, having shorter routes requires higher maximum degree. Some common choices for maximum degree and route length are as follows, where n is the number of nodes in the DHT, using Big O notation: In graph theory, the degree (or valency) of a vertex is the number of edges incident to the vertex. ... For other uses, see Big O. In computational complexity theory, big O notation is often used to describe how the size of the input data affects an algorithms usage of computational resources (usually running time or memory). ...

  • Degree O(1), route length O(logn)
  • Degree O(logn), route length O(logn / loglogn)
  • Degree O(logn), route length O(logn)
  • Degree O(n1 / 2), route length O(1)

The third choice is the most common, even though it is not quite optimal in terms of degree/route length tradeoff, because such topologies typically allow more flexibility in choice of neighbors. Many DHTs use that flexibility to pick neighbors which are close in terms of latency in the physical underlying network.


Maximum route length is closely related to diameter: the maximum number of hops in any shortest path between nodes. Clearly the network's route length is at least as large as its diameter, so DHTs are limited by the degree/diameter tradeoff[4] which is fundamental in graph theory. Route length can be greater than diameter since the greedy routing algorithm may not find shortest paths.[5] In the mathematical subfield of graph theory, the distance between two vertices in a graph is the number of edges in a shortest path connecting them. ... A pictorial representation of a graph In mathematics and computer science, graph theory is the study of graphs, mathematical structures used to model pairwise relations between objects from a certain collection. ...


Algorithms for Overlay Networks

Aside from routing, there exist many algorithms which exploit the structure of the overlay network for sending a message to all nodes, or a subset of nodes, in a DHT.[6] These algorithms are used by applications to do overlay multicast, range queries, or to collect statistics.


Examples

DHT protocols and implementations

The Content Addressable Network (CAN) is a distributed decentralized P2P infrastructure that provides hashtable functionality on Internet-like scale. ... Chord is one of the original distributed hash table protocols. ... Kademlia is a distributed hash table for decentralized peer to peer computer networks designed by Petar Maymounkov and David Mazières. ... Pastry is an overlay and routing network for the implementation of a distributed hash table similar to Chord. ... P-Grid is a self-organizing structured peer-to-peer system, which can accommodate arbitrary key distributions (and hence support lexicographic key ordering and range queries), still providing storage load-balancing and efficient search by using randomized routing. ... Tapestry is a distributed hash table which provides a decentralized object location, routing, and multicasting infrastructure for distributed applications. ...

Applications employing DHTs

This article is about the protocol. ... The following tables compare general and technical information for a number of applications supporting BitTorrent. ... The Circle is a peer-to-peer distributed file system written mainly in Python. ... Codeen is a proxy server system created at Princeton and deployed for general use on PlanetLab. ... Coral is an open source, peer-to-peer content distribution network designed to mirror web content. ... CSpace is a cryptographically secure, decentralised peer-to-peer-based communications system for file sharing and chat based on text, voice and video. ... Dijjer is a peer-to-peer web cache. ... eMule is a peer-to-peer file sharing application for Microsoft Windows. ... FAROO is an universal web search engine based on peer-to-peer technology. ... GNUnet is a framework for decentralized, peer-to-peer networking. ... I2P (originally short for Invisible Internet Project[1], though it is not commonly referred to by this name anymore) is a free and open source project building an anonymous and/or pseudonymous virtual private network. ... JXTA (Juxtapose) is Open Source peer-to-peer platform created by Sun Microsystems in 2001. ... NEOnet is a distributed hash table peer-to-peer network created by the makers of Morpheus, StreamCast Networks. ... Overnet is a decentralized peer-to-peer computer network, usually used for sharing large files (e. ... Released in January of 2004, Warez P2P is a proprietary P2P filesharing service that uses the Ares network, and offers a service similar to that of Kazaa. ... YaCy (read ya see) is an open-source distributed search engine, built on principles of peer-to-peer (P2P) networks, and released under the GPL. Its core is a computer program written in Java distributed on several hundred of computers, as of September 2006, so-called YaCy-peers. ... This article is about search engines. ...

See also

  • memcached: a high-performance, distributed memory object caching system
  • Prefix Hash Tree: Sophisticated querying over DHTs

memcached is a general-purpose distributed memory caching system that was originally developed by Danga Interactive for LiveJournal, but is now used by many other sites. ... A prefix hash tree (PHT) is a distributed data structure that enables more sophisticated queries over a distributed hash table (DHT). ...

References

  1. ^ Hari Balakrishnan, M. Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica. Looking up data in P2P systems. In Communications of the ACM, February 2003.
  2. ^ Moni Naor and Udi Wieder. Novel Architectures for P2P Applications: the Continuous-Discrete Approach. Proc. SPAA, 2003.
  3. ^ Gurmeet Singh Manku. Dipsea: A Modular Distributed Hash Table. Ph. D. Thesis (Stanford University), August 2004.
  4. ^ http://maite71.upc.es/grup_de_grafs/table_g.html
  5. ^ Gurmeet Singh Manku, Moni Naor, and Udi Wieder. Know thy Neighbor's Neighbor: the Power of Lookahead in Randomized P2P Networks. Proc. STOC, 2004.
  6. ^ Ali Ghodsi. Distributed k-ary System: Algorithms for Distributed Hash Tables. KTH-Royal Institute of Technology, 2006.

Robert Tappan Morris (born 1965) is an associate professor at the Massachusetts Institute of Technology. ... Communications of the ACM (CACM) is the flagship monthly magazine of the Association for Computing Machinery. ...

External links

  • Distributed Hash Tables, Part 1 by Brandon Wiley.
  • Distributed Hash Tables links Carles Pairot's Page on DHT and P2P research
  • Tangosol Coherence includes a structure similar to a DHT, though all nodes have knowledge of the other participants

  Results from FactBites:
 
Distributed hash table - Wikipedia (60 words)
A distributed hash table (DHT), is a technology based on hash tables enabling identification and retrieving, in distributed systems like some P2P networks, of information.
The whole table is distributed on the network : each node has a part of it.
Distributed hash tables are being used in Freenet and Chord.
Distributed hash table - Wikipedia, the free encyclopedia (1519 words)
Distributed hash tables (DHTs) are a class of decentralized distributed systems that partition ownership of a set of keys among participating nodes, and can efficiently route messages to the unique owner of any given key.
DHT research was originally motivated, in part, by peer-to-peer systems such as Napster, Gnutella, and Freenet, which took advantage of resources distributed across the Internet to provide a single useful application.
Contrast this with a traditional hash table in which addition or removal of one bucket causes nearly the entire keyspace to be remapped.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.