FACTOID # 53: If you thought Antarctica was inhospitable, think again - its land area is only ninety-eight percent ice. Reassuringly, the other 2% is categorised as "barren rock".
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RELATED ARTICLES
People who viewed "Kademlia" also viewed:
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Kademlia

Kademlia is a distributed hash table for decentralized peer to peer computer networks designed by Petar Maymounkov and David Mazières. It specifies the structure of the network and how the exchange of information has to take place through node lookups. Kademlia nodes communicate among themselves using the User Datagram Protocol. Over an existing network, a new virtual or overlay network is created in which each node is identified by a number or node ID. The node ID serves not only as a nodes identification, but the Kademlia algorithm uses it to locate values (usually file hashes or keywords). In fact, the node ID provides a direct map to file hashes. Distributed hash tables (DHTs) are a class of decentralized distributed systems that provide a lookup service similar to a hash table: (name, value) pairs are stored in the DHT, and any participating node can efficiently retrieve the value associated with a given name. ... KCeasy, an open-source program that connects to various peer-to-peer networks A peer-to-peer (or P2P) computer network relies primarily on the computing power and bandwidth of the participants in the network rather than concentrating it in a relatively low number of servers. ... Computer networks redirects here. ... Network node (NN): A grouping of one or more network elements (at one or more sites) which provides network related functions, and is administered as a single entity. ... The User Datagram Protocol (UDP) is one of the core protocols of the Internet protocol suite. ... Computer networks redirects here. ... An overlay network is a computer network which is built on top of another network. ... A hash function [1] is a reproducible method of turning some kind of data into a (relatively) small number that may serve as a digital fingerprint of the data. ...


When searching for some value, the algorithm explores the network in several steps. Each step approaches the key until the contacted node returns the value or no more closer nodes are found. Like many other DHTs, Kademlia contacts only O(logn) (see Big O notation) nodes during the search out of a total of n nodes in the system. Big O notation or Big Oh notation, and also Landau notation or asymptotic notation, is a mathematical notation used to describe the asymptotic behavior of functions. ...


Further advantages are found particularly in the decentralized structure, which clearly increases the resistance against a denial of service attack. Even if a whole set of nodes are flooded, this will have limited effect on network availability, which will recover itself by knitting the network around these "holes". A denial-of-service attack (also, DoS attack) is an attack on a computer system or network that causes a loss of service to users, typically the loss of network connectivity and services by consuming the bandwidth of the victim network or overloading the computational resources of the victim system. ...

Contents

System Details

First generation peer-to-peer networks, such as Napster, relied on a central database to co-ordinate look ups on the network. Second generation peer-to-peer networks, such as Gnutella, used flooding to locate files, searching every node on the network. Third generation peer-to-peer networks use Distributed hash tables to look up files in the network. Distributed hash tables store resource locations throughout the network. A major criterion for these protocols is locating specific nodes quickly.


The Kademlia algorithm is based on the calculation of the "distance" between two nodes. This distance is computed as the exclusive or of the two node IDs, taking the result as an integer number. Exclusive disjunction (usual symbol xor) is a logical operator that results in true if one of the operands (not both) is true. ... The integers are commonly denoted by the above symbol. ...


This "distance" does not have anything to do with geographical conditions, but designates the distance within the ID range. Thus it can and does happen that, for example, a node from Germany and one from Australia are "neighbours".


Each Kademlia search iteration comes one bit closer to the target. A basic Kademlia network with 2n nodes will only take n steps to find that node.


Routing Tables

Kademlia routing tables consist of a list for each bit of the node id. The list holds data to locate other nodes. The data in each list is the port, ip address, and node id of other nodes. Nodes that can go in the nth list must have a differing nth bit from the node's id. This means that it is very easy to fill the first list as 1/2 of the nodes in the network are valid. The next list can use only 1/4 of the nodes in the network, etc.


In the Kademlia literature, the lists are referred to as k-buckets. k is a system wide number, like 20. Ie, all nodes on the network will have lists of 20 nodes for a particular bit.


Since the possible nodes for each k-bucket decreases quickly, the lower bit k-buckets are highly specialized and will fully map all nodes in that section of the network. For instance if k is 20, then the last four k-buckets can never be full, even in a fully populated network.

An example network partition
An example network partition

Consider the simple network to the right. There are seven nodes; the small circles at the bottom. The node under consideration is node six (binary 110) in black. There are three k-buckets in this network. Nodes zero, one and two (binary 000, 001, and 010) are candidates for the first k-bucket. Node three (binary 011) is not participating in the network. In the second k-bucket, nodes four and five (binary 100 and 101) are placed. Finally, the third k-bucket can only contain node seven (binary 111). Each of the three k-buckets is enclosed in a gray circle. If the size of the k-bucket was two, then the first 2-bucket could only contain two of the three nodes. Image File history File links No higher resolution available. ... Image File history File links No higher resolution available. ...


When a k-bucket is full and a new node is discovered for that k-bucket, the least recently seen node in the k-bucket is PINGed. If the node is still alive, the new node is place in a secondary list; a replacement cache. The replacement cache is used if a node in the k-bucket stops responding. By keeping older nodes in the k-bucket, it is more likely that well connected nodes will be in the higher bit k-buckets.


Protocol Messages

Kademlia has four messages.

  • PING - used to verify that a nodes is still active.
  • STORE - notify node of resource location; the resource should map to this node.
  • FIND_NODE - asks for the best route to a given node.
  • FIND_VALUE - as FIND_NODE, but if the value is in it's STORE, a pointer to the resource is returned.

Each RPC message includes a random value from the initiator. This ensures that a response is from the actual node the message was sent to. Remote procedure call (RPC) is a protocol that allows a computer program running on one computer to cause a subroutine on another computer to be executed without the programmer explicitly coding the details for this interaction. ...


Locating Nodes

Node lookups can proceed asynchronously. The number of lookup is denoted by α and is typically three. There are generally redundant paths to each node due to the k-buckets. A node issues FIND_NODE request to any nodes in its k-bucket matching the desired id . These nodes will search their k-buckets and return the k closest nodes to the desired node id. The search continues until no values are returned that are closer than the best previous result; this would be one of the nodes queried. When this happens, the search expands to all k of the best results. If none of these returns a better node id, the lookup terminates with the best resulting node id.


The node information can be augmented with round trip times, or RTT. This information can be used to choose nodes in the search. Both the k-buckets and the FIND_NODE and FIND_VALUE replies must be augmented with RTT values. In telecommunications, the term round-trip delay time or round-trip time (RTT) has the following meanings: The elapsed time for transit of a signal over a closed circuit, or time elapsed for a message to a remote place and back again. ... In telecommunications, the term round-trip delay time or round-trip time (RTT) has the following meanings: The elapsed time for transit of a signal over a closed circuit, or time elapsed for a message to a remote place and back again. ...


Locating Resources

Information is located by mapping it to a node id. A hash is typically used for the map. The target node that will have information due to a previous STORE message. Locating a resource follows the same procedure as locating a node, except the search terminates when a node has a match to the resource hash. A hash function [1] is a reproducible method of turning some kind of data into a (relatively) small number that may serve as a digital fingerprint of the data. ...


The resource information is stored at several nodes to allow for nodes to come and go. Also, for popular files, the resource is stored farther and farther away from the best node. This allows popular searches to resolve quickly. The information also ages, so if a files loses it's popularity the information will eventually only be stored close to the best node. The node that is providing the file will periodically perform STORE messages. When the node providing the file goes off line, the file information will then disappear from even the best nodes.


Joining the Network

A node that would like to join the net must first go through a bootstrap process. In this phase, the node needs to know the IP address of another node (obtained from the user, or from a stored list) that is already participating in the Kademlia network. If the bootstrapping node has not yet participated in the network, it computes a random ID number that is not already assigned to any other node. It uses this ID until leaving the network. In computing, Bootstrapping refers to a process where a simple system activates another more complicated system that serves the same purpose. ... This article or section does not cite any references or sources. ... The word random is used to express lack of order, purpose, cause, or predictability in non-scientific parlance. ...


The joining node inserts the bootstrap node into its k-bucket. The new node then does a lookup of itself, using the standard node search procedure. The "self-lookup" will populate other nodes' k-buckets with the new node id. After this, the new node refreshes all k-buckets further than its nearest neighbor. A refresh is just a lookup of a random node id that falls in the k-bucket range. Nearby nodes will also transfer resource locations (using a STORE message) to the new node.


Initially nodes have one k-bucket. When the k-bucket becomes full, it can be split. The split occurs if the range of nodes in the k-bucket spans the nodes own id (values to the left and right in a binary tree). Kademlia relaxes even this rule for the nearestk-buckets. It may turn out that a highly unbalanced binary sub-tree exists near the node. If k is 20, and there are 21+ nodes with a prefix "xxx0011....." and the new node is "xxx000011001", the new node can contain multiple k-buckets for the other 21+ nodes. This is to guarantee that the network knows about all nodes in a region.


Accelerated lookups

Kademlia uses a XOR metric to define closeness. The XOR function returns zero if bits are the same and one if the bits are different. Two nodes values XORed together will result in the difference of the bits. The Kademlia protocol searches the network "bit by bit" so that eventually, the XOR becomes close to zero and the best node is found. Exclusive disjunction (usual symbol xor) is a logical operator that results in true if one of the operands (not both) is true. ...


The XOR metric allows Kademlia to extend routing tables beyond single bits. Groups of bits can be place in k-buckets. The group of bits are termed a prefix. For an m-bit prefix, there will be 2m-1 k-buckets. The missing k-bucket is a further extension of the routing tree that contains the node ID. An m-bit prefix reduces the maximum number of lookups from log2 n to log2b n. These are maximum values and the average value will be far less due to redundancy, increasing the chance of finding a node in a k-bucket that share more than just the prefix with the target.


Nodes can use mixtures of prefixes in their routing table, such as the Kad Network used by eMule. The Kademlia network could even be heterogeneous in routing table implementations. This would just complicate the analysis of lookups. The Kad network is a peer-to-peer (P2P) network which implements the Kademlia P2P overlay protocol. ... eMule is a peer-to-peer file sharing application for Microsoft Windows. ...


Use in file sharing networks

Kademlia has been used for file sharing. By making Kademlia keyword searches, one can find information in the file-sharing network so it can be downloaded. Since there is no central instance to store an index of existing files, this task is divided evenly among all clients: If a node wants to share a file, it processes the contents of the file, calculating from it a number (hash) that will identify this file within the file-sharing network. The hashes and the node IDs must be of the same length. It then searches for several nodes whose ID is close to the hash, and has his own IP address stored at those nodes. A searching client will use Kademlia to search the network for the node whose ID has the smallest distance to the file hash, then will retrieve the contacts list that is stored in that node. The contacts stored in the network constantly change as nodes connect and disconnect. Due to built-in redundancy, the contacts are replicated in several nodes. File sharing is the activity of making files available to other users for download over the Internet, but also over smaller networks. ... A hash function [1] is a reproducible method of turning some kind of data into a (relatively) small number that may serve as a digital fingerprint of the data. ... Redundancy in information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. ...


The file hash is usually obtained from a specially formed Internet link found elsewhere, or included within an indexing file obtained from other sources.


Filename searches are implemented using keywords. The filename is divided into its constituent words. Each of these keywords is hashed and stored in the network, together with the corresponding filename and file hash. A search involves choosing one of the keywords, contacting the node with an ID closest to that keyword hash, and retrieving the list of filenames that contain the keyword. Since every filename in the list has its hash attached, the chosen file can then be obtained in the normal way. In computer science, a keyword is an identifier which indicates a specific command. ...


Public clients using the Kademlia algorithm (these networks are incompatible with one another):

  • Overnet network: Overnet (integrated in eDonkey (no longer available) and MLDonkey)
  • Kad Network: eMule (starting from v0.40), MLDonkey (starting from 2.5-28) and aMule (starting from 2.1.0).
  • RevConnect - see also [1] (starting with version 0.403)
  • KadC: [2] C library to publish and retrieve information to/from the Overnet network
  • BitTorrent Azureus DHT: Azureus (2.3.0.0+): uses a modified Kademlia implementation for decentralized tracking and various other features like the "Comments and Ratings" plugin.[3]
  • BitTorrent Mainline DHT: BitTorrent client (4.1.0+), µTorrent (1.2+), BitSpirit (3.0+), BitComet (0.59+) and KTorrent: They all share a DHT based on an implementation of the Kademlia algorithm, for trackerless torrents.

Overnet is a decentralized peer-to-peer computer network, usually used for sharing large files (e. ... eDonkey2000 was a peer-to-peer file sharing application developed by MetaMachine, using the Multisource File Transfer Protocol. ... MLDonkey is an open source, free software multi-network peer-to-peer application. ... The Kad network is a peer-to-peer (P2P) network which implements the Kademlia P2P overlay protocol. ... eMule is a peer-to-peer file sharing application for Microsoft Windows. ... MLDonkey is an open source, free software multi-network peer-to-peer application. ... In computing, aMule is a peer-to-peer file sharing application that works with the eDonkey2000 network and the Kad Network, but offers more features than the standard eDonkey client, including support for Kademlia. ... Overnet is a decentralized peer-to-peer computer network, usually used for sharing large files (e. ... This article is about a BitTorrent client. ... A BitTorrent client is a client that utilizes the BitTorrent protocol for data transfer. ... µTorrent (also microTorrent or uTorrent) is a freeware proprietary BitTorrent client for Microsoft Windows written in C++, and localized for many different languages. ... BitComet is a BitTorrent client written in C++ for Microsoft Windows and available in 43 different languages. ... KTorrent is a BitTorrent client written in C++ for KDE using the Qt user interface toolkit. ...

See also

The Content Addressable Network (CAN) is a distributed decentralized P2P infrastructure that provides hashtable functionality on Internet-like scale. ... Chord is one of the original distributed hash table protocols. ... Tapestry is a distributed hash table which provides a decentralized object location, routing, and multicasting infrastructure for distributed applications. ... Pastry is an overlay and routing network for the implementation of a distributed hash table similar to Chord. ...

External links


  Results from FactBites:
 
FAQ eD2k-Kademlia - AMule Project FAQ (2667 words)
Since Kademlia is a decentralized network, it removes the bottleneck that was previously caused by the need for servers (though Lugdunum has done great work in reducing this bottleneck).
Both are based on the original Kademlia algorithm but have been applied in different ways and therefore are incompatible.
This latter one is solved in the Kademlia protocol.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.