|
"The Chord project aims to build scalable, symmetric and robust distributed systems using peer-to-peer ideas. The basis for much of it is the Chord distributed hash table lookup primitive. Chord is completely decentralized and can find data using only O(log(N)) messages, where N is the number of nodes in the system. Chord's lookup mechanism is provably robust in the face of frequent node failures and re-joins." [1] This article or section should be merged with Distributed computing In computer science, a distributed system is an application that consists of components running on different computers concurrently. ...
A peer-to-peer (or P2P) computer network is a network that relies on the computing power and bandwidth of the participants in the network rather than concentrating it in a relatively few servers. ...
Distributed hash tables (DHTs) are a class of decentralized distributed systems that partition ownership of a set of keys among participating nodes, and can efficiently route messages to the unique owner of any given key. ...
Decentralisation (or decentralization) is any of various means of more widely distributing decision-making to bring it closer to the point of service or action. ...
Chord is being developed at MIT and the current Chord source code can be downloaded and used under the MIT License. Mapúa Institute of Technology (MIT, MapúaTech or simply Mapúa) is a private, non-sectarian, Filipino tertiary institute located in Intramuros, Manila. ...
Source code (commonly just source or code) is any series of statements written in some human-readable computer programming language. ...
The MIT License, also called the X License or the X11 License, originated at the Massachusetts Institute of Technology, is a license for the use of certain types of computer software. ...
Overview
Using the Chord lookup protocol, node keys are arranged in a circle. The circle cannot have more than 2m nodes. The ring can have ids/keys ranging from 0 to 2m ā 1. In the above diagram the green circles represent nodes. The black points represent keys. IDs and keys are assigned an m-bit identifier using what is known as consistent hashing. The SHA-1 algorithm is the base hashing function for consistent hashing. The consistent hashing is integral to the probability of the robustness and performance because both keys and IDs (IP addresses) are uniformly distributed and in the same identifier space. Consistent hashing is also necessary to let nodes join and leave the network without disrupting the network. The SHA (Secure Hash Algorithm) family is a set of related cryptographic hash functions designed by the National Security Agency (NSA) and published by the National Institute of Standards and Technology (NIST). ...
Consistent hashing is a scheme that provides hash table functionality in a way that the addition or removal of one bucket does not significantly change the mapping of keys to buckets. ...
An IP address (Internet Protocol address) is a unique address that devices use in order to identify and communicate with each other on a computer network utilizing the Internet Protocol standard (IP)âin simpler terms, a computer address. ...
Each node has a successor and a predecessor. The successor to a node or key is the next node in the identifier circle when you move clockwise. The predecessor of a node or key is the next node in the id circle when you move counter-clockwise. For example, the successor of node 2 is node 3, and the predecessor of node 1 is node 0. More than one successor and predecessor are kept in a list to maintain a high probability that the successor and predecessor pointers actually point to the correct nodes after possible failure or departure of the initial successor or predecessor. An example of the finger table of a node is shown above with its pointers to identifiers at 2X intervals around the chord ring stopping at M/2. An example of a query for a key request in the same original network as the previous image with the same finger table and pointers.
Potential Uses - Cooperative Mirroring: A load balancing mechanism by a local network hosting information available to computers outside of the local network. This scheme could allow developers to balance the load between many computers instead of a central server to ensure availability of their product.
- Time-shared storage: In a network, once a computer joins the network its available data is distributed throughout the network for retrieval when that computer disconnects from the network. As well as other computers' data is sent to the computer in question for offline retrieval when they are no longer connected to the network. Mainly for nodes without the ability to connect full time to the network.
- Distributed Indices: Retrieval of files over the network within a searchable database. eg. P2P file transfer clients.
- Large scale combinatorial searches: Keys being candidate solutions to a problem and each key mapping to the node, or computer, that is responsible for evaluating them as a solution or not. eg. Code Breaking
In computing, load balancing is a technique (usually performed by load balancers) to spread work between many computers, processes, hard disks or other resources in order to get optimal resource utilization and decrease computing time. ...
Proof sketches Chord must contact at most O(log N) nodes to find a successor in an N-node network, with high probability Define a node n that has a query for a key k. Suppose node p is the node that the key k maps to in Chord and n p. So node n forwards its query to the closest predecessor of k in its finger table call this node n's ith interval, somewhere between n and p, and call this node in interval i of n, f, and the new distance between f and p is 2i ā 1 worst case. So continuing this the distance at least halves each time and with in m steps, for m defined in the Overview, the query will have arrived at node p. And because the identifiers are random after 'log N' forwardings the probability will be and the expected number of identifiers in this interval is 1 with high probability so only O(log N) contacts must be made. If Chord keeps track of r = O(log N) predecessors/successors, then with high probability, if each node has probability of 1/4 of failing, find_successor (see below) and find_predecessor (see below) will return the correct nodes Simply with probablility , which is a high probability, the node will have the correct pointer.
Pseudocode Definitions for pseudocode: - finger[k]: first node that succeeds
 - successor: the next node from the node in question on the identifier ring
- predecessor: the previous node from the node in question on the identifier ring
The pseudocode to find the successor node of an id is given below: // ask node n to find the successor of id n.find_successor(id) if (id (n, successor)) return successor; else // forward the query around the circle return successor.find_successor(id); // search the local table for the highest predecessor of id n.closest_preceding_node(id) for i = m downto 1 if(finger[i] (n,id)) return finger[i]; return n; The pseudocode to stabilize the chord ring/circle after node joins and departures is as follows: // create a new Chord ring. n.create() predecessor = nil; successor = n; // join a Chord ring containing node n'. n.join(n') predecessor = nil; successor = n'.find_successor(n); // called periodically. verifies nās immediate // successor, and tells the successor about n. n.stabilize() x = successor.predecessor; if (x (n, successor)) successor = x; successor.notify(n); // n' thinks it might be our predecessor. n.notify(n') if (predecessor is nil or n' (predecessor, n)) predecessor = n'; // called periodically. refreshes finger table entries. //next stores the index of the finger to fix n.fix_fingers() next = next + 1; if(next > m) next = 1; finger[next] = find successor(n+2next ā 1); //called periodically. checks whether predecessor has failed. n.check_predecessor() if(predecessor has failed) predecessor = nil; Developers Hari Balakrishnan, Russ Cox, Frank Dabek, Jinyang Li, Frans Kaashoek, David Karger, Robert Morris, Athicha Muthitacharoen, Emil Sit, Ion Stoica, Jeremy Stribling Robert Tappan Morris (b. ...
See also The Content Addressable Network (CAN) is a distributed decentralized P2P infrastructure that provides hashtable functionality on Internet-like scale. ...
Tapestry is a distributed hash table which provides a decentralized object location, routing, and multicasting infrastructure for distributed applications. ...
Pastry is an overlay and routing network for the implementation of a distributed hash table similar to the Chord project. ...
External links |