In computer science, the set is a collection of certain values without any particular order. It corresponds with the mathematical concept of set, but with the restriction that it has to be finite. Disregarding sequence, it is the same as a list. A set can be seen as an associative array where the value of each key-value pair is ignored.
Sets can be implemented using various data structures. Ideal set data structures make it efficient to check if an object is in the set, as well as enabling other useful operations such as iterating through all the objects in the set, performing a union or intersection of two sets, or taking the complement of a set in some limited domain. Popular methods include arrays (in particular bit arrays), hash tables, and any sort of tree structure. A Bloom map implements a set probabilistically, using a very compact representation but risking a small chance of false positives on queries. Any associative array data structure can be used to implement a set by letting the set of keys be the elements of the set and ignoring the values.
However, very few of these data structures support set operations such as union or intersection efficiently. For these operations, more specialized set data structures exist.
A disjoint-set datastructure is a datastructure that keeps track of such a partitioning.
In a disjoint-set forest, each set is represented by a tree datastructure where each node holds a reference to its parent node.
Disjoint-set datastructures arise naturally in many applications, particularly where some kind of partitioning or equivalence relation is involved, and this section discusses some of them.
A well-designed datastructure allows a variety of critical operations to be performed on using as little resources, both execution time and memory space, as possible.
In the design of many types of programs, the choice of datastructures is a primary design consideration, as experience in building large systems has shown that the difficulty of implementation and the quality and performance of the final result depends heavily on choosing the best datastructure.
After the datastructures are chosen, the algorithms to be used often become relatively obvious.