# Cory's Wiki

For the [http://en.wikipedia.org/wiki/Disjoint-set_data_structure disjoint-set union problem], want to be able to keep track of which set each of a group of objects belongs to.

This should be done very quickly, in amortized constant time.

##### Operations

find(element x)

• * query which subset a particular element belongs to
• * return a constant identifier to represent the set
• * Can use a particular element as the identifier, as long as it is used consistently for all members

union(element a, element b)

• * join two subsets together into the union of their elements
• * is 'destructive' in that there is one fewer subset after the merge
##### Solutions

The naive solution is to use a tree structure to represent each subset. Each element has a parent reference which points to another node in its set. The tree root acts as the set identifier.

• find(q):
• if q.parent:
• return find(q.parent)
• else:
• return q
• union(a,b):
• A = find(a)
• B = find(b)
• if A is not B:
• A.parent = B

To perform better, two primary optimizations are used: union by size and path compression.

#### union by size

When combining the trees of two sets, maintain the root of the tree with higher depth. The depth of a node can be referred to as its “rank” (a.k.a. “union by rank”).

#### path compression

When performing find(x), update the parent of each element to point directly to the tree root.

• find(q):
• if q.parent:
• p = find(q.parent)
• q.parent = p
• return p
• else:
• return q

#### Performance

For n elements and m operations, the naive method can run in O(mn) in the worst-case scenario. After implementing union by rank, it becomes O(m log n).

Path compression combined with union by size allows for amortized performance $O(m \alpha (m,n))$ total or $O(\alpha(m,n))$ per operation, where $\alpha$ is the very slow-growing inverse-[http://en.wikipedia.org/wiki/Ackermann_function Ackermann] function on the size of the data.

##### Applications
• * maintain sets for each connected sub-tree while building the min/max spanning tree

Dynamic shortest path problem

• * find the shortest path between two nodes in a computer network in which links are appearing and disappearing
• * maintain dynamic connected component sets