Cory's Wiki

For the [http://en.wikipedia.org/wiki/Disjoint-set_data_structure disjoint-set union problem], want to be able to keep track of which set each of a group of objects belongs to.

This should be done very quickly, in amortized constant time.

Operations

find(element x)

  • * query which subset a particular element belongs to
  • * return a constant identifier to represent the set
  • * Can use a particular element as the identifier, as long as it is used consistently for all members

union(element a, element b)

  • * join two subsets together into the union of their elements
  • * is 'destructive' in that there is one fewer subset after the merge
Solutions

The naive solution is to use a tree structure to represent each subset. Each element has a parent reference which points to another node in its set. The tree root acts as the set identifier.

  • find(q):
  • if q.parent:
  • return find(q.parent)
  • else:
  • return q
  • union(a,b):
  • A = find(a)
  • B = find(b)
  • if A is not B:
  • A.parent = B

To perform better, two primary optimizations are used: union by size and path compression.

union by size

When combining the trees of two sets, maintain the root of the tree with higher depth. The depth of a node can be referred to as its “rank” (a.k.a. “union by rank”).

path compression

When performing find(x), update the parent of each element to point directly to the tree root.

  • find(q):
  • if q.parent:
  • p = find(q.parent)
  • q.parent = p
  • return p
  • else:
  • return q

Performance

For n elements and m operations, the naive method can run in O(mn) in the worst-case scenario. After implementing union by rank, it becomes O(m log n).

Path compression combined with union by size allows for amortized performance $O(m \alpha (m,n))$ total or $O(\alpha(m,n))$ per operation, where $\alpha$ is the very slow-growing inverse-[http://en.wikipedia.org/wiki/Ackermann_function Ackermann] function on the size of the data.

Applications

Kruskal's Algorithm

  • * maintain sets for each connected sub-tree while building the min/max spanning tree

Dynamic shortest path problem

  • * find the shortest path between two nodes in a computer network in which links are appearing and disappearing
  • * maintain dynamic connected component sets