[http://en.wikipedia.org/wiki/Binary_search_tree Binary Search Trees] are trees with two or less children per node and maintain the following properties for each node:
Left/right are arbitrary labels to distinguish between the two sub-trees.
A balanced tree with $n$ nodes will use a number of comparisons $\log_2 n - O(1)$ and is optimal for comparison-based algorithms. Binary search can handle more general queries than hash tables. (e.g. Can quickly find the next value greater than a query value)
The longest path from the root to a leaf is the height of the tree. So a balanced tree will have height $O(\log_2 (n))$. A balanced tree of height $h$ has at most $2^h - 1$ nodes. Updating strategies should maintain this balance.
Balanced:
#* $X_1 = A[0:m-1]$ and $X_2 = A[m+1:|X|]$
Search for value x begins at the root.
*# If x is less, recurse at the left child of the current node *# if x is greater, recurse at the right child of the current node *# if they are equal, return the current node
If all keys are equally likely to be searched, then $O(log(n))$ is the best search time we can do. Otherwise if the input is non-random, then other tree strategies are better: [http://en.wikipedia.org/wiki/Weight-balanced_tree weight-balanced trees] or rebalance gradually after each update via rotation (e.g. AVL Trees, Red Black Trees, Rank Balanced Trees)
Given a new value to insert, perform a binary search. Insert the new value in a new node wherever the search ends with a null child reference.
Constructing a tree with this insertion method is very sensitive to insertion order. It is terrible if done in sorted order because it will produce a deviant tree. It can work pretty well if the insertion order is a random permutation of the variables. This is because when all permutations are equally likely, more balanced trees are more likely to occur. It is a result of the fact that multiple permutations can produce balancing effects, while deviant shapes occur with fewer permutations.
Strategy for rebalancing the tree gradually after each insertion/deletion operation.
Maintains the binary search quality while reducing the over-all height of the tree.
Alternate representation is an implicit complete binary tree. An array A of n items which are interpreted as a tree. This is an implicit data structure
in that the logical structure is just an array of items, but the key information that makes it a tree is in the ordering within the array.
For each non-root node x, priority(parent(x)) \leq priority(x).
Create a node object for each tree node, with the following fields:
Text completion: once a prefix has been typed, can search through a tree of known previous searches to provide suggestions.
Machine learning nearest neighbor classification. Can use tree search to improve search for the nearest neighboring value.
Balanced binary search trees aren't ideal for an adversarial
case model: The “adversary” doesn't make random requests, but always requests the key stored at the bottom of the tree.
Should use strategies that take advantage of nonuniform input:
The motivation for a splay tree is to move accessed keys closer to the root on access. Function is identical to a regular binary search tree, with the addition of a splay(node) routine which is called at the end of every search. It is also called after a failed search.
Amortized analysis of splay(x):
The amortized time to splay key i will be $O\left(\log \frac{W}{w_i}\right)$
Splay trees are at least as good (asymptotically) as an unchanging binary search tree which is optimized for a particular access pattern.
In special cases, a double-rotation can make no change to the rank of the target node. This requires the lowest node in the original tree to have at least half of the total weight. It is fairly contrived, but due to the special potential function the potential will still increase after a double-rotation occurs.
also see [http://en.wikipedia.org/wiki/Finger_trees Finger Trees]
Applications in cache management: want to kick out an item in the cache in order to make room for a new one. If you knew the future, the correct answer would be the one that isn't accessed for the longest amount of time. An approximate heuristic for this is to kick out the least-recently used one instead. This heuristic is the worst decision process when elements are accessed in a round-robin order. See competetive ratio