Directed tree structured Markov networks can be easily transformed into an efficient **product-form distribution**, represented by a “cluster tree”. The **tree width** is the maximum number of variables within a single cluster minus one. If the tree is a path, then the **path width** is the tree width when decomposed into a chain graph.
$w^* \leq pw^* \leq \dots$

Given a directed tree graph G, produce a product-form distribution by taking the product, starting at the root of G. $p(1,\dots,7) = p(3)p(1|3)p(2|3)p(4|3)p(5|4)p(6|4)p(7|4)$

Alternately, the product-form distribution can be constructed by multiplying the joint distribution of each clique together, divided by their intersections. $p(1,\dots,7) = \frac{p(1,3)p(2,3)}{p(3)} * \frac{p(3,4)}{p(3)} * \frac{p(4,5)}{p(4)} * \frac{p(4,6)}{p(4)} * \frac{p(4,7}{p(4)}$

This method has the benefit that it applies to any graph of cliques arranged in a tree structure.

For a tree-decomposition with edge between $B_j$ and $B_i$, $sep(i,j) = \text{scope}(B_I) \cap \text{scope}(B_J)$

The **separator-width** is the maximum separator in the tree.

For a tree-decomposition with edge between $B_j$ and $B_i$, $elim(i,j) = \text{scope}(B_I) - sep(i,j) = \text{scope}(B_I) - \text{scope}(B_I) \cap \text{scope}(B_J)$

So we desire to decompose cyclical graphs into a tree structure. Chordal I-maps are decomposable. The fill-in algorithm can test for chordality and `triangulate`

or make the graph chordal.

Cutset conditioning can produce a tree-decomposition from a dependency graph.

This is a method for constructing a join-tree a.k.a “junction tree”.
Decompose a graphical model $\mathcal{M} = <X,D,F,\Pi>$, its scopes $S = S_1,\dots,S_r$, and its primal graph $G=(X,E)$, where X = {$X_1,\dots,X_n$} and F = {$f_1,\dots,\f_r$}. Output is a **join-tree decomposition**.

- Select a variable ordering $d = (X_1,\dots,X_n)$
- Triangulate the graph in reverse-order
- Create a join-tree of the induced triangulated graph
- Identify all maximal cliques in the chordal graph
- create a tree structure over the cliques

##* In reverse-variable order: ##*# add the clique to the tree by connecting it to another node with which it share the largest subset of variables

- Place each input function in a cluster containing all variables in its scope

Simpler version:

- Use the fill-in algorithm if necessary
- Identify all cliques of G
- Order the cliques as $C_1,\dots C_t$ by rank of the highest vertex in the clique
- Connect each $C_i$ to a predecessor $C_j$ where $(j < i)$ and $C_j$ shares the most vertices with $C_i$

Once a join tree is created, “Messages” can be passed between clusters by marginalizing over non-shared variables. Each cluster will originate with a funtion of its member variables, but will then receive “messages” in the form of functions over sub-sets of variables shared with adjacent clusters. After receiving all messages, a joint marginal can be computed from the product of all of these functions within the cluster.

Generalization of Bucket-Tree Elimination, **Cluster-Tree Elimination** or **CTE** is a message-passing algorithm for producing a tree with [http://en.wikipedia.org/wiki/Explicit_and_implicit_methods model explicit] clusters. So each cluster becomes an independent representation of the variables within its scope.

- Input: a tree decomposition
- while not converged:
- for every edge(u,v) in the tree:
- If u has received messages from all adjacent vertices other than v:
- compute message u→v = product of:
- all messages received by u from neighbors other than v
- sum of local conditional probabilities of u
- send u→v to v

Convergence is guaranteed, and occurs when no new messages are created. If processing is performed from leaves to root, convergence is guaranteed after two passes.

Time complexity: $O((r+N) * \text{degree} * k^{w+1})$ Space complexity: $O(N k^{\text{max separator size}})$

- * N = number of vertices in the tree-decomposition
- * w = the tree-width of the decomposition
- * r = the number of functions in the model
- * degree = the maximum degree in the tree-decomposition
- * k = maximum domain size of any variable

A tree-decomposition can be transformed to reduce separator-width by combining nodes separated by the greatest width. Cluster-tree elimination applied to the resulting graph will have time complexity $O(m * \text{degree} * e^{r})$ and space $O(n * e^{s})$, where $r$ is the maximal number of nodes in a cluster and $s$ is the separator-width of the resulting decomposition.