$p(x)$ is hard, so choose an easy proposal distribution $q(x) \in Q$. Formulate inference as an optimization problem: minimize “distance” between q and p.
Projection problem: Given p, find distribution from family of distributions Q that is closest to p.
Optimization functions KL Divergence I-projection and M-projection
The log-partition function can be expressed in terms of free energy and the KL divergence.
[http://en.wikipedia.org/wiki/Mean_field Mean field] assumes a propositional distribution in the form of a Gibbs distribution. Exponential function of “Means of the neighbors”. Goal is to optimize $\text{max}_{q \in Q} H[q(x)] + \sum_x q(x) \log(\~ p(x))$.
Update step:
Guaranteed to converge to a stationary point, but not necessarily a local optimum.
Naive Mean Field: q is a set of independent marginals. Structured Mean Field: q has some low tree-width structure
Choose q to be a junction tree of p. This can produce exact inference.
“Construct lagrangian, stationary points, do a bit of math…”
Check out Holder's inequality. Weigth each bucket, subject to the constraint that all weights sum to one.