Huixuan's wlog

on computational photography and others

Variational Bayes approach to kernel estimation

leave a comment »

Reference:
[1]   James Miskin ,  David J.C. MacKay, Ensemble Learning for Blind Image Separation and Deconvolution, In Adv. in Independent Component Analysis, M. Girolani, Ed. Springer-Verlag.2000.
[2]   R. Fergus et al., Removing camera shake from a single photograph, SIGGRAPH2006.
[3]   O. Whyte, J. Sivic, A. Zisserman, and J. Ponce, Non-uniform deblurring for shaken images, CVPR2010.
[4]   C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
[5]   F. Durand, A. Levin, Y. Weiss, and W. T. Freeman, Understanding and evaluating blind deconvolution algorithms, CVPR2009.
[6] A. Levin, Y. Weiss, F. Durand, W. T. Freeman. Efficient Marginal Likelihood Optimization in Blind Deconvolution, CVPR2011.

The standard formulation for blurring is:
J = I \circ K + \epsilon,
where J is the observed image, I is the latent sharp image, K is the kernel, and \epsilon is the noise.
In [1] and [2] the latent image I and kernel K both follows GMM distribution, and the noise \epsilon is Gaussian. Reference [3] uses a slightly different formulation of convolution, but it also ends up with a bilinear representation of blur with sparse prior on kernel and image parameters.
Although it’s straight forward to write out the conditional distribution as
P(I, K|J) = P(\epsilon)P(I)P(K).
As [5] shows, the marginal distribution over K i.e. P(K|J)typically gives a better estimation of K than MAP estimate. However, with sparse prior on both I and K, this distribution is intractable to estimate. Levin solves this problem with Laplacian approximation, while references [1]-[3] use variational Bayes approach[5].

 
The basic idea of this variational approach is that, since P(I, K|J) is so hard to marginalize, why don’t I assume it a simpler family of distribution Q(I, K)?
In this case, if the KL-divergence between the two distributions P and Q is small enough, then Q is a pretty good approximate to use. In practice, KL(Q||P) is also not easy to compute in a straightfoward manner, but since
ln P(J) = L(Q) + KL(Q||P)
where the lower bound
L(Q) = \iint Q(I, K) ln{\frac{P(I, K, J)}{Q(I, K)}} dI dK
is easier to compute.

Now all we need is to estimate the parameters for Q(I, K) and use the I, K value of peaking probability as the MAP_K estimate for latent image and kernel.

A common simplification to make is that Q is a factorized distribution over each element of I and K. In [1] and [2] it is also assumed that the noise variance \sigma follows a Gamma distribution:
Q(I, K) = \prod_{n=1}^{N} Q_I(I_n) \prod_{m=1}^{M} Q_K(K_m)\prod_{v=1}^{N}Q_{\epsilon}(\epsilon_v).
Reference [4] shows that plugging this factorization into the lower bound gives:
Q(I_m) = E_{Q(I, K, \epsilon/ I_m)} P(I, K, J, V).

In [1] – [3] the authors assume that the inverse variance of noise is unknown, and is automatically selected during the Variational Bayes process. However, this seem often causes problems[6].

Levin et.al.[6] simplifies the above formulation by seeing K as parameters. Instead of optimizing P(I, K), it only approximates the distribution P(I|J, K) and seeks the $K$ that achieves the minimal residual.

Advertisement

Written by hxtang

March 22, 2011 at 12:53 pm

Posted in paper reading, wlog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.