Variational Bayes approach to kernel estimation
Reference:
[1] James Miskin , David J.C. MacKay, Ensemble Learning for Blind Image Separation and Deconvolution, In Adv. in Independent Component Analysis, M. Girolani, Ed. Springer-Verlag.2000.
[2] R. Fergus et al., Removing camera shake from a single photograph, SIGGRAPH2006.
[3] O. Whyte, J. Sivic, A. Zisserman, and J. Ponce, Non-uniform deblurring for shaken images, CVPR2010.
[4] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
[5] F. Durand, A. Levin, Y. Weiss, and W. T. Freeman, Understanding and evaluating blind deconvolution algorithms, CVPR2009.
[6] A. Levin, Y. Weiss, F. Durand, W. T. Freeman. Efficient Marginal Likelihood Optimization in Blind Deconvolution, CVPR2011.
The standard formulation for blurring is:
,
where J is the observed image, I is the latent sharp image, K is the kernel, and is the noise.
In [1] and [2] the latent image I and kernel K both follows GMM distribution, and the noise is Gaussian. Reference [3] uses a slightly different formulation of convolution, but it also ends up with a bilinear representation of blur with sparse prior on kernel and image parameters.
Although it’s straight forward to write out the conditional distribution as
.
As [5] shows, the marginal distribution over K i.e. typically gives a better estimation of K than MAP estimate. However, with sparse prior on both I and K, this distribution is intractable to estimate. Levin solves this problem with Laplacian approximation, while references [1]-[3] use variational Bayes approach[5].
The basic idea of this variational approach is that, since is so hard to marginalize, why don’t I assume it a simpler family of distribution
?
In this case, if the KL-divergence between the two distributions P and Q is small enough, then Q is a pretty good approximate to use. In practice, is also not easy to compute in a straightfoward manner, but since
where the lower bound
is easier to compute.
Now all we need is to estimate the parameters for and use the I, K value of peaking probability as the
estimate for latent image and kernel.
A common simplification to make is that is a factorized distribution over each element of I and K. In [1] and [2] it is also assumed that the noise variance
follows a Gamma distribution:
.
Reference [4] shows that plugging this factorization into the lower bound gives:
.
In [1] – [3] the authors assume that the inverse variance of noise is unknown, and is automatically selected during the Variational Bayes process. However, this seem often causes problems[6].
Levin et.al.[6] simplifies the above formulation by seeing K as parameters. Instead of optimizing , it only approximates the distribution
and seeks the $K$ that achieves the minimal residual.