a very incomplete and biased sampling of cvpr2011 papers and thoughts
This is a set of interesting papers in cvpr2011 as well as my thoughts after seeing these papers. My selection of the papers and comments is very biased because I am much more interested in producing nice pictures than understanding their contents, and I wasn’t able to attend some of the posters/orals because there’s time overlap/ I was doing my own poster/ I didn’t want to squeeze into that terrible crowd around the poster…
Object recognition
I don’t know object recognition very well and only attended very few oral/poster sessions. But I have found the following two ideas very interesting.
One impressive trend in recognition is a retrospect of the premise in learning based methods: iid sampling in testing and training dataset. Although the distribution in a single dataset might be iid, the real world doesn’t follow the same distribution. Obvious biases across various current object recognition dataset signals a very clear sign of this problem. The following keynote and paper looks into this problem:
More Words and Bigger Pictures: Where could large-scale learning take us? by David Forsyth (Workshop keynote)
Unbiased Look at Dataset Bias (PDF) by Antonio Torralba, Alyosha Efros
- Another interesting trend is to make recognition more human centered. Tags in natural languages may be too complex and ambiguous to work with, therefore, it might be more plausible to associate the realworld with what a machine can do with it, as in the following paper.
- From 3D Scene Geometry to Human Workspace (PDF) by Abhinav Gupta, Scott Satkin, Alyosha Efros, Martial Hebert
- Image restoration/statistics
- This is the area I’m most familiar with, and I looked at almost every oral/poster on this talk. I have found patch-based statistics very well studied in deblurring/inpainting/super-resolution areas, but never in difficult restoration problems such as motion deblurring.
- My favorite paper in this aspect is the following, since it shows that the “internal” self-similarity in image patches is actually a very strong prior, and much cheaper to use.
- Internal Statistics of a Single Natural Image (PDF, abstract,results,supplementary material) by Maria Zontak, Michal Irani.
Another paper I found very interesting is
Blind Deconvolution Using A Normalized Sparsity Measure (PDF) by Dilip Krishnan (New York University), Rob Fergus
especially when relating it to Anat Levin’s work on resolving dimensionality asymmetry issues between kernel space and image space.
But this paper uses a specific way of regularization that doesn’t need the variance term for regularization.
Computational photography
This year also sees a few interesting computational cameras/optics design.
Have you ever considered that the glare caused by star filters can be used to restore the saturated areas, or the shape and location of a object can be computed from many of its images in a complex mirror system, or some efficient and cheap computation can be done by optics to cut off the energy use in a micro robot, or how fluorescent can be captured in images; what is the appropriate way of post-capture processing pipline to eliminate sensor/optics noise? You think it is possible to capture the propagation of light somehow? How can you reconstruct a scene when light doesn’t go straight?
Glare Encoding of High Dynamic Range Images (PDF, supplementary material, project, video) by Mushfiqur Rouf , Rafal Mantiuk, Wolfgang Heidrich, Matthew Trentacoste , Cheryl Lau.
Three-Dimensional Kaleidoscopic Imaging by Ilya Reshetouski , Alkhazur Manakov , Hans-Peter Seidel, Ivo Ihrke
Wide-angle Micro Sensors for Vision on a Tight Budget (PDF, project) by Sanjeev Koppal, Todd Zickler, Ioannis Gkioulekas.
Separating Reflective and Fluorescent Components of An Image by Cherry Zhang, Imari Sato
Estimating Motion and Size of Moving Non-Line-of-Sight Objects in Cluttered Environments, by Rohit Pandharkar , Andreas Velten, Andrew Bardagjy, Ramesh Raskar, Moungi Bawendi, Ahmed Kirmani, Everett Lawson
Noise Suppression in Low-Light Images through Joint Denoising and Demosaicing (PDF) by Priyam Chatterjee, Neel Joshi, Sing Bing Kang, Yasuyuki Matsushita
The Light-Path Less Traveled (PDF) by Srikumar Ramalingam, Sofien Bouaziz, Peter Sturm, Philip Torr
Btw, I saw a poster using multiple cameras to see through occlusion for tracking. I’m not particularly amazed by the technique this very poster use, but it truly fascinates me that people outside computational photography cares about what this area does.
3D reconstruction
This topic, like object recognition, is extremely hot in this conference given the large number of posters on it. Many try to deal with illumination/deformation by factoring them out of the image, but my favorite paper is saying that complex illumination projects more constraints on the shape of an object.
Shape Estimation in Natural Illumination (PDF, project), by Micah Johnson, Edward Adelson.
Another very interesting paper is about eliminate ambiguity in a scene with many duplicate patterns. Whilst many current paper can solve very deformable objects, finding correspondence among them remains a very hard problem. Many current papers still assume a constant illumination.
Structure from motion for scenes with large duplicate structures (PDF) by Richard Roberts, Sudipta Sinha, Richard Szeliski, Drew Steedly
Image quality measurements
I paid some attention to this topic not simply because I’m presenting a paper on this topic, but because it astonished me that by using many quite straight-forward features and a very simple regression algorithm, I got such a nice performance on evaluating image qualities. Other than my paper (which only targets at low-level distortions and gray image), there are also papers working on color qualities, or even the content of the image (which caught a lot of attention in the conference), using the same approach of subjective data collection + feature-designing + simple regression/classification.
However, my major concern is, just like object recognition, how much is this learning-based approach biased to the dataset. For instance, is my design of image quality features biased to blur and compression; is the memorization features biased to images of only a few objects and a simple layout? Further, are such biases necessary or undesired in real-world applications? Do they provide guidelines for how to restore details, how to tune amateur photos into professional ones, or even how to capture intereting/good photos?
Accelerated computation
One typical question I ask at posters is “how fast is your algorithm with a typical input size”. In the real-world we are processing large bulks of data: a typical useful photo is about 1024×768 pixels. Unfortunately, many computer vision problems are so difficult that we are dealing with much less data, e.g. images of 300×300 pixels, and the algorithms still don’t work in real-time.
Based on the MRF optimization relevant talks/posters I attended, I have the impression that people are trying to solve this problem by splitting it into many smaller modules.
This year many paper make the observation that depth maps are piecewise constant (or linear), and use this as a prior to accelerate processing. Although this works very well on current datasets, I don’t think this is the “right” prior to use since this constancy is due to lack of depth precisions with the relevant reconstruction techniques (e.g. pixel disparity has a relatively low precision). It might be fine to recover the structure just roughly for visualization purposes, but there are applications that require high precision depth recovery too and in such applications, a strong prior on piecewise constancy might hurt.
Of course, a popular way of dealing this problem is to go parallel. Google and UW has made bundle adjustment efficient by paralleling the numerical computing.
Multicore Bundle Adjustment (PDF), by Changchang Wu, Sameer Agarwal, Brian Curless, Steve Seitz
Another very interesting paper from GE states that computation should go INTEGER. When I saw the paper, I thought, wow, that is really industrial way of thinking…
The Magic Sigma by Dirk Padfield
To read all of these papers might take ages for me. Thanks for the useful overview.
lishuda
June 28, 2011 at 3:31 am