Jumat, 25 November 2011
Kinect Apps Challenge

Senin, 15 Agustus 2011
All the summer schools
- Antonio Criminisi described their InnerEye system for retrieval of similar body part scans, which is useful for diagnosis based on similar cases' medical history. He also featured the basics of Random Forests as an advertisement to his ICCV 2011 tutorial. The new thing was using peculiar weak classifiers (like 2nd order separation surfaces). Antonio argued they perform much better then trees in some cases.
- Andrew Fitzgibbon gave a brilliant lecture about pose estimation for Kinect (MSR Cambridge is really proud of that algorithm [Shotton, 2011], this is the topic for another post).
- Olga Barinova talked about the modern methods of image analysis and her work for the past 2 years (graphical models for non-maxima suppression for object detection and urban scene parsing).
- when computing visual words, sometimes it is fruitful to use soft assignments to clusters, or more advanced methods like Locality-constrained linear coding [Wang et al., 2010];
- for instance-level recognition it is possible to use query expansion to overcome occlusions [Chum et al., 2007]: the idea is to use the best matched images from the base as new queries;
- object detection is traditionally done with sliding window, the problems here are: various aspect ratio, partial occlusions, multiple responses and background clutter for substantially non-convex objects;
- for object detection use bootstrapped sequential classification: on the next stage take the false negative detections from the previous stage as negative examples and retrain the classifier;
- multiple kernel learning [Gehler and Nowozin, 2009] is a hot tool that is used to find the ideal linear combination of SVM kernels: combining different features is fruitful, but learning the combination is not much better than just averaging (Lampert: “Never use MKL without comparison to simple baselines!”);
- movies are common datasets, since there are a lot of repeated objects/people/environments, and the privacy issues are easy to overcome. The movies like Groundhog Day and Run Lola Run are especially good since they contain repeated episodes. You can try to find the clocks on Video Google Demo.
- the life-cycle of a research project is: 1) chase the high-hanging fruit (theoretically-sound model), 2) try to make stuff really work, 3) look for the things that confuse/annoy you and fix them;
- for Kinect pose estimation, the good top-down method based on tracking did not work, so they ended up classifying body parts discriminatively, temporal smoothing is used on the late stage;
- “don't be obsessed with theoretical guarantees: they are either weak or trivial”;
- on the simplest optimization method: “How many people have invented [coordinate] alternation at some point of their life?”. Indeed, the method is guaranteed to converge, but the problems arise when the valleys are not axis-aligned;
- gradient descent is not a panacea: in some cases it does small steps too, conjugate gradient method is better (it uses 1st order derivatives only);
- when possible, use second derivatives to determine step size, but estimating them is hard in general;
- one almost never needs to take the matrix inverse; in MATLAB, to solve the system Hd = −g, use backslash: d = −H\g;
- the Friday evening method is to try MATLAB fminsearch (implementing the derivative-free Nelder-Mead method).
- (obvious) don't rely on the error on a train set, and (less obvious) don't even report about it in your papers;
- for SVM kernels, in order to be legitimate, a kernel should be an inner product; it is often hard to prove it directly, but there are workarounds: a kernel can be drawn from a conditionally positive-definite matrix; sum, product and exponent of a kernel(s) is a kernel too etc. (thus, important for multiple-kernel learning, linear combination of kernels is a kernel);
- since training (and running) non-linear SVMs is computationally hard, explicit feature maps are popular now: try to decompose the kernel back to conventional dot product of modified features; typically the features should be transformed to infinite sums, so take first few terms [Vedaldi and Zisserman, 2010];
- if the kernel can be expressed as a sum over vector components (e.g. χ2 kernel $\sum_d x_d x'_d / (x_d + x'_d)$), it is easy to decompose; radial basis function (RBF) kernel ($\exp (\|x-x'\|^2 / 2\sigma^2)$) is the exponent of a sum, so it is hardly decomposable (more strict conditions are in the paper);
- when using RBF kernel, you have another parameter σ to tune; the rule of thumb is to take σ² equal to the median distance between training vectors (thus, cross-validation becomes one-dimensional).
Rabu, 30 Maret 2011
[CFP] GraphiCon-2011 and MSCVSS
Kamis, 16 September 2010
LidarK 2.0 released
The second major release of GML LidarK is now available. It reflects our 3-year experience on 3D data processing. The description from the project page:
The LidarK library provides an open-source framework for processing multidimensional point data such as 3D LIDAR scans. It allows building a spatial index for performing fast search queries of different kinds. Although it is intended to be used for LIDAR scans, it can be helpful for a wide range of problems that require spatial data processing.
The API has been enriched with various features in this release. Indeed, it became more consistent and logical. New ways to access data (i.e. various iterators) are implemented. One can now find k nearest neighbours for any point, not just for one that belongs to the index. Since the data structure is a container, we've decided to parametrize it with template parameter. This decision is controversive: one does not need to cast tuples any more, but the code became clumsier and less portable, unfortunately.
The C++/MATLAB code is licensed free of charge for academic use. You can download it here.
Sabtu, 11 September 2010
ECCV 2010 highlights
- static equilibrium: it seems that the only property they check here is that centroid is projected into the figure bearing;
- enough support force: they estimate density (light -- vegetation, medium -- human, heavy -- buildings) and say that it is unlikely that building is build on the tree;
- volume constraint: boxes cannot intersect;
- depth ordering: backprojecting the result to the image plane should correspond to what we see on the image.
Rabu, 24 Februari 2010
Olga's CVPR paper
The paper is on multiple object retrieval. They extend the Hough transform with some graphical model, which makes the results more robust. As soon as the paper become publicly available, I'll add the link here.
Sabtu, 23 Januari 2010
On image labelling
- At Academia, the task of labelling is usually being endured on [PhD] students' broad shoulders. The funny part is the students are not always enrolled in the relevant project. At Graphics & Media Lab, students who have not attended enough seminars by the time of revision, should label some data sets for the lab projects.
- One could also hire some people to label her data. Since the developers/researchers are relatively high-paid, it is economic to hire other folks (sometimes, they are students as well). UPDATE: hr0nix mentioned in the comment that there exists the Mechanical Turk service that helps requesters to find contractors.
- The more witty way is to use applied psychology. For example, Google transformed the labelling process to the game. During the gameplay, you and your randomly chosen partner tag images. Sooner you tag an image with the same tag, more points you get. The brilliant idea! Believe or not, when I first saw it, I was carried away and could not stop playing until my friends dragged me out for a pizza!
- The most revolutionary approach was introduced by Densey Tan. Here is a popular explanation of what he has done. The idea is to capture labels straight from one's brain using EEG/fMRI/whatnot. Now they can perform only 2 or 3 class labelling, but (I hope) it is only the beginning.
Sabtu, 26 Desember 2009
Science vs. Industry
- When you work as a researcher, you invent something new instead of just coding different stuff. This job is creative, not just constructive.
- When you work as a programmer, you result is just a code. You don't publish papers, no one knows about your results. Moreover, they are often proprietorial.
- If you work as an office worker or a manager, you could be paid a lot of money, but you are supposed to work 24x7. It really sucks because you do really annoying paperwork or participate in meetings, but do not produce anything substantial. If you have a lot of money, you have no time or desire to spend them to anything interesting. That's the reason why my roommate Andrey Korolev left Shell.
- When you work as a scientist, you should not learn all the boring technologies. You can't be a prominent Java programmer without knowing heaps of jXxx libraries, a number of frameworks and without having some useless certificates. Besides, your knowledge could be narrow, they are restricted by your employer's needs.
- Scientists are mobile: they move from one university to another. Some universities do not prolong contracts with professors, even if they are very cool. For me, it seems boring to live all the life in one place. (Though you could be sent to business trips while you work at some big corporation)
- If you work at university, you might have a teaching experience of some kind, which is useful for you, if you use it properly.
- The main advantage is a good salary. :) If you are not a complete geek, it does matter. Science is usually funded by grants, which is unstable.
- Since your employer needs profit, you do something practical. You can be sure that you don't investigate some abstract stuff that will never be applied.