Warung Bebas
Tampilkan postingan dengan label GML. Tampilkan semua postingan
Tampilkan postingan dengan label GML. Tampilkan semua postingan

Jumat, 25 November 2011

Kinect Apps Challenge

Graphics & Media Lab and Microsoft Research Cambridge have announced a contest for applications that use Kinect sensor. The authors of the five brightest apps will be funded to attend the 2012 Microsoft Research PhD summer school. I blogged about the last year's event in my previous post. Details of the contest are here.

I guess there's no need to explain what Kinect is. It is extremely successful combined colour and depth sensor by Microsoft. It is being distributed for killer price (≈$150) as an add-on for Xbox, although it is hard to imagine the range of possible applications. Controlling a computer only by gestures is considered as a primer of natural user interface. To name some applications beyond gaming, this kind of NUI is useful to help surgeons to keep their hands clean:

Kinect helps blind people to navigate through buildings:

For more ideas look at the winners of OpenNI challenge.

OpenNI is an open-source alternative to Kinect SDK. Its strong point is it is integrated with PCL, but beware that you cannot use it for the contest. Only Kinect for Windows SDK is allowed.

Kinect is probably the most successful commercial outcome from the Microsoft Research lab. MSR is unique because they do a lot of theoretical research there, and it is unclear if it is profitable for Microsoft to fund it. But the projects like Kinect reveal the doubts. I will post about MSR organization in comparison to other industrial labs in one of the later posts.

Senin, 15 Agustus 2011

All the summer schools

This summer I attended two conferences (3DIMPVT, EMMCVPR) and two summer schools. I know my latency is somewhat annoying, but it's better to review them now then never. :) This post is about the summer schools, and the following is going to be about the conferences.




PhD Summer School in Cambridge


Both schools were organized by Microsoft Research. The first one, PhD Summer School was in Cambridge, UK. The lectures covered some general issues for computer science PhD students (like using cloud computing for research and career perspectives) as well as some recent technical results by Microsoft Research. From the computer vision side, there were several talks:
  • Antonio Criminisi described their InnerEye system for retrieval of similar body part scans, which is useful for diagnosis based on similar cases' medical history. He also featured the basics of Random Forests as an advertisement to his ICCV 2011 tutorial. The new thing was using peculiar weak classifiers (like 2nd order separation surfaces). Antonio argued they perform much better then trees in some cases.
  • Andrew Fitzgibbon gave a brilliant lecture about pose estimation for Kinect (MSR Cambridge is really proud of that algorithm [Shotton, 2011], this is the topic for another post).
  • Olga Barinova talked about the modern methods of image analysis and her work for the past 2 years (graphical models for non-maxima suppression for object detection and urban scene parsing).
The other great talks were about .NET Gadgeteer, the system for modelling and even deployment of electronic gadgets (yes, hardware!), and F#, Microsoft's alternative to Scala, the language that combines object-oriented paradigm with functional. Sir Tony Hoare also gave a lecture, so I had a chance to ask him how he ended up in Moscow State University in the 60s. It turns out he studied statistics, and Andrey Kolmogorov was one of the leaders of the field that time, so that internship was a great opportunity for him. He said he had liked the time in Moscow. :) There were also magnificent lectures by Simon Peyton-Jones about giving talks and writing papers. Those advices are the must for everyone who does research, you can find the slides here. Slides for some of the lectures are available from the school page.


The school talks did not take all the time. Every night was occupied by some social event (go-karting, punting etc.) as well as unofficial after-parties in Cambridge pubs. Definitely it is the most fun school/conference I've attended so far. Karting was especially great, with the quality track, pit-stops, stats and prizes, so special thanks to Microsoft for including it to the program!




Microsoft Computer Vision School in Moscow


This year, Microsoft Research summer school in Russia was devoted to computer vision and organized in cooperation with our lab. The school started before its official opening with a homework assignment we authored (I was one of four student volunteers). The task was to develop an image classification method capable to distinguish two indoor and two outdoor classes. The results were rated according to the performance on the hidden test set. Artem Konev won the challenge with 95.5% accuracy and was awarded a prize consisted of an xBox and Kinect. Two years ago we used those data for the projects on Introduction to Computer Vision course, where nobody reached even 90%. It reflects not just the lore of participants, but also the progress of computer vision: all the top methods used PHOW descriptors and linear SVM with approximate decomposed χ2 kernel [Vedaldi and Zisserman, 2010], which were unavailable that time!


In fact, Andrew Zisserman was one of the speakers. Andrew is the most cited computer vision researcher and the only person whose Zisserman number is zero. :) His course was on Visual Search and Recognition, including instance-level and category-level recognition. The ideas that were relatively new:
  • when computing visual words, sometimes it is fruitful to use soft assignments to clusters, or more advanced methods like Locality-constrained linear coding [Wang et al., 2010];
  • for instance-level recognition it is possible to use query expansion to overcome occlusions [Chum et al., 2007]: the idea is to use the best matched images from the base as new queries;
  • object detection is traditionally done with sliding window, the problems here are: various aspect ratio, partial occlusions, multiple responses and background clutter for substantially non-convex objects;
  • for object detection use bootstrapped sequential classification: on the next stage take the false negative detections from the previous stage as negative examples and retrain the classifier;
  • multiple kernel learning [Gehler and Nowozin, 2009] is a hot tool that is used to find the ideal linear combination of SVM kernels: combining different features is fruitful, but learning the combination is not much better than just averaging (Lampert: “Never use MKL without comparison to simple baselines!”);
  • movies are common datasets, since there are a lot of repeated objects/people/environments, and the privacy issues are easy to overcome. The movies like Groundhog Day and Run Lola Run are especially good since they contain repeated episodes. You can try to find the clocks on Video Google Demo.
Zisserman talked about PASCAL challenge a lot. During a break he mentioned that he annotated some images himself since “it is fun”. One problem with the challenge is we don't know if the progress over years really reflects the increased quality of methods, or is just because of growth of the training set (though, it is easy to check).


Andrew Fitzgibbon gave two more great lectures, one about Kinect (with slightly different motivation than in Cambridge) and another about continuous optimization. He talked a lot about reconciling theory and practice:
  • the life-cycle of a research project is: 1) chase the high-hanging fruit (theoretically-sound model), 2) try to make stuff really work, 3) look for the things that confuse/annoy you and fix them;
  • for Kinect pose estimation, the good top-down method based on tracking did not work, so they ended up classifying body parts discriminatively, temporal smoothing is used on the late stage;
  • “don't be obsessed with theoretical guarantees: they are either weak or trivial”;
  • on the simplest optimization method: “How many people have invented [coordinate] alternation at some point of their life?”. Indeed, the method is guaranteed to converge, but the problems arise when the valleys are not axis-aligned;
  • gradient descent is not a panacea: in some cases it does small steps too, conjugate gradient method is better (it uses 1st order derivatives only);
  • when possible, use second derivatives to determine step size, but estimating them is hard in general;
  • one almost never needs to take the matrix inverse; in MATLAB, to solve the system Hd = −g, use backslash: d = −H\g;
  • the Friday evening method is to try MATLAB fminsearch (implementing the derivative-free Nelder-Mead method).
Dr. Fitzgibbon asked the audience what the first rule of machine learning is. I hardly helped over replying “Never talk about machine learning”, but he expected the different answer: “Always try the nearest neighbour first!”


Christoph Lampert gave lectures on kernel methods, and structured learning, and kernel methods for structured learning. Some notes on the kernel methods talk:
  • (obvious) don't rely on the error on a train set, and (less obvious) don't even report about it in your papers;
  • for SVM kernels, in order to be legitimate, a kernel should be an inner product; it is often hard to prove it directly, but there are workarounds: a kernel can be drawn from a conditionally positive-definite matrix; sum, product and exponent of a kernel(s) is a kernel too etc. (thus, important for multiple-kernel learning, linear combination of kernels is a kernel);
  • since training (and running) non-linear SVMs is computationally hard, explicit feature maps are popular now: try to decompose the kernel back to conventional dot product of modified features; typically the features should be transformed to infinite sums, so take first few terms [Vedaldi and Zisserman, 2010];
  • if the kernel can be expressed as a sum over vector components (e.g. χ2 kernel $\sum_d x_d x'_d / (x_d + x'_d)$), it is easy to decompose; radial basis function (RBF) kernel ($\exp (\|x-x'\|^2 / 2\sigma^2)$) is the exponent of a sum, so it is hardly decomposable (more strict conditions are in the paper);
  • when using RBF kernel, you have another parameter σ to tune; the rule of thumb is to take σ² equal to the median distance between training vectors (thus, cross-validation becomes one-dimensional).
Christoph also told a motivating story why one should always use cross-validation (so just forget the previous point :). Sebastian Nowozin was working on his [ICCV 2007] paper on action classification. He used the method by Dollár et al. [2005] as a baseline. The paper reported 80.6% accuracy on the KTH dataset. He outperformed the method by a couple of per cents and then decided to reproduce Dollár's results. Imagine his wonder when simple cross-validation (with same features and kernels) yielded 85.2%! So, Sebastian had to improve his method to beat the baseline.


I feel I should stop writing about the talks now since the post grows enormously long. Another Lampert's lecture and Carsten Rother's course on CRFs were close to my topic, so they deserve separate posts (I already reviewed basics of structured learning and max-product optimization in this blog). Andreas Müller blogged about the recent Ivan Laptev's action recognition talk on CVML, which was pretty similar to ours. The slides are available for all MSCVS talks, and videos will be shared in September.


There were also several practical sessions, but I personally consider them not that useful, because one hardly ever can feel the essence of a method in 1.5 hours changing the code according to some verbose instruction. It is more of an art to design such tutorials, and no one can really master it. :) Even if the task is well-designed, one may not succeed performing it due to technical reasons: during Carsten Rother's tutorial, Tanya and me spent half an hour to spot the bug caused by confusing input and index variable names (MATLAB is still dynamically typed). Ondrej Chum once mentioned how his tutorial was doomed since half of the students did not know how to work with sparse matrices. So, practical sessions are hard.


There was also a poster session, but I cannot remember a lot of bright works, unfortunately. Nataliya Shapovalova who won the best poster award, presented quite interesting work on action recognition, which I liked as well (and it is not the last name bias! :) My congratulations to Natasha!


The planned social events were not so exhaustive as in Cambridge, but self-organization worked out. The most prominent example was our overnight walk around Moscow, in which a substantial part of school participants took part. It included catching last subway train, drinking whiskey and gin, a game of guessing hallucinating names of each other, and moving a car from the tram rail to let the tram go in the morning. :) I also met some of OpenCV developers from Nizhny Novgorod there.




MSCVS is a one-time event, unfortunately. There are at least three annual computer vision summer schools in Europe: ICVSS (the most mature one, I attended it last year), CVML (held in France by INRIA) and VSSS (includes sport sessions besides the lectures, held in Zürich). If you are a PhD student in vision (especially in the beginning of your program), it is worth attending one of them each year to keep up with current trends in the vision community, especially if you don't go to the major conferences. The sets of topics (and even speakers!) have usually large intersection, so pick one of them. ICVSS has arguably the most competitive participant selection, but the application deadline and acceptance notification are in March, so one can apply to the other schools if rejected.

Rabu, 30 Maret 2011

[CFP] GraphiCon-2011 and MSCVSS

Our lab traditionally organizes GraphiCon, the major (and may be the only) conference in Russia specialized in computer graphics and computer vision. The conference is not very selective, still it has a decent community of professionals behind it. So, if you want to get a quality feedback on your preliminary work, or always wanted to visit Russia, consider submitting a paper by May, 16.

Special offer for young readers of my blog: If it is the place where you learned about the conference and decided to submit a paper, I'll buy you a beer during the conference. :) Consider it another social micro-event.

Another piece of news is primarily for undergrads and PhD students from Russia. Our lab and Microsoft Research organize another event this summer: Computer vision summer school. There are such lecturers as Cristoph Lampert and Andrew Zisserman, among others. Participation is free of charge, and accommodation is provided. Deadline is on April, 30. You should not miss it!

Kamis, 16 September 2010

LidarK 2.0 released

The second major release of GML LidarK is now available. It reflects our 3-year experience on 3D data processing. The description from the project page:

The LidarK library provides an open-source framework for processing multidimensional point data such as 3D LIDAR scans. It allows building a spatial index for performing fast search queries of different kinds. Although it is intended to be used for LIDAR scans, it can be helpful for a wide range of problems that require spatial data processing.

The API has been enriched with various features in this release. Indeed, it became more consistent and logical. New ways to access data (i.e. various iterators) are implemented. One can now find k nearest neighbours for any point, not just for one that belongs to the index. Since the data structure is a container, we've decided to parametrize it with template parameter. This decision is controversive: one does not need to cast tuples any more, but the code became clumsier and less portable, unfortunately.

The C++/MATLAB code is licensed free of charge for academic use. You can download it here.

Sabtu, 11 September 2010

ECCV 2010 highlights

Today is the last day of ECCV 2010, so the best papers are already announced. Although I was not there, a lot of papers are available via cvpapers, so I eventually run into some of them.

Best papers

The full list of the conference awards is available here. The best paper is therefore "Graph Cut based Inference with Co-occurrence Statistics" by Lubor Ladicky, Chris Russell, Pushmeet Kohli, and Philip Torr. The second best is "Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics" by Abhinav Gupta, Alyosha Efros, and Martial Hebert. Two thoughts before starting reviewing them. First, the papers come from the two institutions which are (arguably) considered now the leading ones in the vision community: Microsoft Research and Carnegie-Mellon Robotics Institute. Second, both papers are about semantic segmentation (although the latter couples it with implicit geometry reconstruction); Vidit Jain already noted the acceptance bias in favour of the recognition papers.

Okay, the papers now. Ladicky et al. addressed the problem of global terms in the energy minimization for semantic segmentation. Specifically, their global term deals only with occurrences of object classes and invariant to how many connected components (i.e. objects) or individual pixels represent the class. Therefore, one cow on an image gives the same contribution to the global term as two cows, as well as one accidental pixel of a cow. The global term penalizes big quantity of different categories in the single image (the MDL prior), which is helpful when we are given a large set of possible class labels, and also penalizes the co-occurrence of the classes that are unlikely to come along with each other, like cows and sheep. Statistics is collected from the train set and defines if the co-occurrence of the certain pair of classes should be encouraged or penalized. Although the idea of incorporating co-occurrence to the energy function is not new [Torralba et al, 2003; Rabinovich et al., 2007], the authors claim that their method is the first one which simultaneously satisfy the four conditions: global energy minimization (implicit global term rather than a multi-stage heuristic process), invariance to the structure of classes (see above), efficiency (not to make the model order of magnitude larger) and parsimony (MDL prior, see above).

How do the authors minimize the energy? They restrict the global term to the function of the set of classes represented in the image, which is monotonic w.r.t. argument set enclosing (more classes, more penalty). Then the authors introduce some fake nodes for αβ-swap or α-expansion procedure, so that the optimized energy remains submodular. It is really similar to how they applied graph-cut based techniques for minimizing energy with higher-order cliques [Kohli, Ladicky and Torr, 2009]. So when you face some non-local terms in the energy, you can try something similar.

What are the shortcomings of the method? It would be great to penalize objects in addition to classes. First, local interactions are taken into account, as well as global ones. But what about the medium level? Shape of an object, size, colour consistency etc. are the great cues. Second, on the global level only inter-class co-occurrences play a role, but what about the intra-class ones? It is impossible to have two suns in a photo, but it is likely to meet several pedestrians walking along the street. It is actually done by Desai et al. [2009] for object detection.

The second paper is by Gupta et al., who have remembered the romantic period of computer vision, when the scenes composed of perfect geometrical shapes were reconstructed successfully. They address the problem of 3D reconstruction by a single image, like in auto pop-up [Hoiem, Efros and Hebert, 2005]. They compare the result of auto pop-up with Potemkin villages: "there is nothing behind the pretty façade." (I believe this comparison is the contribution of the second author). Instead of surfaces, they fit boxes into the image, which allows them to put a wider range of constraints to the 3D structure, including:
  • static equilibrium: it seems that the only property they check here is that centroid is projected into the figure bearing;
  • enough support force: they estimate density (light -- vegetation, medium -- human, heavy -- buildings) and say that it is unlikely that building is build on the tree;
  • volume constraint: boxes cannot intersect;
  • depth ordering: backprojecting the result to the image plane should correspond to what we see on the image.
This is a great paper that exploits Newtonian mechanics as well as human intuition, however, there are still some heuristics (like the density of a human) which could probably be generalized out. It seems that this approach has a big potential, so it might became the seminal paper for the new direction. Composing recognition with geometry reconstruction is quite trendy now, and this method is ideologically simple but effective. There are a lot of examples how the algorithm works on the project page.

Funny papers

There are a couple of ECCV papers which have fancy titles. The first one is "Being John Malkovich" by Ira Kemelmacher-Shlizerman, Aditya Sankar, Eli Shechtman, and Steve Seitz from the University of Washington GRAIL. If you've seen the movie, you can guess what is the article about. Given the video of someone pulling faces, the algorithm transforms it to the video of John Malkovich making similar faces. "Ever wanted to be someone else? Now you can." In contrast to the movie, in the paper not necessarily John Malkovich plays himself: it could be George Bush, Cameron Diaz, John Clooney and even any person for whom you can find a sufficient video or photo database! You can see the video of the real-time puppetry on the project page, although obvious lags take place and the result is still far from being perfect.

Another fancy title is "Building Rome on a Cloudless Day". There are 11 (eleven) authors contributing to the paper, including Marc Pollefeys. This summer I spent one cloudless day in Rome, and, to be honest, it was not that pleasant. So, why is the paper called this way then? The paper refers to another one: "Building Rome in a Day" from ICCV 2009 by the guys from Washington again, which itself refers to the proverb "Rome was not built in a day." In this paper authors build a dense 3D model of some Rome sights using a set of Flickr photos tagged "Rome" or "Roma". Returning back to the monument of collective intelligence from ECCV2010, they did the same, but without cloud computing, that's why the day is cloudless now. S.P.Q.R.

I cannot avoid to mention here the following papers, although they are not from ECCV. Probably the most popular CVPR 2010 paper is "Food Recognition Using Statistics of Pairwise Local Features" by Shulin Yang, Mei Chen, Dean Pomerleau, Rahul Sukthankar. The first page of the paper contains the motivation picture with a hamburger, and it looks pretty funny. They insist that the stuff from McDonald's is very different from that from Burger King, and it is really important to recognize them to keep track of the calories. Well, the authors don't look overweight, so the method should work.

The last paper in this section is "Paper Gestalt" by the imaginary Carven von Bearnensquash, published in Secret Proceedings of Computer Vision and Pattern Recognition (CVPR), 2010. The authors (presumably from UCSD) make fun of the way we usually write computer vision papers assuming that some features might convince a reviewer to accept or reject the paper, like mathematical formulas that create an illusion of author qualification (although if they are irrelevant), ROC curves etc. It also derides the attempts to apply black-box machine-learning techniques without the appropriate analysis of the possible features. Now I am trying to subscribe to the Journal of Machine Learning Gossip.

Colleagues

There was only one paper from our lab at the conference: "Geometric Image Parsing in Man-Made Environments" by Olga Barinova and Elena Tretiak (in co-authorship with Victor Lempitsky and Pushmeet Kohli). The scheme similar to the image parsing framework [Tu et al., 2005] is utilized, i.e. top-down analysis is performed. They detect parallel lines (like edges of buildings and windows), their vanishing points and the zenith jointly, using a witty graphical model. The approach is claimed to be robust to the clutter in the edge map.

Indeed, this paper could not have been possible without me. =) It was me who convinced Lena to join the lab two years ago (actually, it was more like convincing her not to apply for the other lab). So, the lab will remember me at least as a decent selectioner/scout...

Rabu, 24 Februari 2010

Olga's CVPR paper

I'd like to congratulate my colleague Olga Barinova on the paper, which is accepted to CVPR international conference! To my knowledge, it is the first paper from our lab, which is accepted to a Rank 1 conference (though it was written during Olga's internship at Microsoft Research).

The paper is on multiple object retrieval. They extend the Hough transform with some graphical model, which makes the results more robust. As soon as the paper become publicly available, I'll add the link here.

UPD. (Apr 20, 2010) PDF

UPD (Jul 30, 2010) Video

Sabtu, 23 Januari 2010

On image labelling

Labelling data is a labourous side task that arises in most computer vision projects. Since the developers usually don't want to spend their time for such a dumb work, there exist a number of workarounds. Let me enumerate some I've heard of:
  1. At Academia, the task of labelling is usually being endured on [PhD] students' broad shoulders. The funny part is the students are not always enrolled in the relevant project. At Graphics & Media Lab, students who have not attended enough seminars by the time of revision, should label some data sets for the lab projects.
  2. One could also hire some people to label her data. Since the developers/researchers are relatively high-paid, it is economic to hire other folks (sometimes, they are students as well). UPDATE: hr0nix mentioned in the comment that there exists the Mechanical Turk service that helps requesters to find contractors.
  3. The more witty way is to use applied psychology. For example, Google transformed the labelling process to the game. During the gameplay, you and your randomly chosen partner tag images. Sooner you tag an image with the same tag, more points you get. The brilliant idea! Believe or not, when I first saw it, I was carried away and could not stop playing until my friends dragged me out for a pizza!
  4. The most revolutionary approach was introduced by Densey Tan. Here is a popular explanation of what he has done. The idea is to capture labels straight from one's brain using EEG/fMRI/whatnot. Now they can perform only 2 or 3 class labelling, but (I hope) it is only the beginning.
The last point reminds me my old thoughts about the future of machine learning (or at least ML applied to Vision). Nowadays we deal with ensembles of weak classifiers, such as decision trees, stamps etc. One can use guinea pigs as weak classifiers! I suppose their brain is developed enough to understand 3d structure of the scene in the way human brain does, while modern computer vision systems lack for this ability. The animals are to be learned, for example, by experiencing an electric shock in case of wrong answers. Now, it is not obligatory to train "experts", it is sufficient to analyse their brain activity. Isn't it a breakthrough? :)

Sabtu, 26 Desember 2009

Science vs. Industry

Yesterday I was called by an HR officer from NetCracker, my former employer. She told me they "have a lot of new activities", so now they "fulfil repeated recruiting" and suggested to return to NetCracker. It is really funny, because 1.5 years ago they told me that according to the corporate policy it was impossible to employ somebody who had left the company once. I politely rejected them, but the talk refreshed my memories about previous hesitations.

When I switched the company to Graphics & Media Lab, it was a hard decision. That time I firmly decided to prefer scientific career over industrial. I considered a lot of pros and cons to make the decision. Although they are often personal, I'd like to share my observations with you.

Advantages of research work:
  • When you work as a researcher, you invent something new instead of just coding different stuff. This job is creative, not just constructive.
  • When you work as a programmer, you result is just a code. You don't publish papers, no one knows about your results. Moreover, they are often proprietorial.
  • If you work as an office worker or a manager, you could be paid a lot of money, but you are supposed to work 24x7. It really sucks because you do really annoying paperwork or participate in meetings, but do not produce anything substantial. If you have a lot of money, you have no time or desire to spend them to anything interesting. That's the reason why my roommate Andrey Korolev left Shell.
  • When you work as a scientist, you should not learn all the boring technologies. You can't be a prominent Java programmer without knowing heaps of jXxx libraries, a number of frameworks and without having some useless certificates. Besides, your knowledge could be narrow, they are restricted by your employer's needs.
  • Scientists are mobile: they move from one university to another. Some universities do not prolong contracts with professors, even if they are very cool. For me, it seems boring to live all the life in one place. (Though you could be sent to business trips while you work at some big corporation)
  • If you work at university, you might have a teaching experience of some kind, which is useful for you, if you use it properly.
Advantages of industrial work:
  • The main advantage is a good salary. :) If you are not a complete geek, it does matter. Science is usually funded by grants, which is unstable.
  • Since your employer needs profit, you do something practical. You can be sure that you don't investigate some abstract stuff that will never be applied.
Trust or not, our lab combines advantages of both directions. We usually work on projects ordered by customers who pay for it. Of course, we select only those projects, in which we can get some scientific results. Actually, it is done automatically, because it is not profitable for a customer to pay us for coding. That's how applied science should work, I think.
 

ZOOM UNIK::UNIK DAN UNIK Copyright © 2012 Fast Loading -- Powered by Blogger