This web-page is obsolete. Code for computing features and classifying the Caltech101 dataset is available here.
Anna Bosch and
Andrew Zisserman
Warning: the Kernel matrices that were previously available from
this site were found to contain errors that positively benefited
classification performance.
We are grateful to Nicolas Pinto and Peter Gehler for alerting us to these errors.
Overview
The objective of this work is classifying images by the object categories they contain. To this end we combine shape and appearance representations over a region of interest to learn the object model.
Challenges
We need to be able to recognize objects in images despite within class variations (see examples below) and imaging variations such as: scale, viewpoint, lighting and background. Additionally, we need to learn which are the best descriptors to classify each specific object category.
Method
For datasets where the objects can appear at different image positions and that have significant background clutter, treating image classification by extracting features over the whole image is not sufficient. Instead it is necessary to 'home in' on the object instance in order to learn its visual description. We propose a method of automatically learning a rectangular ROI in each of the training images. The intuition is that between a subset of the training images for a particular class there will be regions with high visual similarity (the object instances). It is a subset due to the variability in the training images, one instance may only be similar to a few others, not to all the other training images. These 'corresponding' regions can be identified from the clutter by measuring their similarity using the image representation defined over a ROI rather than over the entire image.
Many descriptors have been proposed in the literature and they achieve different classification performances depending on each specific task. For example, shape descriptors provide good information if we are trying to distinguish between cars and ariplanes, while appearance descriptors are better if we want to distinguish between horses and zebras, thus no single descriptor can be optimal for all tasks. We investigate the problem of learning optimal descriptors for a given classification task. The problem of selecting the best object descriptors is posed in the kernel learning framework. The kernel that we propose is based on the weighted sum of base kernels corresponding to base features where the optimal feature weights for each class are learnt separately to minimize the classification error rate on a validation set.
Related categories are the most confused categories by our system and sometimes also by the humans. We extend the system to automatically learnt a category hierarchy. The highest levels are able to classify amongst the easiest separable categories, grouping the related ones as the same category. For example the ketch and schooner category will be both classified as a sail boat. Then at lower level we learn the specific feature weights to distinguish amongst these more related categories.
Object ambiguities for Caltech 256.
Ambiguous annotations for Caltech 256.
Resources
Downloads from this site:
- PHOG code (page)
- PHOW code (zip file)
- code to compute the pyramid kernels (zip file)
- code for self-similarity descriptor
Datasets
Download the Caltech 101 and Caltech 256 datasets.
Relevant Publications
Bosch, A. , Zisserman, A. and Munoz, X.
Representing shape with a spatial pyramid kernel
Proceedings of the International Conference on Image and Video Retrieval (2007)
Bibtex source
|
Abstract
|
Document: ps.gz PDF
Bosch, A. , Zisserman, A. and Munoz, X.
Image Classification using Random Forests and Ferns
Proceedings of the 11th International Conference on Computer Vision, Rio de Janeiro, Brazil (2007)
Bibtex source
|
Abstract
|
Document: ps.gz PDF
Acknowledgements
This work is funded by the EU project CLASS.