Machine Learning Tut
Machine Learning Tut
A Tutorial
Ajay Joshi, Anoop Cherian and Ravishankar Shivalingam
Dept. of Computer Science, UMN
Outline
Introduction Supervised Learning Unsupervised Learning Semi-Supervised Learning
Constrained Clustering Distance Metric Learning Manifold Methods in Vision Sparsity based Learning Active Learning
Machine Learning tries to use statistical reasoning to find approximate solutions for tackling the above difficulties.
Unsupervised Learning
K-Means/Dirichlet/Gaussian Processes
Semi-Supervised Learning
The latest trend in ML and the focus of this tutorial.
Supervised Learning
Uses training data with labels to learn a model of the data Later uses the learned model to predict test data. Traditional Supervised learning techniques:
Generative Methods
o Nave Bayes Classifier o Artificial Neural Networks o Principal Component Analysis followed by Classification, etc.
Discriminative methods
o Support Vector Machines o Linear Discriminant Analysis, etc.
Use EM on the training data to find P(w|z) and P(z|d). Train a discriminative classifier (SVM) on P(z|d) and classify test images.
Number of topic categories might not be available (as in the case of scene classification mentioned earlier) or might increase with more data.
Unsupervised Learning
Learner is provided only unlabeled data. No feedback is provided from the environment. Aim of the learner is to find patterns in data which is otherwise observed as unstructured noise. Commonly used UL techniques:
Dimensionality reduction (PCA, pLSA, ICA, etc). Clustering (K-Means, Mixture models, etc.).
(Video)
Large
Large
Small
Small
Large
Small
What is SSL?
As the name suggests, it is in between Supervised and Unsupervised learning techniques w.r.t the amount of labelled and unlabelled data required for training. With the goal of reducing the amount of supervision required compared to supervised learning. At the same time improving the results of unsupervised clustering to the expectations of the user.
Manifold assumption:
Objective function lies in a low dimensional manifold in the ambient space.
o Helps against the curse of dimensionality.
Constrained Clustering
When we have any of the following:
Class labels for a subset of the data. Domain knowledge about the clusters. Information about the similarity between objects. User preferences.
Can be clustered by searching for partitionings that respect the constraints. Recently the trend is toward similarity-based approaches.
Partitioning A
Partitioning B
Constrained Clustering
D ( x, y )
Transformed Space
Generating constraints
Active feedback from user querying only the most informative instances. Spatial and temporal constraints from video sequences. For content-based image retrieval (CBIR), derived from annotations provided by users.
Curse of Dimensionality
In many applications, we simply vectorize an image or image patch by a raster-scan. 256 x 256 image converts to a 65,536-dimensional vector. Images, therefore, are typically very high-dimensional data Volume, and hence the number of points required to uniformly sample a space increases exponentially with dimension. Affects the convergence of any learning algorithm. In some applications, we know that there are only a few variables, for e.g., face pose and illumination. Data lie on some low-dimensional subspace/manifold in the high-dimensional space.
The curse of dimensionality can be mitigated under the manifold assumption. Linear dimensionality reduction techniques like PCA have been widely used in the vision community. Recent trend is towards non-linear techniques that recover the intrinsic parameterization (pose & illumination).
LLE Embedding
Picture Courtesy: Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds (2003) by L.K. Saul & S.T. Roweis
ISOMAP Embedding
Picture Courtesy: A Global Geometric Framework for Nonlinear Dimensionality Reduction by J.B.Tenenbaum, V. de Silva, J. C. Langford in SCIENCE Magazine 2000
LTSA Embedding
Picture Courtesy: Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment (2002), Z. Zhang & H.
Here vectors ys are vectorized patches of images, b is a matrix constituting the basis vectors of the dictionary and vector a represents the weights of each basis in the dictionary.
Model the labeled images using this dictionary to obtain sparse weights a. Train a classifier/regressor on the a. Project the test data onto same dictionary and classification/regression using the learned model.
Some sample images of the floor in the lab setting at different heights taken from the base camera of a helicopter.
350 basis vectors are built using L1 minimization to make the dictionary.
Original image
Reconstructed 3D image
Picture Courtesy: Sparse Representation For Computer Vision and Pattern Recognition (Wright et al, 2009)
Active Learning
A motivating example: Given an image or a part of it, classify it into a certain category! Challenges to be tackled:
Large variations in images What is important in a given image? Humans are often the judge: very subjective!
A lot of training is generally required for accurate classification. Varied scene conditions like lighting, weather, etc needs further training.
Active Learning
Basic idea:
Traditional supervised learning algorithms passively accept training data. Instead, query for annotations on informative images from the unlabeled data. Theoretical results show that large reductions in training sizes can be obtained with active learning!
But how to find images that are the most informative ?
Success stories
Picture Courtesy: Machine Learning Techniques for Computer Vision (ECCV 2004), C. M. Bishop
Face Detection
Viola and Jones (2001)
Picture Courtesy: Machine Learning Techniques for Computer Vision (ECCV 2004), C. M. Bishop
Face Detection
Weights Increased Weak Classifier 2
Weak Classifier 1
Picture Courtesy: Machine Learning Techniques for Computer Vision (ECCV 2004), C. M. Bishop
Face Detection
Picture Courtesy: Machine Learning Techniques for Computer Vision (ECCV 2004), C. M. Bishop
Face Detection
Picture Courtesy: Machine Learning Techniques for Computer Vision (ECCV 2004), C. M. Bishop
Face Detection
Picture Courtesy: Machine Learning Techniques for Computer Vision (ECCV 2004), C. M. Bishop
Face Detection
Picture Courtesy: Machine Learning Techniques for Computer Vision (ECCV 2004), C. M. Bishop
Face Detection
Picture Courtesy: Machine Learning Techniques for Computer Vision (ECCV 2004), C. M. Bishop
AdaBoost in Vision
Other Uses of AdaBoost
Human/Pedestrian Detection & Tracking Face Expression Recognition Iris Recognition Action/Gait Recognition Vehicle Detection License Plate Detection & Recognition Traffic Sign Detection & Recognition
References
[1] A. Singh, R. Nowak, and X. Zhu. Unlabeled data: Now it helps, now it doesn't. In Advances in Neural Information Processing Systems (NIPS) 22, 2008. [2] X. Zhu. Semi-supervised learning literature survey. Technical Report 1530, Department of Computer Sciences, University of Wisconsin, Madison, 2005. [3] Z. Ghahramani, Unsupervised Learning, Advanced Lectures on Machine Learning LNAI 3176, Springer-Verlag. [4] S. Kotsiantis, Supervised Machine Learning: A Review of Classification Techniques, Informatica Journal 31 (2007) 249-268 [5] R. Raina, A. Battle, H. Lee, B. Packer, A. Ng, Self Taught Learning: Transfer learning from unlabeled data, ICML, 2007. [6] A. Goldberg, Xi. Zhu, A. Singh, Z. Xu, and R. Nowak. Multi-manifold semi-supervised learning. In Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), 2009. [7] S. Basu, I. Davidson and K. Wagstaff, Constrained Clustering: Advances in Algorithms, Theory, and Applications, CRC Press, (2008). [8] B. Settles, Active Learning Literature Survey, Computer Sciences Technical report 1648, University of Wisconsin-Madison, 2009.
Thank you!