Practical # 9
Practical # 9
Vision processing
Language processing
Forecasting things like stock market trends, weather
Pattern recognition
Games
Data mining
Expert systems
Robotics
Concept of learning
Learning is the process of converting experience into expertise or knowledge.
Learning can be broadly classified into three categories, as mentioned below, based on the
nature of the learning data and interaction between the learner and the environment.
Supervised Learning
Unsupervised Learning
Semi-supervised Learning
Supervised Learning
Supervised learning is commonly used in real world applications, such as face and speech
recognition, products or movie recommendations, and sales forecasting.
Supervised learning can be further classified into two types - Regression and Classification.
o Regression trains on and predicts a continuous-valued response, for example
predicting real estate prices.
o Classification attempts to find the appropriate class label, such as analyzing
positive/negative sentiment, male and female persons, benign and malignant
tumors, secure and unsecure loans etc.
In supervised learning, learning data comes with description, labels, targets or desired outputs
and the objective is to find a general rule that maps inputs to outputs. This kind of learning
data is called labeled data. The learned rule is then used to label new data with unknown
outputs.
Example
o Supervised learning involves building a machine learning model that is based
on labeled samples. For example, if we build a system to estimate the price of a
plot of land or a house based on various features, such as size, location, and so on,
we first need to create a database and label it. We need to teach the algorithm
what features correspond to what prices. Based on this data, the algorithm will
learn how to calculate the price of real estate using the values of the input
features.
Unsupervised Learning
Unsupervised learning is used to detect anomalies, outliers, such as fraud or defective
equipment, or to group customers with similar behaviors for a sales campaign. It is the
opposite of supervised learning. There is no labeled data here.
When learning data contains only some indications without any description or labels, it is up
to the coder or to the algorithm to find the structure of the underlying data, to discover hidden
patterns, or to determine how to describe the data. This kind of learning data is
called unlabeled data.
Suppose that we have a number of data points, and we want to classify them into several
groups. We may not exactly know what the criteria of classification would be. So, an
unsupervised learning algorithm tries to classify the given dataset into a certain number of
groups in an optimum way.
Unsupervised learning algorithms are extremely powerful tools for analyzing data and for
identifying patterns and trends. They are most commonly used for clustering similar input
into logical groups. Unsupervised learning algorithms include Kmeans, Random Forests,
Hierarchical clustering and so on.
Semi-supervised Learning
If some learning samples are labeled, but some other are not labeled, then it is semi-
supervised learning. It makes use of a large amount of unlabeled data for training and a
small amount of labeled data for testing.
Semi-supervised learning is applied in cases where it is expensive to acquire a fully labeled
dataset while more practical to label a small subset. For example, it often requires skilled
experts to label certain remote sensing images, and lots of field experiments to locate oil at a
particular location, while acquiring unlabeled data is relatively easy.
Reinforcement Learning
Here learning data gives feedback so that the system adjusts to dynamic conditions in order
to achieve a certain objective. The system evaluates its performance based on the feedback
responses and reacts accordingly. The best known instances include self-driving cars and
chess master algorithm AlphaGo.
SCiKIT
o Simple and efficient tools for data mining and data analysis
o Accessible to everybody, and reusable in various contexts
o Built on NumPy, SciPy, and matplotlib
o Open source, commercially usable - BSD license
o Scikit-learn is a machine learning library for Python. It features several
regression, classification and clustering algorithms including SVMs, gradient
boosting, k-means, random forests and DBSCAN.
Scikit features
o Classification
Identifying to which category an object belongs to.
Applications: Spam detection, Image recognition.
Algorithms: SVM, nearest neighbors, Naïve Bayes
o Regression
Predicting a continuous-valued attribute associated with an object.
Applications: Drug response, Stock prices.
Algorithms: SVR, ridge regression, Lasso
o Clustering
Automatic grouping of similar objects into sets.
Applications: Customer segmentation, Grouping experiment outcomes
Algorithms: k-Means, spectral clustering, mean-shift
o Dimensionality reduction
Reducing the number of random variables to consider.
Applications: Visualization, Increased efficiency
Algorithms: PCA, feature selection, non-negative matrix factorization.
o Model selection
Comparing, validating and choosing parameters and models.
Goal: Improved accuracy via parameter tuning
Modules: grid search, cross validation, metrics.
o Preprocessing
o Feature extraction and normalization.
o Application: Transforming input data such as text for use with machine
learning algorithms.
o Modules: preprocessing, feature extraction.