Module 1-Basics of ML
Module 1-Basics of ML
Machine Learning
Data
Output Computer Program
4
Magic?
No, more like gardening
• Seeds = Algorithms
• Nutrients = Data
• Gardener = You
• Plants = Programs
When Do We Use Machine
Learning?
ML is used when:
• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)
6
Some more examples of tasks that are
best solved by using a learning
algorithm
• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
– Medical images
• Generating patterns:
– Generating images or motion sequences
• Recognizing anomalies:
– Unusual credit card transactions
– Unusual patterns of sensor readings in a nuclear power
plant
• Prediction:
– Future stock prices or currency exchange rates 7
Sample Applications
• Web search
• Computational biology
• Finance
• E-commerce
• Space exploration
• Robotics
• Information extraction
• Social networks
• Debugging software
• [Your favorite area]
8
ML in a Nutshell
• Tens of thousands of machine learning
algorithms
• Hundreds new every year
• Every machine learning algorithm has three
components:
– Representation
– Evaluation
– Optimization
Representation
• -Decision trees
• -Sets of rules / Logic programs
• -Instances
• -Graphical models
(Bayes/Markov nets)
• -Neural networks
• -Support vector machines
• -Model ensembles Etc.
Various Function
Representations
• Numerical functions
– Linear regression
– Neural networks
– Support vector machines
• Symbolic functions
– Decision trees
– Rules in propositional logic
– Rules in first-order predicate logic
• Instance-based functions
– Nearest-neighbor
– Case-based
• Probabilistic Graphical Models
– Naïve Bayes
– Bayesian networks
– Hidden-Markov Models (HMMs)
Evaluation
• -Accuracy
• -Precision and recall
• -Squared error
• -Likelihood
• -Posterior probability
• -Cost / Utility
• -Margin
• -Entropy
• -K-L divergence etc.
Optimization
• Combinatorial optimization
–E.g. Greedy search
• Convex optimization
–E.g. Gradient descent
• Constrained optimization
–E.g. Linear programming
Various Search/Optimization
Algorithms
• Gradient descent
– Perceptron
– Backpropagation
• Dynamic Programming
– HMM Learning
– PCFG Learning
• Divide and Conquer
– Decision tree induction
– Rule learning
• Evolutionary Computation
– Genetic Algorithms (GAs)
– Genetic Programming (GP)
– Neuro-evolution
“Machine Learning: Field of study that gives
computers the ability to learn without being
explicitly programmed.” -Arthur Samuel (1959)
Samuel’s Checkers-Player
9
Defining the Learning Task
Improve on task T, with respect to performance metric P, based on
experience E
T: Playing checkers
P: Percentage of games won against an arbitrary opponent
E: Playing practice games against itself
11
Autonomous Cars
13
Autonomous Car Technology
Path
Planning
Sebastian
Stanley
15
Deep Belief Net on Face
Images
object models
object parts
(combination
of edges)
edges
pixels
16
Learning of Object
Parts
Training on Multiple
Objects
• Trained on 4 classes
(cars, faces,
motorbikes,
airplanes).
• Second layer:
Shared-features
and object-specific
features.
Scene Labeling via Deep
Learning
Input images
Samples from
feedforward
Inference
(control)
Samples from
Full posterior
inference
20
Machine Learning in
Automatic Speech
Recognition
A Typical Speech Recognition
System
22
Types of Learning
23
Types of Learning
Based on the methods and way of learning,
machine learning is divided into mainly four
types
. Supervised (inductive) learning
– Given: training data + desired outputs (labels)
• Unsupervised learning
– Given: training data (without desired outputs)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions 24
SUPERVISED LEARNING
• we train the machines using the "labelled" dataset,
and based on the training, the machine predicts the
output. Here, the labelled data specifies that some
of the inputs are already mapped to the output.
• first, we train the machine with the input and
corresponding output, and then we ask the machine
to predict the output using the test dataset.
• E.g. Suppose we have an input dataset of cats and
dog images. So, first, we will provide the training to
the machine to understand the images, such as
the shape & size of the tail of cat and dog, Shape of
eyes, colour, height (dogs are taller, cats are
smaller), etc
• After completion of training, we input the
picture of a cat and ask the machine to identify
the object and predict the output. Now, the
machine is well trained, so it will check all the
features of the object, such as height, shape,
colour, eyes, ears, tail, etc., and find that it's a
cat. So, it will put it in the Cat category
• The main goal of the supervised learning
technique is to map the input variable(x) with
the output variable(y).Some real-world
applications of supervised learning are Risk
Assessment, Fraud Detection, Spam
filtering, etc.
Categories of Supervised Machine Learning
Supervised machine learning can be
classified into two types of problems, which
are given below:
• Classification
• Regression
Classification
• Classification algorithms are used to
solve the classification problems in which
the output variable is categorical, such
as "Yes" or “No”, “Male” or “Female”,
“Red” or “Blue”, etc. The classification
algorithms predict the categories present
in the dataset. Some real-world
examples of classification algorithms
are Spam Detection, Email filtering, etc.
Some popular classification algorithms are
given below:
• Random Forest Algorithm
• Decision Tree Algorithm
• Logistic Regression Algorithm
• Support Vector Machine Algorithm
Classification
Multi class classifier
41
Some popular Regression algorithms are
given below:
• Simple Linear Regression Algorithm
• Multivariate Regression Algorithm
• Decision Tree Algorithm
• Lasso Regression
Advantages:
• Since supervised learning work with the
labelled dataset so we can have an exact
idea about the classes of objects.
• These algorithms are helpful in predicting
the output on the basis of prior experience.
Disadvantages:
• These algorithms are not able to solve
complex tasks.
• It may predict the wrong output if the test
data is different from the training data.
• It requires lots of computational time to
train the algorithm.
Applications of Supervised Learning
Image Segmentation:
• Supervised Learning algorithms are used in
image segmentation. In this process, image
classification is performed on different image
data with pre-defined labels.
Medical Diagnosis:
• Supervised algorithms are also used in the
medical field for diagnosis purposes. It is done
by using medical images and past labelled data
with labels for disease conditions. With such a
process, the machine can identify a disease for
the new patients.
• Fraud Detection - Supervised Learning classification
algorithms are used for identifying fraud
transactions, fraud customers, etc. It is done by
using historic data to identify the patterns that can
lead to possible fraud.
• Spam detection - In spam detection & filtering,
classification algorithms are used. These algorithms
classify an email as spam or not spam. The spam
emails are sent to the spam folder.
• Speech Recognition - Supervised learning algorithms
are also used in speech recognition. The algorithm is
trained with voice data, and various identifications
can be done using the same, such as voice-activated
passwords, voice commands, etc.
Unsupervised Machine Learning
• there is no need for supervision.
• the machine is trained using the
unlabeled dataset, and the machine
predicts the output without any
supervision.
• the models are trained with the data that
is neither classified nor labelled, and the
model acts on that data without any
supervision.
• The main aim of the unsupervised learning
algorithm is to group or categories the unsorted
dataset according to the similarities, patterns, and
differences. Machines are instructed to find the
hidden patterns from the input dataset.
• E.g.Suppose there is a basket of fruit images, and
we input it into the machine learning model. The
images are totally unknown to the model, and the
task of the machine is to find the patterns and
categories of the objects.
• So, now the machine will discover its patterns an
differences, such as colour difference, shape
difference, and predict the output when it is teste
with the test dataset
Unlabeled input data is fed to the machine
learning model in order to train it. Firstly, it will
interpret the raw data to find the hidden patterns
from the data and then will apply suitable
algorithms such as
k-means clustering, Decisiontree, etc.
Types of Unsupervised
Learning Algorithm:
1) Clustering
• The clustering technique is used when we
want to find the inherent groups from the
data. It is a way to group the objects into a
cluster such that the objects with the most
similarities remain in one group and have
fewer or no similarities with the objects of
other groups. An example of the clustering
algorithm is grouping the customers by
their purchasing behaviour.
Clustering is the task of dividing the population or
data points into a number of groups such that data
points in the same groups are more similar to
other data points in the same group and
dissimilar to the data points in other groups. It is
basically a collection of objects on the basis of
similarity and dissimilarity between them.
Some of the popular clustering algorithms
are given below:
• K-Means Clustering algorithm
• Mean-shift algorithm
• DBSCAN Algorithm
• Principal Component Analysis
• Independent Component Analysis
2) Association
• finds interesting relations among variables
within a large dataset.
• find the dependency of one data item on
another data item and map those variables
accordingly so that it can generate
maximum profit. This algorithm is mainly
applied in Market Basket analysis, Web
usage mining, continuous production, etc.
• Some popular algorithms of Association
rule learning are Apriori Algorithm, Eclat,
FP-growth algorithm.
:
The most commonly used supervised learning algorithms are: The most commonly used unsupervised learning algorithms are:
Video Games:
• RL algorithms are much popular in gaming
applications. It is used to gain super-human
performance. Some popular games that use RL
algorithms are AlphaGO and AlphaGO Zero.
Resource Management:
• automatically learn and schedule resources to
wait for different jobs in order to minimize
average job slowdown.
Robotics:
• RL is widely being used in Robotics
applications. Robots are used in the industrial
and manufacturing area, and these robots are
made more powerful with reinforcement
learning. There are different industries that
have their vision of building intelligent robots
using AI and Machine learning technology.
Text Mining
• Text-mining, one of the great applications of
NLP, is now being implemented with the help
of Reinforcement Learning by Salesforce
company.
Advantages
• It helps in solving complex real-world
problems which are difficult to be solved by
general techniques.
• The learning model of RL is similar to the
learning of human beings; hence most
accurate results can be found.
• Helps in achieving long term results.
Disadvantages
• RL algorithms are not preferred for simple
problems.
• RL algorithms require huge data and
computations.
• Too much reinforcement learning can lead
to an overload of states which can weaken
the results.
Features
• Machine learning models are trained using
data that can be represented as raw features
(same as data) or derived features (derived
from data).
• One of the most important aspects of the
machine learning model is identifying the
features which will help to create a great
model, the model that performs well on
unseen data.
• Machine learning models are trained using
data that can be represented as raw features
• A model for predicting the risk of cardiac disease
mayhave features such as the following:
– Age
– Gender
– Weight
– Whether the person smokes
– Whether the person is suffering from diabetic
disease, etc.
• A model for predicting whether the person is
suitable for a job may have features such as
the educational qualification, number of years of
experience, experience working in the field
etc
• A model for predicting the size of a shirt for a
person may have features such as age, gender,
•Features are nothing but the independent
variables in machine learning models.
•What is required to be learned in any specific
machine learning problem is a set of these
features (independent variables), coefficients
of these features, and parameters for coming
up with appropriate functions or models.
Feature Selection
Feature selection is selecting a subset feature out
of the original features in order to reduce model
complexity,enhance computational efficiency of the
models and reduce generalization error introduced
due to noise by irrelevant features.
Feature extraction
• Feature extraction is
extracting/deriving information from
the original features set to create a new
features subspace.
• The primary idea behind feature extraction
is to compress the data with the goal of
maintaining most of the relevant
information.
Feature extraction
•Feature extraction refers to the process
of transforming raw data into
numerical features that can be
processed while preserving the
information in the original data set. It
yields better results than applying
machine learning directly to the raw data.
Feature extraction
•The key difference between feature
selection and feature extraction
techniques used for dimensionality
reduction is that while the original
features are maintained in the case of
feature selection algorithms, the feature
extraction algorithms transform the data
onto a new feature space.
Spectrogram of a signal using short-time Fourier transform.
Spectrogram shows variation of frequency content over time.
Dataset
• Training Dataset: The sample of data used to train
the model.
• Validation Dataset: The sample of data used to
provide an unbiased evaluation of a model trained on
the training dataset . The evaluation becomes more
biased as skill on the validation dataset is
incorporated into the model configuration.
• Test Dataset: The sample of data used to provide
an unbiased evaluation of a final model trained on
the training dataset.
Framing a Learning
Problem
Designing a Learning
System
• Choose the training experience
• Choose exactly what is to be learned
– i.e. the target function
• Choose how to represent the target function
• Choose a learning algorithm to infer the
target function from the experience
Training data Learner
Environment/
Knowledge
Experience
Testing data
Performance
Element 41
Training vs. Test
Distribution
• We generally assume that the training and
test examples are independently drawn from
the same overall distribution of data
– We call this “i.i.d” which stands for
“independent and identically distributed”
https://www.youtube.com/watch?v=FJ6Z_
-HC eg4
Probability Theory
• Probability theory is at the foundation of
many machine learning algorithms.
• Probability is all about the possibility of various outcomes.
• The set of all possible outcomes is called the sample space.
• The sample space for a coin flip is {heads, tails}. The
sample space for the temperature of water is all values
between the freezing and boiling point.
• Only one outcome in the sample space is possible at
a time, and the sample space must contain all
possible values.
• The sample space is often depicted as Ω (capital
omega) and a specific outcome as ω (lowercase
Representation of the probability of an
event ω as P(ω).The two basic axioms of
probability are:
The probability of any event has to be between 0 (impossible) and 1 (certain), and
the sum of the probabilities of all events should be 1.
This follows from the fact that the sample space must contain all possible
outcomes. Therefore, we are certain (probability 1) that one of the possible
outcomes will occur.
•A random variable x, is a variable which
randomly takes on values from a sample space.
• We often indicate a specific value x can take .
with italics.
•For example, if x represents the outcome of a
coin flip, we may represent a specific outcome
as x = heads.
•Random variables can either be discrete like
the coin flip, or continuous (can take on an
uncountably infinite amount of possible
values).
•To describe the likelihood of each possible
value of a random variable x, we specify a
probability distribution.
• We write x ~ P(x) to indicate that x is
a random variable which is drawn from
a probability distribution P(x).
● Consider the roll of a fair dice and let A=1 if the number is even (i.e.
{
P(x∣y)=P(x∩y)/P(y)
.Conditional probability distribution is the
likelihood of one condition being true if another
condition is known to be true. This forms the
foundation of Bayes’ theorem and Bayesian
networks.
Two fair dice are rolled. What is the conditional
probability that first one lands on 6 given that
the dice land on different numbers?
50
History of Machine
Learning
• 1950s
– Samuel’s checker player
– Selfridge’s Pandemonium
• 1960s:
– Neural networks: Perceptron
– Pattern recognition
– Learning in the limit theory
– Minsky and Papert prove limitations of Perceptron
• 1970s:
– Symbolic concept induction
– Winston’s arch learner
– Expert systems and the knowledge acquisition bottleneck
– Quinlan’s ID3
– Michalski’s AQ and soybean diagnosis
– Scientific discovery with BACON
– Mathematical discovery with AM
History of Machine Learning
• 1980s:
– Advanced decision tree and rule learning
– Explanation-based Learning (EBL)
– Learning and planning and problem solving
– Utility problem
– Analogy
– Cognitive architectures
– Resurgence of neural networks (connectionism, backpropagation)
– Valiant’s PAC Learning Theory
– Focus on experimental methodology
• 1990s
– Data mining
– Adaptive software agents and web applications
– Text learning
– Reinforcement learning (RL)
– Inductive Logic Programming (ILP)
– Ensembles: Bagging, Boosting, and Stacking
History of Machine Learning
• 2000s
– Support vector machines & kernel methods
– Graphical models
– Statistical relational learning
– Transfer learning
– Sequence labeling
– Collective classification and structured outputs
– Computer Systems Applications (Compilers, Debugging, Graphics, Security)
– E-mail management
– Personalized assistants that learn
– Learning in robotics and vision
• PRESENT
– Deep learning systems
– Learning for big data
– Bayesian methods
– Multi-task & lifelong learning