Unit4 PPT
Unit4 PPT
to
Machine Learning
Introduction
➢ What is Machine Learning?
Introduction
➢ What is Machine Learning?
➢ Arthur Samuel (1959) - Machine Learning: Field of study that
gives computers the ability to learn without being explicitly
programmed.
Introduction
➢ Flavors of Machine Learning
Introduction
➢ Flavors of Machine Learning
➢ Supervised Learning
➢ Unsupervised Learning
➢ Semi Supervised Learning
➢ Reinforcement Learning
Introduction
➢ What is Supervised Learning?
➢ Learning by examples/Learning by presentation of
examples/Learning by presentation of input/output pairs
Introduction
➢ What is Supervised Learning?
➢ Classification
➢ Regression
Introduction
➢ What is Supervised Learning?
➢ Classification
➢Output variable is discrete in nature
Introduction
➢ What is Supervised Learning?
➢ Classification
➢Output variable is discrete in nature
➢Predicting selling price of a house as
high/moderate/low
Introduction
➢ What is Supervised Learning?
➢ Regression
➢Output variable is continuous in nature
Introduction
➢ What is Supervised Learning?
➢ Regression
➢Output variable is continuous in nature
➢Predicting selling price of a house in exact amount
Introduction
➢ What is Unsupervised Learning?
➢ Learning by observation
Introduction
➢ What is Unsupervised Learning?
➢ Learning by observation
➢ Example
Introduction
➢ What is Semi/Partially Supervised Learning?
➢ Learning from positive and unlabeled data
➢ Learning from labeled and unlabeled data
Introduction
➢ What is Semi/Partially Supervised Learning?
➢ Learning from positive and unlabeled data
➢Example: Social Bookmarking System
Introduction
➢ What is Semi/Partially Supervised Learning?
➢ Learning from labeled and unlabeled data
➢Example: Movie Recommender System
Introduction
➢ What is Reinforcement Learning?
➢ Learning from interacting with the environment
Introduction
➢ What is Reinforcement Learning?
➢ Learning from interacting with the environment
➢ Example
train_test_split
Iris Species
Iris Species
KNN for Classification
• Evelyn Fix and Joseph Hodges developed this algorithm in 1951, which was
subsequently expanded by Thomas Cover.
'distance' : weight
points by the inverse
of their distance. in
this case, closer
neighbors of a query
point will have a
greater influence than
neighbors which are
further away.
‘stratify’ parameter in split
This stratify parameter makes a split so that the proportion of values in the
sample produced will be the same as the proportion of values provided by
parameter stratify.
Then stratify=y will make sure that your random split has:
•25% of 0's
•75% of 1's
Precision and recall
• Consider a computer program for recognizing dogs
(the relevant element) in a digital photograph.
➢D = Sqrt[(48-33)^2 + (142000-150000)^2] =
8000.01 >> HPI = 264
Source: [1]
KNN for Regression [5]
➢ Important thing to notice in the given data is to notice
the difference in the scale of Age and Loan.
Source: [1]
KNN for Regression [5]
➢ Important thing to notice in the given data is to notice
the difference in the scale of Age and Loan.
Source: [1]
KNN for Regression [5]
Source: [1]
KNN for Regression [5]
What if we take weighted average?
Source: [1]
Unsupervised Learning
➢ K-means Algorithm [1]
In clustering, we do
not have a target to
predict.
Hence it is an
unsupervised learning
problem.
➢ K-means Algorithm [1]
Unsupervised Learning
➢ K-means Algorithm [1]
Unsupervised Learning
➢ K-means Algorithm [1]
K-means Clustering
K-means Clustering
K-means flowchart
K-means Algorithm
Artificial Neural Networks
➢ What? [2]
• Computing Systems inspired by Biological
Neural Networks.
72
Biological Neural Networks [2]
➢ Nervous System
• Biological Neural Networks
• Biological Neurons
• What?
• Biological Neuron is an electrically excitable cell that
processes and transmits information through electrical
and chemical signals.
•
73
Biological Neural Networks [2]
➢ Nervous System
• Biological Neural Networks
• Biological Neurons
• 10 – 100 billion Neurons
• connection to 100 – 10000 other neurons
• 100 different types
• layered structure
74
Biological Neural Networks [2]
➢ Features
• Parallel processing systems
• Neurons are processing elements and each neuron
performs some simple calculations
• Neurons are networked
• Each connection conveys a signal from one node
(neuron) to another
• Connection strength decides the extent to which
a signal is amplified or diminished by a connection
75
Biological Neural Networks [2]
➢ Features (from our experience)
• Ability to learn from experience and accomplish
complex task without being programmed explicitly
• Driving
• Speaking using a particular language
• Translation
• Speaker Recognition
• Face Recognition, etc…
76
Biological Neural Networks [2]
77
Biological Neural Networks [2]
78
Biological Neural Networks [2]
79
Biological Neural Networks [2]
Artificial Neuron Model [3]
➢ An artificial neuron is a mathematical function
regarded as a model of a biological neuron.
➢ Remember: 1. BN is able to receive the amplified or
diminished inputs from multiple dendrites 2. It is able to
combine these inputs 3. It is able to process input and
produce output
➢ Simple Neuron
•Weight Function, Net Input Function &
Transfer Function
81
Neuron with Vector Input [3]
82
Activation Functions [3]
83
Activation Functions
84
A Layer of Neurons [3]
85
Multiple Layers of Neurons [3]
86
ANN Architectures [2]
➢ Fully Connected Network (Asymmetric)
87
ANN Architectures [2]
➢ Fully Connected Network (Symmetric)
88
ANN Architectures [2]
➢ Layered Network
These are networks in which nodes are partitioned into subsets called layers,
with no connections that lead from layer j to layer k if j > k 89
ANN Architectures [2]
➢ Acyclic Network
92
Linear Separability [2]
➢ 1 – D Case
➢ 7/5 Students data – Weight Values & Obese/Not Obese
➢ (50, NO), (55, NO), (60, NO), (65, NO), (70, O), (75, O), (80, O) – Linearly
Separable
➢ (55, NO), (60, O), (65, NO), (70, O), (75, O) – Linearly Inseparable
93
Linear Separability
➢2 – D Case
➢ Learning a separating line
94
Linear Separability
➢ 3 – D Case
➢ Learning a separating plane
95
Perceptron Model [3]
➢ What is Perceptron?
➢ It is a machine which can learn (using examples)
to assign input vectors to different classes.
➢ What can it do?
• 2-class linear classification problem
• What?
• Process
96
hardlim(n) = 1, if n >= 0; 0 otherwise.
Perceptron Learning Rule [3]
➢ Learning Process
•Wnew=Wold + eP
• bnew = bold + e, where e = target - actual97
Numerical [2]
➢ Assume 7 one dimensional input patterns {0.0, 0.17, 0.33, 0.50,
0.67, 0.83, 1.0}. Assume that first four patterns belong to class
0 (with desired output 0) and remaining patterns belong to class
1 (with desired output 1). Design a perceptron to classify these
patterns. Use perceptron learning rule. Assume learning rate =
0.1 and initial weight and bias to be (-0.36) and (-0.1)
respectively. Show computation for two epochs.
98
Some Issues
➢ Why to use bias?
➢Termination Criterion
➢ Learning Rate
➢ Non-numeric Inputs
➢ Epoch
99
Multiclass Discrimination
➢ Layer of Perceptron
➢ To distinguish among n classes, a layer of n
perceptrons can be used
➢
• A presented sample is
considered to belong to
ith calss only if ith
output is 1 and
remaining are 0.
• If all outputs are zero,
or if more than one
output value equals one,
the network may be
considered to have
failed in classification
100
task.
Ex-OR Gate
➢ Layer of Perceptron
➢ AND Gate and OR Gate – Linearly Separable?
➢ Ex-OR – Linearly Separable?
➢How to learn functionality like (classifying non-linear
patterns) Ex-OR Gate?
101
Feed Forward Neural Network [1]
𝜃
𝜃44
𝜃𝜃66
𝜃𝜃55
Multilayer Networks – Typical
Transfer Functions [3]
103
Reinforcement Learning
➢ Q-Learning Algorithm:
➢The Q-Learning algorithm goes as follows:
1. Set the gamma parameter, and environment rewards in matrix R.
2. Initialize matrix Q to zero.
3. For each episode:
Select a random initial state.
Do While the goal state hasn't been reached.
- Select one among all possible actions for the current state.
- Using this possible action, consider going to the next state.
- Get maximum Q value for this next state based on all
possible actions.
- Compute: Q(state, action) = R(state, action) + Gamma *
Max[Q(next state, all actions)]
- Set the next state as the current state.
End Do
End For
104
Reinforcement Learning [6]
➢ Q-Learning (Initial Setup)
Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)] 105
Reinforcement Learning [6]
➢ Q-Learning – Episode 1 (Gamma = 0.8, and the initial state as Room 1)
Q(1, 5) = R(1, 5) + 0.8 * Max[Q(5, 1), Q(5, 4), Q(5, 5)] = 100 + 0.8 * 0 = 100
106
Reinforcement Learning [6]
➢ Q-Learning – Episode 1 (Gamma = 0.8, and the initial state as Room 1)
Q(1, 5) = R(1, 5) + 0.8 * Max[Q(5, 1), Q(5, 4), Q(5, 5)] = 100 + 0.8 * 0 = 100
107
Reinforcement Learning [6]
➢ Q-Learning – Episode 2 (Gamma = 0.8, and the initial state as Room 3)
Q(3, 1) = R(3, 1) + 0.8 * Max[Q(1, 3), Q(1, 5)] = 0 + 0.8 * Max(0, 100) = 80
108
Reinforcement Learning [6]
➢ Q-Learning – Episode 2 (Gamma = 0.8, and the initial state as Room 3)
Q(3, 1) = R(3, 1) + 0.8 * Max[Q(1, 3), Q(1, 5)] = 0 + 0.8 * Max(0, 100) = 80
109
Reinforcement Learning [6]
➢ Q-Learning – Episode 2 Continue (Gamma = 0.8, and the current
state is Room 1)
Q(1, 5) = R(1, 5) + 0.8 * Max[Q(5, 1), Q(5, 4), Q(5, 5)] = 100 + 0.8 * 0 = 100
110
Reinforcement Learning [6]
➢ Q-Learning – Episode 2 Continue (Gamma = 0.8, and the current
state is Room 1)
Q(1, 5) = R(1, 5) + 0.8 * Max[Q(5, 1), Q(5, 4), Q(5, 5)] = 100 + 0.8 * 0 = 100
111
Reinforcement Learning [6]
➢ Q-Learning
➢ If our agent learns more through further episodes, it will
finally reach convergence values in matrix Q like:
112
Reinforcement Learning [6]
➢ Q-Learning
113
References
1. Data mining: concepts and techniques, J. Han, and M. Kamber.
Morgan Kaufmann, (2006)
5. https://www.saedsayad.com/k_nearest_neighbors_reg.htm
6. http://mnemstudio.org/path-finding-q-learning-tutorial.htm
Disclaimer
➢ Content of this presentation is not original and it
has been prepared from various sources for
teaching purpose.