0% found this document useful (0 votes)
7 views118 pages

Unit4 PPT

The document provides an introduction to machine learning, defining it as a field that enables computers to learn without explicit programming. It outlines various types of machine learning, including supervised, unsupervised, semi-supervised, and reinforcement learning, along with examples and algorithms like K-Nearest Neighbors (KNN) and Q-Learning. Additionally, it discusses concepts related to neural networks and their architectures, emphasizing their ability to learn from experience.

Uploaded by

23bme020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views118 pages

Unit4 PPT

The document provides an introduction to machine learning, defining it as a field that enables computers to learn without explicit programming. It outlines various types of machine learning, including supervised, unsupervised, semi-supervised, and reinforcement learning, along with examples and algorithms like K-Nearest Neighbors (KNN) and Q-Learning. Additionally, it discusses concepts related to neural networks and their architectures, emphasizing their ability to learn from experience.

Uploaded by

23bme020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 118

Introduction

to
Machine Learning
Introduction
➢ What is Machine Learning?
Introduction
➢ What is Machine Learning?
➢ Arthur Samuel (1959) - Machine Learning: Field of study that
gives computers the ability to learn without being explicitly
programmed.
Introduction
➢ Flavors of Machine Learning
Introduction
➢ Flavors of Machine Learning
➢ Supervised Learning
➢ Unsupervised Learning
➢ Semi Supervised Learning
➢ Reinforcement Learning
Introduction
➢ What is Supervised Learning?
➢ Learning by examples/Learning by presentation of
examples/Learning by presentation of input/output pairs
Introduction
➢ What is Supervised Learning?
➢ Classification
➢ Regression
Introduction
➢ What is Supervised Learning?
➢ Classification
➢Output variable is discrete in nature
Introduction
➢ What is Supervised Learning?
➢ Classification
➢Output variable is discrete in nature
➢Predicting selling price of a house as
high/moderate/low
Introduction
➢ What is Supervised Learning?
➢ Regression
➢Output variable is continuous in nature
Introduction
➢ What is Supervised Learning?
➢ Regression
➢Output variable is continuous in nature
➢Predicting selling price of a house in exact amount
Introduction
➢ What is Unsupervised Learning?
➢ Learning by observation
Introduction
➢ What is Unsupervised Learning?
➢ Learning by observation
➢ Example
Introduction
➢ What is Semi/Partially Supervised Learning?
➢ Learning from positive and unlabeled data
➢ Learning from labeled and unlabeled data
Introduction
➢ What is Semi/Partially Supervised Learning?
➢ Learning from positive and unlabeled data
➢Example: Social Bookmarking System
Introduction
➢ What is Semi/Partially Supervised Learning?
➢ Learning from labeled and unlabeled data
➢Example: Movie Recommender System
Introduction
➢ What is Reinforcement Learning?
➢ Learning from interacting with the environment
Introduction
➢ What is Reinforcement Learning?
➢ Learning from interacting with the environment
➢ Example
train_test_split
Iris Species
Iris Species
KNN for Classification

• The K-Nearest Neighbors (KNN) algorithm is a supervised machine learning


method employed to tackle classification and regression problems.

• Evelyn Fix and Joseph Hodges developed this algorithm in 1951, which was
subsequently expanded by Thomas Cover.

• It is widely disposable in real-life scenarios since it is non-parametric, meaning it


does not make any underlying assumptions about the distribution of data.
KNN for
Classification
KNN for Classification
KNN for Classification
'uniform' : uniform
weights. All points in
each neighborhood
are weighted equally.

'distance' : weight
points by the inverse
of their distance. in
this case, closer
neighbors of a query
point will have a
greater influence than
neighbors which are
further away.
‘stratify’ parameter in split
This stratify parameter makes a split so that the proportion of values in the
sample produced will be the same as the proportion of values provided by
parameter stratify.

For example: a binary categorical classification problem,

if y is the dependent variable or target\label column within dataframe


following values:
•0 - 25% data is zeros
•1 - 75% data is ones

Then stratify=y will make sure that your random split has:
•25% of 0's
•75% of 1's
Precision and recall
• Consider a computer program for recognizing dogs
(the relevant element) in a digital photograph.

• Upon processing a picture which contains ten cats and


twelve dogs, the program identifies eight dogs.

• Of the eight elements identified as dogs, only five


actually are dogs (true positives), while the other three
are cats (false positives).

• Seven dogs were missed (false negatives), and seven cats


were correctly excluded (true negatives).

• The program's precision is then 5/8 (true positives /


selected elements) while its recall is 5/12 (true positives
/ relevant elements).
classification_report
classification_report
Voting for Classification vs Taking Average
for Regression
• In the classification problem, the class labels are determined by
performing majority voting.
• The class with the most occurrences among the neighbors becomes the
predicted class for the target data point.

• In the regression problem, the class label is calculated by taking


average of the target values of K nearest neighbors.
• The calculated average value becomes the predicted output for the target
data point.
KNN for Regression
KNN for Regression
KNN for Regression
KNN for Regression
KNN for Regression [5]
➢ KNN for Regression
KNN for Regression [5]
➢ We can now use the training set to classify an unknown
case (Age=48 and Loan=$142,000) using Euclidean
distance.

➢ If K=1 then the nearest neighbour is the last case in the


training set with HPI=264.

➢D = Sqrt[(48-33)^2 + (142000-150000)^2] =
8000.01 >> HPI = 264

➢ By having K=3, the prediction for HPI is equal to the


average of HPI for the top three neighbours.

➢ HPI = (264+139+139)/3 = 180.7

Source: [1]
KNN for Regression [5]
➢ Important thing to notice in the given data is to notice
the difference in the scale of Age and Loan.

Source: [1]
KNN for Regression [5]
➢ Important thing to notice in the given data is to notice
the difference in the scale of Age and Loan.

➢ We should ideally normalize the data in such cases.

Source: [1]
KNN for Regression [5]

Source: [1]
KNN for Regression [5]
What if we take weighted average?

Source: [1]
Unsupervised Learning
➢ K-means Algorithm [1]

In clustering, we do
not have a target to
predict.

We look at the data,


try to club similar
observations, and form
different groups.

Hence it is an
unsupervised learning
problem.
➢ K-means Algorithm [1]
Unsupervised Learning
➢ K-means Algorithm [1]
Unsupervised Learning
➢ K-means Algorithm [1]
K-means Clustering
K-means Clustering
K-means flowchart
K-means Algorithm
Artificial Neural Networks
➢ What? [2]
• Computing Systems inspired by Biological
Neural Networks.

72
Biological Neural Networks [2]
➢ Nervous System
• Biological Neural Networks
• Biological Neurons
• What?
• Biological Neuron is an electrically excitable cell that
processes and transmits information through electrical
and chemical signals.

73
Biological Neural Networks [2]
➢ Nervous System
• Biological Neural Networks
• Biological Neurons
• 10 – 100 billion Neurons
• connection to 100 – 10000 other neurons
• 100 different types
• layered structure

74
Biological Neural Networks [2]
➢ Features
• Parallel processing systems
• Neurons are processing elements and each neuron
performs some simple calculations
• Neurons are networked
• Each connection conveys a signal from one node
(neuron) to another
• Connection strength decides the extent to which
a signal is amplified or diminished by a connection

75
Biological Neural Networks [2]
➢ Features (from our experience)
• Ability to learn from experience and accomplish
complex task without being programmed explicitly
• Driving
• Speaking using a particular language
• Translation
• Speaker Recognition
• Face Recognition, etc…

76
Biological Neural Networks [2]

77
Biological Neural Networks [2]

78
Biological Neural Networks [2]

79
Biological Neural Networks [2]
Artificial Neuron Model [3]
➢ An artificial neuron is a mathematical function
regarded as a model of a biological neuron.
➢ Remember: 1. BN is able to receive the amplified or
diminished inputs from multiple dendrites 2. It is able to
combine these inputs 3. It is able to process input and
produce output
➢ Simple Neuron
•Weight Function, Net Input Function &
Transfer Function

81
Neuron with Vector Input [3]

82
Activation Functions [3]

< & >=

83
Activation Functions

f(net) = max(0, net) 0.01*xi/αi*xi

Note: Image is not Original

84
A Layer of Neurons [3]

85
Multiple Layers of Neurons [3]

86
ANN Architectures [2]
➢ Fully Connected Network (Asymmetric)

87
ANN Architectures [2]
➢ Fully Connected Network (Symmetric)

88
ANN Architectures [2]
➢ Layered Network

These are networks in which nodes are partitioned into subsets called layers,
with no connections that lead from layer j to layer k if j > k 89
ANN Architectures [2]
➢ Acyclic Network

These are subclass of layered networks with no intra-layer connections.


90
ANN Architectures [2]
➢ Feedforward Network

These are subclass of acyclic networks in which a connection is allowed


91 from a
node in layer i only to nodes in layer i + 1.
Learning in ANN
➢Types of Learning
• Supervised Learning
• Unsupervised Learning

92
Linear Separability [2]
➢ 1 – D Case
➢ 7/5 Students data – Weight Values & Obese/Not Obese
➢ (50, NO), (55, NO), (60, NO), (65, NO), (70, O), (75, O), (80, O) – Linearly
Separable
➢ (55, NO), (60, O), (65, NO), (70, O), (75, O) – Linearly Inseparable

➢ Learning a separating point/line

93
Linear Separability
➢2 – D Case
➢ Learning a separating line

94
Linear Separability
➢ 3 – D Case
➢ Learning a separating plane

➢ Higher Dimensional Case


➢ Learning a separating hyperplane

95
Perceptron Model [3]
➢ What is Perceptron?
➢ It is a machine which can learn (using examples)
to assign input vectors to different classes.
➢ What can it do?
• 2-class linear classification problem
• What?
• Process

96
hardlim(n) = 1, if n >= 0; 0 otherwise.
Perceptron Learning Rule [3]
➢ Learning Process

•Wnew=Wold + eP
• bnew = bold + e, where e = target - actual97
Numerical [2]
➢ Assume 7 one dimensional input patterns {0.0, 0.17, 0.33, 0.50,
0.67, 0.83, 1.0}. Assume that first four patterns belong to class
0 (with desired output 0) and remaining patterns belong to class
1 (with desired output 1). Design a perceptron to classify these
patterns. Use perceptron learning rule. Assume learning rate =
0.1 and initial weight and bias to be (-0.36) and (-0.1)
respectively. Show computation for two epochs.

98
Some Issues
➢ Why to use bias?
➢Termination Criterion
➢ Learning Rate
➢ Non-numeric Inputs
➢ Epoch

99
Multiclass Discrimination
➢ Layer of Perceptron
➢ To distinguish among n classes, a layer of n
perceptrons can be used

• A presented sample is
considered to belong to
ith calss only if ith
output is 1 and
remaining are 0.
• If all outputs are zero,
or if more than one
output value equals one,
the network may be
considered to have
failed in classification
100
task.
Ex-OR Gate
➢ Layer of Perceptron
➢ AND Gate and OR Gate – Linearly Separable?
➢ Ex-OR – Linearly Separable?
➢How to learn functionality like (classifying non-linear
patterns) Ex-OR Gate?

101
Feed Forward Neural Network [1]

𝜃
𝜃44

𝜃𝜃66

𝜃𝜃55
Multilayer Networks – Typical
Transfer Functions [3]

103
Reinforcement Learning
➢ Q-Learning Algorithm:
➢The Q-Learning algorithm goes as follows:
1. Set the gamma parameter, and environment rewards in matrix R.
2. Initialize matrix Q to zero.
3. For each episode:
Select a random initial state.
Do While the goal state hasn't been reached.
- Select one among all possible actions for the current state.
- Using this possible action, consider going to the next state.
- Get maximum Q value for this next state based on all
possible actions.
- Compute: Q(state, action) = R(state, action) + Gamma *
Max[Q(next state, all actions)]
- Set the next state as the current state.
End Do
End For

104
Reinforcement Learning [6]
➢ Q-Learning (Initial Setup)

Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)] 105
Reinforcement Learning [6]
➢ Q-Learning – Episode 1 (Gamma = 0.8, and the initial state as Room 1)

Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)]

Q(1, 5) = R(1, 5) + 0.8 * Max[Q(5, 1), Q(5, 4), Q(5, 5)] = 100 + 0.8 * 0 = 100
106
Reinforcement Learning [6]
➢ Q-Learning – Episode 1 (Gamma = 0.8, and the initial state as Room 1)

Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)]

Q(1, 5) = R(1, 5) + 0.8 * Max[Q(5, 1), Q(5, 4), Q(5, 5)] = 100 + 0.8 * 0 = 100
107
Reinforcement Learning [6]
➢ Q-Learning – Episode 2 (Gamma = 0.8, and the initial state as Room 3)

Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)]

Q(3, 1) = R(3, 1) + 0.8 * Max[Q(1, 3), Q(1, 5)] = 0 + 0.8 * Max(0, 100) = 80
108
Reinforcement Learning [6]
➢ Q-Learning – Episode 2 (Gamma = 0.8, and the initial state as Room 3)

Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)]

Q(3, 1) = R(3, 1) + 0.8 * Max[Q(1, 3), Q(1, 5)] = 0 + 0.8 * Max(0, 100) = 80
109
Reinforcement Learning [6]
➢ Q-Learning – Episode 2 Continue (Gamma = 0.8, and the current
state is Room 1)

Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)]

Q(1, 5) = R(1, 5) + 0.8 * Max[Q(5, 1), Q(5, 4), Q(5, 5)] = 100 + 0.8 * 0 = 100
110
Reinforcement Learning [6]
➢ Q-Learning – Episode 2 Continue (Gamma = 0.8, and the current
state is Room 1)

Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)]

Q(1, 5) = R(1, 5) + 0.8 * Max[Q(5, 1), Q(5, 4), Q(5, 5)] = 100 + 0.8 * 0 = 100
111
Reinforcement Learning [6]
➢ Q-Learning
➢ If our agent learns more through further episodes, it will
finally reach convergence values in matrix Q like:

➢ This matrix Q, can then be normalized (i.e.; converted to


percentage) by dividing all non-zero entries by the highest
number (500 in this case):

112
Reinforcement Learning [6]
➢ Q-Learning

113
References
1. Data mining: concepts and techniques, J. Han, and M. Kamber.
Morgan Kaufmann, (2006)

2. Elements of Artificial Neural Networks, Kishan Mehrotra,


Chilukuri K. Mohan, Sanjay Ranka. MIT Press, (1997)

3. Matlab Neural Network Tollbox Documentation

4. Data Mining the Web, Daniel T. Larose, and Zdravko Markov,


Wiley

5. https://www.saedsayad.com/k_nearest_neighbors_reg.htm

6. http://mnemstudio.org/path-finding-q-learning-tutorial.htm
Disclaimer
➢ Content of this presentation is not original and it
has been prepared from various sources for
teaching purpose.

You might also like