0% found this document useful (0 votes)
144 views200 pages

BCSE 0105 - Machine Learning - Module 1 - Complete - NC

The document outlines a course on machine learning. It introduces machine learning concepts and applications and discusses topics that will be covered in the course including supervised and unsupervised learning techniques. It also lists learning outcomes and references materials for further reading.

Uploaded by

Sunny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
144 views200 pages

BCSE 0105 - Machine Learning - Module 1 - Complete - NC

The document outlines a course on machine learning. It introduces machine learning concepts and applications and discusses topics that will be covered in the course including supervised and unsupervised learning techniques. It also lists learning outcomes and references materials for further reading.

Uploaded by

Sunny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 200

B.

Tech (CSE) - III Year VI Semester


Session 2022-23

BCSE0105: MACHINE LEARNING


MODULE 1
Dr. Nabanita Choudhury
Assistant Professor, Department of CEA, GLA University, Mathura
PREREQUISITE
➢ Basic concepts of probability and statistics.

COURSE OBJECTIVE
✓ To introduce students to the basic concepts and techniques of Machine Learning
✓ To develop skills of using recent machine learning software for solving practical
problems
✓ To gain experience of doing independent study and research

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 2
WHAT ARE WE GOING TO LEARN
MODULE 1
Introduction: Machine Learning basics, Hypothesis space and inductive bias,
training and test set, cross validation.
Introduction to Statistical Learning: Bayesian Method.
Machine Learning: Supervised (Regression, Classification) vs. Unsupervised
(Clustering) Learning.
Data Preprocessing: Imputation, Outlier management, One hot encoding,
Dimensionality Reduction-feature extraction, Principal Component Analysis (PCA),
Singular Value Decomposition.
Supervised Learning: Regression-Linear regression, Polynomial regression,
Classification- Logistic regression, k-nearest neighbor classifier.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT 3


BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA
PROFESSOR, GLA UNIVERSITY, MATHURA
WHAT ARE WE GOING TO LEARN
MODULE 2
Supervised Learning: Decision tree classifier, Naïve Bayes classifier, Support vector
machine classifier.
Unsupervised Learning: k-means clustering, Hierarchical clustering.
Underfitting vs Overfitting: Regularization and Bias/Variance.
Ensemble methods: Bagging, Boosting, Improving classification with Ada-Boost
algorithm.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT 4


BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA
PROFESSOR, GLA UNIVERSITY, MATHURA
TEXT BOOKS
▪Mitchell, Tom M., Machine Learning. Tata McGraw-Hill Education, 2013

▪ Alpaydin, E., Introduction to machine learning. MIT press, 2009

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 5
REFERENCE BOOKS
▪ Harrington, P. , Machine learning in action, Shelter Island, NY:
Manning Publications Co, 2012.

▪ Bishop, C. M., Pattern recognition


and machine learning (Information
science and statistics) Springer-
Verlag New York. Inc. Secaucus, NJ,
USA. 2006.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 6
OUTCOME
After completion of the course, students will be able to:
✓ CO1: Apply the basic concepts of machine learning.
✓ CO2: Apply the concepts of regression and re-sampling methods.
✓ CO3: Design supervised and re-enforcement learning based solution.
✓ CO4: Apply the ensemble methods for improving classification.
✓ CO5: Identify the ways of feature extraction, reduction and selection.
✓ CO6: Design the applications of machine learning algorithm

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 7
WHAT IS MACHINE LEARNING?

Ref: https://www.youtube.com/watch?v=LzaWrmKL1Z4

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 8
WHAT IS MACHINE LEARNING?

Ref: https://medium.com/@suryasaikrishna97/introduction-to-machine-learning-5faa9b636578

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 9
MACHINE LEARNING -TIMELINE

Ref: https://medium.com/analytics-vidhya/fundamental-omachine-learning-ada28afa1bd3

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 10
WHAT DO THE PIONEERS SAY…

Ref: https://medium.com/@jetnew/a-summary-of-alan-m-turings-computing-machinery-and-intelligence-fd714d187c0b
Ref: https://www.facebook.com/NDLIndia/photos/a.745256605623867/1044605819022276/?type=3
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 11
WHAT DO THE PIONEERS SAY…
In 1959, the term Machine Learning term was coined
by Arthur Samuel

Arthur Samuel (1901-1990) was a pioneer of artificial


intelligence research

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 12
WHAT DO THE PIONEERS SAY…

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 13
WHAT DO THE PIONEERS SAY…

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 14
WHAT DO THE PIONEERS SAY…

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 15
WHY IS MACHINE LEARNING SO IMPORTANT?
Due to the excessive production of data, we need a Finding hidden patterns a nd extracting
method that c a n be used to structure, analyze a n d key insights from data is the most essential
draw useful insights from data. part of Machine Learning.

From detecting the


genes linked to the
deadly ALS disease to
building self-driving cars,
ML c a n be used to solve
the most complex
Machine Learning is used to forecast sales, predict
problems.
downfalls in the stock market, identify risks a n d
anomalies, etc.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 16
APPLICATIONS

Ref: https://www.javatpoint.com/applications-of-machine-learning

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 17
APPLICATIONS (CONTD.)

Virtual Personal Assistants

Ref: https://www.edureka.co/blog/machine-learning-applications/

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 18
APPLICATIONS (CONTD.)
Products Recommendations

Social Media (Facebook)

Ref: https://www.edureka.co/blog/machine-learning-applications/

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 19
APPLICATIONS (CONTD.)
Traffic Predictions

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 20
APPLICATIONS (CONTD.)
Fraud detection

Ref: Suryanarayana, S. Venkata, G. N. Balaji, and G. Venkateswara Rao. "Machine Learning Approaches for Credit Card Fraud Detection." Int. J. Eng. Technol 7.2 (2018): 917-920.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 21
APPLICATIONS (CONTD.)
Online Video Streaming Recommendation

https://pub.towardsai.net/recommendation-system-in-depth-tutorial-with-python-for-netflix-using-collaborative-filtering-533ff8a0e444

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 22
APPLICATIONS (CONTD.)
Stock Market Analysis

https://medium.com/vsinghbisen/how-sentiment-analysis-in-stock-market-used-for-right-prediction-5c1bfe64c233

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 23
APPLICATIONS (CONTD.)
Medical Diagnosis

https://medium.com/ai-techsystems/application-of-machine-learning-89a227256f7d

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 24
APPLICATIONS (CONTD.)
Self Driving Cars

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 25
APPLICATIONS (CONTD.)
Spam mail detection Google Translate

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 26
TRADITIONAL PROGRAMMING VS MACHINE LEARNING

https://www.avenga.com/magazine/machine-learning-programming/

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 27
TRADITIONAL PROGRAMMING VS MACHINE LEARNING
(CONTD.)

https://www.avenga.com/magazine/machine-learning-programming/

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 28
MACHINE LEARNING, ARTIFICIAL INTELLIGENCE AND
DEEP LEARNING

https://www.edureka.co/blog/ai-vs-machine-learning-vs-deep-learning/

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 29
MACHINE LEARNING, ARTIFICIAL INTELLIGENCE AND
DEEP LEARNING (CONTD.)

https://www.viatech.com/en/2018/05/history-of-artificial-intelligence/

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 30
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 31
SUPERVISED MACHINE LEARNING

Ref: https://medium.com/@jorgesleonel/supervised-learning-c16823b00c13

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 32
SUPERVISED MACHINE LEARNING (CONTD.)
▪ Supervised learning is the type of machine learning in which the model is
trained using well labelled training data, and on basis of that data, the
model predicts the output.

▪ The labelled data means some input data is already tagged with the
correct output.

▪ In supervised learning, the training data provided to the model work as the
supervisor that teaches the model to predict the output correctly. It applies
the same concept as a student learns in the supervision of the teacher.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 33
SUPERVISED MACHINE LEARNING (CONTD.)
▪ The aim of a supervised learning algorithm is to find a mapping function
to map the input variable(x) with the output variable(y).

▪ In the real-world, supervised learning can be used for Risk Assessment,


Image Classification, Fraud Detection, Spam Filtering, Customer Sentiment
Analysis, etc.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 34
SUPERVISED MACHINE LEARNING (CONTD.)

▪ The model gets trained on a labelled dataset in supervised learning, where


the model learns about each type of data.

▪ Once the training process is completed, the model is tested on the basis of
test data (a subset of the training set), and then it predicts the output.

▪ A labelled dataset is one that has both input and output parameters.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 35
SUPERVISED MACHINE LEARNING (CONTD.)

Ref: https://www.javatpoint.com/supervised-machine-learning

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 36
SUPERVISED MACHINE LEARNING (CONTD.)
▪ Suppose we have a dataset of different types of shapes which includes
square, triangle, and hexagon.

o If the given shape has four sides, and all the sides are equal, then it will
be labelled as a square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as
hexagon.

▪ Now, after training, we test our model using the test set, and the task of the
model is to identify the shape.

▪ The machine is already trained on all types of shapes, and when it finds a
new shape, it classifies the shape and predicts the output.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 37
STEPS OF SUPERVISED LEARNING
1. Determine the type of training dataset

2. Collect/Gather the labelled training data

3. Split the training dataset into training dataset, test dataset, and validation
dataset

4. Determine the input features of the training dataset, which should have
enough knowledge so that the model can accurately predict the output

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 38
STEPS OF SUPERVISED LEARNING (CONTD.)
5. Determine the suitable algorithm for the model, such as support vector
machine, decision tree, etc.

6. Execute the algorithm on the training dataset. Sometimes we need


validation sets as the control parameters, which are the subset of training
datasets.

7. Evaluate the accuracy of the model by providing the test set. If the model
predicts the correct output, which means our model is accurate.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 39
TRAININGANDTESTDATA
Training Set: A subset of dataset to train the
machine learning model and we already know
the output.

70% 30%
OR
80% 20%

Test set: A subset of dataset to test the


machine learning model, and by using
the test set, model predicts the output.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 40
SUPERVISED MACHINE LEARNING (CONTD.)

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 41
SUPERVISED MACHINE LEARNING (CONTD.)

Ref: https://www.jcchouinard.com/supervised-learning/

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 42
REAL LIFE APPLICATIONSOF SUPERVISED ML

Face Detection
Text Categorization

Spam Categorization
House Price Prediction Stock Price Prediction
43
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA
SUPERVISED MACHINE LEARNING TYPES

Supervised
Learning

Classification Regression

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 44
REGRESSION
• Regression algorithms are used if there is a relationship between the input
variable and the output variable.

• It is used for the prediction of continuous variables, such as weather forecasting,


market trends, etc.

• A regression problem is when the output variable is a real or continuous value,


such as “salary” or “weight”.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 45
REGRESSION (CONTD.)

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 46
REGRESSION (CONTD.)

Ref: https://www.javatpoint.com/regression-analysis-in-machine-learning

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 47
CLASSIFICATION
• Classification algorithms are used when the output variable is categorical, which
means there are two classes such as Yes-No, Male-Female, True-false, Disease-No
disease etc.

• A classification model attempts to draw some conclusion from observed values.


Given one or more inputs a classification model will try to predict the value of
one or more outcomes.

• For example, when filtering emails “spam” or “not spam”, when looking at
transaction data, “fraudulent”, or “authorized”.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 48
CLASSIFICATION (CONTD.)
• Classification either predicts categorical class labels or classifies data (construct a
model) based on the training set and the values (class labels) in classifying
attributes and uses it in classifying new data.

• Examples of classification algorithms include –


➢ Naïve Bayes
➢ K-nearest neighbors classification
➢ Random Forest
➢ Decision Trees
➢ Logistic Regression
➢ Support vector Machines

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 49
UNSUPERVISED MACHINE LEARNING
• Unsupervised learning is a type of machine learning in which models are trained
using unlabeled dataset and are allowed to act on that data without any
supervision.

• The models in unsupervised learning itself find the hidden patterns and insights
from the given data.

• It can be compared to learning which takes place in the human brain while
learning new things.

• The goal of unsupervised learning is to find the underlying structure of dataset,


group that data according to similarities, and represent that dataset in a
compressed format.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 50
UNSUPERVISED MACHINE LEARNING (CONTD.)
• Suppose the unsupervised learning algorithm is given an input dataset containing
images of different types of cats and dogs.

• The algorithm is never trained upon the given dataset, which means it does not
have any idea about the features of the dataset.

• The task of the unsupervised learning algorithm is to identify the image features
on their own.

• Unsupervised learning algorithm will perform this task by clustering the image
dataset into the groups according to similarities between images.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 51
UNSUPERVISED MACHINE LEARNING ADVANTAGES
• Unsupervised learning is helpful for finding useful insights from the data.

• Unsupervised learning is much similar as a human learns to think by their own


experiences, which makes it closer to the real AI.

• Unsupervised learning works on unlabeled and uncategorized data which make


unsupervised learning more important.

• In real-world, we do not always have input data with the corresponding output so
to solve such cases, we need unsupervised learning.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 52
UNSUPERVISED MACHINE LEARNING (CONTD.)

Ref: https://www.g2.com/articles/supervised-vs-unsupervised-learning

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 53
UNSUPERVISED MACHINE LEARNING –APPLICATIONS

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 54
UNSUPERVISED MACHINE LEARNING –APPLICATIONS
Customer Segmentation Analysis

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 55
UNSUPERVISED MACHINE LEARNING –TYPES

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 56
UNSUPERVISED LEARNING ALGORITHMS
• K-means clustering
• KNN (k-nearest neighbors) clustering
• Hierarchal clustering
• Anomaly detection
• Principle Component Analysis
• Independent Component Analysis
• Apriori algorithm
• Singular value decomposition

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 57
SUPERVISED VS UNSUPERVISED LEARNING
Supervised Learning Unsupervised Learning
Supervised learning algorithms are trained using Unsupervised learning algorithms are trained
labeled data. using unlabeled data.

Supervised learning model takes direct feedback Unsupervised learning model does not take any
to check if it is predicting correct output or not. feedback.

Unsupervised learning model finds the hidden


Supervised learning model predicts the output.
patterns in data.
In supervised learning, input data is provided to In unsupervised learning, only input data is
the model along with the output. provided to the model.
The goal of supervised learning is to train the The goal of unsupervised learning is to find the
model so that it can predict the output when it is hidden patterns and useful insights from the
given new data. unknown dataset.
Ref: https://www.javatpoint.com/difference-between-supervised-and-unsupervised-learning

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 58
SUPERVISED VS UNSUPERVISED LEARNING (CONTD.)
Supervised Learning Unsupervised Learning
Supervised learning needs supervision to train the Unsupervised learning does not need any
model. supervision to train the model.
Supervised learning can be categorized Unsupervised Learning can be classified
in Classification and Regression problems. in Clustering and Associations problems.

Supervised learning can be used for those cases Unsupervised learning can be used for those cases
where we know the input as well as corresponding where we have only input data and no
outputs. corresponding output data.
Supervised learning model produces an accurate Unsupervised learning model may give less
result. accurate result as compared to supervised
learning.

Ref: https://www.javatpoint.com/difference-between-supervised-and-unsupervised-learning

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 59
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 60
LINEAR REGRESSION
▪ Linear regression is one of the easiest and most popular Machine Learning
algorithms. It is a statistical method that is used for predictive analysis.

▪ Linear regression makes predictions for continuous/real or numeric variables


such as sales, salary, age, product price, etc.

▪ Linear regression algorithm shows a linear relationship between a


dependent (y) and one or more independent (y) variables, hence called as
linear regression.

▪ Since linear regression shows the linear relationship, which means it finds
how the value of the dependent variable is changing according to the value
of the independent variable.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 61
LINEAR REGRESSION
(CONTD.)
▪ The linear regression model
provides a sloped straight line
representing the relationship
between the variables.

https://www.javatpoint.com/linear-regression-in-machine-learning

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 62
y= a0+a1x

LINEAR REGRESSION (CONTD.)


▪ Mathematically, we can represent linear regression as:

y = b0+b1x
Here,

y = Dependent Variable (Target Variable)


x = Independent Variable (predictor Variable)
b0= Intercept of the line
b1= Linear regression coefficient

The values for x and y variables are training datasets for Linear Regression
model representation.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 63
TERMINOLOGIES RELATEDTO REGRESSION
✓ Dependent Variable: The main factor in Regression analysis which we want
to predict or understand is called the dependent variable. It is also called
target variable.

✓ Independent Variable: The factors which affect the dependent variables or


which are used to predict the values of the dependent variables are called
independent variable, also called as a predictor.

✓ Outliers: Outlier is an observation which contains either very low value or


very high value in comparison to other observed values. An outlier may
hamper the result, so it should be avoided.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 64
TYPES OF LINEAR REGRESSION
Linear regression can be further divided into two types of the algorithms:

▪ Simple Linear Regression:


If a single independent variable is used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called
Simple Linear Regression.

▪ Multiple Linear regression:


If more than one independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm is
called Multiple Linear Regression.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 65
SIMPLE LINEAR REGRESSION -EXAMPLE

Salary based on Years of Experience (salary_data.csv)


Ref: https://towardsdatascience.com/machine-learning-simple-linear-regression-with-python-f04ecfdadc13

66
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA
SIMPLE LINEAR
REGRESSION - EXAMPLE

Regression is a technique for

Salary
investigating the relationship
between dependent (outcome
(Salary (y)) and independent
(feature or attribute or criteria
or predictor (Experience (x))
variables. Experience

67
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA
SIMPLE LINEAR REGRESSION (CONTD.)
A list of houses with size and price is given.

• Need to find best fit line to predict the price. This best fit line is known as
regression line and represented by a linear equation y = b1*x + b0

In this equation:
y – Dependent Variable (Predicted Price of House)
b1 – Slope
x – Independent variable (Size of House (Predictor))
b0 – Intercept
Slope b1 and Intercept b0 are model coefficient/model parameters/ regression
coefficients.
68
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA
UNDERSTANDING THE BESTFITLINE
Predicated Value Line/ Model
Error
Y= m * X + c Actual Value

Our main goal is to find the best fit line,


which means that the error between predicted
values and actual values should be minimized.
The best fit line will have the least error.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 69
UNDERSTANDING THE BESTFITLINE (CONTD.)
How do we find the line of best fit?

• The best fit line will have the least error.

• Gradient Descent is a tool or optimization


algorithm to arrive at the best fit line.

• Let's understand the Statistical way of


computing the best fit line.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 70
Statistical Way of Computing Best Fit Line
• A line can be represented by the formula:

y = mx + c (for two data points)


For n data points (xi , yi),
the regression line is represented as:

ŷ = b0 + b1x ; (ŷ means predicted value)


where b0 and b1 represents regression coefficient, are calculated as-

b1= Σ [ (xi − x’)(yi − 2y’) ]


Σ [ (xi − x’) ]
b0 = y’ - b1 * x’
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA xi and yi are individual data point x’ and y’ are mean71value.
LINEAR REGRESSION WITH EXAMPLE
Problem statement
How to Find the Regression Equation
Last year, five randomly selected students took a math
aptitude test before they began their statistics course.
Student xi yi
The Statistics Department has three questions.
1 95 85
1. What linear regression equation best predicts
2 85 95 statistics performance, based on math aptitude
3 80 70 scores?

4 70 65 2. If a student scored 80 in the aptitude test, what grade


would we expect from him/her in statistics?
5 60 70

The xi column shows scores on the aptitude test. Similarly, the


yi column shows statistics grades.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 72
LINEAR REGRESSION WITH EXAMPLE (CONTD.)
Student xi yi (xi-x’) (yi-y’)
For n data points (xi , yi), the regression line is represented
1 95 85 17 8 as:
(yi-y’)
2 85 95 7 18 ŷ = b0 + b1x ; (ŷ means predicted value)
where b0 and b1 represents regression coefficient,
3 80 70 2 -7
are calculated as-
4 70 65 -8 -12
b1 = Σ [ (xi - x’)(yi - y’) ] / Σ [ (xi - x’)2]
5 60 70 -18 -7 b0 = y’ - b1 * x’

Sum 390 385 xi and yi are individual data point x’ and y’ are mean
value.
Mean x’ = 78 y’ = 77
• The last two columns show deviations scores -
The last two rows show sums and mean scores that the difference between the student's score and the
we will use to conduct the regression analysis. average score on each test.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 73
LINEAR REGRESSION WITH EXAMPLE (CONTD.)
Student xi yi (xi-x’) (yi-y’) (xi-x’)2 After putting the values from the
1 95 85 17 8 289 table in the equations, we get,
2 85 95 7 18 49
3 80 70 2 -7 4 b1 = 470/730 = 0.644
4 70 65 -8 -12 64
5 60 70 -18 -7 324 b0 = y’ - b1 * x’
Sum 390 385 730
 b0 = 77 - (0.644)(78)
Mean X’=78 Y’=77
 b0 = 26.768

b1 = Σ [ (xi - x’)(yi - y’) ] / Σ [ (xi - x’)2] Therefore, the regression equation


is:
b0 = y’ - b1 * x’
ŷ = 26.768 + 0.644x .

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA
The Answers

1. What linear regression equation best predicts statistics performance, based on math aptitude
scores?

Ans. ŷ = 26.768 + 0.644x .

2. If a student scored 80 in the aptitude test, what grade would we expect from him/her in
statistics?
Ans. ŷ (when x=80) = 78.288

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 75
Practice Regression Exercise No. 1

The values of y and their corresponding values of x are shown in the table below

a)Find the least square regression line (best fit line)


b)Estimate the value of y when x = 10.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 76
Practice Regression Exercise No. 2

The company wants to do the advertisement of


$200 in the year 2023 and wants to know the
prediction about the sales for this year.

What is the value of sale prediction when


investment cost is 200 Dollar ?

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 77
ERROR CALCULATION (PERFORMANCE MEASUREMENT)

• Absolute Error and Mean Absolute Error

• Mean Squared Error

• Root Mean Squared Error

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 78
COSTFUNCTION
• Cost function is the calculation of error between
predicted values and actual values.

• It tells you how wrong the model is in finding a relation


between the input and output.

• It describes how badly your model is


behaving/predicting.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 79
LINEAR
REGRESSION
USING GRADIENT
DESCENT

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 80
LINEAR REGRESSION USING GRADIENT
DESCENT (CONTD.)
▪ Linear regression is a linear approach to
modelling the relationship between a
dependent variable and one or more
independent variables.

▪ Let X be the independent variable and Y be


the dependent variable. We will define a
linear relationship between these two
variables as follows:

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 81
LINEAR REGRESSION USING GRADIENT
DESCENT (CONTD.)
▪ Loss Function

• The loss is the error in our predicted value of m and c.


• Our goal is to minimize this error to obtain the most accurate value
of m and c.
• We will use the Mean Squared Error function to calculate the loss.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 82
LINEAR REGRESSION USING GRADIENT
DESCENT (CONTD.)
▪ Steps –

1. Find the difference between the actual y and predicted y value


(y = mx + c), for a given x.
2. Square this difference.
3. Find the mean of the squares for every value in X.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 83
LINEAR REGRESSION USING GRADIENT
DESCENT (CONTD.)
▪ Steps –

1. Find the difference between the actual y and predicted y value


(y = mx + c), for a given x.
2. Square this difference.
3. Find the mean of the squares for every value in X.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 84
LINEAR REGRESSION USING GRADIENT
DESCENT (CONTD.)
Here yᵢ is the actual value and ȳᵢ is the predicted value.

Let’s substitute the value of ȳᵢ:

So, we square the error and find the mean, hence the name Mean Squared
Error.
Now that we have defined the loss function, lets try minimizing it and
finding m and c.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 85
THE GRADIENT DESCENT
ALGORITHM
Gradient descent is an
iterative optimization
algorithm to find the
minimum of a function.

Here that function is our


Loss Function.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 86
THE GRADIENT DESCENT ALGORITHM
1. Initially let m = 0 and c = 0. Let L be our learning rate.
This controls how much the value of m changes with each step.
L could be a small value like 0.0001 for good accuracy.

2. Calculate the partial derivative of the loss function with respect to m, and plug
in the current values of x, y, m and c in it to obtain the derivative value D.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 87
LINEAR REGRESSION USING GRADIENT
DESCENT (CONTD.)
Dₘ is the value of the partial derivative with respect to m. Similarly let’s find the
partial derivative with respect to c, Dc :

3. Now we update the current value of m and c using the following equation:

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 88
LINEAR REGRESSION USING GRADIENT
DESCENT (CONTD.)
4. We repeat this process until our loss function is a very small value or ideally 0
(which means 0 error or 100% accuracy).

The value of m and c that we are left with now will be the optimum values.

Gradient descent is one of the simplest and widely used algorithms in machine
learning

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 89
MULTIPLELINEAR REGRESSION
If there is only one input variable ( x ) , then s u c h linear regression is called
simple linear regression. A nd if there is more than one input variable, then
s u c h linear regression is called multiple linear regression.

House Price
Prediction

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 90
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 91
POLYNOMIAL
REGRESSION
▪ If our data points clearly
do not fit a linear
regression (a straight line
through all data points), it
might be ideal for
polynomial regression.

▪ Polynomial regression, like


linear regression, uses the
relationship between the
variables x and y to find
the best way to draw a line
through the data points.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 92
POLYNOMIAL REGRESSION (CONTD.)
▪ Polynomial Regression is a regression algorithm that models the relationship
between a dependent(y) and independent variable(x) as nth degree
polynomial. The Polynomial Regression equation is given below:

y= b0+b1x1+ b2x12+ b2x13+...... bnx1n

▪ It is also called the special case of Multiple Linear Regression in ML.


Because we add some polynomial terms to the Multiple Linear regression
equation to convert it into Polynomial Regression.

▪ It is a linear model with some modification in order to increase the accuracy.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 93
POLYNOMIAL REGRESSION (CONTD.)
▪ The dataset used in Polynomial regression for training is of non-linear
nature.

▪ It makes use of a linear regression model to fit the complicated and non-
linear functions and datasets.

▪ Hence, in Polynomial regression, the original features are converted into


Polynomial features of required degree (2, 3, .., n) and then modeled using
a linear model.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 94
NEED FOR POLYNOMIAL REGRESSION
▪ If we apply a linear model on a linear dataset, then it provides us a good
result as we have seen in Simple Linear Regression, but if we apply the
same model without any modification on a non-linear dataset, then it will
produce a drastic output. Due to which loss function will increase, the error
rate will be high, and accuracy will be decreased.

▪ So, for such cases, where data points are arranged in a non-linear fashion,
we need the Polynomial Regression model.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 95
NEED FOR
POLYNOMIAL
REGRESSION (CONTD.)

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 96
REGRESSION EQUATIONS –
SIMPLE LINEAR VS MULTIPLE LINEAR VS POLYNOMIAL

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 97
LOGISTIC REGRESSION
▪ Logistic regression is one of the most popular Machine Learning algorithms,
which comes under the Supervised Learning technique. It is used for
predicting the categorical dependent variable using a given set of
independent variables.

▪ Logistic regression predicts the output of a categorical dependent variable.


Therefore, the outcome must be a categorical or discrete value. It can be
either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact
value as 0 and 1, it gives the probabilistic values which lie between 0 and
1.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 98
LOGISTIC REGRESSION VS LINEAR REGRESSION (CONTD.)

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 99
LOGISTIC REGRESSION (CONTD.)
▪ Logistic Regression is much similar to the Linear Regression except that how
they are used. Linear Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the classification problems.

▪ In Logistic regression, instead of fitting a regression line, we fit an "S"


shaped logistic function, which predicts two maximum values (0 or 1).

▪ The curve from the logistic function indicates the likelihood of something such
as whether the cells are cancerous or not, a mouse is obese or not based on
its weight, etc.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 100
LOGISTIC
REGRESSION
(CONTD.)

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 101
LOGISTIC REGRESSION (CONTD.)
▪ Logistic Regression is a significant machine learning algorithm because it
has the ability to provide probabilities and classify new data using
continuous and discrete datasets.

▪ Logistic Regression can be used to classify the observations using different


types of data and can easily determine the most effective variables used
for the classification.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 102
LOGISTIC FUNCTION (SIGMOID FUNCTION)
▪ The sigmoid function is a mathematical function used to map the predicted
values to probabilities.

▪ It maps any real value into another value within a range of 0 and 1.

▪ The value of the logistic regression must be between 0 and 1, which cannot
go beyond this limit, so it forms a curve like the "S" form. The S-form curve
is called the Sigmoid function or the logistic function.

▪ In logistic regression, we use the concept of the threshold value, which


defines the probability of either 0 or 1. Such as values above the threshold
value tends to 1, and a value below the threshold values tends to 0.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 103
Logistic regression uses the
concept of predictive
modeling as regression;
therefore, it is called
logistic regression, but is
used to classify samples;
Therefore, it falls under the
classification algorithm.

SIGMOID FUNCTION (CONTD.)


BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 104
LOGISTIC REGRESSION EQUATION
▪ The Logistic regression equation can be obtained from the Linear Regression
equation.

▪ We know the equation of the straight line can be written as:


y = b0 + b1x1 + b2x2 + b3x3 + … + bnxn

▪ In Logistic Regression y can be between 0 and 1 only, so for this let's divide
the above equation by (1-y):

y
; 0 for y = 0, infinity for y = 1
1−y

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 105
LOGISTIC REGRESSION EQUATION (CONTD.)
▪ But we need range between -[infinity] to +[infinity], so we take logarithm of
the equation, and it will become:

y
log = b0 + b1x1 + b2x2 + b3x3 + … + bnxn
1−y

The above equation is the final equation for Logistic Regression.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 106
TYPES OF LOGISTIC REGRESSION
On the basis of the categories, Logistic Regression can be classified into three
types:

▪ Binomial: In binomial Logistic regression, there can be only two possible


types of the dependent variables, such as 0 or 1, Pass or Fail, etc.

▪ Multinomial: In multinomial Logistic regression, there can be 3 or more


possible unordered types of the dependent variable, such as "cat", "dogs",
or "sheep“

▪ Ordinal: In ordinal Logistic regression, there can be 3 or more possible


ordered types of dependent variables, such as "low", "Medium", or "High".

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 107
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 108
K NEAREST NEIGHBOR

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 109
K-NEAREST NEIGHBOR(KNN) ALGORITHM
▪ K-Nearest Neighbor is one of the simplest Machine Learning algorithms
based on Supervised Learning technique.

▪ K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar
to the available categories.

▪ K-NN algorithm stores all the available data and classifies a new data
point based on the similarity. This means when new data appears then it
can be easily classified into a well suite category by using K- NN algorithm.

▪ K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 110
K-NEAREST NEIGHBOR(KNN) ALGORITHM (CONTD.)
▪ K-Nearest Neighbor is one of the simplest Machine Learning algorithms It is
also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.

▪ KNN algorithm at the training phase just stores the dataset and when it gets
new data, then it classifies that data into a category that is much similar to
the new data.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 111
K-NEAREST NEIGHBOR(KNN) ALGORITHM (CONTD.)
▪ Suppose there are two categories, i.e., Category A and Category B, and we have a new data point
x1, so this data point will lie in which of these categories.

▪ To solve this type of problem, we need a K-NN algorithm.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 112
K-NEAREST NEIGHBOR(KNN) ALGORITHM (CONTD.)
• Step-1: Select the number K of the neighbors

• Step-2: Calculate the Euclidean distance of K number of neighbors

• Step-3: Take the K nearest neighbors as per the calculated Euclidean distance

• Step-4: Among these k neighbors, count the number of the data points in each
category

• Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum

• Step-6: Our model is ready.


BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 113
K-NEAREST NEIGHBOR(KNN) ALGORITHM (CONTD.)
• Firstly, we will choose the number of neighbors, so we will choose the k=5

• Next, we will calculate the Euclidean distance between the data points.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 114
K-NEAREST NEIGHBOR(KNN) ALGORITHM (CONTD.)

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 115
K-NEAREST NEIGHBOR(KNN) ALGORITHM (CONTD.)
• By calculating the Euclidean distance we got the nearest neighbors, as three nearest
neighbors in category A and two nearest neighbors in category B.

• As we can see the 3 nearest neighbors are from category A, hence this new data
point must belong to category A.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 116
HOW TO SELECT THE VALUE OF K IN THE K-NN ALGORITHM?
• There is no particular way to determine the best value for "K", so we need to try some
values to find the best out of them. The most preferred value for K is 5.

• A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of
outliers in the model.

• Large values for K are good, but it may find some difficulties in terms of time and
resource consumptions.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 117
ADVANTAGES OF KNN ALGORITHM:
• It is simple to implement

• It is robust to the noisy training data

• It can be more effective if the training data is large

DISADVANTAGESOF KNN ALGORITHM:


• Always needs to determine the value of K which may be complex some time.

• The computation cost is high because of calculating the distance between the data
points for all the training samples.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 118
NUMERICAL EXAMPLE 1
• Find to which class the new data point belongs for the following, using KNN:

Height (cm) Weight (kg) Class


167 51 Underweight
182 62 Normal
176 69 Normal
173 64 Normal
172 65 Normal
174 56 Underweight
169 58 Normal
173 57 Normal
170 55 Normal
170 57 ?

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 119
NUMERICAL EXAMPLE 1 (CONTD.)

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 120
NUMERICAL EXAMPLE 1 (CONTD.)
Height (cm) Weight (kg) Class
167 51 Underweight
182 62 Normal
176 69 Normal
• Distance formula – Euclidean Distance
173 64 Normal
172 65 Normal
174 56 Underweight
d= 𝑥2 − 𝑥1 2 + 𝑦2 − 𝑦1 2

169 58 Normal
173 57 Normal
170 55 Normal
170 57 ?

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 121
NUMERICAL EXAMPLE 1 (CONTD.)
• Calculate the distance between the new point and all the existing data points, one by one.
Height (cm) Weight (kg) Class Distance
167 51 Underweight 170 − 167 2 + 57 − 51 2

= 6.7
182 62 Normal 170 − 182 2 + 57 − 62 2

= 13
176 69 Normal :
173 64 Normal :
172 65 Normal :
174 56 Underweight :
169 58 Normal :
173 57 Normal :
170 55 Normal :
170 57 ?

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 122
NUMERICAL EXAMPLE 1 (CONTD.)
• We get the following:
Height (cm) Weight (kg) Class Distance
167 51 Underweight 6.7
182 62 Normal 13
176 69 Normal 13.4
173 64 Normal 7.6
172 65 Normal 8.2
174 56 Underweight 4.1
169 58 Normal 1.4
173 57 Normal 3
170 55 Normal 2
170 57 ?

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 123
NUMERICAL EXAMPLE 1 (CONTD.)
• Arrange the data according to the ascending order of distance and rank them.

Height (cm) Weight (kg) Class Distance Rank

169 58 Normal 1.4 1


170 55 Normal 2 2
173 57 Normal 3 3

174 56 Underweight 4.1 4

167 51 Underweight 6.7 5

173 64 Normal 7.6 6


172 65 Normal 8.2 7
182 62 Normal 13 8
176 69 Normal 13.4 9
170 57 ?

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 124
NUMERICAL EXAMPLE 1 (CONTD.)
• Find the class for the new data, according to the value of k

If k = 1, class = Normal
Height (cm) Weight (kg) Class Distance Rank
(consider one nearest neighbor)
169 58 Normal 1.4 1
If k = 2, class = Normal
170 55 Normal 2 2
(consider two nearest neighbors)
173 57 Normal 3 3
174 56 Underweight 4.1 4
If k = 5, class = Normal
167 51 Underweight 6.7 5 (consider five nearest neighbors)
173 64 Normal 7.6 6
172 65 Normal 8.2 7
182 62 Normal 13 8
176 69 Normal 13.4 9
170 57 ?

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 125
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 126
Data Preprocessing

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 127
Data Preprocessing …
Data preprocessing is an integral step in Machine Learning as the
quality of data and the useful information that can be derived from it
directly affects the ability of the model to learn

The basic concepts used for data preprocessing are:


• Handling Null Values (Imputation)
• Standardization
• Handling Categorical Variables
• One-Hot Encoding
• Outlier Management

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 128
HANDLING MISSING (NULL) VALUES
▪ In real world data, there are some instances where a particular
element is absent because of various reasons, such as, corrupt data,
failure to load the information, or incomplete extraction.

▪ Making the right decision on how to handle missing data generates


robust data models.

▪ The missing values are often encoded as NaNs, or blanks

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 129
• Imputation (Handling Missing Values)
o Imputation is a technique used for replacing the missing data with some substitute
value to retain most of the data/information of the dataset. Some of the techniques use
mean, median and mode to substitute the value.
o Imputation is important because removing the data (using dropna()) from the dataset
every time is not feasible and can lead to a reduction in the size of the dataset to a
large extent, which not only raises concerns for biasing the dataset but also leads to
incorrect analysis.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 130
You're cleaning up a DataFrame with ~1000 observations. You notice that one
categorical column contains 512 missing values. What strategy should you
employ to deal with these missing values?

A. Drop all rows with missing values

B. Drop the column entirely

C. Replace all missing values with the column mean

D. Replace all missing values with randomly sampled values from this column

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 131
HANDLING MISSING (NULL) VALUES …
▪Training a model with a dataset that has a lot of missing values can
drastically impact the machine learning model’s quality.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 132
HANDLING MISSING (NULL) VALUES …
In the variable or any observation, values are not stored called as missing
values/data

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 133
HANDLING MISSING (NULL)
VALUES …

Different ways of handling the missing values 134


BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA
SOMEADVANCEDATA PREPROCESSINGTECHNIQUES

Encoding Categorical Data

• Categorical data can be considered as gathered information


that is divided into groups. Or Categorical data refers to a
data type that can be stored and identified based on the
names or labels given to them.

• For example, a list of many people with their blood group:


A+, A-, B+, B-, AB+, AB-,O+, O- etc. in which each of the
blood types is a categorical value.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 135
SOMEADVANCEDATA PREPROCESSINGTECHNIQUES
Encoding Categorical Data
Nominal data: This type of categorical data consists of
the name variable without any numerical values.

For example, in any organization, the name of the


different departments like research and development
department, human resource department, accounts and
billing department etc.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 136
SOMEADVANCEDATA PREPROCESSINGTECHNIQUES

Encoding Categorical Data

Ordinal data: This type of categorical data


consists of a set of orders or scales.

For example, a list of patients consists of the level


of sugar present in the body of a person which
can be divided into high, low and medium
classes.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 137
Some Advance Data
Preprocessing Techniques
Encoding Categorical Data

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 138
SOMEADVANCEDATA PREPROCESSINGTECHNIQUES

Encoding Categorical Data

There are two steps to convert Categorical data to


Numerical data:

✓ Integer Encoding or Labeled Encoding or Ordinal


Encoding

✓ One-Hot Encoding

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 139
SOMEADVANCEDATA PREPROCESSINGTECHNIQUES

• Labeled/Ordinal/Integer Encoding Data

This type of encoding is used when the variables in


the data are ordinal, ordinal encoding converts each
label into integer values and the encoded data
represents the sequence of labels.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 140
SOMEADVANCEDATA PREPROCESSINGTECHNIQUES

One Hot Encoding

▪ One hot encoding is a representation of


categorical variables as binary vectors.

▪ Each value is represented as a binary vector that is


all zero values except the index of the value,
which is marked with a 1.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 141
SOMEADVANCEDATA PREPROCESSINGTECHNIQUES

One Hot Encoding

• It refers to splitting the column which contains


numerical categorical data to many columns
depending on the number of categories present in
that column.

• Each column contains “0” or “1” corresponding to


which column it has been placed.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 142
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 143
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 144
WHY FEATURE SELECTION?
▪ High-dimensional data often contain irrelevant or
redundant features
✓ Reduce the accuracy of machine learning algorithms
✓ Slow down the learning process
✓ Be a problem in storage and retrieval
✓ Hard to interpret

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 145
FEATURE SELECTION
Thousands to millions of low level features: select
the most relevant one to build better, faster, and
easier to understand learning machines.
n

n’

m X

Feature Selection is a process that chooses an optimal subset


of features according to a certain criterion.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 146
WHY FEATURE SELECTION?
Why we need Feature Selection:
•It enables the machine learning algorithm to train
faster.
•It reduces the complexity of a model and makes it
easier to interpret.
•It improves the accuracy of a model if the right
subset is chosen.
•It reduces overfitting.
•To visualize the data for model selection.
•To reduce dimensionality and remove noise.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 147
FEATURE SELECTION VS
DIMENSIONALITY REDUCTION
▪ Feature Selection
• When classifying novel patterns, only a small number of
features need to be computed (i.e., faster classification).
• New features is just a subset of the original features.

▪ Dimensionality Reduction
• When classifying novel patterns, all features need to be
computed.
• New features are combinations (linear for PCA/*LDA) of
the original features (difficult to interpret).

*LDA: Linear discriminant analysis is used to find a linear combination of


features that characterizes or separates two or more classes (or levels) of a
categorical variable.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 148
WHY DIMENSIONALITY REDUCTION?

▪ Most machine learning techniques may not be effective


for high-dimensional data
▪ Curse of Dimensionality: It refers to an
exponential increase in the size of data caused by
a large number of dimensions.
▪ Query accuracy and efficiency degrade rapidly as
the dimension increases.
▪ The essential dimension may be small.
▪ For example, the number of genes responsible for a
certain type of disease may be small.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 149
APPLICATIONS OF DIMENSIONALITY REDUCTION
▪ Customer relationship management
▪ Text mining
▪ Image retrieval
▪ Microarray data analysis
▪ Protein classification
▪ Face recognition
▪ Handwritten digit recognition
▪ Intrusion detection
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 150
APPLICATION OF DIMENSIONALITY
REDUCTION …

Document Classification
Terms
Web Pages
Emails T1 T2 ….…… TN C
D1 12 0 ….…… 6 Sports
D2 3 10 ….…… 28 Travel
Documents



DM 0 11 ….…… 16 Jobs

Internet • Task: To classify unlabeled


documents into categories
ACM Portal IEEE Xplore PubMed
• Challenge: thousands of terms
• Solution: to apply
Digital Libraries dimensionality reduction
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 151
PROJECTION OF HIGH DIMENSION TO LOW
DIMENSION

Reduce data from 3D to 2D

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 152
DIMENSION REDUCTION
TECHNIQUES
The two popular and well-known dimension reduction techniques are-

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 153
PRINCIPAL COMPONENT ANALYSIS (PCA)
Principal components analysis (PCA) is a dimensionality reduction technique that
enables you to identify correlations and patterns in a data set so that it can be
transformed into a data set of significantly lower dimension without loss of any
important information.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 154
STEP BY STEP COMPUTATION OF PCA

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 155
STEP 1: STANDARDIZATION OF THE DATA
Standardization is all about scaling your data in such a way that all the variables
and their values lie within a similar range.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 156
STEP 2: COMPUTING THE COVARIANCE MATRIX
A covariance matrix expresses the correlation between the different variables in the
data set. It is essential to identify heavily dependent variables because they contain
biased and redundant information which reduces the overall performance of the
model.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 157
STEP 3: CALCULATING THE EIGENVECTORS AND
EIGENVALUES
Eigenvectors and eigenvalues are the mathematical constructs that must be computed
from the covariance matrix in order to determine the principal components of the
data set.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 158
STEP 4: COMPUTING THE PRINCIPAL
COMPONENTS
Once we have computed the Eigenvectors and eigenvalues, all we have
to do is order them in the descending order, where the eigenvector with
the highest eigenvalue is the most significant and thus forms the first
principal component.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 159
STEP 5: REDUCING THE DIMENSIONS OF THE DATA
SET
The last step is performing PCA is to re-arrange the original data with the final
principal components which represent the maximum and the most significant
information of the data set.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 160
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS
Given the data in the following table, compute the Eigen vectors using Principal
Component Analysis (PCA) algorithm.

Feature Example 1 Example 2 Example 3 Example 4

X1 4 8 13 7

X2 11 4 5 14

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 161
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
Step 1: Calculate Mean
Calculate the mean of X1 and X2 as shown below.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 162
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
Step 2: Calculation of the covariance matrix.
The covariances are calculated as follows:

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 163
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
Step 2 contd.: Calculation of the covariance matrix.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 164
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
The covariance matrix is,

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 165
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
Step 3: Eigenvalues of the covariance matrix
The characteristic equation of the covariance matrix is,

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 166
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
Solving the characteristic equation we get,

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 167
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
Step 4: Computation of the eigenvectors
To find the first principal components, we need to only compute the eigenvector
corresponding to the largest eigenvalue. In the present example, the largest
eigenvalue is λ1 and so we compute the eigenvector corresponding to λ1.
The eigenvector corresponding to λ = λ1 is a vector

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 168
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
satisfying the following equation:

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 169
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
This is equivalent to the following two equations:

Using the theory of systems of linear equations, we note that these equations are
not independent and solutions are given by,

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 170
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
that is,

where t is any real number.


Taking t = 1, we get an eigenvector corresponding to λ as 1

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 171
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
To find a unit eigenvector, we compute the length of X1 which is given by,

Therefore, a unit eigenvector corresponding to λ1 is

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 172
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
By carrying out similar computations, the unit eigenvector e corresponding to 2

the eigenvalue λ= λ can be shown to be, 2

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 173
BAYES THEOREM –PREREQUISITES
While studying the Bayes theorem, we need to understand few important
concepts. These are as follows:

1. Experiment
An experiment is defined as the planned operation carried out under
controlled condition such as tossing a coin, drawing a card and rolling a dice,
etc.

2. Sample Space
During an experiment what we get as a result is called as possible outcomes
and the set of all possible outcome of an event is known as sample space.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 174
BAYES THEOREM –PREREQUISITES (CONTD.)
For example, if we are rolling a dice, sample space will be:

S1 = {1, 2, 3, 4, 5, 6}

Similarly, if our experiment is related to toss a coin and recording its outcomes,
then sample space will be:

S2 = {Head, Tail}

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 175
BAYES THEOREM –PREREQUISITES (CONTD.)
3. Event

Event is defined as subset of sample space in an experiment. Further, it is also


called as set of outcomes.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 176
BAYES THEOREM –PREREQUISITES (CONTD.)
4. Independent Events:
Two events are said to be independent when occurrence of one event does not
affect the occurrence of another event. In simple words we can say that the
probability of outcome of both events does not depend on one another.
Mathematically, two events A and B are said to be independent if:

P(A ∩ B) = P(AB) = P(A)*P(B)

5. Conditional Probability:
Conditional probability is defined as the probability of an event A, given that
another event B has already occurred (i.e. A conditional B). This is represented
by P(A|B) and we can define it as:
P(A|B) = P(A ∩ B) / P(B)
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 177
BAYES THEOREM –CONDITIONAL PROBABILITY
▪ Famous mathematician Thomas Bayes gave this theorem to solve the
problem of finding reverse probability by using conditional probability.

▪ The theorem is stated as follows:

If E1, E2, E3, …, En are non-empty events which form a partition of the sample
space S,
that is, E1, E2, E3, …, En are pairwise disjoint and E1U E2U E3U …U En = S.
If A is any event of non-zero probability that occurs with some Ei; (i = 1, 2, 3,
…, n), then

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 178
BAYES THEOREM –CONDITIONAL PROBABILITY (CONTD.)
▪ Bayes’ theorem for two events is given as:

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 179
BAYES THEOREM –NUMERICAL PROBLEMS
Q1. It is observed that 50% of mails are spam. There is a software that filters spam
mail before reaching the inbox. It accuracy for detecting a spam mail is 99% and
chances of tagging a non-spam mail as spam mail is 5%. If a certain mail is tagged
as spam find the probability that it is not a spam mail.

Solution:

Let E1 = event of spam mail


E2 = event of non-spam mail
A = event of detecting a spam mail

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 180
BAYES THEOREM –NUMERICAL PROBLEMS
Solution (contd.):

Now,
P(E1) = 0.5 and P(E2) = 0.5
P(A|E1) = 0.99 and P(A|E2) = 0.05

Then,
P(A|E2) P(E2)
P(E2|A) =
(A|E1)P E1 +P(A|E2) P(E2)

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 181
BAYES THEOREM –NUMERICAL PROBLEMS
Q2. Three urns are there containing white and black balls; first urn has 3 white and 2
black balls, second urn has 2 white and 3 black balls and third urn has 4 white and 1
black balls. Without any biasing one urn is chosen from that one ball is chosen
randomly which was white. What is probability that it came from the third urn?

Solution:

Let E1 = event that the ball is chosen from first urn


E2 = event that the ball is chosen from second urn
E3 = event that the ball is chosen from third urn
A = event that the chosen ball is white

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 182
BAYES THEOREM –NUMERICAL PROBLEMS
Solution (contd.):

Now,
P(E1) = P(E2) = P(E3) = 1/3

P(A|E1) = 3/5, P(A|E2) = 2/5, P(A|E3) = 4/5

Then,
P(A|E3) P(E3)
P(E3|A) =
(A|E1)P E1 +P(A|E2) P(E2)+P(A|E3) P(E3)

= 4/9
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 183
BAYES THEOREM –NUMERICAL PROBLEMS –TRY YOURSELF!
Q3. A card is lost from a pack of 52 cards. From the remaining cards two are drawn
randomly and found to be both clubs. Find the probability that the lost card is also a
clubs.

Q4. A insurance company has insured 4000 doctors, 8000 teachers and 12000
businessmen. The chances of a doctor, teacher and businessman dying before the age
of 58 is 0.01, 0.03 and 0.05, respectively. If one of the insured people dies before
58, find the probability that he is a doctor.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 184
BAYES THEOREM –NUMERICAL PROBLEMS –TRY YOURSELF!
Q5. An unbiased dice is rolled and for each number on the dice a bag is chosen:

Numbers on the Dice Bag choosen


1 Bag A
2 or 3 Bag B
4 or 5 or 6 Bag C

Bag A contains 3 white ball and 2 black ball, bag B contains 3 white ball and 4
black ball and bag C contains 4 white ball and 5 black ball. Dice is rolled and bag
is chosen, if a white ball is chosen find the probability that it is chosen from bag B.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 185
EVALUATING A CLASSIFICATION MODEL
For evaluating a Classification model, we have the
following ways:

1. Log Loss or Cross-Entropy Loss

2. Confusion Matrix

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 186
1. LOG LOSS OR CROSS-ENTROPY LOSS
• It is used for evaluating the performance of a classifier, whose output is
a probability value between the 0 and 1.

• For a good binary Classification model, the value of log loss should be
near to 0.

• The value of log loss increases if the predicted value deviates from the
actual value.

• The lower log loss represents the higher accuracy of the model.

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 187
2. CONFUSION MATRIX
• Confusion Matrix (Error Matrix) is used to measure the performance of the
classification model.

• The number of correct and incorrect predictions are summarized with count
values and broken down by each class. This is represented by confusion
matrix.

• It is represented in N x N matrix form where N is the number of target


classes. The matrix compares the actual target values with those predicted
by the proposed machine learning model. This gives us a holistic view of
how well our classification model is performing and what kinds of errors it is
making.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 188
CONFUSION MATRIX (CONTD.) • True Positive (TP) - The actual value was positive and the
The matrix looks like as below table model predicted a positive value

• True Negative (TN) - The actual value was negative and the
model predicted a negative value

• False Positive (FP) – (Type 1 error) The predicted value was


falsely predicted, the actual value was negative but the
model predicted a positive value. Also known as the Type 1
error

• False Negative (FN) – (Type 2 error) The predicted value


was falsely predicted, the actual value was positive but the
model predicted a negative value. Also known as the Type 2
error
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 189
CONFUSION MATRIX (CONTD.)
The matrix looks like as below table

True Positive (TP)


False Positive (FP) – Type 1 error
(The actual value was positive and the model (The predicted value was falsely predicted)
predicted a positive value)

True Negative (TN)


False Negative (FN) (Type 2 error)
The actual value was negative and the model also
The predicted value was falsely predicted.
predicted a negative value

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 190
CONFUSION MATRIX (CONTD.)
The matrix looks like as below table

191
CONFUSION MATRIX (CONTD.)

Precision: Also called positive predictive value, is the fraction of


relevant instances among the retrieved (predicted) instances.
Or

Precision • The proportion of positive cases that were correctly identified or


What percent of your predictions were correct?
(This would determine whether our model is reliable or not)

𝑇𝑃 (𝐴𝑐𝑡𝑢𝑎𝑙 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝐼𝑛𝑠𝑡𝑎𝑛𝑐𝑒𝑠)


Precision =
𝑇𝑃+𝐹𝑃 (𝑇𝑜𝑡𝑎𝑙 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠)
(It tells how many retrieved (predicted) items are relevant)

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 192
Example
Consider a Machine Learning Model for recognizing dogs (the relevant element) in a
digital photograph. It contains ten cats and twelve dogs.

After processing the Machine Learning Model, it identifies eight dogs. Of the eight
elements identified as dogs, only five actually are dogs (true positives or relevant
instances), then what is Precision?

Precision
Precision: is the fraction
of relevant instances
among the retrieved
instances.
Total Retrieved (Predicted)
Elements

AD AC
𝟓
𝑇𝑃 (𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝐼𝑛𝑠𝑡𝑎𝑛𝑐𝑒𝑠)
Precision = = 𝑇𝑃+𝐹𝑃 PD 5 3
𝟖
(𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠)
PC 7 7

Total 12 10
Note- Precision TALKS about VALIDITY of the Model
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 193
QUIZ

When you type a particular Queries on Google


Search Engine, it returns 30 pages in total, in which
only 20 of pages are relevant, then what is the
precision of the model?

1. 3/2

2. 2/3

3. 6/9

4. B & C are Correct

Answer- 4 194
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA
CONFUSION MATRIX (CONTD.)

Recall: recall (also known as sensitivity) is the fraction of relevant


instances that were retrieved
Or

• It tells us how many of the actual positive cases we were able to


predict correctly with our model. (Also known as True Positive Rate)
or
• What percent of the positive cases did you catch?
(The proportion of actual positive cases which are correctly identified)

𝑇𝑃 (𝐴𝑐𝑡𝑢𝑎𝑙 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝐼𝑛𝑠𝑡𝑎𝑛𝑐𝑒𝑠)


Recall =
Recall or Sensitivity 𝑇𝑃+𝐹𝑁 (𝑇𝑜𝑡𝑎𝑙 𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠)
(It tells how many relevant items are retrieved)

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 195
QUIZ
When you type a particular Queries on Google
Search Engine, it returns 30 pages in total, in which
only 20 of pages are relevant, and failed to return
40 additional relevant pages, then what is the
Recall of the model?

1. 2/3

20 10 2. 2/7
3
3. 1/3
0
40 4. None of the Above

Answer- 3
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 196
Recall
F1 SCORE
• F1 Score is the Harmonic Mean between precision and recall. The range for F1
Score is [0, 1]. It tells you how precise your classifier is (how many instances it
classifies correctly), as well as how robust it is.

• In Short, it tells What percent of positive predictions were correct?


F1 Score = 2*(Recall * Precision) / (Recall + Precision)

• The greater the F1 Score, the better is the performance of our model.

• F1 should be used to compare classifier models, not global accuracy

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 197
References

▪ https://learnbasictech.blogspot.com/
▪ https://www.javatpoint.com/machine-learning/
▪ https://www.w3schools.com/python/
▪ https://www.analyticsvidhya.com/blog/2021/10/understanding-polynomial-regression-model/
▪ https://www.javatpoint.com/logistic-regression-in-machine-learning
▪ https://www.capitalone.com/tech/machine-learning/what-is-logistic-regression/
▪ https://www.simplilearn.com/tutorials/machine-learning-tutorial/
▪ https://www.geeksforgeeks.org/
▪ https://towardsdatascience.com/introduction-to-logistic-regression-66248243c148

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 198
References

▪ https://www.kaggle.com/code/prashant111/knn-classifier-tutorial
▪ https://www.youtube.com/watch?v=HZT0lxD5h6k
▪ https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ https://www.listendata.com/2017/12/k-nearest-neighbor-step-by-step-tutorial.html
▪ https://people.revoledu.com/kardi/tutorial/KNN/KNN_Numerical-example.html
▪ https://www.cuemath.com/data/bayes-theorem/
▪ https://byjus.com/maths/bayes-theorem-questions/
▪ https://www.simplilearn.com/tutorials/statistics-tutorial/bayes-theorem
▪ https://www.statisticshowto.com/probability-and-statistics/probability-main-index/bayes-theorem-
problems/
▪ https://www.vedantu.com/formula/bayes-theorem-formula

BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 199
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 200

You might also like