BCSE 0105 - Machine Learning - Module 1 - Complete - NC
BCSE 0105 - Machine Learning - Module 1 - Complete - NC
COURSE OBJECTIVE
✓ To introduce students to the basic concepts and techniques of Machine Learning
✓ To develop skills of using recent machine learning software for solving practical
problems
✓ To gain experience of doing independent study and research
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 2
WHAT ARE WE GOING TO LEARN
MODULE 1
Introduction: Machine Learning basics, Hypothesis space and inductive bias,
training and test set, cross validation.
Introduction to Statistical Learning: Bayesian Method.
Machine Learning: Supervised (Regression, Classification) vs. Unsupervised
(Clustering) Learning.
Data Preprocessing: Imputation, Outlier management, One hot encoding,
Dimensionality Reduction-feature extraction, Principal Component Analysis (PCA),
Singular Value Decomposition.
Supervised Learning: Regression-Linear regression, Polynomial regression,
Classification- Logistic regression, k-nearest neighbor classifier.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 5
REFERENCE BOOKS
▪ Harrington, P. , Machine learning in action, Shelter Island, NY:
Manning Publications Co, 2012.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 6
OUTCOME
After completion of the course, students will be able to:
✓ CO1: Apply the basic concepts of machine learning.
✓ CO2: Apply the concepts of regression and re-sampling methods.
✓ CO3: Design supervised and re-enforcement learning based solution.
✓ CO4: Apply the ensemble methods for improving classification.
✓ CO5: Identify the ways of feature extraction, reduction and selection.
✓ CO6: Design the applications of machine learning algorithm
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 7
WHAT IS MACHINE LEARNING?
Ref: https://www.youtube.com/watch?v=LzaWrmKL1Z4
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 8
WHAT IS MACHINE LEARNING?
Ref: https://medium.com/@suryasaikrishna97/introduction-to-machine-learning-5faa9b636578
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 9
MACHINE LEARNING -TIMELINE
Ref: https://medium.com/analytics-vidhya/fundamental-omachine-learning-ada28afa1bd3
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 10
WHAT DO THE PIONEERS SAY…
Ref: https://medium.com/@jetnew/a-summary-of-alan-m-turings-computing-machinery-and-intelligence-fd714d187c0b
Ref: https://www.facebook.com/NDLIndia/photos/a.745256605623867/1044605819022276/?type=3
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 11
WHAT DO THE PIONEERS SAY…
In 1959, the term Machine Learning term was coined
by Arthur Samuel
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 12
WHAT DO THE PIONEERS SAY…
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 13
WHAT DO THE PIONEERS SAY…
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 14
WHAT DO THE PIONEERS SAY…
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 15
WHY IS MACHINE LEARNING SO IMPORTANT?
Due to the excessive production of data, we need a Finding hidden patterns a nd extracting
method that c a n be used to structure, analyze a n d key insights from data is the most essential
draw useful insights from data. part of Machine Learning.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 16
APPLICATIONS
Ref: https://www.javatpoint.com/applications-of-machine-learning
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 17
APPLICATIONS (CONTD.)
Ref: https://www.edureka.co/blog/machine-learning-applications/
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 18
APPLICATIONS (CONTD.)
Products Recommendations
Ref: https://www.edureka.co/blog/machine-learning-applications/
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 19
APPLICATIONS (CONTD.)
Traffic Predictions
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 20
APPLICATIONS (CONTD.)
Fraud detection
Ref: Suryanarayana, S. Venkata, G. N. Balaji, and G. Venkateswara Rao. "Machine Learning Approaches for Credit Card Fraud Detection." Int. J. Eng. Technol 7.2 (2018): 917-920.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 21
APPLICATIONS (CONTD.)
Online Video Streaming Recommendation
https://pub.towardsai.net/recommendation-system-in-depth-tutorial-with-python-for-netflix-using-collaborative-filtering-533ff8a0e444
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 22
APPLICATIONS (CONTD.)
Stock Market Analysis
https://medium.com/vsinghbisen/how-sentiment-analysis-in-stock-market-used-for-right-prediction-5c1bfe64c233
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 23
APPLICATIONS (CONTD.)
Medical Diagnosis
https://medium.com/ai-techsystems/application-of-machine-learning-89a227256f7d
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 24
APPLICATIONS (CONTD.)
Self Driving Cars
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 25
APPLICATIONS (CONTD.)
Spam mail detection Google Translate
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 26
TRADITIONAL PROGRAMMING VS MACHINE LEARNING
https://www.avenga.com/magazine/machine-learning-programming/
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 27
TRADITIONAL PROGRAMMING VS MACHINE LEARNING
(CONTD.)
https://www.avenga.com/magazine/machine-learning-programming/
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 28
MACHINE LEARNING, ARTIFICIAL INTELLIGENCE AND
DEEP LEARNING
https://www.edureka.co/blog/ai-vs-machine-learning-vs-deep-learning/
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 29
MACHINE LEARNING, ARTIFICIAL INTELLIGENCE AND
DEEP LEARNING (CONTD.)
https://www.viatech.com/en/2018/05/history-of-artificial-intelligence/
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 30
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 31
SUPERVISED MACHINE LEARNING
Ref: https://medium.com/@jorgesleonel/supervised-learning-c16823b00c13
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 32
SUPERVISED MACHINE LEARNING (CONTD.)
▪ Supervised learning is the type of machine learning in which the model is
trained using well labelled training data, and on basis of that data, the
model predicts the output.
▪ The labelled data means some input data is already tagged with the
correct output.
▪ In supervised learning, the training data provided to the model work as the
supervisor that teaches the model to predict the output correctly. It applies
the same concept as a student learns in the supervision of the teacher.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 33
SUPERVISED MACHINE LEARNING (CONTD.)
▪ The aim of a supervised learning algorithm is to find a mapping function
to map the input variable(x) with the output variable(y).
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 34
SUPERVISED MACHINE LEARNING (CONTD.)
▪ Once the training process is completed, the model is tested on the basis of
test data (a subset of the training set), and then it predicts the output.
▪ A labelled dataset is one that has both input and output parameters.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 35
SUPERVISED MACHINE LEARNING (CONTD.)
Ref: https://www.javatpoint.com/supervised-machine-learning
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 36
SUPERVISED MACHINE LEARNING (CONTD.)
▪ Suppose we have a dataset of different types of shapes which includes
square, triangle, and hexagon.
o If the given shape has four sides, and all the sides are equal, then it will
be labelled as a square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as
hexagon.
▪ Now, after training, we test our model using the test set, and the task of the
model is to identify the shape.
▪ The machine is already trained on all types of shapes, and when it finds a
new shape, it classifies the shape and predicts the output.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 37
STEPS OF SUPERVISED LEARNING
1. Determine the type of training dataset
3. Split the training dataset into training dataset, test dataset, and validation
dataset
4. Determine the input features of the training dataset, which should have
enough knowledge so that the model can accurately predict the output
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 38
STEPS OF SUPERVISED LEARNING (CONTD.)
5. Determine the suitable algorithm for the model, such as support vector
machine, decision tree, etc.
7. Evaluate the accuracy of the model by providing the test set. If the model
predicts the correct output, which means our model is accurate.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 39
TRAININGANDTESTDATA
Training Set: A subset of dataset to train the
machine learning model and we already know
the output.
70% 30%
OR
80% 20%
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 41
SUPERVISED MACHINE LEARNING (CONTD.)
Ref: https://www.jcchouinard.com/supervised-learning/
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 42
REAL LIFE APPLICATIONSOF SUPERVISED ML
Face Detection
Text Categorization
Spam Categorization
House Price Prediction Stock Price Prediction
43
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA
SUPERVISED MACHINE LEARNING TYPES
Supervised
Learning
Classification Regression
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 44
REGRESSION
• Regression algorithms are used if there is a relationship between the input
variable and the output variable.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 45
REGRESSION (CONTD.)
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 46
REGRESSION (CONTD.)
Ref: https://www.javatpoint.com/regression-analysis-in-machine-learning
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 47
CLASSIFICATION
• Classification algorithms are used when the output variable is categorical, which
means there are two classes such as Yes-No, Male-Female, True-false, Disease-No
disease etc.
• For example, when filtering emails “spam” or “not spam”, when looking at
transaction data, “fraudulent”, or “authorized”.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 48
CLASSIFICATION (CONTD.)
• Classification either predicts categorical class labels or classifies data (construct a
model) based on the training set and the values (class labels) in classifying
attributes and uses it in classifying new data.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 49
UNSUPERVISED MACHINE LEARNING
• Unsupervised learning is a type of machine learning in which models are trained
using unlabeled dataset and are allowed to act on that data without any
supervision.
• The models in unsupervised learning itself find the hidden patterns and insights
from the given data.
• It can be compared to learning which takes place in the human brain while
learning new things.
• The algorithm is never trained upon the given dataset, which means it does not
have any idea about the features of the dataset.
• The task of the unsupervised learning algorithm is to identify the image features
on their own.
• Unsupervised learning algorithm will perform this task by clustering the image
dataset into the groups according to similarities between images.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 51
UNSUPERVISED MACHINE LEARNING ADVANTAGES
• Unsupervised learning is helpful for finding useful insights from the data.
• In real-world, we do not always have input data with the corresponding output so
to solve such cases, we need unsupervised learning.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 52
UNSUPERVISED MACHINE LEARNING (CONTD.)
Ref: https://www.g2.com/articles/supervised-vs-unsupervised-learning
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 53
UNSUPERVISED MACHINE LEARNING –APPLICATIONS
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 54
UNSUPERVISED MACHINE LEARNING –APPLICATIONS
Customer Segmentation Analysis
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 55
UNSUPERVISED MACHINE LEARNING –TYPES
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 56
UNSUPERVISED LEARNING ALGORITHMS
• K-means clustering
• KNN (k-nearest neighbors) clustering
• Hierarchal clustering
• Anomaly detection
• Principle Component Analysis
• Independent Component Analysis
• Apriori algorithm
• Singular value decomposition
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 57
SUPERVISED VS UNSUPERVISED LEARNING
Supervised Learning Unsupervised Learning
Supervised learning algorithms are trained using Unsupervised learning algorithms are trained
labeled data. using unlabeled data.
Supervised learning model takes direct feedback Unsupervised learning model does not take any
to check if it is predicting correct output or not. feedback.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 58
SUPERVISED VS UNSUPERVISED LEARNING (CONTD.)
Supervised Learning Unsupervised Learning
Supervised learning needs supervision to train the Unsupervised learning does not need any
model. supervision to train the model.
Supervised learning can be categorized Unsupervised Learning can be classified
in Classification and Regression problems. in Clustering and Associations problems.
Supervised learning can be used for those cases Unsupervised learning can be used for those cases
where we know the input as well as corresponding where we have only input data and no
outputs. corresponding output data.
Supervised learning model produces an accurate Unsupervised learning model may give less
result. accurate result as compared to supervised
learning.
Ref: https://www.javatpoint.com/difference-between-supervised-and-unsupervised-learning
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 59
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 60
LINEAR REGRESSION
▪ Linear regression is one of the easiest and most popular Machine Learning
algorithms. It is a statistical method that is used for predictive analysis.
▪ Since linear regression shows the linear relationship, which means it finds
how the value of the dependent variable is changing according to the value
of the independent variable.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 61
LINEAR REGRESSION
(CONTD.)
▪ The linear regression model
provides a sloped straight line
representing the relationship
between the variables.
https://www.javatpoint.com/linear-regression-in-machine-learning
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 62
y= a0+a1x
y = b0+b1x
Here,
The values for x and y variables are training datasets for Linear Regression
model representation.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 63
TERMINOLOGIES RELATEDTO REGRESSION
✓ Dependent Variable: The main factor in Regression analysis which we want
to predict or understand is called the dependent variable. It is also called
target variable.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 64
TYPES OF LINEAR REGRESSION
Linear regression can be further divided into two types of the algorithms:
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 65
SIMPLE LINEAR REGRESSION -EXAMPLE
66
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA
SIMPLE LINEAR
REGRESSION - EXAMPLE
Salary
investigating the relationship
between dependent (outcome
(Salary (y)) and independent
(feature or attribute or criteria
or predictor (Experience (x))
variables. Experience
67
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA
SIMPLE LINEAR REGRESSION (CONTD.)
A list of houses with size and price is given.
• Need to find best fit line to predict the price. This best fit line is known as
regression line and represented by a linear equation y = b1*x + b0
In this equation:
y – Dependent Variable (Predicted Price of House)
b1 – Slope
x – Independent variable (Size of House (Predictor))
b0 – Intercept
Slope b1 and Intercept b0 are model coefficient/model parameters/ regression
coefficients.
68
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA
UNDERSTANDING THE BESTFITLINE
Predicated Value Line/ Model
Error
Y= m * X + c Actual Value
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 69
UNDERSTANDING THE BESTFITLINE (CONTD.)
How do we find the line of best fit?
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 70
Statistical Way of Computing Best Fit Line
• A line can be represented by the formula:
Sum 390 385 xi and yi are individual data point x’ and y’ are mean
value.
Mean x’ = 78 y’ = 77
• The last two columns show deviations scores -
The last two rows show sums and mean scores that the difference between the student's score and the
we will use to conduct the regression analysis. average score on each test.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 73
LINEAR REGRESSION WITH EXAMPLE (CONTD.)
Student xi yi (xi-x’) (yi-y’) (xi-x’)2 After putting the values from the
1 95 85 17 8 289 table in the equations, we get,
2 85 95 7 18 49
3 80 70 2 -7 4 b1 = 470/730 = 0.644
4 70 65 -8 -12 64
5 60 70 -18 -7 324 b0 = y’ - b1 * x’
Sum 390 385 730
b0 = 77 - (0.644)(78)
Mean X’=78 Y’=77
b0 = 26.768
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA
The Answers
1. What linear regression equation best predicts statistics performance, based on math aptitude
scores?
2. If a student scored 80 in the aptitude test, what grade would we expect from him/her in
statistics?
Ans. ŷ (when x=80) = 78.288
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 75
Practice Regression Exercise No. 1
The values of y and their corresponding values of x are shown in the table below
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 76
Practice Regression Exercise No. 2
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 77
ERROR CALCULATION (PERFORMANCE MEASUREMENT)
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 78
COSTFUNCTION
• Cost function is the calculation of error between
predicted values and actual values.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 79
LINEAR
REGRESSION
USING GRADIENT
DESCENT
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 80
LINEAR REGRESSION USING GRADIENT
DESCENT (CONTD.)
▪ Linear regression is a linear approach to
modelling the relationship between a
dependent variable and one or more
independent variables.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 81
LINEAR REGRESSION USING GRADIENT
DESCENT (CONTD.)
▪ Loss Function
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 82
LINEAR REGRESSION USING GRADIENT
DESCENT (CONTD.)
▪ Steps –
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 83
LINEAR REGRESSION USING GRADIENT
DESCENT (CONTD.)
▪ Steps –
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 84
LINEAR REGRESSION USING GRADIENT
DESCENT (CONTD.)
Here yᵢ is the actual value and ȳᵢ is the predicted value.
So, we square the error and find the mean, hence the name Mean Squared
Error.
Now that we have defined the loss function, lets try minimizing it and
finding m and c.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 85
THE GRADIENT DESCENT
ALGORITHM
Gradient descent is an
iterative optimization
algorithm to find the
minimum of a function.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 86
THE GRADIENT DESCENT ALGORITHM
1. Initially let m = 0 and c = 0. Let L be our learning rate.
This controls how much the value of m changes with each step.
L could be a small value like 0.0001 for good accuracy.
2. Calculate the partial derivative of the loss function with respect to m, and plug
in the current values of x, y, m and c in it to obtain the derivative value D.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 87
LINEAR REGRESSION USING GRADIENT
DESCENT (CONTD.)
Dₘ is the value of the partial derivative with respect to m. Similarly let’s find the
partial derivative with respect to c, Dc :
3. Now we update the current value of m and c using the following equation:
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 88
LINEAR REGRESSION USING GRADIENT
DESCENT (CONTD.)
4. We repeat this process until our loss function is a very small value or ideally 0
(which means 0 error or 100% accuracy).
The value of m and c that we are left with now will be the optimum values.
Gradient descent is one of the simplest and widely used algorithms in machine
learning
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 89
MULTIPLELINEAR REGRESSION
If there is only one input variable ( x ) , then s u c h linear regression is called
simple linear regression. A nd if there is more than one input variable, then
s u c h linear regression is called multiple linear regression.
House Price
Prediction
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 90
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 91
POLYNOMIAL
REGRESSION
▪ If our data points clearly
do not fit a linear
regression (a straight line
through all data points), it
might be ideal for
polynomial regression.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 92
POLYNOMIAL REGRESSION (CONTD.)
▪ Polynomial Regression is a regression algorithm that models the relationship
between a dependent(y) and independent variable(x) as nth degree
polynomial. The Polynomial Regression equation is given below:
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 93
POLYNOMIAL REGRESSION (CONTD.)
▪ The dataset used in Polynomial regression for training is of non-linear
nature.
▪ It makes use of a linear regression model to fit the complicated and non-
linear functions and datasets.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 94
NEED FOR POLYNOMIAL REGRESSION
▪ If we apply a linear model on a linear dataset, then it provides us a good
result as we have seen in Simple Linear Regression, but if we apply the
same model without any modification on a non-linear dataset, then it will
produce a drastic output. Due to which loss function will increase, the error
rate will be high, and accuracy will be decreased.
▪ So, for such cases, where data points are arranged in a non-linear fashion,
we need the Polynomial Regression model.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 95
NEED FOR
POLYNOMIAL
REGRESSION (CONTD.)
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 96
REGRESSION EQUATIONS –
SIMPLE LINEAR VS MULTIPLE LINEAR VS POLYNOMIAL
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 97
LOGISTIC REGRESSION
▪ Logistic regression is one of the most popular Machine Learning algorithms,
which comes under the Supervised Learning technique. It is used for
predicting the categorical dependent variable using a given set of
independent variables.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 98
LOGISTIC REGRESSION VS LINEAR REGRESSION (CONTD.)
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 99
LOGISTIC REGRESSION (CONTD.)
▪ Logistic Regression is much similar to the Linear Regression except that how
they are used. Linear Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the classification problems.
▪ The curve from the logistic function indicates the likelihood of something such
as whether the cells are cancerous or not, a mouse is obese or not based on
its weight, etc.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 100
LOGISTIC
REGRESSION
(CONTD.)
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 101
LOGISTIC REGRESSION (CONTD.)
▪ Logistic Regression is a significant machine learning algorithm because it
has the ability to provide probabilities and classify new data using
continuous and discrete datasets.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 102
LOGISTIC FUNCTION (SIGMOID FUNCTION)
▪ The sigmoid function is a mathematical function used to map the predicted
values to probabilities.
▪ It maps any real value into another value within a range of 0 and 1.
▪ The value of the logistic regression must be between 0 and 1, which cannot
go beyond this limit, so it forms a curve like the "S" form. The S-form curve
is called the Sigmoid function or the logistic function.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 103
Logistic regression uses the
concept of predictive
modeling as regression;
therefore, it is called
logistic regression, but is
used to classify samples;
Therefore, it falls under the
classification algorithm.
▪ In Logistic Regression y can be between 0 and 1 only, so for this let's divide
the above equation by (1-y):
y
; 0 for y = 0, infinity for y = 1
1−y
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 105
LOGISTIC REGRESSION EQUATION (CONTD.)
▪ But we need range between -[infinity] to +[infinity], so we take logarithm of
the equation, and it will become:
y
log = b0 + b1x1 + b2x2 + b3x3 + … + bnxn
1−y
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 106
TYPES OF LOGISTIC REGRESSION
On the basis of the categories, Logistic Regression can be classified into three
types:
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 107
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 108
K NEAREST NEIGHBOR
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 109
K-NEAREST NEIGHBOR(KNN) ALGORITHM
▪ K-Nearest Neighbor is one of the simplest Machine Learning algorithms
based on Supervised Learning technique.
▪ K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar
to the available categories.
▪ K-NN algorithm stores all the available data and classifies a new data
point based on the similarity. This means when new data appears then it
can be easily classified into a well suite category by using K- NN algorithm.
▪ K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 110
K-NEAREST NEIGHBOR(KNN) ALGORITHM (CONTD.)
▪ K-Nearest Neighbor is one of the simplest Machine Learning algorithms It is
also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
▪ KNN algorithm at the training phase just stores the dataset and when it gets
new data, then it classifies that data into a category that is much similar to
the new data.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 111
K-NEAREST NEIGHBOR(KNN) ALGORITHM (CONTD.)
▪ Suppose there are two categories, i.e., Category A and Category B, and we have a new data point
x1, so this data point will lie in which of these categories.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 112
K-NEAREST NEIGHBOR(KNN) ALGORITHM (CONTD.)
• Step-1: Select the number K of the neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean distance
• Step-4: Among these k neighbors, count the number of the data points in each
category
• Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum
• Next, we will calculate the Euclidean distance between the data points.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 114
K-NEAREST NEIGHBOR(KNN) ALGORITHM (CONTD.)
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 115
K-NEAREST NEIGHBOR(KNN) ALGORITHM (CONTD.)
• By calculating the Euclidean distance we got the nearest neighbors, as three nearest
neighbors in category A and two nearest neighbors in category B.
• As we can see the 3 nearest neighbors are from category A, hence this new data
point must belong to category A.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 116
HOW TO SELECT THE VALUE OF K IN THE K-NN ALGORITHM?
• There is no particular way to determine the best value for "K", so we need to try some
values to find the best out of them. The most preferred value for K is 5.
• A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of
outliers in the model.
• Large values for K are good, but it may find some difficulties in terms of time and
resource consumptions.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 117
ADVANTAGES OF KNN ALGORITHM:
• It is simple to implement
• The computation cost is high because of calculating the distance between the data
points for all the training samples.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 118
NUMERICAL EXAMPLE 1
• Find to which class the new data point belongs for the following, using KNN:
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 119
NUMERICAL EXAMPLE 1 (CONTD.)
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 120
NUMERICAL EXAMPLE 1 (CONTD.)
Height (cm) Weight (kg) Class
167 51 Underweight
182 62 Normal
176 69 Normal
• Distance formula – Euclidean Distance
173 64 Normal
172 65 Normal
174 56 Underweight
d= 𝑥2 − 𝑥1 2 + 𝑦2 − 𝑦1 2
169 58 Normal
173 57 Normal
170 55 Normal
170 57 ?
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 121
NUMERICAL EXAMPLE 1 (CONTD.)
• Calculate the distance between the new point and all the existing data points, one by one.
Height (cm) Weight (kg) Class Distance
167 51 Underweight 170 − 167 2 + 57 − 51 2
= 6.7
182 62 Normal 170 − 182 2 + 57 − 62 2
= 13
176 69 Normal :
173 64 Normal :
172 65 Normal :
174 56 Underweight :
169 58 Normal :
173 57 Normal :
170 55 Normal :
170 57 ?
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 122
NUMERICAL EXAMPLE 1 (CONTD.)
• We get the following:
Height (cm) Weight (kg) Class Distance
167 51 Underweight 6.7
182 62 Normal 13
176 69 Normal 13.4
173 64 Normal 7.6
172 65 Normal 8.2
174 56 Underweight 4.1
169 58 Normal 1.4
173 57 Normal 3
170 55 Normal 2
170 57 ?
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 123
NUMERICAL EXAMPLE 1 (CONTD.)
• Arrange the data according to the ascending order of distance and rank them.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 124
NUMERICAL EXAMPLE 1 (CONTD.)
• Find the class for the new data, according to the value of k
If k = 1, class = Normal
Height (cm) Weight (kg) Class Distance Rank
(consider one nearest neighbor)
169 58 Normal 1.4 1
If k = 2, class = Normal
170 55 Normal 2 2
(consider two nearest neighbors)
173 57 Normal 3 3
174 56 Underweight 4.1 4
If k = 5, class = Normal
167 51 Underweight 6.7 5 (consider five nearest neighbors)
173 64 Normal 7.6 6
172 65 Normal 8.2 7
182 62 Normal 13 8
176 69 Normal 13.4 9
170 57 ?
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 125
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 126
Data Preprocessing
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 127
Data Preprocessing …
Data preprocessing is an integral step in Machine Learning as the
quality of data and the useful information that can be derived from it
directly affects the ability of the model to learn
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 128
HANDLING MISSING (NULL) VALUES
▪ In real world data, there are some instances where a particular
element is absent because of various reasons, such as, corrupt data,
failure to load the information, or incomplete extraction.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 129
• Imputation (Handling Missing Values)
o Imputation is a technique used for replacing the missing data with some substitute
value to retain most of the data/information of the dataset. Some of the techniques use
mean, median and mode to substitute the value.
o Imputation is important because removing the data (using dropna()) from the dataset
every time is not feasible and can lead to a reduction in the size of the dataset to a
large extent, which not only raises concerns for biasing the dataset but also leads to
incorrect analysis.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 130
You're cleaning up a DataFrame with ~1000 observations. You notice that one
categorical column contains 512 missing values. What strategy should you
employ to deal with these missing values?
D. Replace all missing values with randomly sampled values from this column
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 131
HANDLING MISSING (NULL) VALUES …
▪Training a model with a dataset that has a lot of missing values can
drastically impact the machine learning model’s quality.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 132
HANDLING MISSING (NULL) VALUES …
In the variable or any observation, values are not stored called as missing
values/data
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 133
HANDLING MISSING (NULL)
VALUES …
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 135
SOMEADVANCEDATA PREPROCESSINGTECHNIQUES
Encoding Categorical Data
Nominal data: This type of categorical data consists of
the name variable without any numerical values.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 136
SOMEADVANCEDATA PREPROCESSINGTECHNIQUES
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 137
Some Advance Data
Preprocessing Techniques
Encoding Categorical Data
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 138
SOMEADVANCEDATA PREPROCESSINGTECHNIQUES
✓ One-Hot Encoding
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 139
SOMEADVANCEDATA PREPROCESSINGTECHNIQUES
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 140
SOMEADVANCEDATA PREPROCESSINGTECHNIQUES
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 141
SOMEADVANCEDATA PREPROCESSINGTECHNIQUES
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 142
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 143
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 144
WHY FEATURE SELECTION?
▪ High-dimensional data often contain irrelevant or
redundant features
✓ Reduce the accuracy of machine learning algorithms
✓ Slow down the learning process
✓ Be a problem in storage and retrieval
✓ Hard to interpret
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 145
FEATURE SELECTION
Thousands to millions of low level features: select
the most relevant one to build better, faster, and
easier to understand learning machines.
n
n’
m X
▪ Dimensionality Reduction
• When classifying novel patterns, all features need to be
computed.
• New features are combinations (linear for PCA/*LDA) of
the original features (difficult to interpret).
Document Classification
Terms
Web Pages
Emails T1 T2 ….…… TN C
D1 12 0 ….…… 6 Sports
D2 3 10 ….…… 28 Travel
Documents
…
…
DM 0 11 ….…… 16 Jobs
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 152
DIMENSION REDUCTION
TECHNIQUES
The two popular and well-known dimension reduction techniques are-
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 153
PRINCIPAL COMPONENT ANALYSIS (PCA)
Principal components analysis (PCA) is a dimensionality reduction technique that
enables you to identify correlations and patterns in a data set so that it can be
transformed into a data set of significantly lower dimension without loss of any
important information.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 154
STEP BY STEP COMPUTATION OF PCA
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 155
STEP 1: STANDARDIZATION OF THE DATA
Standardization is all about scaling your data in such a way that all the variables
and their values lie within a similar range.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 156
STEP 2: COMPUTING THE COVARIANCE MATRIX
A covariance matrix expresses the correlation between the different variables in the
data set. It is essential to identify heavily dependent variables because they contain
biased and redundant information which reduces the overall performance of the
model.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 157
STEP 3: CALCULATING THE EIGENVECTORS AND
EIGENVALUES
Eigenvectors and eigenvalues are the mathematical constructs that must be computed
from the covariance matrix in order to determine the principal components of the
data set.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 158
STEP 4: COMPUTING THE PRINCIPAL
COMPONENTS
Once we have computed the Eigenvectors and eigenvalues, all we have
to do is order them in the descending order, where the eigenvector with
the highest eigenvalue is the most significant and thus forms the first
principal component.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 159
STEP 5: REDUCING THE DIMENSIONS OF THE DATA
SET
The last step is performing PCA is to re-arrange the original data with the final
principal components which represent the maximum and the most significant
information of the data set.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 160
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS
Given the data in the following table, compute the Eigen vectors using Principal
Component Analysis (PCA) algorithm.
X1 4 8 13 7
X2 11 4 5 14
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 161
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
Step 1: Calculate Mean
Calculate the mean of X1 and X2 as shown below.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 162
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
Step 2: Calculation of the covariance matrix.
The covariances are calculated as follows:
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 163
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
Step 2 contd.: Calculation of the covariance matrix.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 164
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
The covariance matrix is,
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 165
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
Step 3: Eigenvalues of the covariance matrix
The characteristic equation of the covariance matrix is,
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 166
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
Solving the characteristic equation we get,
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 167
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
Step 4: Computation of the eigenvectors
To find the first principal components, we need to only compute the eigenvector
corresponding to the largest eigenvalue. In the present example, the largest
eigenvalue is λ1 and so we compute the eigenvector corresponding to λ1.
The eigenvector corresponding to λ = λ1 is a vector
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 168
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
satisfying the following equation:
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 169
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
This is equivalent to the following two equations:
Using the theory of systems of linear equations, we note that these equations are
not independent and solutions are given by,
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 170
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
that is,
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 171
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
To find a unit eigenvector, we compute the length of X1 which is given by,
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 172
NUMERICAL ON PRINCIPAL COMPONENT
ANALYSIS (CONTD.)
By carrying out similar computations, the unit eigenvector e corresponding to 2
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 173
BAYES THEOREM –PREREQUISITES
While studying the Bayes theorem, we need to understand few important
concepts. These are as follows:
1. Experiment
An experiment is defined as the planned operation carried out under
controlled condition such as tossing a coin, drawing a card and rolling a dice,
etc.
2. Sample Space
During an experiment what we get as a result is called as possible outcomes
and the set of all possible outcome of an event is known as sample space.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 174
BAYES THEOREM –PREREQUISITES (CONTD.)
For example, if we are rolling a dice, sample space will be:
S1 = {1, 2, 3, 4, 5, 6}
Similarly, if our experiment is related to toss a coin and recording its outcomes,
then sample space will be:
S2 = {Head, Tail}
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 175
BAYES THEOREM –PREREQUISITES (CONTD.)
3. Event
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 176
BAYES THEOREM –PREREQUISITES (CONTD.)
4. Independent Events:
Two events are said to be independent when occurrence of one event does not
affect the occurrence of another event. In simple words we can say that the
probability of outcome of both events does not depend on one another.
Mathematically, two events A and B are said to be independent if:
5. Conditional Probability:
Conditional probability is defined as the probability of an event A, given that
another event B has already occurred (i.e. A conditional B). This is represented
by P(A|B) and we can define it as:
P(A|B) = P(A ∩ B) / P(B)
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 177
BAYES THEOREM –CONDITIONAL PROBABILITY
▪ Famous mathematician Thomas Bayes gave this theorem to solve the
problem of finding reverse probability by using conditional probability.
If E1, E2, E3, …, En are non-empty events which form a partition of the sample
space S,
that is, E1, E2, E3, …, En are pairwise disjoint and E1U E2U E3U …U En = S.
If A is any event of non-zero probability that occurs with some Ei; (i = 1, 2, 3,
…, n), then
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 178
BAYES THEOREM –CONDITIONAL PROBABILITY (CONTD.)
▪ Bayes’ theorem for two events is given as:
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 179
BAYES THEOREM –NUMERICAL PROBLEMS
Q1. It is observed that 50% of mails are spam. There is a software that filters spam
mail before reaching the inbox. It accuracy for detecting a spam mail is 99% and
chances of tagging a non-spam mail as spam mail is 5%. If a certain mail is tagged
as spam find the probability that it is not a spam mail.
Solution:
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 180
BAYES THEOREM –NUMERICAL PROBLEMS
Solution (contd.):
Now,
P(E1) = 0.5 and P(E2) = 0.5
P(A|E1) = 0.99 and P(A|E2) = 0.05
Then,
P(A|E2) P(E2)
P(E2|A) =
(A|E1)P E1 +P(A|E2) P(E2)
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 181
BAYES THEOREM –NUMERICAL PROBLEMS
Q2. Three urns are there containing white and black balls; first urn has 3 white and 2
black balls, second urn has 2 white and 3 black balls and third urn has 4 white and 1
black balls. Without any biasing one urn is chosen from that one ball is chosen
randomly which was white. What is probability that it came from the third urn?
Solution:
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 182
BAYES THEOREM –NUMERICAL PROBLEMS
Solution (contd.):
Now,
P(E1) = P(E2) = P(E3) = 1/3
Then,
P(A|E3) P(E3)
P(E3|A) =
(A|E1)P E1 +P(A|E2) P(E2)+P(A|E3) P(E3)
= 4/9
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 183
BAYES THEOREM –NUMERICAL PROBLEMS –TRY YOURSELF!
Q3. A card is lost from a pack of 52 cards. From the remaining cards two are drawn
randomly and found to be both clubs. Find the probability that the lost card is also a
clubs.
Q4. A insurance company has insured 4000 doctors, 8000 teachers and 12000
businessmen. The chances of a doctor, teacher and businessman dying before the age
of 58 is 0.01, 0.03 and 0.05, respectively. If one of the insured people dies before
58, find the probability that he is a doctor.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 184
BAYES THEOREM –NUMERICAL PROBLEMS –TRY YOURSELF!
Q5. An unbiased dice is rolled and for each number on the dice a bag is chosen:
Bag A contains 3 white ball and 2 black ball, bag B contains 3 white ball and 4
black ball and bag C contains 4 white ball and 5 black ball. Dice is rolled and bag
is chosen, if a white ball is chosen find the probability that it is chosen from bag B.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 185
EVALUATING A CLASSIFICATION MODEL
For evaluating a Classification model, we have the
following ways:
2. Confusion Matrix
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 186
1. LOG LOSS OR CROSS-ENTROPY LOSS
• It is used for evaluating the performance of a classifier, whose output is
a probability value between the 0 and 1.
• For a good binary Classification model, the value of log loss should be
near to 0.
• The value of log loss increases if the predicted value deviates from the
actual value.
• The lower log loss represents the higher accuracy of the model.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 187
2. CONFUSION MATRIX
• Confusion Matrix (Error Matrix) is used to measure the performance of the
classification model.
• The number of correct and incorrect predictions are summarized with count
values and broken down by each class. This is represented by confusion
matrix.
• True Negative (TN) - The actual value was negative and the
model predicted a negative value
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 190
CONFUSION MATRIX (CONTD.)
The matrix looks like as below table
191
CONFUSION MATRIX (CONTD.)
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 192
Example
Consider a Machine Learning Model for recognizing dogs (the relevant element) in a
digital photograph. It contains ten cats and twelve dogs.
After processing the Machine Learning Model, it identifies eight dogs. Of the eight
elements identified as dogs, only five actually are dogs (true positives or relevant
instances), then what is Precision?
Precision
Precision: is the fraction
of relevant instances
among the retrieved
instances.
Total Retrieved (Predicted)
Elements
AD AC
𝟓
𝑇𝑃 (𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝐼𝑛𝑠𝑡𝑎𝑛𝑐𝑒𝑠)
Precision = = 𝑇𝑃+𝐹𝑃 PD 5 3
𝟖
(𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠)
PC 7 7
Total 12 10
Note- Precision TALKS about VALIDITY of the Model
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 193
QUIZ
1. 3/2
2. 2/3
3. 6/9
Answer- 4 194
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA
CONFUSION MATRIX (CONTD.)
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 195
QUIZ
When you type a particular Queries on Google
Search Engine, it returns 30 pages in total, in which
only 20 of pages are relevant, and failed to return
40 additional relevant pages, then what is the
Recall of the model?
1. 2/3
20 10 2. 2/7
3
3. 1/3
0
40 4. None of the Above
Answer- 3
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 196
Recall
F1 SCORE
• F1 Score is the Harmonic Mean between precision and recall. The range for F1
Score is [0, 1]. It tells you how precise your classifier is (how many instances it
classifies correctly), as well as how robust it is.
• The greater the F1 Score, the better is the performance of our model.
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 197
References
▪ https://learnbasictech.blogspot.com/
▪ https://www.javatpoint.com/machine-learning/
▪ https://www.w3schools.com/python/
▪ https://www.analyticsvidhya.com/blog/2021/10/understanding-polynomial-regression-model/
▪ https://www.javatpoint.com/logistic-regression-in-machine-learning
▪ https://www.capitalone.com/tech/machine-learning/what-is-logistic-regression/
▪ https://www.simplilearn.com/tutorials/machine-learning-tutorial/
▪ https://www.geeksforgeeks.org/
▪ https://towardsdatascience.com/introduction-to-logistic-regression-66248243c148
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 198
References
▪ https://www.kaggle.com/code/prashant111/knn-classifier-tutorial
▪ https://www.youtube.com/watch?v=HZT0lxD5h6k
▪ https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ https://www.listendata.com/2017/12/k-nearest-neighbor-step-by-step-tutorial.html
▪ https://people.revoledu.com/kardi/tutorial/KNN/KNN_Numerical-example.html
▪ https://www.cuemath.com/data/bayes-theorem/
▪ https://byjus.com/maths/bayes-theorem-questions/
▪ https://www.simplilearn.com/tutorials/statistics-tutorial/bayes-theorem
▪ https://www.statisticshowto.com/probability-and-statistics/probability-main-index/bayes-theorem-
problems/
▪ https://www.vedantu.com/formula/bayes-theorem-formula
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 199
BCSE0105: MACHINE LEARNING - DR. NABANITA CHOUDHURY, ASSISTANT PROFESSOR, GLA UNIVERSITY, MATHURA 200