0% found this document useful (0 votes)
11 views45 pages

Unit 3 - ML - CH-1

Uploaded by

Priyanka Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views45 pages

Unit 3 - ML - CH-1

Uploaded by

Priyanka Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Unit 3

Classification – Logistic Regression & Neural Network

Dr. ABHINANDAN P. SHIRAHATTI


Associate Professor,
Department of Computer Science Engineering,
KIT’s College of Engineering (Autonomous),
Kolhapur
Maharashtra – 416234

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Content
Logistic regression- definition, hypothesis representation, decision boundary, cost
function, gradient descent for logistic regression. Multiclass classification.
Regularization – over fitting & under fitting, cost function, regularized linear
regression, regularized logistic Regression.

Neural networks-neuron representation and model, hypothesis for neuron, cost


function, solution of a problem using single neuron, gradient descent for a neuron.
Multiclass classification with neural network. Learning in neural networks –
feedforward neural network, backpropagation algorithm. Loss function – support
vector machines (SVMs), softmax
regression.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Logistic Regression
• Logistic regression is a supervised machine learning algorithm used for classification
tasks where the goal is to predict the probability that an instance belongs to a given class or
not. Logistic regression is a statistical algorithm which analyze the relationship between two
data factors.
Logistic regression is used for binary classification where we use sigmoid
function, that takes input as independent variables and produces a probability
value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic
function for an input is greater than 0.5 (threshold value) then it belongs to
Class 1 otherwise it belongs to Class 0. It’s referred to as regression because it
is the extension of linear regression but is mainly used for classification
problems.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Key Points
• Logistic regression predicts the output of a categorical dependent variable. Therefore,
the outcome must be a categorical or discrete value.

• It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value
as 0 and 1, it gives the probabilistic values which lie between 0 and 1.

• In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
function, which predicts two maximum values (0 or 1).

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Logistic Function – Sigmoid Function
 The sigmoid function is a mathematical function used to map the predicted values to
probabilities.

 It maps any real value into another value within a range of 0 and 1. The value of the
logistic regression must be between 0 and 1, which cannot go beyond this limit, so it
forms a curve like the “S” form.

 The S-form curve is called the Sigmoid function or the logistic function.

 In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and a
value below the threshold values tends to 0.
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Types of Logistic Regression
On the basis of the categories, Logistic Regression can be classified into three types:

 Binomial: In binomial Logistic regression, there can be only two possible


types of the dependent variables, such as 0 or 1, Pass or Fail, etc.

 Multinomial: In multinomial Logistic regression, there can be 3 or more


possible unordered types of the dependent variable, such as “cat”, “dogs”, or
“sheep”

 Ordinal: In ordinal Logistic regression, there can be 3 or more possible


ordered types of dependent variables, such as “low”, “Medium”, or “High”.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Terminologies involved in Logistic Regression
Independent variables: The input characteristics or predictor factors applied to the dependent variable’s
predictions.
Dependent variable: The target variable in a logistic regression model, which we are trying to predict.
Logistic function: The formula used to represent how the independent and dependent variables relate to
one another. The logistic function transforms the input variables into a probability value between 0 and
1, which represents the likelihood of the dependent variable being 1 or 0.
Odds: It is the ratio of something occurring to something not occurring. it is different from probability as
the probability is the ratio of something occurring to everything that could possibly occur.
Log-odds: The log-odds, also known as the logit function, is the natural logarithm of the odds. In logistic
regression, the log odds of the dependent variable are modeled as a linear combination of the
independent variables and the intercept.
Coefficient: The logistic regression model’s estimated parameters, show how the independent and
dependent variables relate to one another.
Intercept: A constant term in the logistic regression model, which represents the log odds when all
independent variables are equal to zero.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Cost function for Linear Regression
Logistic regression can be used where the probabilities between two classes is required. Such as
whether it will rain today or not, either 0 or 1, true or false etc.
Logistic regression is based on the concept of Maximum Likelihood estimation. According to this
estimation, the observed data should be most probable.
In logistic regression, we pass the weighted sum of inputs through an activation function that can
map values in between 0 and 1. Such activation function is known as sigmoid function and the
curve obtained is called as sigmoid curve or S-curve. Consider the below image:

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Cost function for Logistic Regression
The cost function for Logistic Regression (not "logistic linear regression") is designed to
measure the error between predicted probabilities and actual class labels in
classification problems. Since Logistic Regression deals with probabilities and binary
classification, the cost function is different from the Mean Squared Error used in
Linear Regression.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Cost Function
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Cost function for Logistic Regression
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
.
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Mathematical intuition
To understand the logistic regression, lets go over the odds of success.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Mathematical intuition

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Mathematical intuition

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Problem on Logistic Regression
1. You are tasked with predicting whether students will pass or fail a course based on the
number of hours they studied. The outcome variable is binary:1 (Pass) /0 (Fail)
Given a dataset where each student has studied for a certain number of hours, we want to
build a logistic regression model to predict whether a student will pass or fail.
Data set:
Study Hours (x) Pass/Fail (y)
1 0
2 1
3 1
4 1
5 ?

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Formulation of the Logistic Regression Problem:
1. Hypothesis Function:
Logistic regression is used to model the probability that the dependent variable y
(Pass/Fail) is 1, given the number of study hours x. The hypothesis function is given
by the sigmoid function:

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
2. Cost Function:
For logistic regression, the cost function is based on maximum likelihood
estimation and is defined as:

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
3. Gradient Descent
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
4. Decision Boundary
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Solution
Here , the independent variable X is represent in the form of Matrix
XT =[1 2 3 4]
The dependent variable Y is represent in the form of Matrix:
YT =[0 1 1 1]
The data can be given in the matrix form as follows:

The first column is used for setting the bias

 Y=

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Logistic Regression Matrix method
The regression is given below:

 Step by step computation is given below:

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Logistic Regression Matrix Method

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Logistic regression method

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Multiclass classification
 Multiclass classification is the task of assigning an instance to one of three
or more classes, as opposed to binary classification, which involves only two
classes.
 In multiclass classification, we deal with datasets where the target variable
can belong to more than two categories, and the goal is to develop models
that can accurately predict the class for unseen instances.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Binary classification vs. Multi-class classification

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Multiclass classification
Binary Classification
Only two class instances are present in the dataset.
It requires only one classifier model.
Confusion Matrix is easy to derive and understand.
Example:- Check email is spam or not, predicting gender based on height and weight.

Multi-class Classification
Multiple class labels are present in the dataset.
The number of classifier models depends on the classification technique we are applying to.
One vs. All:- N-class instances then N binary classifier models
One vs. One:- N-class instances then N* (N-1)/2 binary classifier models
The Confusion matrix is easy to derive but complex to understand.
Example:- Check whether the fruit is apple, banana, or orange.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
One vs. All (One-vs-Rest)
In one-vs-All classification, for the N-class instances dataset, we have to generate the N-
binary classifier models.
The number of class labels present in the dataset and the number of generated binary
classifiers must be the same.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Multiclass classification
As shown in the above image, consider we have three classes, for example, type 1 for
Green, type 2 for Blue, and type 3 for Red.
Now, as I told you earlier that we have to generate the same number of classifiers as the
class labels are present in the dataset, So we have to create three classifiers here for three
respective classes.
Classifier 1:- [Green] vs [Red, Blue]
Classifier 2:- [Blue] vs [Green, Red]
Classifier 3:- [Red] vs [Blue, Green]
Now to train these three classifiers, we need to create three training datasets. So let’s
consider our primary dataset is as follows,

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Primary Data set

You can see that there are three class labels Green, Blue, and Red present in the
dataset. Now we have to create a training dataset for each class.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Create the training datasets by putting +1 in the class column for that feature value, which is aligned
to that particular class only. For the costs of the remaining features, we put -1 in the class column.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Preparation for training data set

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Now, after creating a training dataset for each classifier, we provide it to our classifier model
and train the model by applying an algorithm

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Over fitting, Under fitting & Regularization

Overfitting, Underfitting, and Regularization are three common concepts in machine


learning that are related to the training of models.

Under fitting in Machine Learning


 Machine learning algorithm is said to have under fitting when a model is too simple to
capture data complexities. It represents the inability of the model to learn the training data
effectively result in poor performance both on the training and testing data. In simple terms,
an underfit model’s are inaccurate, especially when applied to new, unseen examples. It
mainly happens when we uses very simple model with overly simplified assumptions. To
address underfitting problem of the model, we need to use more complex models, with
enhanced feature representation, and less regularization.

 Note: The underfitting model has High bias and low variance.
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Reasons for Under fitting
• The model is too simple, So it may be not capable to represent the complexities in the data.
• The input features which is used to train the model is not the adequate representations of
underlying factors influencing the target variable.
• The size of the training dataset used is not enough.
• Features are not scaled.
Techniques to Reduce Under fitting
 Increase model complexity.
 Increase the number of features, performing feature engineering.
 Remove noise from the data.
 Increase the number of epochs or increase the duration of training to get better results.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Over fitting in Machine Learning

Machine Learning model said to be overfitted when the model does not make accurate
predictions on testing data.When a model gets trained with so much data, it starts learning
from the noise and inaccurate data entries in our data set. And when testing with test data
results in High variance. Then the model does not categorize the data correctly, because of
too many details and noise. The causes of overfitting are the non-parametric and non-linear
methods because these types of machine learning algorithms have more freedom in building
the model based on the dataset and therefore they can really build unrealistic models.

Note: The Over fitting model has Low bias and high variance.
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Reasons for Overfitting:
• High variance and low bias.
• The model is too complex.
• The size of the training data.
Techniques to Reduce Overfitting
• Improving the quality of training data reduces overfitting by focusing on meaningful patterns,
mitigate the risk of fitting the noise or irrelevant features.
• Increase the training data can improve the model’s ability to generalize to unseen data and reduce the
likelihood of overfitting.
• Reduce model complexity.
• Early stopping during the training phase (have an eye over the loss over the training period as soon as
loss begins to increase stop training).
• Ridge Regularization and Lasso Regularization.
• Use dropout for neural networks to tackle overfitting.
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Bias and Variance in Machine Learning
Bias: Bias refers to the error due to overly simplistic assumptions in the learning algorithm.
These assumptions make the model easier to comprehend and learn but might not capture the
underlying complexities of the data. It is the error due to the model’s inability to represent the
true relationship between input and output accurately. When a model has poor performance
both on the training and testing data means high bias because of the simple model, indicating
underfitting.
Variance: Variance, on the other hand, is the error due to the model’s sensitivity to
fluctuations in the training data. It’s the variability of the model’s predictions for different
instances of training data. High variance occurs when a model learns the training data’s noise
and random fluctuations rather than the underlying pattern. As a result, the model performs
well on the training data but poorly on the testing data, indicating overfitting.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Bias and Variance in Machine Learning

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Lasso (L1) (Least Absolute Shrinkage and Selection Operator)
The Lasso (Least Absolute Shrinkage and Selection Operator) is a regression
technique used to perform both variable selection and regularization in order to
enhance the prediction accuracy and interpretability of a model.

Lambda λ is the regularization parameter that controls the degree of regularization.


•If lambda λ=0, Lasso reduces to ordinary least squares regression.
•As lambda λ increases, some of the Wi coefficients shrink to zero, effectively selecting a simpler model.

encourages sparsity in the coefficients, meaning it can set some coefficients to


penalty term exactly zero, eliminating irrelevant features.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Ridge(L2) Regularization
Ridge Regularization, also known as L2 regularization, adds a penalty equal to the square of the weights
associated with each feature variable.
This encourages all coefficients to reduce in size by an amount proportional to their values and reduces
model complexity by shrinking large weights toward zero.
Ridge regularization can be more effective than Lasso when there are many collinear variables because
it prevents individual coefficients from becoming too large and overwhelming others.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Elastic Net (L1 +L2) regularization
Elastic Net is a regularization technique that combines both Lasso (L1 regularization) and
Ridge (L2 regularization). It is useful when there are multiple correlated features, and neither
Lasso nor Ridge alone gives optimal results.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Assignment questions
1. Explain the difference between logistic regression and linear regression. Why is logistic regression used for
classification problems?
2. Describe the significance of the sigmoid function in logistic regression. What properties make it suitable for
classification?
3. Given a logistic regression hypothesis function explain how this function maps predicted outputs to a
probability.
4. Illustrate, with an example, how logistic regression handles binary classification problems.
5. Define the concept of a decision boundary in logistic regression. How is it determined from the hypothesis function ?
6. Provide an example where the decision boundary is a straight line. How would this change for a nonlinear decision
boundary?
7. Why can't we use the Mean Squared Error (MSE) cost function in logistic regression?
8. Explain the One-vs-All (OvA) strategy for extending logistic regression to multiclass classification problems. Provide
an example scenario where this approach would be used.
9. How does the Softmax Regression (also known as Multinomial Logistic Regression) work for multiclass classification?
Provide the hypothesis function and cost function for softmax regression.
10.How can we mitigate overfitting in logistic regression? List and explain at least two techniques.
11.Explain how the regularized cost function for logistic regression is formulated. Write the regularized cost function
and explain how it differs from the unregularized version.
12.Implement logistic regression from scratch (without using libraries like Scikit-learn) using gradient descent. Apply it
to a binary classification problem.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111

You might also like