0% found this document useful (0 votes)

60 views7 pages

Introduction To Logistic Regression: Implement Linear Equation

Logistic regression is a machine learning classification algorithm that can be used to predict categorical dependent variables. It works by implementing a linear equation to predict probabilities, then uses a sigmoid function to map predictions to values between 0 and 1. A decision boundary threshold is used to classify observations into discrete categories based on their predicted probabilities. Performance is evaluated using metrics like accuracy, precision, recall and F1 score as calculated from a confusion matrix.

Uploaded by

sudeepvmenon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as RTF, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views7 pages

Introduction To Logistic Regression: Implement Linear Equation

Uploaded by

sudeepvmenon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as RTF, PDF, TXT or read online on Scribd

You are on page 1/ 7

“rough notes - just for reference”

1. Introduction to Logistic Regression

When data scientists may come across a new classification problem, the first algorithm that may
come across their mind is Logistic Regression. It is a supervised learning classification algorithm
which is used to predict observations to a discrete set of classes. Practically, it is used to classify
observations into different categories. Hence, its output is discrete in nature. Logistic
Regression is also called Logit Regression. It is one of the most simple, straightforward and
versatile classification algorithms which is used to solve classification problems.

2. Logistic Regression intuition

In statistics, the Logistic Regression model is a widely used statistical model which is primarily
used for classification purposes. It means that given a set of observations, Logistic Regression
algorithm helps us to classify these observations into two or more discrete classes. So, the
target variable is discrete in nature.

The Logistic Regression algorithm works as follows -

Implement linear equation

Logistic Regression algorithm works by implementing a linear equation with independent or
explanatory variables to predict a response value. For example, we consider the example of
number of hours studied and probability of passing the exam. Here, number of hours studied is
the explanatory variable and it is denoted by x1. Probability of passing the exam is the response
or target variable and it is denoted by z.

If we have one explanatory variable (x1) and one response variable (z), then the linear equation
would be given mathematically with the following equation-

y = β0 + β1x1

Here, the coefficients β0 and β1 are the parameters of the model.

If there are multiple explanatory variables, then the above equation can be extended to

z = β0 + β1x1+ β2x2+……..+ βnxn

Here, the coefficients β0, β1, β2 and βn are the parameters of the model.

So, the predicted response value is given by the above equations and is denoted by z.
Sigmoid Function
This predicted response value, denoted by z is then converted into a probability value that lie
between 0 and 1. We use the sigmoid function in order to map predicted values to probability
values. This sigmoid function then maps any real value into a probability value between 0 and 1.

In machine learning, sigmoid function is used to map predictions to probabilities. The sigmoid
function has an S shaped curve. It is also called sigmoid curve.

A Sigmoid function is a special case of the Logistic function. It is given by the following
mathematical formula.

Graphically, we can represent sigmoid function with the following graph.

Sigmoid Function

Decision boundary
The sigmoid function returns a probability value between 0 and 1. This probability value is then
mapped to a discrete class which is either “0” or “1”. In order to map this probability value to a
discrete class (pass/fail, yes/no, true/false), we select a threshold value. This threshold value is
called Decision boundary. Above this threshold value, we will map the probability values into
class 1 and below which we will map values into class 0.

Mathematically, it can be expressed as follows:-

p ≥ 0.5 => class = 1

p < 0.5 => class = 0

Generally, the decision boundary is set to 0.5. So, if the probability value is 0.8 (> 0.5), we will
map this observation to class 1. Similarly, if the probability value is 0.2 (< 0.5), we will map this
observation to class 0. This is represented in the graph below-

Making predictions
Now, we know about sigmoid function and decision boundary in logistic regression. We can use
our knowledge of sigmoid function and decision boundary to write a prediction function. A
prediction function in logistic regression returns the probability of the observation being
positive, Yes or True. We call this as class 1 and it is denoted by P(class = 1). If the probability
inches closer to one, then we will be more confident about our model that the observation is in
class 1, otherwise it is in class 0.

3. Assumptions of Logistic Regression

The Logistic Regression model requires several key assumptions. These are as follows:-

1. Logistic Regression model requires the dependent variable to be binary(0/1/2+),

multinomial or ordinal in nature.

2. It requires the observations to be independent of each other. So, the observations

should not come from repeated measurements.

3. Logistic Regression algorithm requires little or no multicollinearity among the

independent variables. It means that the independent variables should not be too highly
correlated with each other.

4. Logistic Regression model assumes linearity of independent variables.

5. The success of Logistic Regression model depends on the sample sizes. Typically, it
requires a large sample size to achieve the high accuracy.
4. Types of Logistic Regression
Logistic Regression model can be classified into three groups based on the target variable
categories. These three groups are described below:-

 Binary Logistic Regression - In Binary Logistic Regression, the target variable has two
possible categories. The common examples of categories are yes or no, good or bad,
true or false, spam or no spam and pass or fail.

 Multinomial Logistic Regression - In Multinomial Logistic Regression, the target variable

has three or more categories which are not in any particular order. So, there are three or
more nominal categories. The examples include the type of categories of fruits - apple,
mango, orange and banana.0/1/2+

 Ordinal Logistic Regression - In Ordinal Logistic Regression, the target variable has three
or more ordinal categories. So, there is intrinsic order involved with the categories. For
example, the student performance can be categorized as poor, average, good and
excellent.

Precision, Recall, Accuracy etc etc ? Confusion matrix

Describing the Performance of a Logistic model
A confusion matrix is a table that is often used to describe the performance of a classification
model (or “classifier”) on a set of test data for which the true values are known. Let us look at
some of the important terms of confusion matrix.
confusion matrix whether employees will leave a company or not

The Confusion Matrix tells us the following:

 There are two possible predicted classes: “yes” and “no”. If we were predicting that
employees would leave an organisation, for example, “yes” would mean they will, and
“no” would mean they won’t leave the organisation.

 The classifier made a total of 165 predictions (e.g., 165 employees were being studied).
 Out of those 165 cases, the classifier predicted “yes” 110 times, and “no” 55 times.
 In reality, 105 employees in the sample leave the organisation, and 60 do not.

Basic terms related to Confusion matrix:

 True positives (TP): These are cases in which we predicted yes (employees will leave the
organisation), and employees actually leave i.e 100
 True negatives (TN): We predicted no(employees will not leave the organisation) and
they don’t leave i.e 50
 False positives (FP): We predicted yes they will leave, but they don’t leave. (Also known
as a “Type I error.”) i.e 10
 False negatives (FN): We predicted no they will not leave, but they actually leave (Also
known as a “Type II error.”) i.e 5

Evaluating a Classification Model

 Accuracy : (TP+TN)/Total(FP+FN+TP+TN) . Describes overall, how often the classifier

correct. i.e 100+50/165 Measures of Accuracy
Sensitivity and specificity are statistical measures of the performance of a binary classification
test:
 Sensitivity/Recall = TP/(TP + FN). When it’s actually yes, how often does it predict yes?
i.e 100/(100+5)

 Sensitivity/Recall = TP/(TP + FN). When it’s actually yes, how often does it predict yes?
i.e 100/(100+5)
 Precision = TP/(TP+FP). When it predicts yes, how often is it correct?100/(10+100)

Specificity = TN/(TN + FP).When it’s actually no, how often does it predict no?? i.e
50/(50+10)

 F1 Score = 2*((precision*recall)/(precision+recall)).
It is also called the F Score or the F Measure. Put another way, the F1 score conveys the
balance between the precision and the recall.

Strategy by Aleksandr A. Svechin Preface PDF
100% (8)
Strategy by Aleksandr A. Svechin Preface PDF
453 pages
Electronic Science - UGC-NET - 2024 - Sample Pages
100% (1)
Electronic Science - UGC-NET - 2024 - Sample Pages
24 pages
Polytechnic Colleges in AP
100% (2)
Polytechnic Colleges in AP
7 pages
Art and Knowledge PDF
No ratings yet
Art and Knowledge PDF
10 pages
Workbook in LOGIC
No ratings yet
Workbook in LOGIC
40 pages
This Study Resource Was: Literature Review: MCQ's
100% (2)
This Study Resource Was: Literature Review: MCQ's
3 pages
Normal Distribution Assessments
No ratings yet
Normal Distribution Assessments
5 pages
Week 38 BA Session 5 Structured Problem Solving Using Frameworks
No ratings yet
Week 38 BA Session 5 Structured Problem Solving Using Frameworks
22 pages
5C Framework Case Study
No ratings yet
5C Framework Case Study
13 pages
Lecture Notes - Structured Problem Solving Using Frameworks - II
No ratings yet
Lecture Notes - Structured Problem Solving Using Frameworks - II
29 pages
LIBA++Lecture+Notes Power+BI
100% (1)
LIBA++Lecture+Notes Power+BI
17 pages
Lecture Notes - Structured Problem Solving Using Frameworks - I
100% (1)
Lecture Notes - Structured Problem Solving Using Frameworks - I
30 pages
Minor Project
No ratings yet
Minor Project
19 pages
Ebooks File Risk Science An Introduction 2nd Edition Terje Aven & Shital Thekdi All Chapters
100% (6)
Ebooks File Risk Science An Introduction 2nd Edition Terje Aven & Shital Thekdi All Chapters
84 pages
Doctrinal and Non-Doctrinal Research
100% (2)
Doctrinal and Non-Doctrinal Research
13 pages
Implementasi Model Discovery Learning Pada Materi Aplikasi Gelombang Elektromagnetik Analisis Pengotor Berbagai Minyak Goreng Sawit Dengan Spektrofotometri UV-Vis
No ratings yet
Implementasi Model Discovery Learning Pada Materi Aplikasi Gelombang Elektromagnetik Analisis Pengotor Berbagai Minyak Goreng Sawit Dengan Spektrofotometri UV-Vis
6 pages
Engineering in History
No ratings yet
Engineering in History
3 pages
Artificial Intelligence and The Auditor - Are You Ready - IDEA
No ratings yet
Artificial Intelligence and The Auditor - Are You Ready - IDEA
5 pages
8
No ratings yet
8
69 pages
Swales's "Moves" and The Research Paper Assignment
No ratings yet
Swales's "Moves" and The Research Paper Assignment
6 pages
The Maturation Process of Incorporating Sustainability in Universities
No ratings yet
The Maturation Process of Incorporating Sustainability in Universities
11 pages
Research in Computer
No ratings yet
Research in Computer
12 pages
Cultural Systems Analysis
No ratings yet
Cultural Systems Analysis
23 pages
Kerala University Library Audit Report 2015
No ratings yet
Kerala University Library Audit Report 2015
32 pages
Statistics Course Outline 2022
No ratings yet
Statistics Course Outline 2022
2 pages
1st Year Physics
No ratings yet
1st Year Physics
1 page
2017 Activity Report
No ratings yet
2017 Activity Report
48 pages
Universitas Majalengka: Fakultas Keguruan & Ilmu Pendidikan
No ratings yet
Universitas Majalengka: Fakultas Keguruan & Ilmu Pendidikan
2 pages
Sinning in The Basement: What Are The Rules? The Ten Commandments of Applied Econometrics
No ratings yet
Sinning in The Basement: What Are The Rules? The Ten Commandments of Applied Econometrics
21 pages
KOM 6115 Assignment 3 (GS65807)
No ratings yet
KOM 6115 Assignment 3 (GS65807)
9 pages
HCGE231 1 1 Jan Jun2024 FA2 ZJ V.3 29012024
No ratings yet
HCGE231 1 1 Jan Jun2024 FA2 ZJ V.3 29012024
6 pages
Cognitive Bias in Forensic Mental Health Assessment: Evaluator Beliefs About Its Nature and Scope
No ratings yet
Cognitive Bias in Forensic Mental Health Assessment: Evaluator Beliefs About Its Nature and Scope
23 pages
Research Variables
No ratings yet
Research Variables
3 pages
Unit 1 BALLB First Year - General English
No ratings yet
Unit 1 BALLB First Year - General English
7 pages
Key Terms in Jurisprudence
No ratings yet
Key Terms in Jurisprudence
12 pages
Bebbington y Larrinaga (2014) Accounting and Sustainable Development An Exploration
No ratings yet
Bebbington y Larrinaga (2014) Accounting and Sustainable Development An Exploration
24 pages

Introduction To Logistic Regression: Implement Linear Equation

Uploaded by

Introduction To Logistic Regression: Implement Linear Equation

Uploaded by

“rough notes - just for reference”

1. Introduction to Logistic Regression

2. Logistic Regression intuition

The Logistic Regression algorithm works as follows -

Implement linear equation

Here, the coefficients β0 and β1 are the parameters of the model.

z = β0 + β1x1+ β2x2+……..+ βnxn

Graphically, we can represent sigmoid function with the following graph.

Mathematically, it can be expressed as follows:-

p ≥ 0.5 => class = 1

p < 0.5 => class = 0

3. Assumptions of Logistic Regression

1. Logistic Regression model requires the dependent variable to be binary(0/1/2+),

2. It requires the observations to be independent of each other. So, the observations

3. Logistic Regression algorithm requires little or no multicollinearity among the

4. Logistic Regression model assumes linearity of independent variables.

 Multinomial Logistic Regression - In Multinomial Logistic Regression, the target variable

Precision, Recall, Accuracy etc etc ? Confusion matrix

The Confusion Matrix tells us the following:

Basic terms related to Confusion matrix:

Evaluating a Classification Model

 Accuracy : (TP+TN)/Total(FP+FN+TP+TN) . Describes overall, how often the classifier

You might also like