0% found this document useful (0 votes)

12 views39 pages

CO 2 Session 3

The document provides an overview of Logistic Regression, including its introduction, use cases in various fields, model description, and diagnostics such as Deviance and ROC Curve. It also discusses additional regression models and classification methods, particularly Decision Trees, emphasizing their interpretability and efficiency. The chapter concludes with self-assessment and terminal questions for further learning.

Uploaded by

sudheerkumarramisetti123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views39 pages

CO 2 Session 3

Uploaded by

sudheerkumarramisetti123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 39

COURSE TITLE: BIG DATA ANALYTICS

COURSE CODE: 21CS3275A

TOPIC: ADVANCED ANALYTICAL THEORY AND METHODS:

LOGISTIC REGRESSION AND CLASSIFICATION

CO-2-Session –3
CONTENTS

 Logistic Regression Introduction

 Logistic Regression Use cases

 Logistic regression example

 Logistic Regression Model Description

 Diagnostics Deviance and the Pseudo-R2

 Receiver Operating Characteristic (ROC) Curve

 Reasons to Choose and Cautions

2
CONTENTS

 Additional Regression Models

 Classification

3
LOGISTIC REGRESSION INTRODUCTION

 In linear regression modeling, the outcome variable is continuous – e.g., income ~ age and
education

 In logistic regression, the outcome variable is categorical, and this chapter focuses on two-
valued outcomes like true/false, pass/fail, or yes/no

4
LOGISTIC REGRESSION USE CASES

 Medical

• Probability of a patient’s successful response to a specific medical treatment – input

could include age, weight, etc.

 Finance : Probability an applicant defaults on a loan

 Marketing

• Probability a wireless customer switches carriers (churns)

 Engineering : Probability a mechanical part malfunctions or fails

5
LOGISTIC REGRESSION

 Used for binary classification (0 / 1)

 It is supervised learning

 Predicting discrete values ( yes /no)

 Threshold = 0.5

 Data needs to be linearly separable

 3 main steps:

1. Calculate logistic function 2. Learn coefficients 3. Make predictions

6
LOGISTIC REGRESSION

SAMPLE DATA

STEP1:
LOGISTIC FUNCTION / SIGMOID
FUNCTION

1/1+e-X

7
LOGISTIC REGRESSION

 Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical dependent
variable using a given set of independent variables.

 Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.

8
LOGISTIC REGRESSION

 Logistic Regression is much similar to the Linear Regression except that how they are used.
Linear Regression is used for solving Regression problems, whereas Logistic regression is
used for solving the classification problems.

 In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).

 The curve from the logistic function indicates the likelihood of something such as whether
the cells are cancerous or not, a mouse is obese or not based on its weight, etc.

9
LOGISTIC REGRESSION

 Logistic Regression is a significant machine learning algorithm because it has the ability to
provide probabilities and classify new data using continuous and discrete datasets.

 Logistic Regression can be used to classify the observations using different types of data
and can easily determine the most effective variables used for the classification. The below
image is showing the logistic function:

10
LOGISTIC REGRESSION

11
LOGISTIC REGRESSION MODEL DESCRIPTION

 Logical regression is based on the logistic function

ey
f(y) = for -∞ < y < ∞
1 + ey

• As y -> infinity, f(y)->1;

and as y->-infinity, f(y)->0

12
LOGISTIC REGRESSION MODEL DESCRIPTION

 With the range of f(y) as (0,1), the logistic function models the probability of an outcome
occurring
• In contrast to linear regression, the values of y
Y = ᵦ0+ᵦ₁x1 + ᵦ₂x2 + ... + ᵦp-1 Xp-1 are not directly observed; only the values of
ey f(y) in terms of success or failure are observed.
P (x1,x2……….;xp-1) = 1 + ey • Called log odds ratio, or logit of p. Maximum
for -∞ < y < ∞
Likelihood Estimation (MLE) is used to

In (p/(1-p)) = Y = ᵦ0+ᵦ₁x1 + ᵦ₂x2 + ... + ᵦp-1 Xp-1 estimate model parameters. MLR is beyond
the scope of this book.

13
LOGISTIC REGRESSION MODEL DESCRIPTION:
CUSTOMER CHURN EXAMPLE

14
DIAGNOSTICS MODEL DESCRIPTION: CUSTOMER
CHURN EXAMPLE

> head(churn_input) # Churned = 1 if cust churned

> sum(churn_input$Churned) # 1743/8000 churned

#Use the Generalized Linear Model function glm()

>Churn_logistic1<glm(Churned~Age+Married+Cust_years+Churned_contacts,data=c
hurn_input,family=binomial(link=“logit”))

> summary(Churn_logistic1) # Age + Churned_contacts best

15
DIAGNOSTICS MODEL DESCRIPTION:
CUSTOMER CHURN EXAMPLE

>Churn_logistic3<glm(Churned~Age+Churned_contacts,data=churn_input,family=bi
nomial(link=“logit”))

> summary(Churn_logistic3) # Age + Churned_contacts

y = 3.50 -0.16* Age +0.38* Churned_contacts

16
DIAGNOSTICS DEVIANCE AND THE PSEUDO-R2
 In logistic regression, deviance = -2logL

• where L is the maximized value of the likelihood function used to obtain the parameter
estimates

 Two deviance values are provided

• Null deviance = deviance based on only the y-intercept term

• Residual deviance = deviance based on all parameters

 Pseudo-R2 measures how well fitted model explains the data

• Value near 1 indicates a good fit over the null model

17
DIAGNOSTICS DEVIANCE AND THE PSEUDO-R2

Y = ᵦ0+ᵦ₁*Age+ ᵦ₂*Churned_Contacts

A metric analogous to R2 in linear regression can be computed as shown

Residual dev. Residual dev. - Res dev.

Pseudo - R2= 1 - null dev.
=
null dev.

18
DIAGNOSTICS RECEIVER OPERATING
CHARACTERISTIC (ROC) CURVE
 Logistic regression is often used to classify

• In the Churn example, a customer can be classified as Churn if the model predicts high
probability of churning
• Although 0.5 is often used as the probability threshold, other values can be used based
on desired error tradeoff
 For two classes, C and nC, we have

• True Positive: predict C, when actually C

19
DIAGNOSTICS RECEIVER OPERATING
CHARACTERISTIC (ROC) CURVE
• True Negative: predict nC, when actually nC

• False Positive: predict C, when actually nC

• False Negative: predict nC, when actually C

# of false positives
False Positive Rate (FPR) =
# of negatives

# of true positives
True Positive Rate (TPR) =
# of positives

20
REASONS TO CHOOSE AND CAUTIONS

 Linear regression – outcome variable continuous

 Logistic regression – outcome variable categorical

 Both models assume a linear additive function of the inputs variables

• If this is not true, the models perform poorly

• In linear regression, the further assumption of normally distributed error terms is

important for many statistical inferences

 Although a set of input variables may be a good predictor of an output variable,

“correlation does not imply causation”

21
ADDITIONAL REGRESSION MODELS

 Multicollinearity is the condition when several input variables are highly correlated

• This can lead to inappropriately large coefficients

 To mitigate this problem

• Ridge regression applies a penalty based on the size of the coefficients

• Lasso regression applies a penalty proportional to the sum of the absolute values of the
coefficients

 Multinomial logistic regression – used for a more-than-two-state categorical outcome

variable

22
CLASSIFICATION

 Classification is widely used for prediction

 Most classification methods are supervised

 This chapter focuses on two fundamental classification methods

• Decision trees

• Naïve Bayes

23
DECISION TREES

 Decision Trees (DTs) are a non-parametric supervised learning method used for
classification and regression.

 Decision trees learn from data to approximate a sine curve with a set of if-then-else
decision rules. The deeper the tree, the more complex the decision rules and the fitter the
model.

 Tree structure specifies sequence of decisions

• Given input X={x1, x2,…, xn}, predict output Y

• Input attributes/features can be categorical or continuous

24
DECISION TREES

 Node = tests a particular input variable

• Root node, internal nodes, leaf nodes return class labels

• Depth of node = minimum steps to reach node

 Branch (connects two nodes) = specifies decision

 Two varieties of decision trees

• Classification trees: categorical output, often binary

• Regression trees: numeric output

25
DECISION TREES

• Example of a decision tree

• Predicts whether customers will buy a product

26
SIMPLE DECISION TREES

27
SIMPLE DECISION TREES

• Example: will bank client subscribe to term deposit?

28
PRUNING

• The shortening of branches of the tree. Pruning is the process of reducing the size of the
tree by turning some branch nodes into leaf nodes, and removing the leaf nodes under the
original branch. Pruning is useful because classification trees may fit the training data
well, but may do a poor job of classifying new values. A simpler tree often avoids over-
fitting.

29
PRUNING EXAMPLE

30
WHY DECISION TREES?

 Decision trees are great for several reasons:

 They are easily interpretable and follows a similar pattern to human thinking. In other
words, you can explain a decision tree as a set of questions/business rules.

 Prediction is fast.

 It’s a set of comparison operations until you reach a leaf node.

 Can be adapted to deal with missing data without imputing data

31
DECISION TREE : THE GENERAL ALGORITHM

 Construct a tree T from training set S

 Requires a measure of attribute information

• Simplistic method (data from previous Fig.)

• Purity = probability of corresponding class

• E.g., P(no)=1789/2000=89.45%, P(yes)=10.55%

• Entropy methods

• Entropy measures the impurity of an attribute

• Information gain measures purity of an attribute

32
DECISION TREE : THE GENERAL ALGORITHM

 Entropy methods of attribute information

Hx = the entropy of X

 Information gain of an attribute = base entropy – conditional entropy

InfoGainA=Hs- HSA

33
DECISION TREE : THE GENERAL ALGORITHM

 Construct a tree T from training set S

• Choose root node = most informative attribute A

• Partition S according to A’s values

• Construct subtrees T1, T2… for the subsets of S recursively until one of following occurs

• All leaf nodes satisfy minimum purity threshold

• Tree cannot be further split with min purity threshold

• Other stopping criterion satisfied – e.g., max depth

34
SUMMARY

In this Chapter explained about the Logistic Regression and Use cases,
Diagnostics Deviance and the Pseudo-R2 and Classification.

35
SELF-ASSESSMENT QUESTIONS

1. Logistic function is also called as _______

(a) Cost Function

(b) Corresponding Function
(c) Linear Function
(d) Sigmoid Function

2. Logistic Regression is an example of__________

(a) Supervised Learning : Regression

(b) Classification
(c) Unsupervised Learning
(d) Semi – Supervised Learning
TERMINAL QUESTIONS

1. What is Logistics Regression?

2. Explain the use cases of Logistic Regression?
3. Explain about Diagnostics Deviance and the Pseudo-R2 ?
4. Describe about Diagnostics Receiver Operating Characteristic (ROC) Curve?
5. What is Classification?
6. What is Pruning?
7. Describe about the Decision Trees?
8. What is General Algorithm in Decision Tree?
REFERENCES FOR FURTHER LEARNING OF THE SESSION

Reference Books:
1. Data science and big data analytics: discovering, analyzing, visualizing and presenting data – EMC
Education Services
2. Tom White, “Hadoop The Definitive Guide”, O’Reilly Publications, Fourth Edition,2015
3. Seema Acharya, Subhashini Chellappan, “Big Data and Analytics”, Wiley Publications, First
Edition,2015
Sites and Web links:
4. https // www.geeksforgeeks.org
5. Big Data Analytics www.javatpoint.com
6. https://www.analyticsvidhya.com/
THANK YOU

Team – Big Data Analytics

Module-2 - Logistic Regression in Machine Learning
No ratings yet
Module-2 - Logistic Regression in Machine Learning
28 pages
Introduction To Machine Learning and Logistic Regression
No ratings yet
Introduction To Machine Learning and Logistic Regression
28 pages
Supervised Learning
No ratings yet
Supervised Learning
187 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Week-14 Lecture 28
No ratings yet
Week-14 Lecture 28
34 pages
ML-classification Models
No ratings yet
ML-classification Models
27 pages
Aychew Chernet
No ratings yet
Aychew Chernet
8 pages
Lecture 7 Classification
No ratings yet
Lecture 7 Classification
33 pages
Pyq ML
No ratings yet
Pyq ML
8 pages
Section 4
No ratings yet
Section 4
40 pages
CSE3506 Module2 Notes
No ratings yet
CSE3506 Module2 Notes
96 pages
Session 9-Logistic Regression
No ratings yet
Session 9-Logistic Regression
33 pages
Accuracy Assessment and Confusion Matrix
No ratings yet
Accuracy Assessment and Confusion Matrix
23 pages
Logistic Regression
No ratings yet
Logistic Regression
11 pages
ML Unit-IV Notes
No ratings yet
ML Unit-IV Notes
49 pages
Dsbda Ut4
No ratings yet
Dsbda Ut4
12 pages
LR, Decision Tree
No ratings yet
LR, Decision Tree
48 pages
4.logistic Regression
No ratings yet
4.logistic Regression
16 pages
ML Notes by Pushpa
No ratings yet
ML Notes by Pushpa
26 pages
Commonly Used Machine Learning Algorithms
No ratings yet
Commonly Used Machine Learning Algorithms
27 pages
SMDS Unit 5
No ratings yet
SMDS Unit 5
21 pages
Statistics Interview Questions
100% (1)
Statistics Interview Questions
7 pages
Chapter 4 Statistical Classification Methods
No ratings yet
Chapter 4 Statistical Classification Methods
63 pages
BFCAI BigDataAnalytics Lecture#5 2
No ratings yet
BFCAI BigDataAnalytics Lecture#5 2
69 pages
Basic ML Algorithm
No ratings yet
Basic ML Algorithm
74 pages
6 ML Updated
No ratings yet
6 ML Updated
23 pages
Artificial Intelligence Lec 4
No ratings yet
Artificial Intelligence Lec 4
13 pages
ML 2 ND Unit
No ratings yet
ML 2 ND Unit
50 pages
Broadly, There Are 3 Types of Machine Learning Algorithms.
No ratings yet
Broadly, There Are 3 Types of Machine Learning Algorithms.
33 pages
ML CLASS 5 Logistic Regression Algorithm
No ratings yet
ML CLASS 5 Logistic Regression Algorithm
16 pages
Sonia Jessica - 2022 - How Does Logistic Regression Work
No ratings yet
Sonia Jessica - 2022 - How Does Logistic Regression Work
4 pages
Lecture 08
No ratings yet
Lecture 08
42 pages
ML 7th Sem AIML ITE Notes Complete LONG (1) - 34-62
No ratings yet
ML 7th Sem AIML ITE Notes Complete LONG (1) - 34-62
29 pages
Regression Vs Classification in Machine Learning Explained!
No ratings yet
Regression Vs Classification in Machine Learning Explained!
10 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
No ratings yet
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
9 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
21 pages
B-56 Sanket Jambhulkar MLA-3
No ratings yet
B-56 Sanket Jambhulkar MLA-3
7 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
53 pages
Machinelearning Algorithm Basics2 NOTES
No ratings yet
Machinelearning Algorithm Basics2 NOTES
72 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
3 - SupervisedIntro
No ratings yet
3 - SupervisedIntro
80 pages
2-Machine Learning Algorithms
No ratings yet
2-Machine Learning Algorithms
16 pages
Mid Semester Project Review UditSoni
No ratings yet
Mid Semester Project Review UditSoni
25 pages
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
No ratings yet
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
153 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
Lecture Material 11
No ratings yet
Lecture Material 11
14 pages
28 - AI-Regression vs. Classification
No ratings yet
28 - AI-Regression vs. Classification
35 pages
Intro To Linear and Logistic Reg
No ratings yet
Intro To Linear and Logistic Reg
5 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Simple Regression: Multiple-Choice Questions
No ratings yet
Simple Regression: Multiple-Choice Questions
36 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
Guidelines For Project Work On Field - 1 PDF
No ratings yet
Guidelines For Project Work On Field - 1 PDF
10 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Potter Methods of Foreign Policy Analysis
No ratings yet
Potter Methods of Foreign Policy Analysis
29 pages
New Batches Info: Quality Thought Ai-Data Science Diploma
No ratings yet
New Batches Info: Quality Thought Ai-Data Science Diploma
16 pages
Analysis of Variance
No ratings yet
Analysis of Variance
42 pages
Factors Influencing The Implementation of Music by "Annex" Preschool Teachers in Selangor
No ratings yet
Factors Influencing The Implementation of Music by "Annex" Preschool Teachers in Selangor
16 pages
Angela - Diloyan - The Importance of Communication in The Classroom - Capstone
No ratings yet
Angela - Diloyan - The Importance of Communication in The Classroom - Capstone
35 pages
Path Analysis Introduction and Example
No ratings yet
Path Analysis Introduction and Example
8 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Agglomerative Hierarchical Clustering
No ratings yet
Agglomerative Hierarchical Clustering
21 pages
Group 5 Capstone IKEA (Operation Management)
No ratings yet
Group 5 Capstone IKEA (Operation Management)
34 pages
EViews
No ratings yet
EViews
9 pages
Report NYC Taxi Operations Starter23
No ratings yet
Report NYC Taxi Operations Starter23
5 pages
Correlation
No ratings yet
Correlation
9 pages
6 - Bank Loan Analysis
No ratings yet
6 - Bank Loan Analysis
10 pages
Cardio Fitness Project
0% (1)
Cardio Fitness Project
2 pages
MAT 510 Week 7 Quiz Linear Regression Assignment Latest 100% Score
No ratings yet
MAT 510 Week 7 Quiz Linear Regression Assignment Latest 100% Score
6 pages
X11 ARIMA Manual
No ratings yet
X11 ARIMA Manual
120 pages
Examining Relationships in Quantitative Research
No ratings yet
Examining Relationships in Quantitative Research
9 pages
Keyword Clustering
No ratings yet
Keyword Clustering
15 pages
Regression Analysis
No ratings yet
Regression Analysis
1 page
Assignment 1
No ratings yet
Assignment 1
24 pages
Sta780 Wk3 One Way Manova-Spss
No ratings yet
Sta780 Wk3 One Way Manova-Spss
12 pages
Mansci Ass
No ratings yet
Mansci Ass
12 pages
6.867 Machine Learning: Mid-Term Exam October 13, 2004
No ratings yet
6.867 Machine Learning: Mid-Term Exam October 13, 2004
11 pages
Post Hoc Test: Descriptives
No ratings yet
Post Hoc Test: Descriptives
3 pages
Categorical Variables in Linear Regression Models
No ratings yet
Categorical Variables in Linear Regression Models
9 pages
AIML 2nd IA Question Bank
No ratings yet
AIML 2nd IA Question Bank
2 pages
TMDB Box Office Prediction: Group 6
No ratings yet
TMDB Box Office Prediction: Group 6
7 pages
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet