0% found this document useful (0 votes)
12 views39 pages

CO 2 Session 3

The document provides an overview of Logistic Regression, including its introduction, use cases in various fields, model description, and diagnostics such as Deviance and ROC Curve. It also discusses additional regression models and classification methods, particularly Decision Trees, emphasizing their interpretability and efficiency. The chapter concludes with self-assessment and terminal questions for further learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views39 pages

CO 2 Session 3

The document provides an overview of Logistic Regression, including its introduction, use cases in various fields, model description, and diagnostics such as Deviance and ROC Curve. It also discusses additional regression models and classification methods, particularly Decision Trees, emphasizing their interpretability and efficiency. The chapter concludes with self-assessment and terminal questions for further learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

COURSE TITLE: BIG DATA ANALYTICS

COURSE CODE: 21CS3275A

TOPIC: ADVANCED ANALYTICAL THEORY AND METHODS:


LOGISTIC REGRESSION AND CLASSIFICATION

CO-2-Session –3
CONTENTS

 Logistic Regression Introduction

 Logistic Regression Use cases

 Logistic regression example

 Logistic Regression Model Description

 Diagnostics Deviance and the Pseudo-R2

 Receiver Operating Characteristic (ROC) Curve

 Reasons to Choose and Cautions

2
CONTENTS

 Additional Regression Models

 Classification

3
LOGISTIC REGRESSION INTRODUCTION

 In linear regression modeling, the outcome variable is continuous – e.g., income ~ age and
education

 In logistic regression, the outcome variable is categorical, and this chapter focuses on two-
valued outcomes like true/false, pass/fail, or yes/no

4
LOGISTIC REGRESSION USE CASES

 Medical

• Probability of a patient’s successful response to a specific medical treatment – input


could include age, weight, etc.

 Finance : Probability an applicant defaults on a loan

 Marketing

• Probability a wireless customer switches carriers (churns)

 Engineering : Probability a mechanical part malfunctions or fails

5
LOGISTIC REGRESSION

 Used for binary classification (0 / 1)

 It is supervised learning

 Predicting discrete values ( yes /no)

 Threshold = 0.5

 Data needs to be linearly separable

 3 main steps:

1. Calculate logistic function 2. Learn coefficients 3. Make predictions

6
LOGISTIC REGRESSION

SAMPLE DATA

STEP1:
LOGISTIC FUNCTION / SIGMOID
FUNCTION

1/1+e-X

7
LOGISTIC REGRESSION

 Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical dependent
variable using a given set of independent variables.

 Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.

8
LOGISTIC REGRESSION

 Logistic Regression is much similar to the Linear Regression except that how they are used.
Linear Regression is used for solving Regression problems, whereas Logistic regression is
used for solving the classification problems.

 In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).

 The curve from the logistic function indicates the likelihood of something such as whether
the cells are cancerous or not, a mouse is obese or not based on its weight, etc.

9
LOGISTIC REGRESSION

 Logistic Regression is a significant machine learning algorithm because it has the ability to
provide probabilities and classify new data using continuous and discrete datasets.

 Logistic Regression can be used to classify the observations using different types of data
and can easily determine the most effective variables used for the classification. The below
image is showing the logistic function:

10
LOGISTIC REGRESSION

11
LOGISTIC REGRESSION MODEL DESCRIPTION

 Logical regression is based on the logistic function

ey
f(y) = for -∞ < y < ∞
1 + ey

• As y -> infinity, f(y)->1;


and as y->-infinity, f(y)->0

12
LOGISTIC REGRESSION MODEL DESCRIPTION

 With the range of f(y) as (0,1), the logistic function models the probability of an outcome
occurring
• In contrast to linear regression, the values of y
Y = ᵦ0+ᵦ₁x1 + ᵦ₂x2 + ... + ᵦp-1 Xp-1 are not directly observed; only the values of
ey f(y) in terms of success or failure are observed.
P (x1,x2……….;xp-1) = 1 + ey • Called log odds ratio, or logit of p. Maximum
for -∞ < y < ∞
Likelihood Estimation (MLE) is used to

In (p/(1-p)) = Y = ᵦ0+ᵦ₁x1 + ᵦ₂x2 + ... + ᵦp-1 Xp-1 estimate model parameters. MLR is beyond
the scope of this book.

13
LOGISTIC REGRESSION MODEL DESCRIPTION:
CUSTOMER CHURN EXAMPLE

14
DIAGNOSTICS MODEL DESCRIPTION: CUSTOMER
CHURN EXAMPLE

> head(churn_input) # Churned = 1 if cust churned

> sum(churn_input$Churned) # 1743/8000 churned

#Use the Generalized Linear Model function glm()

>Churn_logistic1<glm(Churned~Age+Married+Cust_years+Churned_contacts,data=c
hurn_input,family=binomial(link=“logit”))

> summary(Churn_logistic1) # Age + Churned_contacts best

15
DIAGNOSTICS MODEL DESCRIPTION:
CUSTOMER CHURN EXAMPLE

>Churn_logistic3<glm(Churned~Age+Churned_contacts,data=churn_input,family=bi
nomial(link=“logit”))

> summary(Churn_logistic3) # Age + Churned_contacts

y = 3.50 -0.16* Age +0.38* Churned_contacts

16
DIAGNOSTICS DEVIANCE AND THE PSEUDO-R2
 In logistic regression, deviance = -2logL

• where L is the maximized value of the likelihood function used to obtain the parameter
estimates

 Two deviance values are provided

• Null deviance = deviance based on only the y-intercept term

• Residual deviance = deviance based on all parameters

 Pseudo-R2 measures how well fitted model explains the data

• Value near 1 indicates a good fit over the null model

17
DIAGNOSTICS DEVIANCE AND THE PSEUDO-R2

Y = ᵦ0+ᵦ₁*Age+ ᵦ₂*Churned_Contacts

A metric analogous to R2 in linear regression can be computed as shown

Residual dev. Residual dev. - Res dev.


Pseudo - R2= 1 - null dev.
=
null dev.

18
DIAGNOSTICS RECEIVER OPERATING
CHARACTERISTIC (ROC) CURVE
 Logistic regression is often used to classify

• In the Churn example, a customer can be classified as Churn if the model predicts high
probability of churning
• Although 0.5 is often used as the probability threshold, other values can be used based
on desired error tradeoff
 For two classes, C and nC, we have

• True Positive: predict C, when actually C

19
DIAGNOSTICS RECEIVER OPERATING
CHARACTERISTIC (ROC) CURVE
• True Negative: predict nC, when actually nC

• False Positive: predict C, when actually nC

• False Negative: predict nC, when actually C

# of false positives
False Positive Rate (FPR) =
# of negatives

# of true positives
True Positive Rate (TPR) =
# of positives

20
REASONS TO CHOOSE AND CAUTIONS

 Linear regression – outcome variable continuous

 Logistic regression – outcome variable categorical

 Both models assume a linear additive function of the inputs variables

• If this is not true, the models perform poorly

• In linear regression, the further assumption of normally distributed error terms is


important for many statistical inferences

 Although a set of input variables may be a good predictor of an output variable,


“correlation does not imply causation”

21
ADDITIONAL REGRESSION MODELS

 Multicollinearity is the condition when several input variables are highly correlated

• This can lead to inappropriately large coefficients

 To mitigate this problem

• Ridge regression applies a penalty based on the size of the coefficients

• Lasso regression applies a penalty proportional to the sum of the absolute values of the
coefficients

 Multinomial logistic regression – used for a more-than-two-state categorical outcome


variable

22
CLASSIFICATION

 Classification is widely used for prediction

 Most classification methods are supervised

 This chapter focuses on two fundamental classification methods

• Decision trees

• Naïve Bayes

23
DECISION TREES

 Decision Trees (DTs) are a non-parametric supervised learning method used for
classification and regression.

 Decision trees learn from data to approximate a sine curve with a set of if-then-else
decision rules. The deeper the tree, the more complex the decision rules and the fitter the
model.

 Tree structure specifies sequence of decisions

• Given input X={x1, x2,…, xn}, predict output Y

• Input attributes/features can be categorical or continuous

24
DECISION TREES

 Node = tests a particular input variable

• Root node, internal nodes, leaf nodes return class labels

• Depth of node = minimum steps to reach node


 Branch (connects two nodes) = specifies decision

 Two varieties of decision trees

• Classification trees: categorical output, often binary

• Regression trees: numeric output

25
DECISION TREES

• Example of a decision tree

• Predicts whether customers will buy a product

26
SIMPLE DECISION TREES

27
SIMPLE DECISION TREES

• Example: will bank client subscribe to term deposit?

28
PRUNING

• The shortening of branches of the tree. Pruning is the process of reducing the size of the
tree by turning some branch nodes into leaf nodes, and removing the leaf nodes under the
original branch. Pruning is useful because classification trees may fit the training data
well, but may do a poor job of classifying new values. A simpler tree often avoids over-
fitting.

29
PRUNING EXAMPLE

30
WHY DECISION TREES?

 Decision trees are great for several reasons:

 They are easily interpretable and follows a similar pattern to human thinking. In other
words, you can explain a decision tree as a set of questions/business rules.

 Prediction is fast.

 It’s a set of comparison operations until you reach a leaf node.

 Can be adapted to deal with missing data without imputing data

31
DECISION TREE : THE GENERAL ALGORITHM

 Construct a tree T from training set S

 Requires a measure of attribute information

• Simplistic method (data from previous Fig.)

• Purity = probability of corresponding class

• E.g., P(no)=1789/2000=89.45%, P(yes)=10.55%

• Entropy methods

• Entropy measures the impurity of an attribute

• Information gain measures purity of an attribute

32
DECISION TREE : THE GENERAL ALGORITHM

 Entropy methods of attribute information

Hx = the entropy of X

Hx

 Information gain of an attribute = base entropy – conditional entropy

InfoGainA=Hs- HSA

33
DECISION TREE : THE GENERAL ALGORITHM

 Construct a tree T from training set S

• Choose root node = most informative attribute A

• Partition S according to A’s values

• Construct subtrees T1, T2… for the subsets of S recursively until one of following occurs

• All leaf nodes satisfy minimum purity threshold

• Tree cannot be further split with min purity threshold

• Other stopping criterion satisfied – e.g., max depth

34
SUMMARY

In this Chapter explained about the Logistic Regression and Use cases,
Diagnostics Deviance and the Pseudo-R2 and Classification.

35
SELF-ASSESSMENT QUESTIONS

1. Logistic function is also called as _______

(a) Cost Function


(b) Corresponding Function
(c) Linear Function
(d) Sigmoid Function

2. Logistic Regression is an example of__________

(a) Supervised Learning : Regression


(b) Classification
(c) Unsupervised Learning
(d) Semi – Supervised Learning
TERMINAL QUESTIONS

1. What is Logistics Regression?


2. Explain the use cases of Logistic Regression?
3. Explain about Diagnostics Deviance and the Pseudo-R2 ?
4. Describe about Diagnostics Receiver Operating Characteristic (ROC) Curve?
5. What is Classification?
6. What is Pruning?
7. Describe about the Decision Trees?
8. What is General Algorithm in Decision Tree?
REFERENCES FOR FURTHER LEARNING OF THE SESSION

Reference Books:
1. Data science and big data analytics: discovering, analyzing, visualizing and presenting data – EMC
Education Services
2. Tom White, “Hadoop The Definitive Guide”, O’Reilly Publications, Fourth Edition,2015
3. Seema Acharya, Subhashini Chellappan, “Big Data and Analytics”, Wiley Publications, First
Edition,2015
Sites and Web links:
4. https // www.geeksforgeeks.org
5. Big Data Analytics www.javatpoint.com
6. https://www.analyticsvidhya.com/
THANK YOU

Team – Big Data Analytics

39

You might also like