0% found this document useful (0 votes)
30 views

Machine Learning - Session4

- Classification involves using machine learning algorithms to predict categorical class labels. It requires labeled training data with known classes. - Logistic regression can be used for classification problems where the dependent variable is binary or categorical. It predicts the probability of an observation belonging to a class. - Unlike linear regression, which predicts continuous output values, logistic regression's output is a probability value between 0 and 1, making it suitable for problems with binary dependent variables.

Uploaded by

Deepam Mohindra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Machine Learning - Session4

- Classification involves using machine learning algorithms to predict categorical class labels. It requires labeled training data with known classes. - Logistic regression can be used for classification problems where the dependent variable is binary or categorical. It predicts the probability of an observation belonging to a class. - Unlike linear regression, which predicts continuous output values, logistic regression's output is a probability value between 0 and 1, making it suitable for problems with binary dependent variables.

Uploaded by

Deepam Mohindra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Supervised Learning : Classification

Unsupervised Learning : Clustering

AIBM-Session-4
What is Classification?
What is needed for classification?
• Model data with:
• Features that can be quantified
• Labels that are known

• Method to measure similarity


Logistic Regression
• Introduction to Logistic Regression
• Linear Regression vs. Logistic Regression
• Using Logistic Regression for Classification
• Error Measurement
Predicting Customer Churn
Customer Churn occurs when subscribers or customers stop doing
business with a company or service. A business typically treats a
customer as churned once a specific amount of time has passed since
the customers last interaction with the business or service.
Linear Regression for Classification
Logistic Regression
• In both the tables, Age and Gender are independent variable.
• In Table 1, we are trying to predict the revenue which is a continuous dependent variable.
• In Table 2, we are trying to predict the probability of subscription based on gender and age which is a
binary dependent variable.

Age Gender Revenue Age Gender Subscription Predict the


65 MALE 69806 Predict the 65 MALE 1 probability of
23 MALE 25256 revenue amount 23 MALE 1
53 FEMALE 14091 53 FEMALE 0 subscription
using linear
45 MALE 17176 45 MALE 0 using Logistic
regression.
49 FEMALE 45134 49 FEMALE 1 regression.
38 FEMALE 38106 38 FEMALE 0
32 FEMALE 30865 32 FEMALE 1
22 FEMALE 31838 22 FEMALE 1
39 MALE 37286 39 MALE 0
18 FEMALE 36391 18 FEMALE 0

www.proschoolonline.com
Continuous Vs. Categorical Variable
• General linear regression model
• y= 𝛽0 + 𝛽1 x 1 + 𝛽2x2 + 𝜀
• Independent variable(x’s):
• Continuous: Age, income, height -> Uses Numerical value
• Categorical: gender, city, ethnicity -> Uses dummies for example: For Male use “0” and for female “1”

• Dependent Variable (y):


• Continuous: consumption, time spent -> Uses Numerical value
• Categorical: Yes/No -> Uses dummies

www.proschoolonline.com
Example
• Netflix conducted a marketing activity on its 500 customers out of which some customers subscribed the
channel whereas some did not. Now, Netflix wants to analyse the success of their marketing campaign.
They have taken a sample of 20 customers and want to analyse the results. Age Subscription
62 1
• Subscribe: Indicates a customer has subscribed to a magazine. 18 0
• Age(Continuous variable): Examine how age influences the likelihood of subscription 40 0
51 1
37 1
47 1
32 0
49 1
55 1
52 1
52 1
33 1
41 0
44 0
51 1
52 1
36 0
35 0
30 0
39 0

www.proschoolonline.com
A linear Model?
• For the above model we can also use the linear model. Only problem we may face is that the dependent
variable is binary instead of continuous.
• If we want to use the linear model for this problem , then we need to change the variable “No” to “0” and
variable “Yes” to “1” and whenever customer changing from 0 to 1, it increases the likelihood of
subscription.

• So, then we can run a simple linear model


𝑆𝑢bscrib𝑒 = 𝛽0 + 𝛽1 ∗ a𝑔𝑒 + 𝜀

www.proschoolonline.com
Result of Linear Model

• We solved this model using Linear Regression function using Data Analysis tool

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept -0.87 0.37 -2.37 0.03 -1.63 -0.10
Age 0.03 0.01 3.99 0.00 0.02 0.05

• The estimated model is Subscribe = −𝟎. 𝟖𝟔 + 𝟎. 𝟎𝟑 ∗ age

www.proschoolonline.com
Interpretation of Result
• If our dependent variable is binary, then we want to see what makes it change from 0 to 1.
• This can be interpreted as what increases the likelihood of subscription, or P(subscription = 1), which we
can also simply denote as p.
• The result can be interpreted as:
𝑝 subscribe= 1 = 𝑝 = -0.866 + 0.03 * age
• Every additional year of age increases the probability of subscription by 3%.

www.proschoolonline.com
Problems with the linear Approach
• The Probabilities are bounded between (0 ≤ 𝑝 ≤ 1)
• The range of age in our data is between 18 ≤ a𝑔𝑒 ≤ 62 so, the youngest customer is 18 year old and the
oldest customer is 62 year old.
• It only makes sense to develop a forecasts for observations similar to the ones we have in our data
• Lets assume that the probability of a 40 year old person subscribe is:
P = −0.866 + 0.03 ∗ 40 = 0.334
• What about people with 26 and 57 years of age?

If we plug in 26 we find that the probability that this customer buys is estimated P = −0.866 +0.03 ∗ 26 = −0.005
to be -0.005 and this cannot be correct since a probability cannot have a
negative value.
If we plug in 57 we end up with the number of 1.01 which is greater than 1 on P = −0.866 +0.03 ∗ 57 = 1.01
came an invalid value for probability this becomes more clear.

www.proschoolonline.com
Linear Model
• If we plot the observation, the probabilities should go from 0 to 1 but considering the Netflix example,
lets say If the customers are young, below 27 years of age the estimated probabilities are observed to be
negative.
• Meanwhile if the customer has more than 57 years of age the estimated probabilities are greater than 1.
• The below model is not working, how could we fix this one opportunity to artificially cap the linear model
and say whenever the estimator probability below 0 make it 0 and whenever the estimated probability is
Subscription
above 1 make it 1.
1.4

1.2

0.8

0.6

0.4

0.2

0
The intercept is 0 10 20 30 40 50 60 70
-0.08 -0.2

-0.4

www.proschoolonline.com
Linear Model
• The one shown with those breaks in the function but this is too engineered way to custom to be a
standard approach
• Could we do something better and let's think what should we do to fix this again note that probabilities
should be between 0 and 1

The intercept
is -0.08

www.proschoolonline.com
Fixing the Prior Approach
• We need to somehow constrain p such that 0 ≤ 𝑝 ≤ 1
• We know p = f(age), but the linear function didn’t work.
• What must f( ) satisfy to always produce reasonable forecasts?
• f( ) must satisfy two things:
It must always be positive (since p ≥ 0)
It must be less than 1 (since p ≤ 1)

www.proschoolonline.com
Two Steps!
• Need to develop a new function that will satisfy these two criteria
• It must always be positive (since p ≥ 0)
• What functions could give you a positive numbers
The absolute value of a number
The squared version of number
• The alternative to this is an exponential form
• 𝑝 = exp 𝛽0 + 𝛽1 ∗ a𝑔𝑒 =
• For example if 𝛽0 + 𝛽1 ∗ a𝑔𝑒 is -2, then exp(-2) = 0.136 (Use excel function “exp” to find exponential
value.
• It must be less than 1 (since p ≤ 1)

• For example if exp 𝛽0 + 𝛽1 ∗ a𝑔𝑒 is 1.2 , to make it less than one , we can do : 1.2/(1.2+1) = 1.2/2.2

www.proschoolonline.com
The linear thinking is not completely gone
• The previous expression (by doing some algebra) can be rewritten as:
p
• ln = β0 + β1 ∗ age
1−p
• P being the result of the prior expression is equal to a linear function of age that looks just like the linear
simple regression models.
• Even though the probability of the customer subscribing (p) is not linear function of age, we can perform
a simple transformation on it such that it is now a linear function of age.
• The above equation is used in Logistic Regression.

www.proschoolonline.com
99% accuracy
Error Measurement
Unsupervised Learning
Adjust to new mean of the clusters

You might also like