Machine Learning - Session4
Machine Learning - Session4
AIBM-Session-4
What is Classification?
What is needed for classification?
• Model data with:
• Features that can be quantified
• Labels that are known
www.proschoolonline.com
Continuous Vs. Categorical Variable
• General linear regression model
• y= 𝛽0 + 𝛽1 x 1 + 𝛽2x2 + 𝜀
• Independent variable(x’s):
• Continuous: Age, income, height -> Uses Numerical value
• Categorical: gender, city, ethnicity -> Uses dummies for example: For Male use “0” and for female “1”
www.proschoolonline.com
Example
• Netflix conducted a marketing activity on its 500 customers out of which some customers subscribed the
channel whereas some did not. Now, Netflix wants to analyse the success of their marketing campaign.
They have taken a sample of 20 customers and want to analyse the results. Age Subscription
62 1
• Subscribe: Indicates a customer has subscribed to a magazine. 18 0
• Age(Continuous variable): Examine how age influences the likelihood of subscription 40 0
51 1
37 1
47 1
32 0
49 1
55 1
52 1
52 1
33 1
41 0
44 0
51 1
52 1
36 0
35 0
30 0
39 0
www.proschoolonline.com
A linear Model?
• For the above model we can also use the linear model. Only problem we may face is that the dependent
variable is binary instead of continuous.
• If we want to use the linear model for this problem , then we need to change the variable “No” to “0” and
variable “Yes” to “1” and whenever customer changing from 0 to 1, it increases the likelihood of
subscription.
www.proschoolonline.com
Result of Linear Model
• We solved this model using Linear Regression function using Data Analysis tool
www.proschoolonline.com
Interpretation of Result
• If our dependent variable is binary, then we want to see what makes it change from 0 to 1.
• This can be interpreted as what increases the likelihood of subscription, or P(subscription = 1), which we
can also simply denote as p.
• The result can be interpreted as:
𝑝 subscribe= 1 = 𝑝 = -0.866 + 0.03 * age
• Every additional year of age increases the probability of subscription by 3%.
www.proschoolonline.com
Problems with the linear Approach
• The Probabilities are bounded between (0 ≤ 𝑝 ≤ 1)
• The range of age in our data is between 18 ≤ a𝑔𝑒 ≤ 62 so, the youngest customer is 18 year old and the
oldest customer is 62 year old.
• It only makes sense to develop a forecasts for observations similar to the ones we have in our data
• Lets assume that the probability of a 40 year old person subscribe is:
P = −0.866 + 0.03 ∗ 40 = 0.334
• What about people with 26 and 57 years of age?
If we plug in 26 we find that the probability that this customer buys is estimated P = −0.866 +0.03 ∗ 26 = −0.005
to be -0.005 and this cannot be correct since a probability cannot have a
negative value.
If we plug in 57 we end up with the number of 1.01 which is greater than 1 on P = −0.866 +0.03 ∗ 57 = 1.01
came an invalid value for probability this becomes more clear.
www.proschoolonline.com
Linear Model
• If we plot the observation, the probabilities should go from 0 to 1 but considering the Netflix example,
lets say If the customers are young, below 27 years of age the estimated probabilities are observed to be
negative.
• Meanwhile if the customer has more than 57 years of age the estimated probabilities are greater than 1.
• The below model is not working, how could we fix this one opportunity to artificially cap the linear model
and say whenever the estimator probability below 0 make it 0 and whenever the estimated probability is
Subscription
above 1 make it 1.
1.4
1.2
0.8
0.6
0.4
0.2
0
The intercept is 0 10 20 30 40 50 60 70
-0.08 -0.2
-0.4
www.proschoolonline.com
Linear Model
• The one shown with those breaks in the function but this is too engineered way to custom to be a
standard approach
• Could we do something better and let's think what should we do to fix this again note that probabilities
should be between 0 and 1
The intercept
is -0.08
www.proschoolonline.com
Fixing the Prior Approach
• We need to somehow constrain p such that 0 ≤ 𝑝 ≤ 1
• We know p = f(age), but the linear function didn’t work.
• What must f( ) satisfy to always produce reasonable forecasts?
• f( ) must satisfy two things:
It must always be positive (since p ≥ 0)
It must be less than 1 (since p ≤ 1)
www.proschoolonline.com
Two Steps!
• Need to develop a new function that will satisfy these two criteria
• It must always be positive (since p ≥ 0)
• What functions could give you a positive numbers
The absolute value of a number
The squared version of number
• The alternative to this is an exponential form
• 𝑝 = exp 𝛽0 + 𝛽1 ∗ a𝑔𝑒 =
• For example if 𝛽0 + 𝛽1 ∗ a𝑔𝑒 is -2, then exp(-2) = 0.136 (Use excel function “exp” to find exponential
value.
• It must be less than 1 (since p ≤ 1)
• For example if exp 𝛽0 + 𝛽1 ∗ a𝑔𝑒 is 1.2 , to make it less than one , we can do : 1.2/(1.2+1) = 1.2/2.2
www.proschoolonline.com
The linear thinking is not completely gone
• The previous expression (by doing some algebra) can be rewritten as:
p
• ln = β0 + β1 ∗ age
1−p
• P being the result of the prior expression is equal to a linear function of age that looks just like the linear
simple regression models.
• Even though the probability of the customer subscribing (p) is not linear function of age, we can perform
a simple transformation on it such that it is now a linear function of age.
• The above equation is used in Logistic Regression.
www.proschoolonline.com
99% accuracy
Error Measurement
Unsupervised Learning
Adjust to new mean of the clusters