05 Logistic - Regression
05 Logistic - Regression
It’s a classification algorithm, that is used where the response variable is categorical. The idea of Logistic
Regression is to find a relationship between features and probability of particular outcome.
E.g. When we have to predict if a student passes or fails in an exam when the number of hours spent
studying is given as a feature, the response variable has two values, pass and fail.
If the probability is more than 50%, it assigns the value in that particular class else if the probability is less
than 50%, the value is assigned to the other class. Therefore, we can say that logistic regression acts as a
binary classifier.
But, when we use equation(i) to calculate probability, we would get values less than 0 as well as greater than 1.
That doesn’t make any sense . So, we need to use such an equation which always gives values between 0 and
1, as we desire while calculating the probability.
Sigmoid function
We use the sigmoid function as the underlying function in Logistic regression. Mathematically and graphically, it
is shown as:
1) The sigmoid function’s range is bounded between 0 and 1. Thus it’s useful in calculating the probability for
the Logistic function.
2) It’s derivative is easy to calculate than other functions which is useful during gradient descent calculation.
where, the left hand side is called the logit or log-odds function, and p(x)/(1-p(x)) is called odds.
The odds signifies the ratio of probability of success to probability of failure. Therefore, in Logistic
Regression, linear combination of inputs are mapped to the log(odds) - the output being equal to 1.
The cost function for the whole training set is given as :
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
dataset = pd.read_csv('Social_Network_Ads.csv')
In [3]:
dataset.head()
Out[3]:
X = dataset.drop(['Purchased','User ID','Gender'],axis=1)
y = dataset['Purchased']
In [5]:
X.shape,y.shape
Out[5]:
Splitting the dataset into the Training set and Test set
In [6]:
Feature Scaling
In [7]:
Out[8]:
LogisticRegression()
y_pred = classifier.predict(X_test)
In [10]:
Out[10]:
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
95 1 0
96 0 0
97 1 0
98 1 1
99 1 1
'c' argument looks like a single numeric RGB or RGBA sequence, which should
be avoided as value-mapping will have precedence in case its length matches
with 'x' & 'y'. Please use a 2-D array with a single row if you really want
to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should
be avoided as value-mapping will have precedence in case its length matches
with 'x' & 'y'. Please use a 2-D array with a single row if you really want
to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should
be avoided as value-mapping will have precedence in case its length matches
with 'x' & 'y'. Please use a 2-D array with a single row if you really want
to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should
be avoided as value-mapping will have precedence in case its length matches
with 'x' & 'y'. Please use a 2-D array with a single row if you really want
to specify the same RGB or RGBA value for all points.