Logistic Regression

Logistic regression is a classification algorithm that can be used for binary and multiclass classification problems. It is similar to linear regression but uses a logistic function to map output values between 0 and 1 instead of any real number. The logistic function, also known as the sigmoid function, ensures the output of the logistic regression model is a probability between 0 and 1. Logistic regression defines a decision boundary where probabilities greater than 0.5 are classified as true and less than 0.5 as false.

Uploaded by

Parthasarathi Hazra

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Logistic Regression

Uploaded by

Parthasarathi Hazra

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Logistic regression

Logistic regression, despite its name, is a classification model rather than regression model. Logistic
regression is a simple and more efficient method for binary and linear classification problems. It is a
classification model, which is very easy to realize and achieves very good performance with linearly
separable classes. It is an extensively employed algorithm for classification in industry.
What is a sigmoid function?
The logistic function in linear regression is a type of sigmoid, a class of functions with the same
specific properties.
Sigmoid is a mathematical function that takes any real number and maps it to a probability between
1 and 0.
The formula of the sigmoid function is:

The sigmoid function forms an S shaped graph, which means as xx approaches infinity, the
probability becomes 1, and as xx approaches negative infinity, the probability becomes 0. The model
sets a threshold that decides what range of probability is mapped to which binary variable.
Suppose we have two possible outcomes, true and false, and have set the threshold as 0.5. A
probability less than 0.5 would be mapped to the outcome false, and a probability greater than or
equal to 0.5 would be mapped to the outcome true.
Example
Suppose, a regression model is fit using some training data to obtain β and x represents the input
features:
Significance of Sigmoid Function in Logistic Regression
Sigmoid Function In Logistic Regression Is An Advanced Regression Technique That Can Solve Various
Classification Problems. Being A Classification Model, It Is Termed “Regression” Because The
Fundamental Techniques Are Similar To Linear Regression.
Binary Classification Problems Like A Tumour Is Malignant Or Not, An Email Is Spam Or Not And
Multiclass Classification Problems Like Classifying A Random Fruit As Apple, Mango Or Orange Can Be
Solved Using Logistic Regression.
In The Linear Regression Problems We Can Simply Predict A Non Discrete Outcome (∈R) To An Example
By Fitting The Best Line To The Data Given There’s A Hypothesis (H) That Maps X To Y.

Non Discrete Outcome

In Logistic Regression Where The Outcome Always Belongs To A Set Y, Consisting Discrete Values (Say
0 Or 1 Only)
A Few Concepts You Should Know Before Diving Deeper
Hypothesis:
Hypothesis Is An Instance Or Model That Maps Inputs To Outputs And Can Be Evaluated And Used To
Make Predictions.

Univariate Linear Regression Hypothesis

For Example, For Univariate Linear Regression Hypothesis Can Be Given As Hθ(X) = Θ0 + Θ1(X) Where
Θ0 And Θ1 Are The Features.
If We Generalize This Hypothesis, We Get

And On Vectorizing The Above Equation We Get :

We Then Find Out The Maximum Cost [Say J(Θ)] Our Model Has To Pay Each Time We Run It. In Our
Attempt To Minimize This Cost Function J(Θ), We Use Algorithms Such As Gradient Descent And Other
Complex Algorithms Such As Conjugate Gradient, BFGS, L-BFGS.
Gradient Descent
It Is A Function Used To Minimize The Cost Function [J(Θ)]
A Simple Gradient Descent Algorithm For Univariate Linear Regression Can Be Given As:
Repeat Until Global Minima Is Achieved
{
Θj := Θj – Α (Partial Derivative Of J(Θj) ) For J=0,1
Simultaneously Update For J=0 And 1
}

What If We Use Linear Regression For Classification Problems?

Let’s Say We Have A Binary Classification Problem, The Outcomes Belong To The Set {0,1}
On Using Linear Regression, We May Get
Hθ(X) > 1 Or Hθ(X) <0 , Which Seems Absurd If We Are Solving A Binary Classification Problem.
So We Have To Restrict Our Hypothesis To A Function That Maps X To Y Such That Y Lies Between 0
And 1. We Achieve This Restriction Using Sigmoid Function.
Let’s Know What Sigmoid Function Is And What Is Its Nature.
Sigmoid Function
A Sigmoid Function Can Be Mathematically Written As

If We Plot This Function In On a Cartesian plane, We Get The Following Graph:

Figure: Sigmoid Curve
The Sigmoid Curve Is Asymptotic At 1 And 0. It Can Restrict The Outcome Value To Lie In Between 0
And 1 No Matter What Is The Value Of X.

We See That Even Though X Ranges From -∞ To ∞. The Outcome Remains In The Range 0-1 With The
Value 0.5 When X=0.
Let’s See How We Can Use This Function To Tune Linear Regression? Which Is An Algorithm Used For
Prediction In Logistic Regression That Can Be Used For Classification Problems? And How We Can
Define The Decision Boundary?
We Want Our Hypothesis To Follow The Condition 0≤Hθ(X)≤1. As Discussed Above, The Vectorized
General Form Of Linear Regression Hypothesis Is And If We Apply Sigmoid Function To It:

Where
So Our Hypothesis Becomes
We Prefer Sigmoid Function Over Other Functions Of Similar Nature Because The Loss Function Is
Smaller In The Case Of Sigmoid Function.
Defining A Decision Boundary In Sigmoid Function
Now That We Have Applied The Sigmoid Function To The Linear Regression Hypothesis And Have
Obtained A Hypothesis For Logistic Regression . Let’s See How The Classification Will Be Possible.

We Can Predict,
Y=1 If Hθ(X)≥0.5
Y=0 If Hθ(X)<0.5
Advantages and Disadvantages of Logistic Regression
Advantages Disadvantages

If the number of observations is lesser than the

Logistic regression is easier to implement, number of features, Logistic Regression should
interpret, and very efficient to train. not be used, otherwise, it may lead to overfitting.

It makes no assumptions about distributions of

classes in feature space. It constructs linear boundaries.

It can easily extend to multiple The major limitation of Logistic Regression is the
classes(multinomial regression) and a natural assumption of linearity between the dependent
probabilistic view of class predictions. variable and the independent variables.

It not only provides a measure of how

appropriate a predictor(coefficient size)is, but It can only be used to predict discrete functions.
also its direction of association (positive or Hence, the dependent variable of Logistic
negative). Regression is bound to the discrete number set.

Non-linear problems can’t be solved with logistic

regression because it has a linear decision surface.
Linearly separable data is rarely found in real-
It is very fast at classifying unknown records. world scenarios.