13 Logistic Regression Main
13 Logistic Regression Main
Regression
Overview of
Logistic Regression
● Extends idea of linear regression to situation where outcome variable is categorical
● Widely used, particularly where a structured model is useful to explain (=profilling ) or to predict
● We focus on binary classification i.e Y=o or Y=1.
What is
Logistic Regression
Logistic regression is one of the most popular Machine Learning algorithms, which
comes under the Supervised Learning technique. It is used for predicting the categorical
dependent variable using a given set of independent variable
Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome must be a categorical or discrete value. It can be either
Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between 0 and 1.
In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function, which predicts two maximum values (0 or 1).
Assumption in
Logistic Regression
● The dependent variable must be categorical in nature.
● absence of multicollinearity
Sigmoid
Function
● The sigmoid function is a mathematical function used to map the predicted values to probabilities.
● The value of the logistic regression must be between 0 and 1, which cannot go beyond this limit, so it forms a curve like the "S" form. The S-form
● In logistic regression, we use the concept of the threshold value, which defines the probability of either 0 or 1. Such as values above the threshold
● The logit can be mapped back to a probability , which in turn , can be mapped to a class .
P= probability of event
Or, given the odds of an event, the probability of the event can be
computed by :
log(Odds) = logit
On the basis of categorical variable in target variable we classify it into three types :
● Binomial: In Binomial logistic regression there can be two possible categorical dependent variable
● .Multinomial: In multinomial Logistic regression, there can be three or more possible unordered types of the dependent variable,
● Ordinal: In ordinal Logistic regression there can be three or more possible ordered types of dependent variables, such as "First",
“Second", or "Third".
Performance metrics for classification on ML
models
Evaluation is always used to evaluate the model.There are many types to evaluate the metrics.
Confusion metrics is useful machine learning which allows to measure the recall,precision,f1-score.
Let’s start with the example of confusion matrix for a binary classification.we can perform more than two classification also.
Here we have two classes with many cases like True positive,True negative,false positive and false negative.
Performance metrics for classification on ML
models
True positive:There is the cases where we predict true and the actual result is also true,
True negative:There is the cases where we predict no and the actual is also no.
Recall :recall is calculated as the number of true positives divided by the total number of true positives and false negatives.
Performance metrics for classification on ML
models
F score : The F-score, also called the F1-score, is a measure of a model’s accuracy on a dataset. It is used to evaluate binary