0% found this document useful (0 votes)
88 views16 pages

Logistic Regression

Logistic regression is a machine learning classification algorithm that predicts the probability of categorical dependent variables based on independent variables. It uses a logistic function to map predictions between 0 and 1, like an S-curve. Logistic regression is similar to linear regression but is used for classification instead of regression. It estimates probabilities instead of providing exact values.

Uploaded by

Bhakti Betageri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views16 pages

Logistic Regression

Logistic regression is a machine learning classification algorithm that predicts the probability of categorical dependent variables based on independent variables. It uses a logistic function to map predictions between 0 and 1, like an S-curve. Logistic regression is similar to linear regression but is used for classification instead of regression. It estimates probabilities instead of providing exact values.

Uploaded by

Bhakti Betageri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Logistic Regression

Anita Shirol
Introduction

 Logistic regression is a supervised machine learning algorithm mainly used for


classification tasks where the goal is to predict the probability that an
instance of belonging to a given class.
 It is used for classification algorithms its name is logistic regression. it’s
referred to as regression because it takes the output of the linear regression
function as input.
 Uses a sigmoid function to estimate the probability for the given class.
 The difference between linear regression and logistic regression is that linear
regression output is the continuous value that can be anything while logistic
regression predicts the probability that an instance belongs to a given class or
not.
Cont.
 Logistic regression predicts the output of a categorical dependent variable. Therefore
the outcome must be a categorical or discrete value.
 It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact
value as 0 and 1, it gives the probabilistic values which lie between 0 and 1.
 Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
 In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
function, which predicts two maximum values (0 or 1).
 The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight,
etc.
 Logistic Regression is a significant machine learning algorithm because it has the ability
to provide probabilities and classify new data using continuous and discrete datasets.
 Logistic Regression can be used to classify the observations using different types of
data and can easily determine the most effective variables used for the classification.
Logistic Function (Sigmoid Function)
 The sigmoid function is a mathematical function used to map the predicted
values to probabilities.

 It maps any real value into another value within a range of 0 and 1. o The
value of the logistic regression must be between 0 and 1, which cannot go
beyond this limit, so it forms a curve like the “S” form.

 The S-form curve is called the Sigmoid function or the logistic function.

 In logistic regression, we use the concept of the threshold value, which


defines the probability of either 0 or 1. Such as values above the threshold
value tends to 1, and a value below the threshold values tends to 0.
Types
 Binomial: In binomial Logistic regression, there can be only two possible
types of the dependent variables, such as 0 or 1, Pass or Fail, etc.

 Multinomial: In multinomial Logistic regression, there can be 3 or more


possible unordered types of the dependent variable, such as “cat”, “dogs”, or
“sheep”

 Ordinal: In ordinal Logistic regression, there can be 3 or more possible


ordered types of dependent variables, such as “low”, “Medium”, or “High”.
Difference between Linear Regression and
Logistic Regression
Terminologies
 Independent variables: The input characteristics or predictor factors applied to the dependent
variable’s predictions.
 Dependent variable: The target variable in a logistic regression model, which we are trying to
predict.
 Logistic function: The formula used to represent how the independent and dependent variables
relate to one another. The logistic function transforms the input variables into a probability value
between 0 and 1, which represents the likelihood of the dependent variable being 1 or 0.
 Odds: It is the ratio of something occurring to something not occurring. it is different from
probability as the probability is the ratio of something occurring to everything that could possibly
occur.
 Log-odds: The log-odds, also known as the logit function, is the natural logarithm of the odds. In
logistic regression, the log odds of the dependent variable are modeled as a linear combination of
the independent variables and the intercept.
 Coefficient: The logistic regression model’s estimated parameters, show how the independent and
dependent variables relate to one another.
 Intercept: A constant term in the logistic regression model, which represents the log odds when all
independent variables are equal to zero.
 Maximum likelihood estimation: The method used to estimate the coefficients of the logistic
regression model, which maximizes the likelihood of observing the data given the model.
How does Logistic Regression work?

 The logistic regression model transforms the linear regression function continuous
value output into categorical value output using a sigmoid function, which maps any
real-valued set of independent variables input into a value between 0 and 1. This
function is known as the logistic function.
 Let the independent input features be:
Sigmoid
Equation
 Odds of occurrence
Likelihood function for Logistic Regression
Gradient of the log-likelihood function
Assumptions for Logistic Regression

 The assumptions for Logistic regression are as follows:


 Independent observations: Each observation is independent of the other.
meaning there is no correlation between any input variables.
 Binary dependent variables: It takes the assumption that the dependent
variable must be binary or dichotomous, meaning it can take only two values.
For more than two categories softmax functions are used.
 Linearity relationship between independent variables and log odds: The
relationship between the independent variables and the log odds of the
dependent variable should be linear.
 No outliers: There should be no outliers in the dataset.
 Large sample size: The sample size is sufficiently large
Binomial

Code
# import the necessary libraries
 from sklearn.datasets import load_breast_cancer
 from sklearn.linear_model import LogisticRegression
 from sklearn.model_selection import train_test_split
 from sklearn.metrics import accuracy_score
 # load the breast cancer dataset
 X, y = load_breast_cancer(return_X_y=True)
 # split the train and test dataset
 X_train, X_test,\
 y_train, y_test = train_test_split(X, y,
 test_size=0.20,
 random_state=23)
 # LogisticRegression
 clf = LogisticRegression(random_state=0)
 clf.fit(X_train, y_train)
 # Prediction
 y_pred = clf.predict(X_test)

 acc = accuracy_score(y_test, y_pred)


 print("Logistic Regression model accuracy (in %):", acc*100)
Multi class
 from sklearn.model_selection import train_test_split
 from sklearn import datasets, linear_model, metrics

 # load the digit dataset


 digits = datasets.load_digits()

 # defining feature matrix(X) and response vector(y)


 X = digits.data
 y = digits.target

 # splitting X and y into training and testing sets


 X_train, X_test,\
 y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)

 # create logistic regression object


 reg = linear_model.LogisticRegression()

 # train the model using the training sets


 reg.fit(X_train, y_train)

 # making predictions on the testing set


 y_pred = reg.predict(X_test)
Cost function

You might also like