Dav Exp4 66
Dav Exp4 66
THEORY: Logistic regression is a supervised machine learning algorithm used for classification tasks
where the goal is to predict the probability that an instance belongs to a given class or not. Logistic
regression is a statistical algorithm which analyze the relationship between two data factors. The article
explores the fundamentals of logistic regression, it’s types and implementations.
Logistic regression is used for binary classification where we use sigmoid function, that takes input as
independent variables and produces a probability value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic function for an input is
greater than 0.5 (threshold value) then it belongs to Class 1 it belongs to Class 0. It’s referred to as
regression because it is the extension of linear regression but is mainly used for classification problems.
Key Points:
Logistic regression predicts the output of a categorical dependent variable. Therefore, the outcome must
be a categorical or discrete value.
It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it
gives the probabilistic values which lie between 0 and 1.
In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic function, which
predicts two maximum values (0 or 1).
The sigmoid function is a mathematical function used to map the predicted values to probabilities.
It maps any real value into another value within a range of 0 and 1. The value of the logistic regression
must be between 0 and 1, which cannot go beyond this limit, so it forms a curve like the “S” form.
The S-form curve is called the Sigmoid function or the logistic function.
In logistic regression, we use the concept of the threshold value, which defines the probability of either 0
or 1. Such as values above the threshold value tends to 1, and a value below the threshold values tends
to 0.
On the basis of the categories, Logistic Regression can be classified into three types:
Binomial: In binomial Logistic regression, there can be only two possible types of the dependent
variables, such as 0 or 1, Pass or Fail, etc.
Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent
variables, such as “low”, “Medium”, or “High”.
CODE:
import numpy as np
import pandas as pd
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
print(X_train)
print(y_train)
print(X_test)
print(y_test)
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
print(X_train)
print(X_test)
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.25),
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.25),
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.xlabel('Age')
plt.legend()
plt.show()