0% found this document useful (0 votes)
5 views

Logistic Regression

document for logistic regression
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Logistic Regression

document for logistic regression
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

LOGISTIC REGRESSION.

It is used for predicting the categorical dependent variable using a given set of independent variables.
Logistic regression predicts the output of a categorical dependent variable ie. it is used for predicting
the categorical dependent variable using a given set of independent variables.
Logistic regression is a supervised machine learning algorithm mainly used for classification tasks
where the goal is to predict the probability that an instance of belonging to a given class or not.
Therefore, the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true
or False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie
between 0 and 1.
Logistic regression is named for the function used at the core of the method, the logistic function also
called the sigmoid function. It’s an S-shaped curve that can take any real-valued number and map it
into a value between 0 and 1, but never exactly at those limits.

EXAMPLE CODE.
This code uses the diabetes dataset which has been attached to this file.
1. Importing Libraries.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
- Imports necessary libraries for data manipulation, visualization, logistic regression, and model
evaluation.

2. Loading and Exploring Data.


data = pd.read_csv('diabetes.csv')
data.head()
- Loads the diabetes dataset from a CSV file into a Pandas DataFrame and displays the first few
rows.

3. Transposing Data for Better Visualization.


data.head().transpose()
- Transposes the data for better visualization by displaying columns as rows and vice versa.

4. Descriptive Statistics of the Data.


data.describe()
- Provides descriptive statistics of the numerical columns in the dataset, including count, mean, std
(standard deviation), min, 25th percentile (Q1), median (50th percentile or Q2), 75th percentile (Q3),
and max.

5. Feature and Target Separation.


X = data.drop("Outcome", axis=1)
y = data[["Outcome"]]
- Separates the features (`X`) and the target variable (`y`).

6. Train-Test Split.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=7)
- Splits the data into training and testing sets using `train_test_split`. 30% of the data is reserved for
testing, and `random_state` ensures reproducibility.

7. Logistic Regression Model Training.


model = LogisticRegression()
model.fit(X_train, y_train)
- Initializes a logistic regression model and trains it on the training data.
8. Model Prediction and Evaluation.
y_predict = model.predict(X_test)
model_score = model.score(X_test, y_test)
- Predicts the target variable on the test set and calculates the accuracy score of the model on the test
set.

9. Printing Model Evaluation Metrics.


print(model_score)
print(metrics.confusion_matrix(y_test, y_predict))
- Prints the accuracy score and the confusion matrix. The confusion matrix is a table that describes
the performance of a classification model, showing the number of true positives, true negatives, false
positives, and false negatives.

You might also like