0% found this document useful (0 votes)

4 views4 pages

Question 7 - Jupyter Notebook

The document outlines the creation of a machine learning model using Sklearn's Diabetes dataset, detailing the first five steps of the data science life cycle. It includes importing libraries, loading and exploring the dataset, preprocessing the data, building and training a logistic regression model, and evaluating its performance. Key metrics such as accuracy, classification report, confusion matrix, and ROC curve are also generated to assess the model's effectiveness.

Uploaded by

paulbogale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views4 pages

Question 7 - Jupyter Notebook

Uploaded by

paulbogale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

question 7 - Jupyter Notebook http://localhost:8888/notebooks/OneDrive/Desktop/Folders/MU/Sem-7...

Prince Lunia, Enroll: 92000103170, class: 7TC1-C

Create a full ML model for Sklearn’s Diabetes dataset. Load the dataset from sklearn itself. a.
Perform the first 5 data science life cycle steps for this model. b. Write down information
surmised from each code snippet in the markdown cell below each code cell.

Step 1: Import Libraries

In [1]: import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt

Step 2: Load and Explore the Dataset

In [2]: #Data Collection

In [3]:

Out[3]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction

0 6 148 72 35 0 33.6 0.627

1 1 85 66 29 0 26.6 0.351

2 8 183 64 0 0 23.3 0.672

3 1 89 66 23 94 28.1 0.167

4 0 137 40 35 168 43.1 2.288

1 of 4 10/23/2023, 1:21 AM
question 7 - Jupyter Notebook http://localhost:8888/notebooks/OneDrive/Desktop/Folders/MU/Sem-7...

In [4]:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB

In [5]:

Step 3: Preprocess the Data

In [6]: # Split the data into features (X) and target (y)
X = df.drop(columns='Outcome')
y = df['Outcome']

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state

# Scale the features using StandardScaler

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Step 4: Build and Train the Model

In [7]: # Create a Logistic Regression model

model = LogisticRegression()

# Train the model on the training data

model.fit(X_train, y_train)
Out[7]: ▾ LogisticRegression
LogisticRegression()

Step 5: Evaluate the Model

2 of 4 10/23/2023, 1:21 AM
question 7 - Jupyter Notebook http://localhost:8888/notebooks/OneDrive/Desktop/Folders/MU/Sem-7...

In [8]: # Make predictions on the test data

y_pred = model.predict(X_test)

In [9]: # Calculate accuracy

In [10]:

0.7532467532467533

In [11]: # Generate a classification report

In [12]:

precision recall f1-score support

0 0.81 0.80 0.81 99

1 0.65 0.67 0.66 55

accuracy 0.75 154

macro avg 0.73 0.74 0.73 154
weighted avg 0.76 0.75 0.75 154

In [13]: # Generate a confusion matrix

In [14]:

[[79 20]
[18 37]]

3 of 4 10/23/2023, 1:21 AM
question 7 - Jupyter Notebook http://localhost:8888/notebooks/OneDrive/Desktop/Folders/MU/Sem-7...

In [15]: # Calculate ROC AUC

y_pred_prob = model.predict_proba(X_test)[:, 1]
roc_auc = roc_auc_score(y_test, y_pred_prob)

# Plot ROC curve

fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')

In [ ]:

4 of 4 10/23/2023, 1:21 AM

MikroTik Lesson-02
No ratings yet
MikroTik Lesson-02
29 pages
DM Slip Solutions
100% (1)
DM Slip Solutions
24 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
20 pages
Synon Questions With Answers
100% (1)
Synon Questions With Answers
8 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
Food Delivery Business Plan Example
No ratings yet
Food Delivery Business Plan Example
34 pages
Python Sklearn Linear Regression
No ratings yet
Python Sklearn Linear Regression
45 pages
First Quarter Exam Grade 8 Math
No ratings yet
First Quarter Exam Grade 8 Math
3 pages
Budget of Work - Ict Grade 10
No ratings yet
Budget of Work - Ict Grade 10
9 pages
ML Manual Final
No ratings yet
ML Manual Final
35 pages
Unit5 - Logistic Regression
No ratings yet
Unit5 - Logistic Regression
4 pages
Linear Merged Pagenumber
No ratings yet
Linear Merged Pagenumber
48 pages
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
No ratings yet
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
22 pages
ML (Lab 8) Tasks Bilal Habib (5th Semester)
No ratings yet
ML (Lab 8) Tasks Bilal Habib (5th Semester)
16 pages
Logistic Regression vs. SVMs - Solution
No ratings yet
Logistic Regression vs. SVMs - Solution
7 pages
ML Practical 04
No ratings yet
ML Practical 04
20 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Diabetes Prediction System
No ratings yet
Diabetes Prediction System
4 pages
AML Sessional 1 Students
No ratings yet
AML Sessional 1 Students
16 pages
ML Proj Diabetes
No ratings yet
ML Proj Diabetes
51 pages
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
No ratings yet
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
8 pages
Logidtic Regression ASSIGNMENT
No ratings yet
Logidtic Regression ASSIGNMENT
13 pages
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
No ratings yet
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
71 pages
End To End Project Multiple Disease Detection Using ML - Nomidl
No ratings yet
End To End Project Multiple Disease Detection Using ML - Nomidl
24 pages
Logistic Pima Indians - Ipynb - Colaboratory
No ratings yet
Logistic Pima Indians - Ipynb - Colaboratory
4 pages
Documentation Code
No ratings yet
Documentation Code
20 pages
Logistic Regression 205
No ratings yet
Logistic Regression 205
8 pages
Diabetes EDA and Kears Modeling
No ratings yet
Diabetes EDA and Kears Modeling
26 pages
Diabetes
No ratings yet
Diabetes
7 pages
Diabetes Prediction 1704256341
No ratings yet
Diabetes Prediction 1704256341
17 pages
Lab 2
No ratings yet
Lab 2
8 pages
ML Data Preprocessing in Python
No ratings yet
ML Data Preprocessing in Python
9 pages
ML Practicals
No ratings yet
ML Practicals
21 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
SVM - RF - Diabetes - CSV - 26 - 6 - 2023.ipynb - Colaboratory
No ratings yet
SVM - RF - Diabetes - CSV - 26 - 6 - 2023.ipynb - Colaboratory
8 pages
Lab4 - Jupyter Notebook
No ratings yet
Lab4 - Jupyter Notebook
7 pages
Fds 1
No ratings yet
Fds 1
44 pages
Experiment 4
No ratings yet
Experiment 4
5 pages
Logistic - Ipynb - Colaboratory
No ratings yet
Logistic - Ipynb - Colaboratory
6 pages
Exp 5
No ratings yet
Exp 5
7 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
5 pages
Apply Logistic Regression Model Techniques To Predict Data On Any Dataset
No ratings yet
Apply Logistic Regression Model Techniques To Predict Data On Any Dataset
5 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
Clase-02-ML - Colab
No ratings yet
Clase-02-ML - Colab
5 pages
Data Science Practical 9
No ratings yet
Data Science Practical 9
6 pages
Conexion Opc Server S7-1200 OPC SIMATIC-NET TIA-Portal e
No ratings yet
Conexion Opc Server S7-1200 OPC SIMATIC-NET TIA-Portal e
36 pages
Diabetes - Test Report
No ratings yet
Diabetes - Test Report
62 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
1 page
MLPPT 11 45
No ratings yet
MLPPT 11 45
31 pages
Binary Classification
No ratings yet
Binary Classification
2 pages
Diabetic Prediction Using LogicalRegression
No ratings yet
Diabetic Prediction Using LogicalRegression
9 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
23UCC554
No ratings yet
23UCC554
9 pages
Python Data Science Practical Complete
No ratings yet
Python Data Science Practical Complete
22 pages
Diabetis Project
No ratings yet
Diabetis Project
7 pages
Bio-Signal Analysis For Smoking
No ratings yet
Bio-Signal Analysis For Smoking
1 page
ExNo 08ml
No ratings yet
ExNo 08ml
4 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
G26 Report
No ratings yet
G26 Report
4 pages
Machine Intelligence
No ratings yet
Machine Intelligence
24 pages
20BCE7620 AP2021228000397 Experiment-6 Removed
No ratings yet
20BCE7620 AP2021228000397 Experiment-6 Removed
19 pages
Bda Assign
No ratings yet
Bda Assign
15 pages
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
No ratings yet
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
12 pages
CCNA1 Final Exam Answer 2016 v5
No ratings yet
CCNA1 Final Exam Answer 2016 v5
68 pages
Cache Memory
No ratings yet
Cache Memory
12 pages
HCNA-UC-IHUCA V2.8 Training Material (20160201) PDF
No ratings yet
HCNA-UC-IHUCA V2.8 Training Material (20160201) PDF
804 pages
Cyber Criminology
No ratings yet
Cyber Criminology
10 pages
SCM Fiori App List
No ratings yet
SCM Fiori App List
11 pages
Ek125 Final Project
No ratings yet
Ek125 Final Project
13 pages
Unit No:3 Transport Layer: Department of IT
No ratings yet
Unit No:3 Transport Layer: Department of IT
97 pages
Unit3pdf 2021 08 16 08 28 32
No ratings yet
Unit3pdf 2021 08 16 08 28 32
92 pages
Viper Thesis
100% (3)
Viper Thesis
8 pages
Unit5pdf 2022 10 08 10 29 29
No ratings yet
Unit5pdf 2022 10 08 10 29 29
28 pages
Unit2pdf 2021 07 16 12 29 18
No ratings yet
Unit2pdf 2021 07 16 12 29 18
82 pages
Unit3PartCpptx 2021 08 10 08 54 51
No ratings yet
Unit3PartCpptx 2021 08 10 08 54 51
26 pages
CyberBean Computer Notes For Class 5th
No ratings yet
CyberBean Computer Notes For Class 5th
11 pages
Unit3PartBconvertedpptx 2021 08 10 08 54 21
No ratings yet
Unit3PartBconvertedpptx 2021 08 10 08 54 21
42 pages
Klayman Et Al v. Obama Et Al Opinion
No ratings yet
Klayman Et Al v. Obama Et Al Opinion
68 pages
Difficult Heritage and Immersive Technologies
No ratings yet
Difficult Heritage and Immersive Technologies
41 pages
Lab 5
No ratings yet
Lab 5
3 pages
Chapter4 PredicateLogicpptx 2023 09 17 01 17 09
No ratings yet
Chapter4 PredicateLogicpptx 2023 09 17 01 17 09
107 pages
Sensors: Indoor Positioning Algorithm Based On The Improved RSSI Distance Model
No ratings yet
Sensors: Indoor Positioning Algorithm Based On The Improved RSSI Distance Model
15 pages
Usa and Canada Information Sheet 20 - 21
No ratings yet
Usa and Canada Information Sheet 20 - 21
15 pages
UNIT7ImageMorphologyClasspptx 2021 09-25-09!24!51
No ratings yet
UNIT7ImageMorphologyClasspptx 2021 09-25-09!24!51
20 pages
Firming and Planning Time Fence
No ratings yet
Firming and Planning Time Fence
11 pages
FindingOptimalPathpptx 2022 09 16 22 11 262pptx 2023 08 16 08 23 52
No ratings yet
FindingOptimalPathpptx 2022 09 16 22 11 262pptx 2023 08 16 08 23 52
40 pages
WISDOM Installation
No ratings yet
WISDOM Installation
14 pages
Reader Configuration Codes
No ratings yet
Reader Configuration Codes
82 pages
Unit No:3 Transport Layer
No ratings yet
Unit No:3 Transport Layer
23 pages
Unit 4
No ratings yet
Unit 4
16 pages
CBSE IT 402 MCQs
No ratings yet
CBSE IT 402 MCQs
6 pages
Instruction Manual: Compact Bikes
No ratings yet
Instruction Manual: Compact Bikes
66 pages
p139 Data Mining Mafia
No ratings yet
p139 Data Mining Mafia
13 pages
Mid Exam
No ratings yet
Mid Exam
2 pages
PTCL 3G Evo Tab - Software Update Step by Step Visual Guide
No ratings yet
PTCL 3G Evo Tab - Software Update Step by Step Visual Guide
4 pages
Empire Earth 2 Cheats
No ratings yet
Empire Earth 2 Cheats
1 page
Mam C en
No ratings yet
Mam C en
2 pages
User Behavior Path Analysis Based On Sales Data
No ratings yet
User Behavior Path Analysis Based On Sales Data
12 pages
Unit No:3 Transport Layer
No ratings yet
Unit No:3 Transport Layer
23 pages
CERTIN - AI Advisort
No ratings yet
CERTIN - AI Advisort
2 pages
User Guide To Prosys
No ratings yet
User Guide To Prosys
6 pages
Joe Van Bolderen
No ratings yet
Joe Van Bolderen
2 pages
Transcript
No ratings yet
Transcript
1 page
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet

Question 7 - Jupyter Notebook

Uploaded by

Question 7 - Jupyter Notebook

Uploaded by

question 7 - Jupyter Notebook http://localhost:8888/notebooks/OneDrive/Desktop/Folders/MU/Sem-7...

Prince Lunia, Enroll: 92000103170, class: 7TC1-C

Step 1: Import Libraries

In [1]: import pandas as pd

Step 2: Load and Explore the Dataset

In [2]: #Data Collection

0 6 148 72 35 0 33.6 0.627

2 8 183 64 0 0 23.3 0.672

4 0 137 40 35 168 43.1 2.288

Step 3: Preprocess the Data

# Split the data into training and testing sets

# Scale the features using StandardScaler

Step 4: Build and Train the Model

In [7]: # Create a Logistic Regression model

# Train the model on the training data

Step 5: Evaluate the Model

In [8]: # Make predictions on the test data

In [9]: # Calculate accuracy

In [11]: # Generate a classification report

precision recall f1-score support

0 0.81 0.80 0.81 99

accuracy 0.75 154

In [13]: # Generate a confusion matrix

In [15]: # Calculate ROC AUC

# Plot ROC curve

You might also like