Anurag008python
Anurag008python
on
PEPPERFRY
A Study on Predictive Analysis of Diabetes in Pima
Indian Women Using Python
1
CONTENTS
S No. Topic Page No
1 Introduction 3
2 Introduction – Python 4
3 Literature review 9
4 Methodology 10
5 Conclusion 36
6 Learning outcomes 37
7 Bibliography 39
2
About The Data Set
This dataset comes from the National Institute of Diabetes and Digestive and Kidney Diseases. Its
main purpose is to predict whether a patient has diabetes based on various medical test results.
The data includes only female patients who are at least 21 years old and of Pima Indian heritage.
The dataset contains several independent variables, which are medical measurements, and one
dependent variable called Outcome. The Outcome variable shows whether the patient has diabetes
(1 for Yes, 0 for No).
3
1. PYTHON
Python is a widely used general-purpose, high level programming language. It
was created by Guido van Rossum in 1991 and further developed by the Python
Software Foundation. It was designed with an emphasis on code readability, and
its syntax allows programmers to express their concepts in fewer lines of code.
Python is a programming language that lets you work quickly and integrate
systems more efficiently.
There are two major Python versions: Python 2 and Python 3. Both are quite
different.
Language Features
Interpreted
There are no separate compilation and execution steps like C and C++.
Directly run the program from the source code.
Internally, Python converts the source code into an intermediate form called
bytecodes which is then translated into native language of specific
computer to run it.
No need to worry about linking and loading with libraries, etc.
Platform Independent
Python programs can be developed and executed on multiple operating
system platforms.
Python can be used on Linux, Windows, Macintosh, Solaris and many more.
2. ANACONDA
Anaconda is the installation program used by Fedora, Red Hat Enterprise Linux
and some other distributions.
5
What is NumPy?
NumPy stands for Numerical Python. NumPy is a Python library used for working
with arrays. It also has functions for working in domain of linear algebra, fourier
transform, and matrices.
NumPy was created in 2005 by Travis Oliphant. It is an open source project and
you can use it freely.
Pandas is an open-source library in Python that is made mainly for working with
relational or labelled data both easily and intuitively. It provides various data
structures and operations for manipulating numerical data and time series. This
library is built on top of the NumPy library of Python. Pandas is fast and it has high
performance & productivity for users.
Matplotlib graphs your data on Figures (e.g., windows, Jupyter widgets, etc.), each
of which can contain one or more Axes, an area where points can be specified in
terms of x-y coordinates (or theta-r in a polar plot, x-y-z in a 3D plot, etc.). The
simplest way of creating a Figure with an Axes is using pyplot.subplots. We can
then use Axes.plot to draw some data on the Axes:
Note that to get this Figure to display, you may have to call plt.show(), depending
on your backend. For more details of Figures and backends, see creating, viewing
& saving Matplotlib Figures.
For a brief introduction to the ideas behind the library, you can read the
introductory notes or the paper. Visit the installation page to see how you can
download the package and get started with it. You can browse the example gallery
to see some of the things that you can do with seaborn, and then check out the
tutorials or API reference to find out how.
Scikit-Learn is a popular Python library used for machine learning. It provides tools to
build models that can learn from data and make predictions. With Scikit-Learn, you can do tasks
like:
Classifying data (e.g., predicting if a person has diabetes or not).
Clustering data (e.g., grouping similar items together).
Regression (e.g., predicting a number like house prices).
It also helps with preprocessing data (cleaning or preparing it) and evaluating how well your
model performs.
7
Plotly
Plotly is a Python library for creating interactive and beautiful visualizations. It allows you to
make:
Line charts, bar charts, and scatter plots.
3D graphs and maps.
Dashboards for presenting data interactively.
What makes Plotly special is that the graphs are interactive, so users can zoom in, hover over
points, or explore data more easily.
Cufflinks
Cufflinks is a library that works with Plotly and Pandas (a library for working with data). It helps
you quickly create interactive Plotly charts directly from your data in Pandas.
For example:
If you have a DataFrame (a table of data), you can make a chart in just one line of code.
It makes plotting faster and easier when you're working with a lot of data.
In short, Cufflinks acts as a bridge between Pandas and Plotly to simplify creating interactive
visualizations.
8
The analysis of medical data for predicting diseases like diabetes has gained significant
importance in recent years. The dataset used in this study originates from the National
Institute of Diabetes and Digestive and Kidney Diseases. It contains medical diagnostic
information about female patients of Pima Indian heritage, making it a specialized dataset
for diabetes prediction.
The dataset includes various predictor variables such as glucose levels, insulin, BMI, and
skin thickness, which have been identified as key factors in diagnosing diabetes. Studies
have shown that machine learning techniques can effectively analyze such data to predict
health outcomes.
9
This project uses a structured approach to analyze the Pima Indian Diabetes dataset and predict diabetes
outcomes. The methodology is divided into the following steps:
1. Data Collection
The dataset is sourced from the National Institute of Diabetes and Digestive and Kidney Diseases.
It contains key medical measurements, including glucose levels, insulin, BMI, and skin thickness,
along with the outcome variable (diabetic or not).
2. Data Preprocessing
o The data is cleaned to handle any missing or invalid values.
o Features are standardized or normalized to ensure they are on a similar scale, which helps
in improving model accuracy.
o The dataset is split into training and testing sets to evaluate the model's performance.
4. Model Building
o Scikit-Learn is used to build a machine learning model for predicting diabetes.
o Algorithms such as Logistic Regression, Decision Trees, or Random Forests are applied to
the dataset.
o The model is trained using the training data.
5. Model Evaluation
The performance of the model is evaluated using metrics like accuracy, precision, recall,
and F1 score.
The testing dataset is used to validate the model and ensure it generalizes well to unseen
data.
6. Visualization of Results
o The final results, including model performance and important features, are visualized using
Plotly and Cufflinks.
o Interactive charts help communicate findings effectively.
7. Conclusion
Insights gained from the analysis are summarized, highlighting the significant factors contributing
to diabetes prediction and the effectiveness of the model.
10
#name Anurag kandari
#roll no 08
# PGDM-A
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('Diabitiese.csv')
df.head()
1 1 85 66 29 0 26.6
2 8 183 64 0 0 23.3
3 1 89 66 23 94 28.1
df.tail()
767 1 93 70 31 0 30.4
11
DiabetesPedigreeFunction Age Outcome
764 0.340 27 0
765 0.245 30 0
766 0.349 47 1
767 0.315 23 0
768 0.252 55 1
df.describe()
df.shape
(769, 9)
df.isnull().sum()
Pregnancies 0
Glucose 0
BloodPressure 0
SkinThickness 0
Insulin 0
BMI 0
12
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64
sns.relplot(x='Pregnancies',y='BloodPressure',data=df)
<seaborn.axisgrid.FacetGrid at 0x14628ebf3b0>
sns.relplot(x='DiabetesPedigreeFunction',y='BloodPressure',data=df)
<seaborn.axisgrid.FacetGrid at 0x14628f7e390>
13
df.fillna(0, inplace=True)
print(df)
1 1 85 66 29 0 26.6
2 8 183 64 0 0 23.3
3 1 89 66 23 94 28.1
14
766 1 126 60 0 0 30.1
767 1 93 70 31 0 30.4
train
test
0 0
1 0
2 0
3 94
4 168
...
15
764 0
765 112
766 0
767 0
768 1
Name: Insulin, Length: 769, dtype: int64
X_train,X_test,y_train,y_test=train_test_split(train,test,test_size=0.
3,random_state=2)
X_train,X_test,y_train,y_test
16
168 0
Name: Insulin, Length: 538, dtype: int64,
231 370
710 387
440 0
662 231
377 75
...
495 0
236 192
130 168
225 32
388 285
Name: Insulin, Length: 231, dtype: int64)
regression = LinearRegression()
regression
LinearRegression()
regression.fit(X_train,y_train)
LinearRegression()
predict = regression.predict(X_test)
predict
17
64.62105153, 83.60371758, 64.51272396, 65.97834107,
117.10204634, 83.86634143, 99.48385586, 44.17396731,
62.73851429, 50.25848615, 91.93072254, 75.96224368,
74.03553615, 57.04553333, 102.28617605, 88.42887139,
101.07539728, 71.3663263 , 74.5837682 , 96.76927678,
79.18088452, 45.57362872, 97.76901533, 67.89224764,
70.37998818, 71.85458975, 100.03208792, 65.97834107,
61.75277564, 115.86086987, 84.26926166, 95.34603154,
91.04792385, 97.27333855, 76.44392056, 67.13956193,
58.14019902, 43.97669969, 113.58379739, 84.96100712,
72.3290806 , 61.0526452 , 99.06873415, 87.36939153,
83.52376198, 140.14258111, 62.1690963 , 90.98196815,
102.8266226 , 57.37411288, 90.98975366, 84.67659786,
48.55143113, 98.10478091, 92.9522465 , 92.05484843,
95.87187871, 99.08273405, 54.63774839, 99.89299048,
82.99132824, 80.41464765, 99.30178708, 75.97062866,
54.02475958, 104.30122416, 133.55220208, 60.19821847,
90.37736433, 67.5994534 , 78.9836169 , 102.746667 ,
63.26556041, 106.57769718, 38.34907495, 98.44795983,
65.6721464 , 89.64646402, 75.91867286, 97.64488944,
101.73914294, 71.4740544 , 127.22384744, 130.00498169,
89.85091768, 103.9290738 , 71.69310743, 65.34356685,
86.01210199, 108.74584262, 28.86983358, 81.26068939,
87.71975649, 55.64527244, 79.75688908, 82.03516051,
65.12451382, 75.59009331, 92.27390147, 67.40218578,
63.92032161, 78.78754822, 94.99686553, 64.05163354,
60.76883541, 108.81119885, 74.75984989, 90.114141 ,
96.63856433, 85.12129049, 97.29512395, 69.7004442 ,
113.60678174, 89.84373164, 72.48097898, 89.38384017,
76.90381204, 75.69961983, 100.7474172 , 75.02965978,
95.19293421, 99.54981156, 57.30815718, 87.6615863 ,
83.62550299, 79.53184896, 67.90564807, 117.48198221,
107.8254602 , 84.3921886 , 84.23849179, 89.29609906,
105.46184555, 94.12005393, 94.07768205, 49.07787777,
82.0933307 , 58.6878316 , 41.28390601, 93.90219984,
90.00401501, 87.52248887, 80.21857897])
regression.score(X_test,y_test)
0.054992667713272936
18
pip install cufflinks
19
>cufflinks) (3.6.6)
Requirement already satisfied: jupyterlab-widgets<3,>=1.0.0 in c:\
users\anurag kandari\anaconda3\lib\site-packages (from
ipywidgets>=7.0.0->cufflinks) (1.0.0)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\
anurag kandari\anaconda3\lib\site-packages (from pandas>=0.19.2-
>cufflinks) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from pandas>=0.19.2->cufflinks)
(2024.1)
Requirement already satisfied: tzdata>=2022.7 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from pandas>=0.19.2->cufflinks)
(2023.3)
Requirement already satisfied: tenacity>=6.2.0 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from plotly>=4.1.1->cufflinks)
(8.2.2)
Requirement already satisfied: packaging in c:\users\anurag kandari\
anaconda3\lib\site-packages (from plotly>=4.1.1->cufflinks) (23.2)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jedi>=0.16->ipython>=5.3.0-
>cufflinks) (0.8.3)
Requirement already satisfied: wcwidth in c:\users\anurag kandari\
anaconda3\lib\site-packages (from prompt-toolkit<3.1.0,>=3.0.41-
>ipython>=5.3.0->cufflinks) (0.2.5)
Requirement already satisfied: notebook>=4.4.1 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (7.0.8)
Requirement already satisfied: executing in c:\users\anurag kandari\
anaconda3\lib\site-packages (from stack-data->ipython>=5.3.0-
>cufflinks) (0.8.3)
Requirement already satisfied: asttokens in c:\users\anurag kandari\
anaconda3\lib\site-packages (from stack-data->ipython>=5.3.0-
>cufflinks) (2.0.5)
Requirement already satisfied: pure-eval in c:\users\anurag kandari\
anaconda3\lib\site-packages (from stack-data->ipython>=5.3.0-
>cufflinks) (0.2.2)
Requirement already satisfied: jupyter-server<3,>=2.4.0 in c:\users\
anurag kandari\anaconda3\lib\site-packages (from notebook>=4.4.1-
>widgetsnbextension~=3.6.6->ipywidgets>=7.0.0->cufflinks) (2.14.1)
Requirement already satisfied: jupyterlab-server<3,>=2.22.1 in c:\
users\anurag kandari\anaconda3\lib\site-packages (from
notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (2.25.1)
Requirement already satisfied: jupyterlab<4.1,>=4.0.2 in c:\users\
anurag kandari\anaconda3\lib\site-packages (from notebook>=4.4.1-
>widgetsnbextension~=3.6.6->ipywidgets>=7.0.0->cufflinks) (4.0.11)
Requirement already satisfied: notebook-shim<0.3,>=0.2 in c:\users\
anurag kandari\anaconda3\lib\site-packages (from notebook>=4.4.1-
>widgetsnbextension~=3.6.6->ipywidgets>=7.0.0->cufflinks) (0.2.3)
20
Requirement already satisfied: tornado>=6.2.0 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from notebook>=4.4.1-
>widgetsnbextension~=3.6.6->ipywidgets>=7.0.0->cufflinks) (6.4.1)
Requirement already satisfied: anyio>=3.1.0 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jupyter-server<3,>=2.4.0-
>notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (4.2.0)
Requirement already satisfied: argon2-cffi>=21.1 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jupyter-server<3,>=2.4.0-
>notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (21.3.0)
Requirement already satisfied: jinja2>=3.0.3 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jupyter-server<3,>=2.4.0-
>notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (3.1.4)
Requirement already satisfied: jupyter-client>=7.4.4 in c:\users\
anurag kandari\anaconda3\lib\site-packages (from jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (8.6.0)
Requirement already satisfied: jupyter-core!=5.0.*,>=4.12 in c:\users\
anurag kandari\anaconda3\lib\site-packages (from jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (5.7.2)
Requirement already satisfied: jupyter-events>=0.9.0 in c:\users\
anurag kandari\anaconda3\lib\site-packages (from jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (0.10.0)
Requirement already satisfied: jupyter-server-terminals>=0.4.4 in c:\
users\anurag kandari\anaconda3\lib\site-packages (from jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (0.4.4)
Requirement already satisfied: nbconvert>=6.4.4 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jupyter-server<3,>=2.4.0-
>notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (7.10.0)
Requirement already satisfied: nbformat>=5.3.0 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jupyter-server<3,>=2.4.0-
>notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (5.9.2)
Requirement already satisfied: overrides>=5.0 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jupyter-server<3,>=2.4.0-
>notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (7.4.0)
Requirement already satisfied: prometheus-client>=0.9 in c:\users\
anurag kandari\anaconda3\lib\site-packages (from jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (0.14.1)
Requirement already satisfied: pywinpty>=2.0.1 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jupyter-server<3,>=2.4.0-
21
>notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (2.0.10)
Requirement already satisfied: pyzmq>=24 in c:\users\anurag kandari\
anaconda3\lib\site-packages (from jupyter-server<3,>=2.4.0-
>notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (25.1.2)
Requirement already satisfied: send2trash>=1.8.2 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jupyter-server<3,>=2.4.0-
>notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (1.8.2)
Requirement already satisfied: terminado>=0.8.3 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jupyter-server<3,>=2.4.0-
>notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (0.17.1)
Requirement already satisfied: websocket-client>=1.7 in c:\users\
anurag kandari\anaconda3\lib\site-packages (from jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (1.8.0)
Requirement already satisfied: async-lru>=1.0.0 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jupyterlab<4.1,>=4.0.2-
>notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (2.0.4)
Requirement already satisfied: ipykernel in c:\users\anurag kandari\
anaconda3\lib\site-packages (from jupyterlab<4.1,>=4.0.2-
>notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (6.28.0)
Requirement already satisfied: jupyter-lsp>=2.0.0 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jupyterlab<4.1,>=4.0.2-
>notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (2.2.0)
Requirement already satisfied: babel>=2.10 in c:\users\anurag kandari\
anaconda3\lib\site-packages (from jupyterlab-server<3,>=2.22.1-
>notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (2.11.0)
Requirement already satisfied: json5>=0.9.0 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jupyterlab-
server<3,>=2.22.1->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (0.9.6)
Requirement already satisfied: jsonschema>=4.18.0 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jupyterlab-
server<3,>=2.22.1->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (4.19.2)
Requirement already satisfied: requests>=2.31 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jupyterlab-
server<3,>=2.22.1->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (2.32.2)
Requirement already satisfied: idna>=2.8 in c:\users\anurag kandari\
anaconda3\lib\site-packages (from anyio>=3.1.0->jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
22
>ipywidgets>=7.0.0->cufflinks) (3.7)
Requirement already satisfied: sniffio>=1.1 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from anyio>=3.1.0->jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (1.3.0)
Requirement already satisfied: argon2-cffi-bindings in c:\users\anurag
kandari\anaconda3\lib\site-packages (from argon2-cffi>=21.1->jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (21.2.0)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jinja2>=3.0.3->jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (2.1.3)
Requirement already satisfied: attrs>=22.2.0 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jsonschema>=4.18.0-
>jupyterlab-server<3,>=2.22.1->notebook>=4.4.1-
>widgetsnbextension~=3.6.6->ipywidgets>=7.0.0->cufflinks) (23.1.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in
c:\users\anurag kandari\anaconda3\lib\site-packages (from
jsonschema>=4.18.0->jupyterlab-server<3,>=2.22.1->notebook>=4.4.1-
>widgetsnbextension~=3.6.6->ipywidgets>=7.0.0->cufflinks) (2023.7.1)
Requirement already satisfied: referencing>=0.28.4 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jsonschema>=4.18.0-
>jupyterlab-server<3,>=2.22.1->notebook>=4.4.1-
>widgetsnbextension~=3.6.6->ipywidgets>=7.0.0->cufflinks) (0.30.2)
Requirement already satisfied: rpds-py>=0.7.1 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jsonschema>=4.18.0-
>jupyterlab-server<3,>=2.22.1->notebook>=4.4.1-
>widgetsnbextension~=3.6.6->ipywidgets>=7.0.0->cufflinks) (0.10.6)
Requirement already satisfied: platformdirs>=2.5 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jupyter-core!=5.0.*,>=4.12-
>jupyter-server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (3.10.0)
Requirement already satisfied: pywin32>=300 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jupyter-core!=5.0.*,>=4.12-
>jupyter-server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (305.1)
Requirement already satisfied: python-json-logger>=2.0.4 in c:\users\
anurag kandari\anaconda3\lib\site-packages (from jupyter-
events>=0.9.0->jupyter-server<3,>=2.4.0->notebook>=4.4.1-
>widgetsnbextension~=3.6.6->ipywidgets>=7.0.0->cufflinks) (2.0.7)
Requirement already satisfied: pyyaml>=5.3 in c:\users\anurag kandari\
anaconda3\lib\site-packages (from jupyter-events>=0.9.0->jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (6.0.1)
Requirement already satisfied: rfc3339-validator in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jupyter-events>=0.9.0-
>jupyter-server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (0.1.4)
23
Requirement already satisfied: rfc3986-validator>=0.1.1 in c:\users\
anurag kandari\anaconda3\lib\site-packages (from jupyter-
events>=0.9.0->jupyter-server<3,>=2.4.0->notebook>=4.4.1-
>widgetsnbextension~=3.6.6->ipywidgets>=7.0.0->cufflinks) (0.1.1)
Requirement already satisfied: beautifulsoup4 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from nbconvert>=6.4.4->jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (4.12.3)
Requirement already satisfied: bleach!=5.0.0 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from nbconvert>=6.4.4->jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (4.1.0)
Requirement already satisfied: defusedxml in c:\users\anurag kandari\
anaconda3\lib\site-packages (from nbconvert>=6.4.4->jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (0.7.1)
Requirement already satisfied: jupyterlab-pygments in c:\users\anurag
kandari\anaconda3\lib\site-packages (from nbconvert>=6.4.4->jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (0.1.2)
Requirement already satisfied: mistune<4,>=2.0.3 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from nbconvert>=6.4.4->jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (2.0.4)
Requirement already satisfied: nbclient>=0.5.0 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from nbconvert>=6.4.4->jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (0.8.0)
Requirement already satisfied: pandocfilters>=1.4.1 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from nbconvert>=6.4.4->jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (1.5.0)
Requirement already satisfied: tinycss2 in c:\users\anurag kandari\
anaconda3\lib\site-packages (from nbconvert>=6.4.4->jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (1.2.1)
Requirement already satisfied: fastjsonschema in c:\users\anurag
kandari\anaconda3\lib\site-packages (from nbformat>=5.3.0->jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (2.16.2)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\
anurag kandari\anaconda3\lib\site-packages (from requests>=2.31-
>jupyterlab-server<3,>=2.22.1->notebook>=4.4.1-
>widgetsnbextension~=3.6.6->ipywidgets>=7.0.0->cufflinks) (2.0.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from requests>=2.31->jupyterlab-
server<3,>=2.22.1->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (2.2.2)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\anurag
24
kandari\anaconda3\lib\site-packages (from requests>=2.31->jupyterlab-
server<3,>=2.22.1->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (2024.7.4)
Requirement already satisfied: debugpy>=1.6.5 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from ipykernel-
>jupyterlab<4.1,>=4.0.2->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (1.6.7)
Requirement already satisfied: nest-asyncio in c:\users\anurag
kandari\anaconda3\lib\site-packages (from ipykernel-
>jupyterlab<4.1,>=4.0.2->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (1.6.0)
Requirement already satisfied: psutil in c:\users\anurag kandari\
anaconda3\lib\site-packages (from ipykernel->jupyterlab<4.1,>=4.0.2-
>notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (5.9.0)
Requirement already satisfied: webencodings in c:\users\anurag
kandari\anaconda3\lib\site-packages (from bleach!=5.0.0-
>nbconvert>=6.4.4->jupyter-server<3,>=2.4.0->notebook>=4.4.1-
>widgetsnbextension~=3.6.6->ipywidgets>=7.0.0->cufflinks) (0.5.1)
Requirement already satisfied: fqdn in c:\users\anurag kandari\
anaconda3\lib\site-packages (from jsonschema[format-nongpl]>=4.18.0-
>jupyter-events>=0.9.0->jupyter-server<3,>=2.4.0->notebook>=4.4.1-
>widgetsnbextension~=3.6.6->ipywidgets>=7.0.0->cufflinks) (1.5.1)
Requirement already satisfied: isoduration in c:\users\anurag kandari\
anaconda3\lib\site-packages (from jsonschema[format-nongpl]>=4.18.0-
>jupyter-events>=0.9.0->jupyter-server<3,>=2.4.0->notebook>=4.4.1-
>widgetsnbextension~=3.6.6->ipywidgets>=7.0.0->cufflinks) (20.11.0)
Requirement already satisfied: jsonpointer>1.13 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jsonschema[format-
nongpl]>=4.18.0->jupyter-events>=0.9.0->jupyter-server<3,>=2.4.0-
>notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (2.1)
Requirement already satisfied: uri-template in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jsonschema[format-
nongpl]>=4.18.0->jupyter-events>=0.9.0->jupyter-server<3,>=2.4.0-
>notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (1.3.0)
Requirement already satisfied: webcolors>=1.11 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from jsonschema[format-
nongpl]>=4.18.0->jupyter-events>=0.9.0->jupyter-server<3,>=2.4.0-
>notebook>=4.4.1->widgetsnbextension~=3.6.6->ipywidgets>=7.0.0-
>cufflinks) (24.11.1)
Requirement already satisfied: cffi>=1.0.1 in c:\users\anurag kandari\
anaconda3\lib\site-packages (from argon2-cffi-bindings->argon2-
cffi>=21.1->jupyter-server<3,>=2.4.0->notebook>=4.4.1-
>widgetsnbextension~=3.6.6->ipywidgets>=7.0.0->cufflinks) (1.16.0)
Requirement already satisfied: soupsieve>1.2 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from beautifulsoup4-
>nbconvert>=6.4.4->jupyter-server<3,>=2.4.0->notebook>=4.4.1-
25
>widgetsnbextension~=3.6.6->ipywidgets>=7.0.0->cufflinks) (2.5)
Requirement already satisfied: pycparser in c:\users\anurag kandari\
anaconda3\lib\site-packages (from cffi>=1.0.1->argon2-cffi-bindings-
>argon2-cffi>=21.1->jupyter-server<3,>=2.4.0->notebook>=4.4.1-
>widgetsnbextension~=3.6.6->ipywidgets>=7.0.0->cufflinks) (2.21)
Requirement already satisfied: arrow>=0.15.0 in c:\users\anurag
kandari\anaconda3\lib\site-packages (from isoduration-
>jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.9.0->jupyter-
server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.6.6-
>ipywidgets>=7.0.0->cufflinks) (1.2.3)
import cufflinks as cf
init_notebook_mode(connected=True)
cf.go_offline()
26
import chart_studio.plotly as py
import plotly.graph_objs as go
df.plot()
<Axes: >
df.iplot()
<Axes: xlabel='BloodPressure'>
27
df.iplot(kind='bar')
df.count().iplot(kind='bar')
28
df.sum().iplot(kind='bar')
df.iplot(kind='box')
df['Pregnancies'].iplot(kind='hist',bins=25)
29
df.iplot(kind='hist')
df.iplot(kind='scatter',x='Pregnancies',y='BloodPressure',mode='marker
s',size=20)
df.iplot(kind='bubble',x='Pregnancies',y='BloodPressure',size='Outcome
')
30
df.scatter_matrix()
x = df.iloc[:, :-1]
y = df.iloc[:, 6]
1 1 85 66 29 0 26.6
2 8 183 64 0 0 23.3
3 1 89 66 23 94 28.1
31
764 2 122 70 27 0 36.8
767 1 93 70 31 0 30.4
DiabetesPedigreeFunction Age
0 0.627 50
1 0.351 31
2 0.672 32
3 0.167 21
4 2.288 33
.. ... ...
764 0.340 27
765 0.245 30
766 0.349 47
767 0.315 23
768 0.252 55
0 0.627
1 0.351
2 0.672
3 0.167
4 2.288
...
764 0.340
765 0.245
766 0.349
767 0.315
768 0.252
Name: DiabetesPedigreeFunction, Length: 769, dtype: float64
x.head()
1 1 85 66 29 0 26.6
2 8 183 64 0 0 23.3
32
3 1 89 66 23 94 28.1
DiabetesPedigreeFunction Age
0 0.627 50
1 0.351 31
2 0.672 32
3 0.167 21
4 2.288 33
1 1 85 66 29 0 26.6
2 8 183 64 0 0 23.3
3 1 89 66 23 94 28.1
767 1 93 70 31 0 30.4
DiabetesPedigreeFunction Age
0 0.627 50
1 0.351 31
2 0.672 32
3 0.167 21
4 2.288 33
.. ... ...
764 0.340 27
765 0.245 30
766 0.349 47
33
767 0.315 23
768 0.252 55
count 769.000000
mean 0.471590
std 0.331208
min 0.078000
25% 0.244000
50% 0.371000
75% 0.626000
max 2.420000
Name: DiabetesPedigreeFunction, dtype: float64
data = {
df = pd.DataFrame(data)
X = df[["Pregnancies", "Insulin"]]
y = df["Insulin"]
model = LogisticRegression()
model.fit(X_train, y_train)
LogisticRegression()
y_pred = model.predict(X_test)
34
print("\nClassification Report:\n", classification_report(y_test,
y_pred))
Accuracy: 1.0
Confusion Matrix:
[[1 0]
[0 2]]
Classification Report:
precision recall f1-score support
accuracy 1.00 3
macro avg 1.00 1.00 1.00 3
weighted avg 1.00 1.00 1.00 3
print("\nIntercept:", model.intercept_)
print("Coefficients:", model.coef_)
Intercept: [-1.83267869]
Coefficients: [[0.69083358 0.79529279]]
35
In this project, we analyzed the Pima Indian Diabetes dataset to predict diabetes in
female patients based on medical diagnostic measurements. The use of Python and its
powerful libraries like Scikit-Learn, Plotly, and Cufflinks enabled a systematic and
efficient approach to data analysis and modeling.
Through data preprocessing and exploratory analysis, we identified key factors such as
glucose levels, BMI, and insulin as significant predictors of diabetes. By building and
evaluating machine learning models, we achieved a reliable prediction system for
diagnosing diabetes.
The visualizations created using Plotly and Cufflinks provided valuable insights into the
relationships between variables, making the data easier to interpret and understand.
This project highlights the potential of machine learning and data visualization in
healthcare, emphasizing how data-driven approaches can aid in early diagnosis and
decision-making. Future work can explore more advanced algorithms or larger
datasets to further improve prediction accuracy and applicability in real-world
scenarios.
36
By completing this project, the following key learning outcomes were
achieved:
37
using interactive charts and graphs.
o Enhanced skills in communicating technical results in a
clear and visually appealing way.
38
BIBLOGRAPHY
DATASET
https://www.kaggle.com/datasets/whenamancodes/predict-diabities
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical
Learning: Data Mining, Inference, and Prediction. Springer.
39