0% found this document useful (0 votes)
20 views10 pages

Ads Exp 10

Uploaded by

codewijaj06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views10 pages

Ads Exp 10

Uploaded by

codewijaj06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Report on Case Study

“Diabetes Prediction “

by

Aishwarya Iyappan- 4201


Siddhi Ambekar- 4202
Anupam Kumari- 4203
Anjali Bansode- 4207

DEPARTMENT OF COMPUTER ENGINEERING


BHARATI VIDYAPEETH COLLEGE OF ENGINEERING,
NAVI MUMBAI
2023-24
Experiment No: 10

Title: Life cycle of Diabetes prediction using SVM(Support Vector Machine).

Tools:
1. NumPy: For numerical computations and array manipulations.
2. Pandas: For data manipulation and analysis, including reading and
loading datasets into DataFrames.
3. Matplotlib: For creating static, interactive, and animated visualizations in
Python.
4. Seaborn: For statistical data visualization based on Matplotlib, providing
a high-level interface for drawing attractive and informative statistical
graphics.
5. Scikit-Learn: For machine learning tasks, including K-Means clustering,
which is imported from sklearn.cluster.

PROBLEM STATEMENT:
The problem at hand revolves around predicting diabetes whether a person has
diabetes or not, based on information about the patient such as blood pressure,
body mass index (BMI), age, etc. By leveraging machine learning techniques,
specifically Support vector machine, the aim is to allow users to predict diabetes
utilizing the prediction engine. The objective is set to achieve the aims of the
project through a Research on statistical models in machine learning and to
understand how the algorithms works. This case study walks through the various
stages of the data science workflow.
LIFE CYCLE:

I. Data Collection

II. Data Exploration

III. Data Preparation

IV. Training and Evaluating the Machine Learning Model

V. Interpreting the ML Model

VI. Saving the Model

VII. Making Predictions with the Model


Methodology

1. Data Collection: The dataset used for this model is the Pima Indians
Diabetes dataset which consists of several medical predictor variables
and one target variable, Outcome. Predictor variables include the
number of pregnancies the patient has had, their Body Mass Index,
insulin level, glucose level, diabetes pedigree function, blood pressure,
skin thickness and age.

2. Data Cleaning: Clean the data by handling missing values, outliers, and
ensuring consistency. This step is crucial for accurate predictions .

3. Exploratory Data Analysis (EDA):


i. Understanding the Dataset: Analyze the dataset’s structure, distributions,
and relationships between features. EDA helps you gain insights into the
data.
ii. Visualization: Create visualizations such as histograms, scatter plots, and
correlation matrices to explore feature relationships.
4. Feature Engineering:

i. Feature Extraction: Extract meaningful features from the existing ones.


For example, you might create a new feature like “BMI category” based on
BMI values.
ii. Feature Scaling: Normalize or standardize numerical features .

5. Model Selection:
Choose appropriate machine learning models for binary classification
(diabetes vs. non-diabetes). Some common models include:
i. Logistic Regression:A simple yet effective model
ii. Random Forest: An ensemble of decision trees.
iii. Support Vector Machine (SVM): Good for non-linear data.

6. Model Training and Evaluation:

i. Data Splitting: Divide your dataset into training and testing subsets.
ii. Model Training: Train each selected model on the training data.
iii. Model Evaluation: Assess model performance using metrics such as
accuracy, sensitivity, specificity, precision, F1 score, and the Receiver
Operating Characteristic (ROC) curve.
iv. Use k-fold cross-validation to estimate how well the model generalizes to
unseen data.
7. Model Deployment:

i. Once you have a well-performing model, save it (e.g., using


Python’s pickle).
ii. Deploy the model in a production environment, such as an API, so that
users can interact with it.

8. Prediction of Diabetes:

i. Utilize the trained machine learning models to predict the probability of


individuals having diabetes based on their input features (e.g., glucose level,
BMI, etc.).
ii. Implement a user-friendly interface where users can input their data and
receive predictions.
Flow Diagram
Result:
Conclusion

After analyzing all these patient records, we’ve developed a machine


learning model (specifically, support vector machine , which performed the
best) that can effectively predict whether individuals in the dataset have
diabetes. Alongside this, we’ve gained valuable insights from the data through
analysis and visualization, aiding in the prediction of diabetes using machine
learning techniques. The project highlights the diabetes prediction of patients and
understanding supervised learning techniques in deriving actionable insights from
data. Additionally, the project underscores the iterative nature of data science
projects, emphasizing the need for continuous evaluation and refinement to meet
evolving business objectives and improve decision-making processes.

You might also like