0% found this document useful (0 votes)
22 views4 pages

Diabetics Data Set

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views4 pages

Diabetics Data Set

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 4

Ex no:

Date: UNIVARIATE ANALYSIS USING DIABETICS DATA SET.


Aim:
To write a program to calculate a univariate analysis using diabetics data set.

Procedure:
Step 1: Load the Data:
 Load the diabetes dataset into a suitable data structure.
 If using Python, you can use libraries like Pandas to create a DataFrame.
Step 2: Explore the Data:
 Examine the first few rows of the dataset to understand its structure.
 Check for any missing values in the dataset.
Step 3: Understand the Features:
 Review the columns and understand the meaning of each feature in the dataset.
Step 4: Descriptive Statistics:
 Compute and display summary statistics for the dataset. This includes mean, median,
standard deviation, minimum, and maximum values.
Step5: Univariate Analysis:
 Conduct univariate analysis for each feature.
 This involves creating visualizations such as histograms and box plots to understand
the distribution of each variable.
PROGRAM:
import pandas as pd
import matplotlib.pyplot as plt
#column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI",
"DiabetesPedigreeFunction", "Age", "Outcome"]
df = pd.read_csv('D:\ARCHANA\FODS\FODS LAB\diabetes.csv')# header=None, names=column_names)
print (df.head())
# Example: Univariate analysis for the 'Glucose' variable
glucose_summary = df["Glucose"].describe()
print("Summary Statistics for Glucose:\n", glucose_summary)
# Example: Frequency analysis for the 'Glucose' variable
plt.hist(df["Glucose"],bins=20, edgecolor='black')
plt.title("Histogram of Glucose Levels")
plt.xlabel("Glucose Level")
plt.ylabel("Frequency")
plt.show ()
OUTPUT:

RESULT:
Thus the above program to calculate a univariate analysis using diabetics data set has been successfully
analysized.
Ex no:
Date: BIVARIATE ANALYSIS USING DIABETICS DATA SET.
Aim:
To write a program to calculate a bivariate analysis using diabetics data set.

Procedure:
Step 1: Load the Data:
 Load the diabetes dataset into a suitable data structure.
 If using Python, you can use libraries like Pandas to create a DataFrame.
Step 2: Explore the Data:
 Examine the first few rows of the dataset to understand its structure.
 Check for any missing values in the dataset.
Step 3: Understand the Features:
 Review the columns and understand the meaning of each feature in the dataset.
Step 4: Descriptive Statistics:
 Compute and display summary statistics for the dataset. This includes mean, median,
standard deviation, minimum, and maximum values.
Step 5: Bivariate Analysis:
 Conduct bivariate analysis for each feature.
 This involves creating visualizations such as histograms and box plots to understand
the distribution of each variable.

PROGRAM:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the Pima Indians Diabetes dataset
#url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-
diabetes.data"
#column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI",
"DiabetesPedigreeFunction", "Age", "Outcome"]
df = pd.read_csv('D:\ARCHANA\FODS\FODS LAB\diabetes.csv')
# Display the first few rows of the dataset
print(df.head())
# Bivariate analysis - pair plot
sns.pairplot(df, hue="Outcome", diag_kind='kde')
plt.show()
# Bivariate analysis - correlation matrix
correlation_matrix = df.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)
plt.title("Correlation Matrix")
plt.show()

OUTPUT:

RESULT:
Thus the above program to calculate a bivariate analysis using diabetics data set has been successfully
analysized.

You might also like