Diabetics Data Set
Diabetics Data Set
Procedure:
Step 1: Load the Data:
Load the diabetes dataset into a suitable data structure.
If using Python, you can use libraries like Pandas to create a DataFrame.
Step 2: Explore the Data:
Examine the first few rows of the dataset to understand its structure.
Check for any missing values in the dataset.
Step 3: Understand the Features:
Review the columns and understand the meaning of each feature in the dataset.
Step 4: Descriptive Statistics:
Compute and display summary statistics for the dataset. This includes mean, median,
standard deviation, minimum, and maximum values.
Step5: Univariate Analysis:
Conduct univariate analysis for each feature.
This involves creating visualizations such as histograms and box plots to understand
the distribution of each variable.
PROGRAM:
import pandas as pd
import matplotlib.pyplot as plt
#column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI",
"DiabetesPedigreeFunction", "Age", "Outcome"]
df = pd.read_csv('D:\ARCHANA\FODS\FODS LAB\diabetes.csv')# header=None, names=column_names)
print (df.head())
# Example: Univariate analysis for the 'Glucose' variable
glucose_summary = df["Glucose"].describe()
print("Summary Statistics for Glucose:\n", glucose_summary)
# Example: Frequency analysis for the 'Glucose' variable
plt.hist(df["Glucose"],bins=20, edgecolor='black')
plt.title("Histogram of Glucose Levels")
plt.xlabel("Glucose Level")
plt.ylabel("Frequency")
plt.show ()
OUTPUT:
RESULT:
Thus the above program to calculate a univariate analysis using diabetics data set has been successfully
analysized.
Ex no:
Date: BIVARIATE ANALYSIS USING DIABETICS DATA SET.
Aim:
To write a program to calculate a bivariate analysis using diabetics data set.
Procedure:
Step 1: Load the Data:
Load the diabetes dataset into a suitable data structure.
If using Python, you can use libraries like Pandas to create a DataFrame.
Step 2: Explore the Data:
Examine the first few rows of the dataset to understand its structure.
Check for any missing values in the dataset.
Step 3: Understand the Features:
Review the columns and understand the meaning of each feature in the dataset.
Step 4: Descriptive Statistics:
Compute and display summary statistics for the dataset. This includes mean, median,
standard deviation, minimum, and maximum values.
Step 5: Bivariate Analysis:
Conduct bivariate analysis for each feature.
This involves creating visualizations such as histograms and box plots to understand
the distribution of each variable.
PROGRAM:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the Pima Indians Diabetes dataset
#url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-
diabetes.data"
#column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI",
"DiabetesPedigreeFunction", "Age", "Outcome"]
df = pd.read_csv('D:\ARCHANA\FODS\FODS LAB\diabetes.csv')
# Display the first few rows of the dataset
print(df.head())
# Bivariate analysis - pair plot
sns.pairplot(df, hue="Outcome", diag_kind='kde')
plt.show()
# Bivariate analysis - correlation matrix
correlation_matrix = df.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)
plt.title("Correlation Matrix")
plt.show()
OUTPUT:
RESULT:
Thus the above program to calculate a bivariate analysis using diabetics data set has been successfully
analysized.