0% found this document useful (0 votes)
25 views33 pages

Eda Lab Manual Without Output

Uploaded by

Pradi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views33 pages

Eda Lab Manual Without Output

Uploaded by

Pradi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

An Autonomous Institution

Approved by AICTE, Affiliated to Anna University, Chennai.

ISO 9001:2015 Certified Institution, Accredited by NBA (BME, CSE, ECE, EEE, IT & MECH), Accredited by NAAC.
#42, Avadi-Vel Tech Road, Avadi, Chennai- 600062, Tamil Nadu, India.

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

191ITV63/ EXPLORATORY DATA ANALYSIS LABORATORY

NAME :

REGISTER NO :

VM NO : VM -

BRANCH : AI&DS

YEAR : IV

SEMESTER : VII

Vision

 To promote centre of excellence through effectual Teaching and Learning, imparting the
contemporary knowledge centric education through innovative research in multidisciplinary fields.

Mission

 To impart quality technical skills through practicing, knowledge updating in recent technology
and produce professionals with multidisciplinary and leadership skills.
 To promote innovative thinking for design and development of software products of varying
complexity with intelligence to fulfil the global standards and demands.
 To inculcate professional ethics among the graduates and to adapt the changing technologies
through lifelong learning.
An Autonomous Institution

Approved by AICTE, Affiliated to Anna University, Chennai.

ISO 9001:2015 Certified Institution, Accredited by NBA (BME, CSE, ECE, EEE, IT & MECH), Accredited by NAAC.
#42, Avadi-Vel Tech Road, Avadi, Chennai- 600062, Tamil Nadu, India.

CERTIFICATE

Name …………………….………………Year: ……………… Semester: ......................... Branch:


B.TECH – ARTIFICIAL INTELLIGENCE AND DATA SCIENCE University Register

No………………….. College Roll No: ................................. Certified that this is the bonafide record of work
done by the above student in the 191ITV63– EXPLORATORY DATA ANALYSIS LABORATORY
during the academic year 2024-2025.

Signature of the Course In-charge Signature of Head of the Department

Submitted for the University Practical Examination held on ………………... at VEL TECH MULTI
TECH Dr.RANGARAJAN Dr.SAKUNTHALA ENGINEERING COLLEGE, No.42, AVADI – VEL
TECH ROAD, AVADI, CHENNAI-600062.

Signature of Examiners

Internal Examiner: …………… External Examiner: ………………

Date: ………………
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA
SCIENCE
POs Programme Outcomes
(POs)

PO1 Engineering Knowledge: Apply knowledge of mathematics, science, engineering


fundamentals and an Engineering Specialization to the solution of complex
engineeringproblems.

PO2 Problem Analysis: Identify, formulate, review research literature and analyze complex
engineering problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.

PO3 Design / Development of solutions: Design solutions for complex engineering problems and
design system components or processes that meet specified needs with appropriate consideration
for public health and safety, cultural, societal, and environmental considerations.

PO4 Conduct Investigations of Complex Problems: Use research-based knowledge and research
methods including design of experiments, analysis and interpretation of data, and synthesis of
the information to provide valid conclusions.

PO5 Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities
withan understanding of the limitations.

PO6 The Engineer and Society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to
theprofessional engineering practice.

PO7 Environment and sustainability: Understand the impact of the professional engineering
solutionsin societal and environmental contexts, and demonstrate the knowledge of, and need
for sustainable development.

PO8 Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
normsof the engineering practice.

PO9 Individual and team work: Function effectively as an individual, and as a member or leader
indiverse teams, and in multidisciplinary settings.

PO10 Communication: Communicate effectively on complex engineering activities with the


engineeringcommunity and with society at large, such as, being able to comprehend and write
effective reportsand design documentation, make effective presentations, and give and receive
clear instructions

PO11 Project Management and Finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.

PO12 Life-long learning: Recognize the need for, and have the preparation and ability to engage
inindependent and life-long learning in the broadest context of technological change.
PROGRAM EDUCATIONAL OBJECTIVES(PEOs)

PEOs Programme Educational Objectives(PEOs)

Train the graduates with the potential of strong knowledge in the respective field and to create
PEO1
innovative multidisciplinary solutions for challenges in the society.

Groom the engineers to understand, analyse different nature of data and use Machine Learning
PEO2 techniques to develop software systems with varying complexity for data intensive
applications.

To practice professionalism among the graduates and reflect good leadership skills with ethical
PEO3 standards and continued professional development through lifelong learning.

PROGRAMME SPECIFIC OUTCOMES(PSOs)

PSO’s PROGRAMME SPECIFIC OUTCOMES(PSOs)

To impart theoretical knowledge in the respective field along with recent industrial tools and
PSO1
techniques to solve societal problems

Apply the core competency obtained in the field of Machine Learning for analysis, design
PSO2
and development of computing systems for multi-disciplinary problems

Acquire knowledge in the field of intelligence, deep learning and develop software solutions
PSO3 for security and analytics of large volume of data.
COURSE OBJECTIVES

The student should be made to:

 To understand the overview of exploratory data analysis.


 To acquire knowledge on implementation of data visualization using Matplotlib.
 To explore and perform univariate data exploration and analysis
 To apply bivariate data exploration and analysis.
 To learn the usage of Data exploration and visualization techniques for multivariate and
time series data.

COURSE OUTCOMES

At the end of the course, the student should be able to

 Understand the fundamentals of exploratory data analysis.


 Implement the data visualization using Matplotlib.
 Acquire knowledge to perform univariate data exploration and analysis.
 Explore different methods to apply bivariate data exploration and analysis.
 Acquire knowledge regarding utilization of different
data exploration and visualization techniques for multivariate and time series data.

Mapping CO's with PO's & PSO’S

Course
PSO1

PSO2

PSO3
Outcome
PO10

PO11

PO12
PO 1
PO 2

PO 3

PO 4

PO 5

PO 6

PO 7

PO 8

PO 9

-
CO1 - - -

- - -
CO2 - - - -

- -
CO3 - - - -

CO4 - - - - - - -

CO5 - - - - - - - -

CO - - - - -
PAGE FACULTY
NO SIGN
DATE LIST OF EXPERIMENTS
S.NO CO

1 CO1
Install the data Analysis and Visualization tool: Python

Perform exploratory data analysis (EDA) with datasets like


2 email dataset. Export all your emails as a dataset, import CO2
them inside a pandas data frame, visualize them and
get different insights from the data.

3
Working with Numpy arrays, Pandas data frames, Basic CO3
plots using Matplotlib.

4 Explore various variable and row filters in R for cleaning


data. Apply various plot features in R on sample CO4
data sets and visualize.

Perform Time Series Analysis and apply the various


5
visualization techniques CO5

6 Build cartographic visualization for multiple datasets involvi


ng various countries of the world; states and districts in CO5
India etc.
7
Perform EDA on Wine Quality Data Set. CO6

8
Use a case study on a data set and apply the various EDA an CO6
d visualization techniques and present an analysis report.
EX.NO: 1
Install the data Analysis and Visualization tool: Python
DATE:

AIM:

To install the data analysis and visualization tool using python

How to Install Python on Windows

There are several ways to install Python on a Windows machine. Below are the options we’ll explore in

this tutorial:

•Install Python directly from the Microsoft Store: This quick and easy option will get you up and running with
Python in no time. It is especially useful for beginners who want to use Python on their machine for learning
purposes.

•Install Python directly from the Python website: This method gives you more control over the installation
process and allows you to customize your installation.

•Install Python using an Anaconda distribution: Anaconda is a popular Python distribution that comes with a
large number of pre-installed packages and tools, making it a good option for scientific computing and data
science.

No matter which method you choose, you'll be able to start using Python on your Windows machine in just a few
steps. Sometimes, you can have Python already pre-installed on your machine. Here’s how you can check if your
Windows machine has Python installed. Checking if Python is Already Installed on Your Windows Machine

Python can be accessed via the terminal or the Start Menu.

To check if Python is installed on your Windows machine using the terminal, follow these steps:

1.Open a command line tool such as Windows Terminal (the default on Windows 11) or Command Prompt (the
default on Windows 10).

2.In the command line, type `python`. If Python is installed, you should see a message like “Python 3.x.x”
followed by the Python prompt, which looks like this “>>>”. Note that “3.x.x” represents the version number of
Python.

3.If Python is not installed on your machine, you will be automatically taken to the Microsoft Store installation of
Python. Note that the page you are taken to may not be the latest version of Python.

To check if Python is installed on your Windows machine using the Start Menu, follow these steps:

1.Press the Windows key or click on the Start button to open the Start Menu. Type "python".

2.If Python is installed, it should show up as the best match. Press "Enter" or click on the version of Python you
want to open. You should see a message like “Python 3.x.x” followed by the Python prompt, which looks like
this “>>>”. Note that “3.x.x” represents the version number of Python.

3.If Python is not installed on your machine, you will only see results for web searches for "python", or a
suggestion to search the Microsoft Store for "python".
How to Install Python on Windows Using Anaconda

There are different distributions of Python that come pre-installed with relevant packages and tools.
One of the most popular distributions is the Anaconda Python distribution, which comes
preinstalled with a host of data science packages and tools for scientific computing. To download
Python using an Anaconda distribution, follow these steps:

1.Go to the Anaconda download page.

2.Scroll down to the “Anaconda Installers” section — there, you will find different versions of the
Anaconda Installer. Click on the Windows installation for the latest version of Python (at the time of
writing, it is

"64-Bit Graphical Installer" for Python 3.9).

3.Download the installer file to your local machine. Once the download is finalized, start the
installation by clicking on the installer.

4.Once the download is complete, double-click the file to begin the installation process.

5.Complete the installation by clicking on Continue and ticking the license agreement until the
installer starts extracting files and the installation process is complete.

6.In the “Advanced Installations Options” screen, you have the option to “Add Anaconda3 to my
PATH environment variable”. This is only recommended if you only have the Anaconda Python
installation (rather than multiple versions) and you want to use the conda tool from the terminal
(rather than from an IDE).

7.The installer will extract the files and start the installation process. This may take a few minutes.
When the installation is complete, you will be prompted to optionally install DataSpell, a data
science IDE developed by JetBrains.
8.Once the installation is successfully completed, you will see a “Thanks for installing Anaconda”
screen. Press “Finish.”

9.Once the installation is complete, follow the instructions in the section "Checking if Python is
Already Installed on Your Windows Machine" to check that Python has been installed correctly.

Access the Anaconda Installation of Python here

How to Install Python Packages

Python is modular, with a large ecosystem of packages that provide functionality for specific data
science tasks. For example, the pandas package provides functionality for data manipulation,
scikitlearn provides machine learning functionality, and PyTorch provides deep learning
functionality.There are two package management tools for installing Python packages: pip3 and
conda. These tools allow you to install and upgrade Python packages.

Installing packages with pip3

Use pip3 if you installed Python from the Python website or the Microsoft Store. To install
packages with pip3, follow these steps:

1.Before you attempt to install Python packages, make sure Python is installed on your machine. To
install Python, follow the instructions in one of the previous sections (such as downloading and
installing from the website using the Microsoft Store).

2.To install a package using pip3, open a Terminal on macOS or Command Prompt on Windows
and type the following command: The {package_name} here refers to a package you want to install.
For example, to install the numpy package, you would type: pip3 install numpy

3.If the package has dependencies (i.e., it requires other packages for it to function), pip3 will
automatically install them as well.

4.Once the installation is complete, you can import the package into your Python code. For
example, if you installed the numpy package, you could import it and use it like this:

Installing packages with conda

Use conda if you installed Python from Anaconda. conda comes with many Python packages for
data science installed, so you don't need to install common packages yourself.
To install packages with conda, follow these steps:

1.Before you attempt to install Python packages, make sure Python is installed on your machine. To
install Python, follow the instructions in one of the previous sections (such as downloading and
installing from Anaconda).

2.To install a package using conda, open a Terminal on macOS or Command Prompt on Windows
and type the following command:

for example, to install the pytorch package, type the following.

3.If you want to update a package to the latest compatible version, you can use the conda update
command.

conda update {package_name}

For example, to update the pytorch package to the latest version, you would type:

conda update pytorch

4.If you want to uninstall a package, you can use the conda remove command.

conda remove {package_name}

For example, to uninstall the pytorch package, you would type:

conda remove pytorch

5.To list all the packages that are installed, use conda's list command.

conda list

Result:

Thus the installation process is execute successfully


EX.NO: 2 Perform exploratory data analysis (EDA) with datasets like email dataset.
Export all your emails as a dataset, import them inside a pandas data frame,
DATE visualize them and get different insights from the data.

AIM :

Perform exploratory data analysis (EDA) on with datasets like email data set. Export all your

emails as a dataset, import them inside a pandas data frame, visualize them and get different insights from
the data

Algorithm
Step 1. Start
Step 2. Export email dataset from email client and save it as a csv file
Step 3. Load the email dataset using pandas and numpy
Step 4. Perform EDA
Step 5. Stop
Program

import pandas as pd import


numpy as np

# for visualization import matplotlib as mpl


import seaborn as sns import
matplotlib.pyplot as plt import missingno as
msno import plotly.express as px import
plotly.figure_factory as ff from
plotly.subplots import make_subplots import
plotly.graph_objs as go from wordcloud
import WordCloud

print(df.shape)
df = pd.read_csv('../input/spam-email/spam.csv') msno.matrix(df).set_title('Distribution
of missing values',fontsize=20)
df.head(3)

category_ct = df['Category'].value_counts()
fig = px.pie(values=category_ct.values,
names=category_ct.index,
color_discrete_sequence=px.colors.sequential.OrRd,
title= 'Pie Graph: spam or not')
fig.update_traces(hoverinfo='label+percent', textinfo='label+value+percent', textfont_size=15,
marker=dict(line=dict(color='#000000', width=2))) fig.show()
Output:

Result:
The experiment has been implemented successfully.
EX.NO: 3 Working with Numpy arrays, Pandas data frames, Basic
DATE plots using Matplotlib.

AIM :

Working with Numpy arrays, Pandas data frames, Basic plots using Matplotlib.

Algorithm
Step 1. Start
Step 2: Install required libraries (if not already installed).
Step 3: Create and manipulate NumPy arrays.
Step 4: Create and manipulate Pandas DataFrames.
Step 5: Perform basic statistical operations on NumPy arrays and Pandas DataFrames.
Step 6:: Plot basic visualizations using Matplotlib.
Step 7: Display outputs and plots.
Step 8. Stop
Program
# Step 1: Import required libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

# Step 2: Create and manipulate NumPy arrays

# Creating a 2D array of shape (3, 3)

numpy_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print("NumPy Array:\n", numpy_array)

# Performing basic operations on the array

array_sum = np.sum(numpy_array)

array_mean = np.mean(numpy_array)

print("\nSum of elements in NumPy Array:", array_sum)

print("Mean of elements in NumPy Array:", array_mean)

# Step 3: Create and manipulate Pandas DataFrame

# Creating a DataFrame from a dictionary

data = {

'Name': ['Alice', 'Bob', 'Charlie', 'David'],


'Age': [25, 30, 35, 40],

'Salary': [50000, 60000, 70000, 80000]

df = pd.DataFrame(data)

print("\nPandas DataFrame:\n", df)

# Step 4: Perform basic statistical operations on the DataFrame

df_summary = df.describe()

print("\nSummary Statistics of DataFrame:\n", df_summary)

# Step 5: Plot basic visualizations using Matplotlib

# Plotting Salary vs Age

plt.figure(figsize=(8, 6))

plt.plot(df['Age'], df['Salary'], marker='o', color='b', label='Salary vs Age')

plt.title('Salary vs Age')

plt.xlabel('Age')

plt.ylabel('Salary')

plt.grid(True)

plt.legend()

plt.show()

# Step 6: Creating a bar plot of Names vs Salary

plt.figure(figsize=(8, 6))

plt.bar(df['Name'], df['Salary'], color='green')

plt.title('Salary by Employee Name')

plt.xlabel('Name')

plt.ylabel('Salary')

plt.show()
OUTPUT

Result:
Thus the program has been executed successfully
EX.NO: 4 Explore various variable and row filters in R for cleaning data. Apply various
DATE: plot features in R on sample data sets and visualize

AIM:

Explore various variable and row filters in R for cleaning data. Apply various plot features in R on sample
data sets and visualize

Algorithm
Step 1. Start
Step 2. Clean data
Step 3. Filter data
Step 4. Select variables
Step 5. Drop variables
Step 6. Rename variables
Step 7. Filtering row variables
Step 8. Plotting features in R
Step 9. Visualization techniques implementation
Step 10. Stop

Program

1. Working with Numpy


RESULT:

Thus, the program is executed successfully


EX.NO: 5
Perform Time Series Analysis and apply the various visualization techniques
DATE

AIM:

Algorithm
Step 1. Start
Step 2.
Step 3.
Step 4.
Step 5.
Step 6.
Step 7.
Step 8.
Step 9.
Step 10.

Program
OUTPUT

RESULT

The Time Series Analysis and the various visualization techniques are implemented successfully
EX.NO:6 Build cartographic visualization for multiple datasets involving various countries
DATE of the world; states and districts in India etc.

Aim: .
Algorithm
Step 1. Start
Step 2: Install required libraries: geopandas, matplotlib, and plotly.
Step 3: Load world shapefile data using GeoPandas.
Step 4: Load or create the dataset (e.g., population, GDP) for different countries or regions.
Step 5: Merge the shapefile data with your dataset using GeoPandas for geographical plotting.
Step 6: Plot the data on a world map using Matplotlib.
Step 7: Load shapefiles for Indian states and districts.
Step 8: Plot specific regions like India's states or districts on a map.
Step 9: (Optional) Use interactive Plotly for creating interactive maps.
Step 10: Display the cartographic visualizations.
Step 11. Stop
Program
# Step 1: Import necessary libraries
import geopandas as gpd
import matplotlib.pyplot as plt

# Step 2: Load world shapefile data using GeoPandas


world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# Step 3: Load or create the dataset


# For this example, let's create some dummy data for population
data = {
'iso_a3': ['USA', 'CHN', 'IND', 'RUS', 'BRA'],
'population': [331002651, 1439323776, 1380004385, 145934462, 212559417]
}
df_data = pd.DataFrame(data)

# Step 4: Merge shapefile data with our dataset using GeoPandas


world = world.merge(df_data, on='iso_a3', how='left')

# Step 5: Plot the world map and display population data using Matplotlib
fig, ax = plt.subplots(1, 1, figsize=(15, 10))
world.boundary.plot(ax=ax)
world.plot(column='population', cmap='OrRd', legend=True, ax=ax, missing_kwds={
"color": "lightgrey",
"label": "No Data"
})
plt.title("World Population Distribution")
plt.show()

# Step 6: Load shapefile for Indian states


india_states = gpd.read_file('path_to_india_states_shapefile.shp')

# Step 7: Plot India's states on a map


fig, ax = plt.subplots(1, 1, figsize=(10, 10))
india_states.boundary.plot(ax=ax, color='black')
plt.title("India States Boundary Map")
plt.show()
# Step 8: Load shapefile for Indian districts (Optional)
india_districts = gpd.read_file('path_to_india_districts_shapefile.shp')

# Plot Indian districts on a map


fig, ax = plt.subplots(1, 1, figsize=(10, 10))
india_districts.boundary.plot(ax=ax, color='black')
plt.title("India Districts Boundary Map")
plt.show()

# Step 9: Optional - Interactive Map using Plotly (if you want interactivity)
import plotly.express as px

# Use Plotly for an interactive map of world population


fig = px.choropleth(world,
locations="iso_a3",
color="population",
hover_name="name",
color_continuous_scale=px.colors.sequential.Plasma)
fig.update_layout(title="World Population Interactive Map", title_x=0.5)
fig.show()

Result:

The program has been executed successfully


EX.NO: 7
Perform EDA on Wine Quality Data Set.
DATE

Aim:
Perform EDA on Wine Quality Data Set
Algorithm
Step 1. Start
Step 2: Import necessary libraries.
Step 3: Load the Wine Quality dataset.
Step 4: Perform basic dataset exploration (e.g., checking missing values, data types, summary statistics).
Step 5: Visualize distributions of key features.
Step 6: Analyze correlations between features.
Step 7: Visualize relationships between key variables and the target variable (wine quality).
Step 8: Conclude with insights derived from the analysis
Step 9:stop
Program
# Step 1: Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Step 2: Load the Wine Quality dataset


# Download from: https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-
red.csv
file_path = 'winequality-red.csv'
wine_data = pd.read_csv(file_path, sep=';')

# Step 3: Basic dataset exploration


print("First 5 rows of the dataset:")
print(wine_data.head())

print("\nSummary statistics:")
print(wine_data.describe())

# Checking for missing values


print("\nMissing values in the dataset:")
print(wine_data.isnull().sum())

# Step 4: Visualize distributions of key features


# Distribution of Wine Quality
plt.figure(figsize=(8, 6))
sns.countplot(x='quality', data=wine_data)
plt.title('Distribution of Wine Quality')
plt.xlabel('Quality')
plt.ylabel('Count')
plt.show()

# Step 5: Analyze correlations between features


# Correlation matrix
correlation_matrix = wine_data.corr()
plt.figure(figsize=(12, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Heatmap of Wine Quality Features')
plt.show()

# Step 6: Visualize relationships between key variables and wine quality


# Alcohol vs. Quality
plt.figure(figsize=(8, 6))
sns.boxplot(x='quality', y='alcohol', data=wine_data)
plt.title('Alcohol Content vs Wine Quality')
plt.xlabel('Wine Quality')
plt.ylabel('Alcohol Content')
plt.show()

# Step 7: Acidity vs. Quality (Citric Acid)


plt.figure(figsize=(8, 6))
sns.boxplot(x='quality', y='citric acid', data=wine_data)
plt.title('Citric Acid vs Wine Quality')
plt.xlabel('Wine Quality')
plt.ylabel('Citric Acid Content')
plt.show()

# Step 8: Sulfates vs. Quality


plt.figure(figsize=(8, 6))
sns.boxplot(x='quality', y='sulphates', data=wine_data)
plt.title('Sulfates vs Wine Quality')
plt.xlabel('Wine Quality')
plt.ylabel('Sulfates')
plt.show()
Output

Result: Thus application of list for polynomial manipulation was demonstrated


EX. 8 Use a case study on a data set and apply the various EDA and visualization tech
DATE niques and present an analysis report.

Aim:

Use a case study on a data set and apply the various EDA and visualization techniques
andpresent an analysis report.

Algorithm

Step 1. Start

Step 2. Loads the dataset and displays the first few rows.
Step 3. Provides summary statistics and correlation matrix for numerical variables.
Step 4. Creates a histogram to visualize the distribution of the `Age` variable.
Step 5. Uses a box plot to show the distribution of `Salary` by `Performance_Rating`.
Step 6. Generates a pairplot to visualize relationships between numerical variables.
Step 7. Displays a correlation heatmap to show the correlation between variables
Step 8. Assume we have a dataset (`employee_data.csv`) with the following columns: Employee_ID`
`Age`, `Salary`, `Performance_Rating`, and `Years_of_Experience`.
Step 9. Stop

Program
# Summary statistics

summary_stats = df.describe()

print("\nSummary Statistics:\n", summary_stats)

# Correlation matrix

correlation_matrix = df.corr()

print("\nCorrelation Matrix:\n", correlation_matrix)

# Distribution of Age

plt.figure(figsize=(8, 6))

sns.histplot(df['Age'], bins=20, kde=True)

plt.title('Distribution of Age')

plt.xlabel('Age')

plt.ylabel('Frequency') plt.show()
# Box plot of Salary by Performance Rating

plt.figure(figsize=(10, 6))

sns.boxplot(x='Performance_Rating', y='Salary', data=df)

plt.title('Salary Distribution by Performance Rating') plt.xlabel('Performance Rating')

plt.ylabel('Salary') plt.show()

# Pairplot for multiple variables

sns.pairplot(df, hue='Performance_Rating')

plt.suptitle('Pairplot of Employee Data', y=1.02)

plt.show()

# Correlation heatmap

plt.figure(figsize=(10, 8))

sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")

plt.title('Correlation Heatmap')

plt.show()

OUTPUT:

RESULT

Thus the case study on a data set and apply the various EDA and
visualization techniques was execute and verified successfully.

You might also like