0% found this document useful (0 votes)

255 views38 pages

Ad3301 Data Exploration and Visualization

Uploaded by

vishveswari surendran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

255 views38 pages

Ad3301 Data Exploration and Visualization

Uploaded by

vishveswari surendran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 38

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

VARUVN VADIVELAN INSTITUTE OF TECHNOLOGY

DHARMAPURI – 636701

DEPARTMENT OF ARTIFICIAL INTELLIGENCE ANDDATA

SCIENCE

AD3301

DATA EXPLORATION AND VISUALIZATION LABORATORY

1 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

PRACTICAL EXERCISES:

1. Install the data Analysis and Visualization tool: R/ Python /Tableau Public/ Power BI.
2. Perform exploratory data analysis (EDA) on with datasets like email data set. Export all your emails as a
dataset, import them inside a pandas data frame, visualize them and get different insights from the data.
3. Working with Numpy arrays, Pandas data frames , Basic plots using Matplotlib.
4. Explore various variable and row filters in R for cleaning data. Apply various plot features in R on sample
data sets and visualize.
5. Perform Time Series Analysis and apply the various visualization techniques.
6. Perform Data Analysis and representation on a Map using various Map data sets with Mouse Rollover
effect, user interaction, etc..
7. Build cartographic visualization for multiple datasets involving various countries of the world;
states and districts in India etc.
8. Perform EDA on Wine Quality Data Set.
9. Use a case study on a data set and apply the various EDA and visualization techniques and present an
analysis report.

2 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

LIST OF EXPERIMENTS
S.NO EXPERIMENS PAGE NO MARKS SIGNATURE

3 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

EX NO: 1
DATE: INSTALLING DATA ANALYSIS AND VISUALIZATION TOOL

AIM:
To write a steps to install data Analysis and Visualization tool: R/ Python /Tableau Public/ Power BI.

PROCEDURE:
R:
 R is a programming language and software environment specifically designed for statistical
computing and graphics.
Windows:
 Download R from the official website: https://cran.r-project.org/mirrors.html
 Run the installer and follow the installation instructions.
macOS:
 Download R for macOS from the official website: https://cran.r-project.org/mirrors.html
 Open the downloaded file and follow the installation instructions.
Linux:
 You can typically install R using your distribution's package manager. For example, on Ubuntu, you
can use the following command:
csharp
Copy code
sudo apt-get install r-base
Python:
 Python is a versatile programming language widely used for data analysis. You can install Python
and data analysis libraries using a package manager like conda or pip.
Windows:
 Download Python from the official website: https://www.python.org/downloads/windows/
 Run the installer, and make sure to check the "Add Python to PATH" option during installation.
 You can install data analysis libraries like NumPy, pandas, and matplotlib using pip.
macOS:
 macOS typically comes with Python pre-installed. You can install additional packages using pip or
set up a virtual environment using Ana

4 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

 conda.
Linux:
 Python is often pre-installed on Linux. Use your distribution's package manager to install Python if
it's not already installed. You can also use conda or pip to manage Python packages.
Tableau Public:
 Tableau Public is a free version of Tableau for creating and sharing interactive data visualizations.
 Go to the Tableau Public website: https://public.tableau.com/s/gallery
 Download and install Tableau Public by following the instructions on the website.
Power BI:
 Power BI is a business analytics service by Microsoft for creating interactive reports and dashboards.
 Go to the Power BI website: https://powerbi.microsoft.com/en-us/downloads/
 Download and install Power BI Desktop, which is the tool for creating reports and dashboards.
 Please note that the installation steps may change over time, so it's a good idea to check the official
websites for the most up-to-date instructions and download links. Additionally, system requirements
may vary, so make sure your computer meets the necessary specifications for these tools.

5 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Ex no: 2
Date: Exploratory Data Analysis (EDA) on with Datasets

Aim:
To Perform exploratory data analysis (EDA) on with datasets like email data set.
Procedure:
Exploratory Data Analysis (EDA) on email datasets involves importing the data, cleaning it, visualizing
it, and extracting insights. Here's a step-by-step guide on how to perform EDA on an email dataset using
Python and Pandas
1. Import Necessary Libraries:
Import the required Python libraries for data analysis and visualization.
2. Load Email Data:
Assuming you have a folder containing email files (e.g., .eml files), you can use the email library to
parse and extract the email contents.
3. Data Cleaning:
Depending on your dataset, you may need to clean and preprocess the data. Common
cleaning steps include handling missing values, converting dates to datetime format, and removing
duplicates.
4. Data Exploration:
Now, you can start exploring the dataset using various techniques. Here are some common EDA
tasks:
Basic Statistics:
Get summary statistics of the dataset.
Distribution of Dates:
Visualize the distribution of email dates.
5. Word Cloud for Subject or Message:
Create a word cloud to visualize common words in email subjects or messages.
6. Top Senders and Recipients:
Find the top email senders and recipients.
Depending on your dataset, you can explore further, analyze sentiment, perform network analysis, or
any other relevant analysis to gain insights from your email data.

6 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Program:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the dataset
df = pd.read_csv('D:\ARCHANA\dxv\LAB\DXV\Emaildataset.csv')
# Display basic information about the dataset
print(df.info())
# Display the first few rows of the dataset
print(df.head())
# Descriptive statistics
print(df.describe())
# Check for missing values
print(df.isnull().sum())
# Visualize the distribution of numerical variables
sns.pairplot(df)
plt.show()
# Visualize the distribution of categorical variables
sns.countplot(x='label', data=df)
plt.show()
# Correlation matrix for numerical variables
correlation_matrix = df.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.show()
# Word cloud for text data (if you have a column with text data)
from wordcloud import WordCloud
text_data = ' '.join(df['text_column'])
wordcloud = WordCloud(width=800, height=400, random_state=21,
max_font_size=110).generate(text_data)
plt.figure(figsize=(10, 7))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis('off')
plt.show()

OUT PUT:
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 5171 non-null int64
1 label 5171 non-null object
2 text 5171 non-null object
3 label_num 5171 non-null int64

7 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

dtypes: int64(2), object(2)

memory usage: 161.7+ KB
None
Unnamed: 0 label text label_num
0 605 ham Subject: enron methanol ; meter # : 988291\r\n... 0
1 2349 ham Subject: hpl nom for january 9 , 2001\r\n( see... 0
2 3624 ham Subject: neon retreat\r\nho ho ho , we ' re ar... 0
3 4685 spam Subject: photoshop , windows , office . cheap ... 1
4 2030 ham Subject: re : indian springs\r\nthis deal is t... 0
Unnamed: 0 label_num
count 5171.000000 5171.000000
mean 2585.000000 0.289886
std 1492.883452 0.453753
min 0.000000 0.000000
25% 1292.500000 0.000000
50% 2585.000000 0.000000
75% 3877.500000 1.000000
max 5170.000000 1.000000
Unnamed: 0 0
label 0
text 0
label_num 0
dtype: int64

8 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

9 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Result:
The above Performing exploratory data analysis (EDA) on with datasets like email data set has been
performed successfully.
Ex no: 03
10 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Date: Working with Numpy arrays, Pandas data frames , Basic plots using Matplotlib

Aim:
Write the steps for Working with Numpy arrays, Pandas data frames , Basic plots using Matplotlib
Procedure:
1. NumPy:
NumPy is a fundamental library for numerical computing in Python. It provides support for multi-
dimensional arrays and various mathematical functions. To get started, you'll first need to install NumPy if
you haven't already (you can use pip):

pip install numpy

Once NumPy is installed, you can use it as follows:

import numpy as np
# Creating NumPy arrays
arr = np.array([1, 2, 3, 4, 5])
print(arr)
# Basic operations
mean = np.mean(arr)
sum = np.sum(arr)
# Mathematical functions
square_root = np.sqrt(arr)
exponential = np.exp(arr)
# Indexing and slicing
first_element = arr[0]
sub_array = arr[1:4]
# Array operations
combined_array = np.concatenate([arr, sub_array])

OUTPUT:
11 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

2. Pandas:
Pandas is a powerful library for data manipulation and analysis.
You can install Pandas using pip:
pip install pandas
Here's how to work with Pandas DataFrames:
import pandas as pd

# Creating a DataFrame from a dictionary

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'Age': [25, 30, 35, 28, 22],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']
}

df = pd.DataFrame(data)
# Display the entire DataFrame
print("DataFrame:")
print(df)
# Accessing specific columns
print("\nAccessing 'Name' column:")
print(df['Name'])
# Adding a new column
df['Salary'] = [50000, 60000, 75000, 48000, 55000]
# Filtering data
print("\nPeople older than 30:")
print(df[df['Age'] > 30])
# Sorting by a column
print("\nSorting by 'Age' in descending order:")
print(df.sort_values(by='Age', ascending=False))
# Aggregating data
print("\nAverage age:")
print(df['Age'].mean())
# Grouping and aggregation
grouped_data = df.groupby('City')['Salary'].mean()
print("\nAverage salary by city:")
12 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

print(grouped_data)
# Applying a function to a column
df['Age_Squared'] = df['Age'].apply(lambda x: x ** 2)
# Removing a column
df = df.drop(columns=['Age_Squared'])
# Saving the DataFrame to a CSV file
df.to_csv('output.csv', index=False)
# Reading a CSV file into a DataFrame
new_df = pd.read_csv('output.csv')
print("\nDataFrame from CSV file:")
print(new_df)

OUTPUT:

13 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

3. Matplotlib:

Matplotlib is a popular library for creating static, animated, or interactive plots and graphs.
Install Matplotlib using pip:
pip install matplotlib
Here's a simple example of creating a basic plot:
import matplotlib.pyplot as plt
# Sample data
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Create a line plot
plt.figure(figsize=(8, 6))
plt.plot(x, y, label='Sine Wave')
plt.title('Sine Wave Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.grid(True)
plt.show()

14 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

OUTPUT:

RESULT:
Thus the above working with numpy, pandas, matplotlib has been completed successfully.
Ex no:4
Date: Exploring various variable and row filters in R for cleaning data
15 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Aim:
Exploring various variable and row filters in R for cleaning data.
PROCEDURE:
Data Preparation and Cleaning
First, let's create a sample dataset and then explore various variable and row filters to clean the data

# Create a sample dataset

set.seed(123)
data <- data.frame(
ID = 1:10,
Age = sample(18:60, 10, replace = TRUE),
Gender = sample(c("Male", "Female"), 10, replace = TRUE),
Score = sample(1:100, 10)
)
# Print the sample data
print(data)
OUTPUT:

Variable Filters
1. Filtering by a Specific Value:
To filter rows based on a specific value in a variable (e.g., only show rows where Age is greater than
30):
filtered_data <- data[data$Age > 30, ]

16 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

2. Filtering by Multiple Conditions:

You can filter rows based on multiple conditions using the & (AND) or | (OR) operators (e.g., show
rows where Age is greater than 30 and Gender is "Male"):
filtered_data <- data[data$Age > 30 & data$Gender == "Male", ]
Row Filters
1. Removing Duplicate Rows:
To remove duplicate rows based on certain columns (e.g., remove duplicates based on 'ID'):
cleaned_data <- unique(data[, c("ID", "Age", "Gender")])
2. Removing Rows with Missing Values:
To remove rows with missing values (NA):
cleaned_data <- na.omit(data)
Data Visualization
1. Apply various plot features using the ggplot2 package to visualize the cleaned data.
# Load the ggplot2 package
library(ggplot2)
# Create a scatterplot of Age vs. Score with points colored by Gender
ggplot(data = cleaned_data, aes(x = Age, y = Score, color = Gender)) +
geom_point() +
labs(title = "Scatterplot of Age vs. Score",
x = "Age",
y = "Score")
# Create a histogram of Age
ggplot(data = cleaned_data, aes(x = Age)) +
geom_histogram(binwidth = 5, fill = "blue", alpha = 0.5) +
labs(title = "Histogram of Age",
x = "Age",
y = "Frequency")
# Create a bar chart of Gender distribution
ggplot(data = cleaned_data, aes(x = Gender)) +
geom_bar(fill = "green", alpha = 0.7) +
labs(title = "Gender Distribution",
x = "Gender",
17 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

y = "Count")

RESULT:
Thus the above Exploring various variable and row filters in R for cleaning data.

EXNO: 5 PERFORM EDA ON WINE QUALITY DATA SET.

DATE
18 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

AIM:
To write a program to Perform EDA on Wine Quality Data Set.
PROGRAM:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Load the dataset
data = pd.read_csv("pathname")
# Display the first few rows of the dataset
print(data.head())
# Get information about the dataset
print(data.info())
# Summary statistics
print(data.describe())
# Distribution of wine quality
sns.countplot(data['quality'])
plt.title(" Wine Quality data set")
plt.show()
# Box plots for selected features by wine quality
features = ['alcohol', 'volatile acidity', 'citric acid', 'residual sugar']
for feature in features:
plt.figure(figsize=(8, 6))
sns.boxplot(x='quality', y=feature, data=data)
plt.title(f'{feature} by Wine Quality')
plt.show()
# Pair plot of selected features
sns.pairplot(data, vars=['alcohol', 'volatile acidity', 'citric acid', 'residual sugar'],
hue='quality', diag_kind='kde')
plt.suptitle("Pair Plot of Selected Features")
plt.show()
# Correlation heatmap
corr_matrix = data.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm", fmt=".2f")
plt.title("Correlation Heatmap")
plt.show()
# Histograms of selected features
features = ['alcohol', 'volatile acidity', 'citric acid', 'residual sugar']
for feature in features:
plt.figure(figsize=(6, 4))
sns.histplot(data[feature], kde=True, bins=20)
plt.title(f"Distribution of {feature}")
plt.show()

19 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

OUTPUT:

20 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

21 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

22 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

RESULT:
Thus the above program to to Perform EDA on Wine Quality Data Set.
23 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

EX NO:6
DATE: TIME SERIES ANALYSIS USING VARIOUS VISULAIZATION
TECHNIQUES
AIM:
To perform time series analysis and apply the various visualization techniques.

DOWNLOADING DATASET:
Step 1: Open google and type the following path in the address bar and download a dataset.
http://github.com/jbrownlee/Datasets.
Step 2: write the following code to get the details.
from pandas import read_csv
from matplotlib import pyplot
series=read_csv(‘pathname')
print(series.head())
series.plot()
pyplot.show()

OUTPUT:

24 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Step 3: To get the time series line plot:

series.plot(style='-.')
pyplot.show()

Step 4:
To create a Histogram:
series.hist()
pyplot.show()

25 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Step 5:
To create density plot:
series.plot(kind='kde')
pyplot.show()

26 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Result:
Thus the above time analysis has been checked with Various visualization techniques.

27 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

EX NO: 7
DATE: DATA ANALYSIS AND REPRESENTATION ON A MAP

AIM:
Write a program to perform data analysis and representation on a map using various map data sets
with mouse rollover effect, user interaction.
PROCEDURE:
STEP 1:
 Make sure to install the necessary libraries.
pip install geopandas folium bokeh
PROGRAM:
from bokeh.io import show
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.plotting import figure
from bokeh.layouts import column
import pandas as pd
import folium
# Load your data
data = pd.read_csv('D:\ARCHANA\dxv\LAB\DXV\geographic.csv')
# Create a Bokeh figure
p = figure(width=800, height=400, tools='pan,wheel_zoom,reset')
# Create a ColumnDataSource to hold data
source = ColumnDataSource(data)
# Add circle markers to the figure
p.circle(x='Longitude', y='Latitude', size=10, source=source, color='orange')
# Create a hover tool for mouse rollover effect
hover = HoverTool()
hover.tooltips = [("Info", "@Info"), ("Latitude", "@Latitude"), ("Longitude",
"@Longitude")]
p.add_tools(hover)
# Display the Bokeh plot
layout = column(p)
show(layout)
# Create a map centered at a specific location
m = folium.Map(location=[latitude, longitude], zoom_start=10)
# Add markers for your data points
for index, row in data.iterrows():
folium.Marker(
location=[row['Latitude'], row['Longitude']],
popup=row['Info'], # Display additional info on mouse click
).add_to(m)
# Save the map to an HTML file

28 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

m.save('map.html')

OUPUT:

29 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

RESULT:
Data analysis and representation on a map using various map data sets with mouse rollover effect,
user interaction has been completed successfully.

30 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

EX NO: 8
DATE: BUILDING CARTOGRAPHIC VISUALIZATION

AIM:
Build cartographic visualization for multiple datasets involving various countries of the world;
states and districts in India etc
PROCEDURE:
STEP 1:
Collect Datasets
Gather the datasets containing geographical information for countries, states, or districts. Make sure these
datasets include the necessary attributes for mapping (e.g., country/state/district names, codes, and
relevant data).
STEP 2:
Install Required Libraries:
pip install geopandas matplotlib
STEP 3:
Load Geographic Data:
Use Geopandas to load the geographic data for countries, states, or districts. Make sure to match the
geographical data with your datasets based on the common attributes.
STEP 4:
Merge Datasets:
Merge your datasets with the geographic data based on common attributes. This step is crucial for linking
your data to the corresponding geographic regions.
STEP 5:
Create Cartographic Visualizations:
Use Matplotlib to create cartographic visualizations. You can create separate plots for different datasets
or overlay them on a single map.
STEP 6:
Customize and Enhance:
Customize your visualizations based on your needs. You can add legends, labels, titles, and other
elements to enhance the interpretability of your maps.
STEP 7:
Save and Share:
31 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Save your visualizations as image files or interactive plots if needed. You can then share these
visualizations with others.

PROGRAM:
import pandas as pd
import geopandas as gpd
import shapely
# needs 'descartes'
import matplotlib.pyplot as plt
df = pd.DataFrame({'city': ['Berlin', 'Paris', 'Munich'],
'latitude': [52.518611111111, 48.856666666667, 48.137222222222],
'longitude': [13.408333333333, 2.3516666666667, 11.575555555556]})
gdf = gpd.GeoDataFrame(df.drop(['latitude', 'longitude'], axis=1),
crs={'init': 'epsg:4326'},
geometry=[shapely.geometry.Point(xy)
for xy in zip(df.longitude, df.latitude)])
print(gdf)
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
base = world.plot(color='white', edgecolor='black')
gdf.plot(ax=base, marker='o', color='red', markersize=5)
plt.show()

OUTPUT:
city geometry
0 Berlin POINT (13.40833 52.51861)
1 Paris POINT (2.35167 48.85667)
2 Munich POINT (11.57556 48.13722)

32 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

RESULT:
Build cartographic visualization for multiple datasets involving various countries of the world;
has been visualized successfully.
33 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

EX NO :9
DATE: VISUALIZING VARIOUS EDA TECHNIQUES AS CASE STUDY FOR
EMPLOYEMENT DATASET
AIM:
Use a case study on a data set and apply the various EDA and visualization techniques and
present an analysis report.
PROCEDURE:
Case Study: Employee Performance Dataset
1. Data Overview:
Dataset: Employee_Performance.csv
Variables:
Employee_ID
Age
Gender
Department
Years_Experience
Salary
Performance_Score (on a scale of 1 to 10)
Work_Hours
2. EDA and Visualization:
a. Descriptive Statistics:
import pandas as pd
# Load the dataset
data = pd.read_csv('Employee_Performance.csv')
# Display basic statistics
descriptive_stats = data.describe()
print(descriptive_stats)

34 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

OUTPUT:

b. Univariate Analysis:
import matplotlib.pyplot as plt
import seaborn as sns
# Age distribution
sns.histplot(data['Age'], bins=20, kde=True)
plt.title('Age Distribution')
plt.show()
# Salary distribution
sns.boxplot(x='Salary', data=data)
plt.title('Salary Distribution')
plt.show()

35 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

OUTPUT:

c. Bivariate Analysis:
# Gender vs Performance_Score
sns.boxplot(x='Gender', y='Performance_Score', data=data)
plt.title('Gender vs Performance Score')
plt.show()
# Department vs Work_Hours
sns.barplot(x='Department', y='Work_Hours', data=data)
plt.title('Department vs Work Hours')
plt.show()

36 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

OUTPUT:

37 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

3. Analysis Report:
Descriptive Statistics:
Provide insights into the central tendency and variability of each variable.
Age Distribution: Most employees fall in the age range of 25-40, with a median age around 30.
Salary Distribution: There are some outliers in the salary, possibly indicating variations in pay scales
or seniority levels.
Gender vs Performance Score: On average, there seems to be little difference in performance scores
between genders.
Department vs Work Hours: The R&D department tends to have slightly higher average work hours
compared to other departments.
Correlation Matrix: Identify relationships between variables. For example, a positive correlation
between years of experience and salary.

This is a basic example, and depending on your dataset, you might want to explore more
complex relationships, handle missing values, and outliers, or conduct statistical tests for further
validation.

RESULT:
Using a case study on an Employee dataset applied various EDA and visualization
techniques and present an analysis report.

38 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

Ad3301 Data Exploration and Visualization
No ratings yet
Ad3301 Data Exploration and Visualization
24 pages
Untitled
No ratings yet
Untitled
4 pages
Question Paper - AI (Feb 1)
No ratings yet
Question Paper - AI (Feb 1)
2 pages
Ad3311 Set4
No ratings yet
Ad3311 Set4
2 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
CCS354 Network Security
No ratings yet
CCS354 Network Security
87 pages
Security Trends, Legal, Ethical and Professional Aspects of Security
No ratings yet
Security Trends, Legal, Ethical and Professional Aspects of Security
3 pages
Dbms
No ratings yet
Dbms
99 pages
Os Lab Manual AI&DS
No ratings yet
Os Lab Manual AI&DS
64 pages
BA Lab Manual
No ratings yet
BA Lab Manual
62 pages
CCW331 BA IAT 1 Set 1 & Set 2 Questions
No ratings yet
CCW331 BA IAT 1 Set 1 & Set 2 Questions
19 pages
Cs3353 Foundations of Data Science L T P C 3 0 0 3
No ratings yet
Cs3353 Foundations of Data Science L T P C 3 0 0 3
2 pages
2.1 Exploratory Data Analysis Using Python
No ratings yet
2.1 Exploratory Data Analysis Using Python
12 pages
Ccs336 CSM Lab Manual
No ratings yet
Ccs336 CSM Lab Manual
30 pages
Big Data Analytics - CCS334 - Notes - Unit 1 - Understanding Big Data
No ratings yet
Big Data Analytics - CCS334 - Notes - Unit 1 - Understanding Big Data
40 pages
Unit - 1 EDA
No ratings yet
Unit - 1 EDA
123 pages
CCS354 Set1
No ratings yet
CCS354 Set1
2 pages
Cs3481 - Dbms Record
No ratings yet
Cs3481 - Dbms Record
63 pages
Ns Unit 2
No ratings yet
Ns Unit 2
18 pages
Lab Manual
No ratings yet
Lab Manual
59 pages
IF4071 - Deep Learning Laboratory
No ratings yet
IF4071 - Deep Learning Laboratory
1 page
AD3461 ML Lab Manual
No ratings yet
AD3461 ML Lab Manual
32 pages
Web Application Security Lab Manual Word
No ratings yet
Web Application Security Lab Manual Word
27 pages
CW3551-DIS Unit 1 Notes
No ratings yet
CW3551-DIS Unit 1 Notes
18 pages
Study On Intel 80386 Microprocessor
No ratings yet
Study On Intel 80386 Microprocessor
3 pages
FDS Lesson Plan
No ratings yet
FDS Lesson Plan
8 pages
Compiler-Design Notes
No ratings yet
Compiler-Design Notes
5 pages
CCS341 Data Warehousing
No ratings yet
CCS341 Data Warehousing
7 pages
CCS334 BDA Lab Manual Final
No ratings yet
CCS334 BDA Lab Manual Final
40 pages
Cs3461 Operating Systems Laboratory L T P C
No ratings yet
Cs3461 Operating Systems Laboratory L T P C
1 page
CCS341 Set2
100% (1)
CCS341 Set2
2 pages
CCS341 - Data Warehousing 2023 Nov Dec
No ratings yet
CCS341 - Data Warehousing 2023 Nov Dec
2 pages
Data Structures Design - AD3251 - Important Questions With Answer - Unit 1 - Abstract Data Types
No ratings yet
Data Structures Design - AD3251 - Important Questions With Answer - Unit 1 - Abstract Data Types
15 pages
Dev PDF
100% (1)
Dev PDF
35 pages
ccs341 Data Warehouse Lab Experiments
No ratings yet
ccs341 Data Warehouse Lab Experiments
26 pages
ccs346 Eda
No ratings yet
ccs346 Eda
2 pages
Programming in C - CS3251 - HandWritten Notes - Un - 250316 - 200237
No ratings yet
Programming in C - CS3251 - HandWritten Notes - Un - 250316 - 200237
38 pages
Data Warehousing Full
No ratings yet
Data Warehousing Full
41 pages
Ccs354-Network Security Laboratory
No ratings yet
Ccs354-Network Security Laboratory
52 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
Data Engineering UNIT-1
100% (1)
Data Engineering UNIT-1
14 pages
Artificial Intelligence and Machine Learning - CS3491 2021 Regulation - Question Paper 2023 Nov Dec
No ratings yet
Artificial Intelligence and Machine Learning - CS3491 2021 Regulation - Question Paper 2023 Nov Dec
11 pages
Question Bank - OS
No ratings yet
Question Bank - OS
6 pages
CS3301 Datastructure QN Paper Apr-May
No ratings yet
CS3301 Datastructure QN Paper Apr-May
2 pages
Jerusalem College of Engineering: ACADEMIC YEAR 2021 - 2022
No ratings yet
Jerusalem College of Engineering: ACADEMIC YEAR 2021 - 2022
40 pages
Compiler Design Lab Manual
No ratings yet
Compiler Design Lab Manual
36 pages
Python Record
No ratings yet
Python Record
35 pages
BDA Unit 1-1
No ratings yet
BDA Unit 1-1
21 pages
FDS Iat-2 Part-B
No ratings yet
FDS Iat-2 Part-B
4 pages
Unit 4 Fod
100% (1)
Unit 4 Fod
21 pages
CS3591 Computer Networks Lab Manual Finalized
No ratings yet
CS3591 Computer Networks Lab Manual Finalized
67 pages
Ad3311 Set 1
No ratings yet
Ad3311 Set 1
2 pages
Cs3353 Foundations of Data Science Unit V
No ratings yet
Cs3353 Foundations of Data Science Unit V
13 pages
Ccs341-Dw-Int I Key-Set I-Ar
No ratings yet
Ccs341-Dw-Int I Key-Set I-Ar
18 pages
CN Answer
No ratings yet
CN Answer
14 pages
FDSA Unit-2
No ratings yet
FDSA Unit-2
41 pages
Dev Lab Manual
No ratings yet
Dev Lab Manual
35 pages
DEV Manual - ESEC
No ratings yet
DEV Manual - ESEC
27 pages
Ad3301 Data Exploration and Visualization
100% (3)
Ad3301 Data Exploration and Visualization
30 pages
Mpls Vs IPSec VPNs
No ratings yet
Mpls Vs IPSec VPNs
2 pages
DBMS Languages and Interfaces
No ratings yet
DBMS Languages and Interfaces
18 pages
Infosys SAP AMS-framework
No ratings yet
Infosys SAP AMS-framework
4 pages
07 Activity 1
No ratings yet
07 Activity 1
1 page
Chapter-5 Data Mining - Introduction
No ratings yet
Chapter-5 Data Mining - Introduction
1 page
2021 Company Profile - AdvanceNet Group Updated
No ratings yet
2021 Company Profile - AdvanceNet Group Updated
11 pages
Digital Marketing Specialization v2
No ratings yet
Digital Marketing Specialization v2
23 pages
Soa QB
No ratings yet
Soa QB
14 pages
Test Bank For Essentials of Business Analytics 3rd Edition by Camm
100% (1)
Test Bank For Essentials of Business Analytics 3rd Edition by Camm
24 pages
First Derivatives In-Memory Databases: Peter Storeng
No ratings yet
First Derivatives In-Memory Databases: Peter Storeng
34 pages
2d Arcade Game
No ratings yet
2d Arcade Game
20 pages
Introduction To APIs and API Testing
No ratings yet
Introduction To APIs and API Testing
13 pages
CC MODULE 4.1
No ratings yet
CC MODULE 4.1
31 pages
Introduction To The Netflix Recommendation System Project: by Daksh Shrivastava
No ratings yet
Introduction To The Netflix Recommendation System Project: by Daksh Shrivastava
8 pages
Chapter 6 Programs and Apps SEM202105
100% (1)
Chapter 6 Programs and Apps SEM202105
52 pages
200 IT Security Job Interview Questions-1
No ratings yet
200 IT Security Job Interview Questions-1
188 pages
1MRG502117 - en - A - PCM600 Version 2.10 Hotfix 20210622 Release Note
No ratings yet
1MRG502117 - en - A - PCM600 Version 2.10 Hotfix 20210622 Release Note
3 pages
Diagnostic Questions: Security and Compliance
No ratings yet
Diagnostic Questions: Security and Compliance
9 pages
Acn Module 5
No ratings yet
Acn Module 5
13 pages
L6. System Software
No ratings yet
L6. System Software
9 pages
Clean Architectures in Python
100% (1)
Clean Architectures in Python
153 pages
Sun Storagetek 6140: Une Brève Introduction Aux Baies
No ratings yet
Sun Storagetek 6140: Une Brève Introduction Aux Baies
40 pages
Assignment BDA1
No ratings yet
Assignment BDA1
2 pages
Chapter 2 Updated
No ratings yet
Chapter 2 Updated
54 pages
Documentation On Transportation 1
No ratings yet
Documentation On Transportation 1
6 pages
202010221057039565516DGS Order 31of2020 21102020
No ratings yet
202010221057039565516DGS Order 31of2020 21102020
5 pages
Requirement Document: Project Name
No ratings yet
Requirement Document: Project Name
4 pages
CAD Exam - Free Actual Q&As, Page 1 - ExamTopics
No ratings yet
CAD Exam - Free Actual Q&As, Page 1 - ExamTopics
2 pages
IP Static Routes: Huawei Technologies Co., LTD
No ratings yet
IP Static Routes: Huawei Technologies Co., LTD
16 pages
B2B Marketing Automation Best Practices Guide
No ratings yet
B2B Marketing Automation Best Practices Guide
29 pages

Ad3301 Data Exploration and Visualization

Uploaded by

Ad3301 Data Exploration and Visualization

Uploaded by

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

VARUVN VADIVELAN INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ARTIFICIAL INTELLIGENCE ANDDATA

DATA EXPLORATION AND VISUALIZATION LABORATORY

1 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

2 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

3 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

4 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

5 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

6 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

7 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

dtypes: int64(2), object(2)

8 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

9 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

pip install numpy

Once NumPy is installed, you can use it as follows:

# Creating a DataFrame from a dictionary

13 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

14 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

# Create a sample dataset

16 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

2. Filtering by Multiple Conditions:

EXNO: 5 PERFORM EDA ON WINE QUALITY DATA SET.

19 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

20 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

21 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

22 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

24 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

Step 3: To get the time series line plot:

25 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

26 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

27 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

28 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

29 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

30 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

32 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

34 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

35 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

36 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

37 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

38 VARUVAN VADIVELAN INSTITUTE OF TECHNOLOGY

You might also like