Eda Lab Verified
Eda Lab Verified
ISO 9001:2015 Certified Institution, Accredited by NBA (BME, CSE, ECE, EEE, IT & MECH),
Accredited by NAAC.
#42, Avadi-Vel Tech Road, Avadi, Chennai- 600062, Tamil Nadu, India.
NAME :
REGISTER NO :
VM NO : VM -
BRANCH : AI&DS
YEAR :IV
SEMESTER :VII
□ To promote centre of excellence through effectual Teaching and Learning, imparting the
contemporary knowledge centric education through innovative research in multidisciplinary fields.
Mission
□ To impart quality technical skills through practicing, knowledge updating in recent technology and
produce professionals with multidisciplinary and leadership skills.
□ To promote innovative thinking for design and development of software products of varying
complexity with intelligence to fulfil the global standards and demands.
□ To inculcate professional ethics among the graduates and to adapt the changing
technologies through lifelong learning.
An Autonomous Institution
Approved by AICTE, Affiliated to Anna University, Chennai.
ISO 9001:2015 Certified Institution, Accredited by NBA (BME, CSE, ECE, EEE, IT & MECH),
Accredited by NAAC.
#42, Avadi-Vel Tech Road, Avadi, Chennai- 600062, Tamil Nadu, India.
CERTIFICATE
Name:………………….………………………..................Year:……………Semester:………
Branch: B.TECH–ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
University Register No.: ……………………………….. College Roll No.:…………………
Certified that this is the bonafide record of work done by the above student in the
191ITV63 – EXPLORATORY DATA ANALYSIS LABORATORY during the academic
year 2024-2025.
Submitted for the University Practical Examination held on ………………... at VEL TECH
MULTI TECH Dr.RANGARAJAN Dr.SAKUNTHALA ENGINEERING COLLEGE, No.42,
AVADI – VEL TECH ROAD, AVADI, CHENNAI-600062.
Signature of Examiners:
Problem Analysis: Identify, formulate, review research literature and analyze complex engineering
PO2 problems reaching substantiated conclusions using first principles of mathematics, natural sciences, and
engineering sciences.
Design / Development of solutions: Design solutions for complex engineering problems and design
PO3 system components or processes that meet specified needs with appropriate consideration for public health
and safety, cultural, societal, and environmental considerations.
Conduct Investigations of Complex Problems: Use research-based knowledge and research methods
PO4 including design of experiments, analysis and interpretation of data, and synthesis of the information to
provide valid conclusions.
Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern engineering
PO5 and IT tools including prediction and modeling to complex engineering activities with an understanding
of the limitations.
The Engineer and Society: Apply reasoning informed by the contextual knowledge to assess societal,
PO6 health, safety, legal and cultural issues and the consequent responsibilities relevant to the professional
engineering practice.
Environment and sustainability: Understand the impact of the professional engineering solutions in
PO7 societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable
development.
Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of the
PO8 engineering practice.
Individual and teamwork: Function effectively as an individual, and as a member or leader in diverse
PO9
teams, and in multidisciplinary settings.
PO10 Communication: Communicate effectively on complex engineering activities with the engineering
community and with society at large, such as, being able to comprehend and write effective reports and
design documentation, make effective presentations, and give and receive clear instructions.
Project Management and Finance: Demonstrate knowledge and understanding of the engineering and
PO11 management principles and apply these to one’s own work, as a member and leader in a team, to manage
projects and in multidisciplinary environments.
PO12 Life-long learning: Recognize the need for, and have the preparation and ability to engage in independent
and life-long learning in the broadest context of technological change.
PROGRAM EDUCATIONAL OBJECTIVES(PEOs)
Train the graduates with the potential of strong knowledge in the respective field and to create
PEO1
innovative multidisciplinary solutions for challenges in the society.
Groom the engineers to understand, analyse different nature of data and use Machine
PEO2 Learning techniques to develop software systems with varying complexity for data intensive
applications.
To practice professionalism among the graduates and reflect good leadership skills with ethical
PEO3
standards and continued professional development through lifelong learning.
To impart theoretical knowledge in the respective field along with recent industrial tools
PSO1
and techniques to solve societal problems
Apply the core competency obtained in the field of Machine Learning for analysis,
PSO2
design and development of computing systems for multi-disciplinary problems
Acquire knowledge in the field of intelligence, deep learning and develop software
PSO3
solutions for security and analytics of large volume of data.
COURSE OBJECTIVES
□ To learn the usage of Data exploration and visualization techniques for multivariate and
time series data.
COURSE OUTCOMES
CO – PO&PSO
MAPPING
CO1 3 3 3 - 3 3 3 2 - 2 - 1 1 - -
CO2 3 3 3 - - 3 3 2 - 2 - - - - -
CO3 3 3 3 - - 3 3 2 - 2 - 1 1 - -
CO4 3 3 3 - - 3 3 2 - - - 1 1 - -
CO5 3 3 3 - - 3 3 2 - - - 1 - - -
CO 3 3 3 - 3 3 3 2 - 2 - 1 1 - -
INDEX:
AIM:
There are several ways to install Python on a Windows machine. Below are the options we’ll explore in
this tutorial:
• Install
Python directly from the Microsoft Store: This quick and easy option will get you up and running
with Python in no time. It is especially useful for beginners who want to use Python on their machine for
learning purposes.
No matter which method you choose, you'll be able to start using Python on your Windows machine in just a
few steps. Sometimes, you can have Python already pre-installed on your machine. Here’s how you can
check if your Windows machine has Python installed. Checking if Python is Already Installed on Your
Windows Machine
To check if Python is installed on your Windows machine using the terminal, follow these steps:
1. Open a command line tool such as Windows Terminal (the default on Windows 11) or Command Prompt
(the default on Windows 10).
2. In the command line, type `python`. If Python is installed, you should see a message like “Python 3.x.x”
followed by the Python prompt, which looks like this “>>>”. Note that “3.x.x” represents the version
number of Python.
3. If Python is not installed on your machine, you will be automatically taken to the Microsoft Store
installation of Python. Note that the page you are taken to may not be the latest version of Python.
To check if Python is installed on your Windows machine using the Start Menu, follow these steps:
1.Press the Windows key or click on the Start button to open the Start Menu. Type "python".
2.If Python is installed, it should show up as the best match. Press "Enter" or click on the version of Python
you want to open. You should see a message like “Python 3.x.x” followed by the Python prompt, which
looks like this “>>>”. Note that “3.x.x” represents the version number of Python.
1
3.If Python is not installed on your machine, you will only see results for web searches for "python", or a
suggestion to search the Microsoft Store for "python".
8. Once the installation is successfully completed, you will see a “Thanks for installing Anaconda” screen.
Press “Finish.”
9. Once the installation is complete, follow the instructions in the section "Checking if Python is Already
2
Installed on Your Windows Machine" to check that Python has been installed correctly.
Access the Anaconda Installation of Python here
How to Install Python Packages:
Python is modular, with a large ecosystem of packages that provide functionality for specific data science
tasks. For example, the pandas package provides functionality for data manipulation, scikitlearn provides
machine learning functionality, and PyTorch provides deep learning functionality.There are two package
management tools for installing Python packages: pip3 and conda. These tools allow you to install and
upgrade Python packages.
Installing packages with pip3:
Use pip3 if you installed Python from the Python website or the Microsoft Store. To install packages withpip3,
follow these steps:
1. Before you attempt to install Python packages, make sure Python is installed on your machine. To install
Python, follow the instructions in one of the previous sections (such as downloading and installing from the
website using the Microsoft Store).
2. To install a package using pip3, open a Terminal on macOS or Command Prompt on Windows and type
the following command: The {package_name} here refers to a package you want to install. For example, to
install the numpy package, you would type: pip3 install numpy
3. If the package has dependencies (i.e., it requires other packages for it to function), pip3 will
automatically install them as well.
4. Once the installation is complete, you can import the package into your Python code. For example, if you
installed the numpy package, you could import it and use it like this:
5. import numpy as np at=pnarak["I", "love", "Python", "package", "management"])
6. If you want to update a package to the latest version, you can use the pip3 install-upgrade command. pip3
install --upgrade (package name)
For example, you update the numpy package to the latest version by following this command:
pip3 install --upgrade numpy
7. If you want to uninstall a package, you can use the pip3 uninstall command. pip3 uninstall (package
name)Installing packages with conda.
Use conda if you installed Python from Anaconda. conda comes with many Python packages for data
science installed, so you don't need to install common packages yourself.
To install packages with conda, follow these steps:
1. Before you attempt to install Python packages, make sure Python is installed on your machine. To install
Python, follow the instructions in one of the previous sections (such as downloading and installing from
Anaconda).
2. To install a package using conda, open a Terminal on macOS or Command Prompt on Windows and type
3
for example, to install the pytorch package, type the following.
3. If you want to update a package to the latest compatible version, you can use theconda update command.
conda update {package_name}
For example, to update the pytorch package to the latest version, you would type:
conda update pytorch
4. If you want to uninstall a package, you can use the conda remove command.
conda remove {package_name}
For example, to uninstall the pytorch package, you would type:
conda remove pytorch
5. To list all the packages that are installed, use conda's list command.
conda list.
Inference:
Result:
Thus the installation process is execute successfully.
4
EX.NO: 2 Perform exploratory data analysis (EDA) with datasets like email datasets.
Export all your emails as a datasets, import them inside a pandas data frame,
DATE:
visualize them and get different insights from the data.
AIM :
Perform exploratory data analysis (EDA) on with datasets like email data set. Export all your
emails as a datasets, import them inside a pandas data frame, visualize them and get different insights from
the data
Algorithm:
Step 1. Start
Step 2. Export email datasets from email client and save it as a csv file
Step 3. Load the email datasets using pandas and numpy
Step 4. Perform EDA
Step 5. Stop
Program:
import pandas as pd
import numpy as np
#for visualization
Import matplotlib as mpl
import seaborn as sns import
matplotlib.pyplot as plt
import missing no as ms no
import plotly.express as px
import plotly.figure_factory as ff from
plotly.subplots
import make_subplots
print(df.shape)
category_ct=df['Category'].value_counts()
fig=px.pie(values=category_ct.values,n
ames=category_ct.index,color_discrete
5
_sequence=px.colors.sequential.OrRd,t
itle='Pie Graph: spam or not')
df.head(3)
6
Output:
Inference:
Result:
The experiment has been implemented successfully.
7
EX.NO: 3
Working with Numpy arrays, Pandas data frames, Basic plots using Matplotlib.
DATE:
AIM :
Working with Numpy arrays, Pandas data frames, Basic plots using Matplotlib.
Algorithm:
Step 1. Start
Step 2: Install required libraries (if not already installed).
Step 3: Create and manipulate NumPy arrays.
Step 4: Create and manipulate Pandas DataFrames.
Step 5: Perform basic statistical operations on NumPy arrays and Pandas DataFrames.
Step 6:: Plot basic visualizations using Matplotlib.
Step 7: Display outputs and plots.
Step 8. Stop
Program:
# Step 1: Import required libraries
import numpy as np
import pandas as pd
data = { 'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40], 'Salary': [50000, 60000, 70000,
80000]}
df = pd.DataFrame(data)
print("\nPandas DataFrame:\n", df)
8
# Step 4: Perform basic statistical operations on the DataFrame
df_summary = df.describe()
print("\nSummary Statistics of DataFrame:\n", df_summary)
# Step 5: Plot basic visualizations using Matplotlib
# Plotting Salary vs Age
plt.figure(figsize=(8, 6))
9
Output:
Inference:
Result:
Thus the program has been executed successfully
10
EX.NO: 4 Explore various variable and row filters in R for cleaning data. Apply various
DATE: plot features in R on sample data sets and visualize
AIM:
Explore various variable and row filters in R for cleaning data. Apply various plot features in R on sample
data sets and visualize.
Algorithm:
Step 1. Start
Step 2. Clean data
Step 3. Filter data
Step 4. Select variables
Step 5. Drop variables
Step 6. Rename variables
Step 7. Filtering row variables
Step 8. Plotting features in R
Step 9. Visualization techniques implementation
Step 10. Stop
Program:
import numpy as np
a=np.array([1,2,3,4])
b=np.array([9,5,6,7])
print(a)
print("After Inserted:", np.insert(a,1,5))
print("After Deleted:", np.delete(a, [1]))
print("Concatenate:",np.concatenate ((a,b), axis=0))
c=a.reshape(2,2)
d=a.reshape(2,2)print(c)print(d)
print("Vstack:", np.vstack((c,d)))
print("hstack:", np.hstack((c,d)))
import matplotlib.pyplot as plt
plt.scatter(a,b)
plt.show()
11
OUTPUT:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data-pd.read_csv("heart.csv")
OUTPUT:
12
data.shape()
OUTPUT:
data.info()
OUTPUT:
data.describe()
OUTPUT:
data.isnull().sum()
OUTPUT:
13
data_dup=data.duplicated().any()
Print(data_dup)
OUTPUT:
data=data.drop_duplicates()
data_dup=data.duplicated().any()
Print(data_dup) Print(data.shape)
OUTPUT:
14
data.columns
OUTPUT:
data[‘target’].value_counts()
OUTPUT:
Plt.hist(data[‘target’])
Plt.xlabel(‘Affected people’)
Plt.ylabel(“count of affected people”)
Plt.title(“heart attack analysis”)
Plt.show()
15
OUTPUT:
Inference:
Result:
16
EX.NO: 5
Perform Time Series Analysis and apply the various visualization techniques
DATE:
AIM:
Perform time series analysis and apply the various visualization techniques
Algorithm:
Step 1. Start
Step 2. Import necessary libraries
Step 3. Load the time series dataset
Step 4. Data Preprocessing
Step 5. Visualize Raw Time Series Data
Step 6. Decompose the Time Series
Step 7. Moving Average/Exponential Smoothing
Step 8. Moving Average/Exponential Smoothing
Step 9. Correlation and Autocorrelation
Step 10. Seasonality Check
Step 11. Apply Advanced Visualization Techniques
Step 12. Trend/Seasonality Analysis
Step 13. Forecasting and Prediction
Step 14. End
Program:
Pip install radar
Pip install faker
import datetime
import pandas as pd
import datetime
import random import radar
from faker import Faker
fake Faker()
def generateData(n):
listdata []
start datetime.datetime(2019,8,1)
end datetime.datetime(2019,8,30)
17
delta end start
date radar.random_datetime(start='2019-08-1', stop-'2019-08-30').strftime("%Y-%m-%d")
for in range(n):
price round(random.uniform (900,1000),4)
listdata.append([date, price])
df-generateData(50)
df.head(10)
18
'LABEL2', 'LABEL3', 'LABEL4'] plt.plot(x, y)
import calendar
import matplotlib.pyplot as
pltmonths list(range(1, 13))
sold_quantity [round(random.uniform (100, 200)) for x in
range(1,13)]figure, axis plt.subplots() # Outline plotting
plt.xticks (months, calendar.month_name[1:13], rotation-20) plot axis.bar(months, sold_quantity) # for
barchart drawing
for rectangle in plot:
height rectangle.get_height() axis.text(rectangle.get_x() rectangle.get_width() /2., 1.002 "height, '%d' %
int(height)) plt.show()
import numpy as
npimport random.
import calendar
import matplotlib.pyplot as
pltmonths list(range(1, 13))
sold quantity [round(random.uniform (100, 200)) for x in range(1, 13)] figure, axis
plt.subplots() plt.yticks(months, calendar.month_name[1:13], rotation-20) plot
axis.barh(months, sold_quantity)for rectangle in plot:
width rectangle.get_width()
y [99, 86, 87, 88, 100, 86,103, 87, 94, 78, 77, 85, 86] plt.scatter(x, y, c ="blue")
19
#To show the
plot plt.show()
OUTPUT:
Inference:
Result:
The Time Series Analysis and the various visualization techniques are implemented successfully
20
EX.NO:6 Build cartographic visualization for multiple datasets involving various countries
DATE: of the world; states and districts in India etc.
Aim: .
Build cartographic visualization for multiple datasets involving various countries
of the world; states and districts in India etc
Algorithm:
Step 1. Start
Step 2: Install required libraries: geopandas, matplotlib, and
plotly.
Step 3: Load world shapeless data using GeoPandas.
Step 4: Load or create the dataset (e.g., population, GDP) for different countries or regions.
Step 5: Merge the shape-file data with your dataset using GeoPandas for geographical plotting.
Step 6: Plot the data on a world map using Matplotlib.
Step 7: Load shape-files for Indian states and districts.
Step 8: Plot specific regions like India's states or districts on a map.
Step 9: (Optional) Use interactive Plotly for creating interactive maps.
Step 10: Display the cartographic visualizations.
Step 11. Stop
Program:
# Step 1: Import necessary libraries
import geopandas as gpd
import matplotlib.pyplot as plt
22
Output:
Inference:
Result:
23
EX.NO: 7
Perform EDA on Wine Quality Data Set.
DATE:
Aim:
Perform EDA on Wine Quality Data Set
Algorithm :
Step 1. Start
Step 2: Import necessary libraries.
Step 3: Load the Wine Quality dataset.
Step 4: Perform basic dataset exploration (e.g., checking missing values, data types, summary statistics).
Step 5: Visualize distributions of key features.
Step 6: Analyze correlations between features.
Step 7: Visualize relationships between key variables and the target variable (wine quality).
Step 8: Conclude with insights derived from the analysis
Step 9:stop
Program:
# Step 1: Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
24
print(wine_data.isnull().sum())
plt.figure(figsize=(12, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Heatmap of Wine Quality Features')
plt.show()
25
# Step 8: Sulfates vs. Quality
plt.figure(figsize=(8, 6))
sns.boxplot(x='quality', y='sulphates', data=wine_data)
plt.title('Sulfates vs Wine Quality')
plt.xlabel('Wine Quality')
plt.ylabel('Sulfates')
plt.show()
Output:
Inference:
Result:
Thus application of list for polynomial manipulation was demonstrated.
26
Perform Data Analysis and representation on a Map using various Map
EX.NO.:8
data sets with Mouse Rollover effect, user interaction, etc.
DATE:
Aim:
This experiment aims to conduct comprehensive data analysis and visualization on geographic
maps using various map datasets. The goal is to leverage interactive features such as mouse
rollover effects and user interactions to enhance data exploration and presentation on the map
interface.
Instructions :
1. Utilize the provided Python program to conduct data analysis and visualization on a
map using the Folium library.
2. Ensure that you have the necessary libraries installed, including Folium and Pandas, to
run the code successfully.
3. Load the crime dataset for Boston, ensuring to handle missing values by dropping rows
with null latitude, longitude, and district information.
4. Filter the dataset to focus on specific types of offenses, such as larceny, auto theft,
robbery, etc., occurring after 2018.
5. Further filter the dataset to extract daytime robbery incidents, considering offenses
occurring between 9 AM and 6 PM.
6. Create a base map (m_1) using Folium, specifying the location, map tiles, and zoom
level.
7. Customize the map visualization by selecting suitable tile layers, such as
'openstreetmap', and adjusting the zoom level to focus on the desired area.
8. Plot the daytime robbery incidents as markers on the map using latitude and longitude
information from the filtered dataset.
Code :
import folium
import pandas as pd
# Create a base map
m_1 = folium.Map(location=[42.32,-71.0589], tiles='openstreetmap', zoom_start=10)
m_1
27
# Load and preprocess crime dataset
crimes = pd.read_csv("/content/crime.csv", encoding='latin-1')
crimes.dropna(subset=['Lat', 'Long', 'DISTRICT'], inplace=True)
crimes = crimes[crimes.OFFENSE_CODE_GROUP.isin(['Larceny', 'Auto Theft', 'Robbery',
'Larceny From Motor Vehicle', 'Residential Burglary', 'Simple Assault', 'Harassment',
'Ballistics', 'Aggravated Assault', 'Other Burglary', 'Arson', 'Commercial Burglary', 'HOME
INVASION', 'Homicide', 'Criminal Harassment', 'Manslaughter'])]
crimes = crimes[crimes.YEAR >= 2018]
crimes.head()
m_2
Observations :
1. The provided Python code utilizes Folium to create interactive maps for visualizing
crime data in Boston.
2. The base map (m_1) is initially created with the 'openstreetmap' tile layer, providing an
overview of the Boston area.
3. Crime data is loaded and preprocessed, ensuring that rows with missing latitude,
longitude, and district information are dropped.
4. The dataset is filtered to focus on specific types of offenses occurring after the year 2018.
28
5. Further filtering is applied to extract daytime robbery incidents, considering offenses
occurring between 9 AM and 6 PM.
6. The final map (m_2) is generated with markers representing the locations of daytime
robberies in Boston, providing a visual representation of crime hotspots.
7. Users can interact with the map by zooming in/out and clicking on individual markers
to view specific crime details, enhancing data exploration capabilities.
Inference:
Result:
The result of the experiment is the successful implementation of data analysis and visualization on
geographic maps using various map datasets. The goal is to leverage interactive features such as mouse
rollover effects and user interactions to enhance data exploration and presentation on the map interface.
29
EX. NO.:9 Use a case study on a data set and apply the various EDA and
DATE: visualization techniques and present an analysis report.
Aim:
Use a case study on a data set and apply the various EDA and visualization techniques
and present an analysis report.
Algorithm :
Step 1. Start
Step 2. Loads the dataset and displays the first few rows.
Step 3. Provides summary statistics and correlation matrix for numerical variables.
Step 4. Creates a histogram to visualize the distribution of the `Age` variable.
Step 5. Uses a box plot to show the distribution of `Salary` by `Performance_Rating`.
Step 6. Generates a pairplot to visualize relationships between numerical variables.
Step 7. Displays a correlation heatmap to show the correlation between variables
Step 8. Assume we have a dataset (`employee_data.csv`) with the following columns: Employee_ID`
`Age`, `Salary`, `Performance_Rating`, and `Years_of_Experience`.
Step 9. Stop
Program:
# Summary statistics
summary_stats = df.describe()
# Correlation matrix
correlation_matrix = df.corr()
# Distribution of Age
plt.figure(figsize=(8, 6))
plt.title('Distribution of Age')
plt.xlabel('Age')
plt.ylabel('Frequency') plt.show()
30
# Box plot of Salary by Performance Rating
plt.figure(figsize=(10, 6))
sns.boxplot(x='Performance_Rating', y='Salary',
plt.ylabel('Salary') plt.show()
sns.pairplot(df, hue='Performance_Rating')
plt.show()
# Correlation heatmap
plt.figure(figsize=(10, 8))
plt.title('Correlation Heatmap')
plt.show()
Output:
31
Inference:
Result:
Thus the case study on a data set and apply the various EDA andvisualization techniques was
executed and verified successfully.
32