0% found this document useful (0 votes)

13 views38 pages

Eda Lab Verified

The document outlines the structure and objectives of the Department of Artificial Intelligence and Data Science at Vel Tech Multi Tech Dr. Rangarajan Dr. Sakunthala Engineering College. It includes details on the Exploratory Data Analysis Laboratory course, its objectives, outcomes, and mapping to program outcomes and specific outcomes. Additionally, it provides guidelines for installing Python and its packages for data analysis and visualization.

Uploaded by

karthick472k4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views38 pages

Eda Lab Verified

Uploaded by

karthick472k4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Approved by AICTE, Affiliated to Anna University, Chennai.

ISO 9001:2015 Certified Institution, Accredited by NBA (BME, CSE, ECE, EEE, IT & MECH),
Accredited by NAAC.
#42, Avadi-Vel Tech Road, Avadi, Chennai- 600062, Tamil Nadu, India.

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

191ITV63/EXPLORATORY DATA ANALYSIS LABORATORY

NAME :
REGISTER NO :
VM NO : VM -
BRANCH : AI&DS
YEAR :IV
SEMESTER :VII

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

Vision

□ To promote centre of excellence through effectual Teaching and Learning, imparting the
contemporary knowledge centric education through innovative research in multidisciplinary fields.

Mission

□ To impart quality technical skills through practicing, knowledge updating in recent technology and
produce professionals with multidisciplinary and leadership skills.
□ To promote innovative thinking for design and development of software products of varying
complexity with intelligence to fulfil the global standards and demands.

□ To inculcate professional ethics among the graduates and to adapt the changing
technologies through lifelong learning.
An Autonomous Institution
Approved by AICTE, Affiliated to Anna University, Chennai.
ISO 9001:2015 Certified Institution, Accredited by NBA (BME, CSE, ECE, EEE, IT & MECH),
Accredited by NAAC.
#42, Avadi-Vel Tech Road, Avadi, Chennai- 600062, Tamil Nadu, India.

CERTIFICATE

Name:………………….………………………..................Year:……………Semester:………
Branch: B.TECH–ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
University Register No.: ……………………………….. College Roll No.:…………………

Certified that this is the bonafide record of work done by the above student in the
191ITV63 – EXPLORATORY DATA ANALYSIS LABORATORY during the academic
year 2024-2025.

Signature of the Course In-charge Signature of Head of the Department

Submitted for the University Practical Examination held on ………………... at VEL TECH
MULTI TECH Dr.RANGARAJAN Dr.SAKUNTHALA ENGINEERING COLLEGE, No.42,
AVADI – VEL TECH ROAD, AVADI, CHENNAI-600062.
Signature of Examiners:

Internal Examiner: …………… External Examiner: ………………

Date:………………………
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

POs Programme Outcomes (POs)

Engineering Knowledge: Apply knowledge of mathematics, science, engineering fundamentals and an

PO1 Engineering Specialization to the solution of complex engineering problems.

Problem Analysis: Identify, formulate, review research literature and analyze complex engineering
PO2 problems reaching substantiated conclusions using first principles of mathematics, natural sciences, and
engineering sciences.

Design / Development of solutions: Design solutions for complex engineering problems and design
PO3 system components or processes that meet specified needs with appropriate consideration for public health
and safety, cultural, societal, and environmental considerations.
Conduct Investigations of Complex Problems: Use research-based knowledge and research methods
PO4 including design of experiments, analysis and interpretation of data, and synthesis of the information to
provide valid conclusions.
Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern engineering
PO5 and IT tools including prediction and modeling to complex engineering activities with an understanding
of the limitations.
The Engineer and Society: Apply reasoning informed by the contextual knowledge to assess societal,
PO6 health, safety, legal and cultural issues and the consequent responsibilities relevant to the professional
engineering practice.
Environment and sustainability: Understand the impact of the professional engineering solutions in
PO7 societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable
development.
Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of the
PO8 engineering practice.

Individual and teamwork: Function effectively as an individual, and as a member or leader in diverse
PO9
teams, and in multidisciplinary settings.

PO10 Communication: Communicate effectively on complex engineering activities with the engineering
community and with society at large, such as, being able to comprehend and write effective reports and
design documentation, make effective presentations, and give and receive clear instructions.

Project Management and Finance: Demonstrate knowledge and understanding of the engineering and
PO11 management principles and apply these to one’s own work, as a member and leader in a team, to manage
projects and in multidisciplinary environments.

PO12 Life-long learning: Recognize the need for, and have the preparation and ability to engage in independent
and life-long learning in the broadest context of technological change.
PROGRAM EDUCATIONAL OBJECTIVES(PEOs)

PEOs Programme Educational Objectives(PEOs)

Train the graduates with the potential of strong knowledge in the respective field and to create
PEO1
innovative multidisciplinary solutions for challenges in the society.

Groom the engineers to understand, analyse different nature of data and use Machine
PEO2 Learning techniques to develop software systems with varying complexity for data intensive
applications.

To practice professionalism among the graduates and reflect good leadership skills with ethical
PEO3
standards and continued professional development through lifelong learning.

PROGRAMME SPECIFIC OUTCOMES(PSOs)

PSO’s PROGRAMME SPECIFIC OUTCOMES(PSOs)

To impart theoretical knowledge in the respective field along with recent industrial tools
PSO1
and techniques to solve societal problems

Apply the core competency obtained in the field of Machine Learning for analysis,
PSO2
design and development of computing systems for multi-disciplinary problems

Acquire knowledge in the field of intelligence, deep learning and develop software
PSO3
solutions for security and analytics of large volume of data.
COURSE OBJECTIVES

The student should be made to:

□ To understand the overview of exploratory data analysis.

□ To acquire knowledge on implementation of data visualization using Matplotlib.
□ To explore and perform univariate data exploration and analysis.

□ To apply bivariate data exploration and analysis.

□ To learn the usage of Data exploration and visualization techniques for multivariate and
time series data.
COURSE OUTCOMES

At the end of the course, the student should be able to

□ Understand the fundamentals of exploratory data analysis.

□ Implement the data visualization using Matplotlib.
□ Acquire knowledge to perform univariate data exploration and analysis.
□ Explore different methods to apply bivariate data exploration and analysis.
□ Acquire knowledge regarding utilization of different data exploration and visualization
techniques for multivariate and time series data.

CO – PO&PSO
MAPPING

CO PO 1 PO 2 PO 3 PO 4 PO 5 PO 6 PO 7 PO 8 PO 9 PO10 PO11 PO12 PSO1 PSO2 PSO3

CO1 3 3 3 - 3 3 3 2 - 2 - 1 1 - -

CO2 3 3 3 - - 3 3 2 - 2 - - - - -

CO3 3 3 3 - - 3 3 2 - 2 - 1 1 - -

CO4 3 3 3 - - 3 3 2 - - - 1 1 - -

CO5 3 3 3 - - 3 3 2 - - - 1 - - -

CO 3 3 3 - 3 3 3 2 - 2 - 1 1 - -
INDEX:

S.NO. DATE LIST OF EXPERIMENTS CO PAGE MARK FACULTY

NO. SIGN.

1. Install the data Analysis and CO1

Visualization tool: Python
2. Perform exploratory data analysis CO2
(EDA) with datasets like email
dataset. Export all your emails as a
dataset, import them inside a pandas
data frame, visualize them and get
different insights from the data.

3. Working with Numpy arrays, Pandas CO3

data frames, Basic plots using
Matplotlib.
4. Explore various variable and row CO3
filters in R for cleaning data. Apply
various plot features in R on sample
datasets and visualize.

5. Perform Time Series Analysis and CO4

Apply the various visualization
techniques.

6. Build cartographic visualization for CO4

multiple datasets involving various
countries of the world; states and
districts in India etc.

7. Perform EDA on Wine Quality CO5

DataSet.
8. Perform Data Analysis and CO4
representation on a Map using
various Map data sets with Mouse
Rollover effect, user interaction, etc.

9. Use a case study on a data set and C05

apply the various EDA and
visualization techniques and
present an analysis report.
EX.NO: 1
Install the data Analysis and Visualization tool: Python
DATE:

AIM:

To install the data analysis and visualization tool using python

How to Install Python on Windows:

There are several ways to install Python on a Windows machine. Below are the options we’ll explore in

this tutorial:

• Install
Python directly from the Microsoft Store: This quick and easy option will get you up and running
with Python in no time. It is especially useful for beginners who want to use Python on their machine for
learning purposes.

• Install Python directly

from the Python website: This method gives you more control over the installation
process and allows you to customize your installation.

• Install Python using

an Anaconda distribution: Anaconda is a popular Python distribution that comes with a
large number of pre-installed packages and tools, making it a good option for scientific computing and data
science.

No matter which method you choose, you'll be able to start using Python on your Windows machine in just a
few steps. Sometimes, you can have Python already pre-installed on your machine. Here’s how you can
check if your Windows machine has Python installed. Checking if Python is Already Installed on Your
Windows Machine

Python can be accessed via the terminal or the Start Menu.

To check if Python is installed on your Windows machine using the terminal, follow these steps:

1. Open a command line tool such as Windows Terminal (the default on Windows 11) or Command Prompt
(the default on Windows 10).

2. In the command line, type `python`. If Python is installed, you should see a message like “Python 3.x.x”
followed by the Python prompt, which looks like this “>>>”. Note that “3.x.x” represents the version
number of Python.

3. If Python is not installed on your machine, you will be automatically taken to the Microsoft Store
installation of Python. Note that the page you are taken to may not be the latest version of Python.

To check if Python is installed on your Windows machine using the Start Menu, follow these steps:

1.Press the Windows key or click on the Start button to open the Start Menu. Type "python".

2.If Python is installed, it should show up as the best match. Press "Enter" or click on the version of Python
you want to open. You should see a message like “Python 3.x.x” followed by the Python prompt, which
looks like this “>>>”. Note that “3.x.x” represents the version number of Python.

1
3.If Python is not installed on your machine, you will only see results for web searches for "python", or a
suggestion to search the Microsoft Store for "python".

How to Install Python on Windows Using Anaconda:

There are different distributions of Python that come pre-installed with relevant packages and tools. One of
the most popular distributions is the Anaconda Python distribution, which comes preinstalled with a host of
data science packages and tools for scientific computing. To download Python using an Anaconda
distribution, follow these steps:
1. Go to the Anaconda download page.
2. Scroll down to the “Anaconda Installers” section — there, you will find different versions of the Anaconda
Installer. Click on the Windows installation for the latest version of Python (at the time of writing, it is
"64-Bit Graphical Installer" for Python 3.9).
3. Download the installer file to your local machine. Once the download is finalized, start the installation by
clicking on the installer.
4. Once the download is complete, double-click the file to begin the installation process.
5. Complete the installation by clicking on Continue and ticking the license agreement until the installer
starts extracting files and the installation process is complete.
6. In the “Advanced Installations Options” screen, you have the option to “Add Anaconda3 to my PATH
environment variable”. This is only recommended if you only have the Anaconda Python installation (rather
than multiple versions) and you want to use the conda tool from the terminal (rather than from an IDE).
7. The installer will extract the files and start the installation process. This may take a few minutes. When the
installation is complete, you will be prompted to optionally install DataSpell, a data science IDE developed
by JetBrains.

8. Once the installation is successfully completed, you will see a “Thanks for installing Anaconda” screen.
Press “Finish.”
9. Once the installation is complete, follow the instructions in the section "Checking if Python is Already
2
Installed on Your Windows Machine" to check that Python has been installed correctly.
Access the Anaconda Installation of Python here
How to Install Python Packages:
Python is modular, with a large ecosystem of packages that provide functionality for specific data science
tasks. For example, the pandas package provides functionality for data manipulation, scikitlearn provides
machine learning functionality, and PyTorch provides deep learning functionality.There are two package
management tools for installing Python packages: pip3 and conda. These tools allow you to install and
upgrade Python packages.
Installing packages with pip3:
Use pip3 if you installed Python from the Python website or the Microsoft Store. To install packages withpip3,
follow these steps:
1. Before you attempt to install Python packages, make sure Python is installed on your machine. To install
Python, follow the instructions in one of the previous sections (such as downloading and installing from the
website using the Microsoft Store).
2. To install a package using pip3, open a Terminal on macOS or Command Prompt on Windows and type
the following command: The {package_name} here refers to a package you want to install. For example, to
install the numpy package, you would type: pip3 install numpy
3. If the package has dependencies (i.e., it requires other packages for it to function), pip3 will
automatically install them as well.
4. Once the installation is complete, you can import the package into your Python code. For example, if you
installed the numpy package, you could import it and use it like this:
5. import numpy as np at=pnarak["I", "love", "Python", "package", "management"])
6. If you want to update a package to the latest version, you can use the pip3 install-upgrade command. pip3
install --upgrade (package name)
For example, you update the numpy package to the latest version by following this command:
pip3 install --upgrade numpy
7. If you want to uninstall a package, you can use the pip3 uninstall command. pip3 uninstall (package
name)Installing packages with conda.
Use conda if you installed Python from Anaconda. conda comes with many Python packages for data
science installed, so you don't need to install common packages yourself.
To install packages with conda, follow these steps:
1. Before you attempt to install Python packages, make sure Python is installed on your machine. To install
Python, follow the instructions in one of the previous sections (such as downloading and installing from
Anaconda).
2. To install a package using conda, open a Terminal on macOS or Command Prompt on Windows and type

the following command:

3
for example, to install the pytorch package, type the following.
3. If you want to update a package to the latest compatible version, you can use theconda update command.
conda update {package_name}
For example, to update the pytorch package to the latest version, you would type:
conda update pytorch
4. If you want to uninstall a package, you can use the conda remove command.
conda remove {package_name}
For example, to uninstall the pytorch package, you would type:
conda remove pytorch
5. To list all the packages that are installed, use conda's list command.

conda list.

Inference:

Result:
Thus the installation process is execute successfully.

4
EX.NO: 2 Perform exploratory data analysis (EDA) with datasets like email datasets.
Export all your emails as a datasets, import them inside a pandas data frame,
DATE:
visualize them and get different insights from the data.

AIM :

Perform exploratory data analysis (EDA) on with datasets like email data set. Export all your

emails as a datasets, import them inside a pandas data frame, visualize them and get different insights from
the data

Algorithm:
Step 1. Start
Step 2. Export email datasets from email client and save it as a csv file
Step 3. Load the email datasets using pandas and numpy
Step 4. Perform EDA
Step 5. Stop
Program:
import pandas as pd
import numpy as np

#for visualization
Import matplotlib as mpl
import seaborn as sns import
matplotlib.pyplot as plt

import missing no as ms no
import plotly.express as px
import plotly.figure_factory as ff from
plotly.subplots

import make_subplots

import plotly.graph_objs as go from

wordcloud import WordCloud

print(df.shape)
category_ct=df['Category'].value_counts()
fig=px.pie(values=category_ct.values,n
ames=category_ct.index,color_discrete

5
_sequence=px.colors.sequential.OrRd,t
itle='Pie Graph: spam or not')

df.head(3)

categories = pd.get dummies(df["Category"])

spam or not pd.concat([df. categories], axis=1) spam or not.drop('Category'.axis-1,inplace-True)

df["length"]df["Message"].apply(len)

hamdf.loc[np.where(spam or not['ham] 1)] reset index() spamd.log[np.where(spam_or not['ham']0)]

reset index()

ham.drop('index.axis-1,inplace=True) spam.drop('index'.axis-1, inplace=True)hist data-

[ham['length'], spam['length']]
group, labels['ham' spam'] colors= ['black', 'red']
# Create distplot with curve type set to 'normal'
figff create distplot(hist data, group labels, show hist-False, colors-colors)
# Add title
fig update layout(title text-Length distribution of ham and spam
messages',
template 'simple white) fig.showO

freq df-Freq df(removed) top_10-freq dfl:10]

figpx.bar(top_10, x = 'Term', y 'Frequency.text = 'Frequency', color-Term',

color discrete sequence-px.colors sequential PuBuGn of Ham Terms',
title - Rank template "simple white" )

for idx in range(len(top_10)):fig data[idx].marker line width=2 fig data[idx] marker

line.color="black"

fig update traces (textposition-'inside'.text font size-11) fig.show()

string get all str(spam) words = get word(string)
removed remove stopword('1' words) freq df-Freqdf(removed) top_10 =freq df[:10]

figpx.bar(top_10, x = 'Term', y Excquency text Frequency', color discrete sequence-px.colors

sequential PuRd. template = "simple white". title-'Rank of Spam Terms', color-Term')

for idx in range(len(top_10)):fig.data[idx].market line.width-2

fig.data[idx] marker line.color="black" fig.showO

6
Output:

Inference:

Result:
The experiment has been implemented successfully.

7
EX.NO: 3
Working with Numpy arrays, Pandas data frames, Basic plots using Matplotlib.
DATE:

AIM :

Working with Numpy arrays, Pandas data frames, Basic plots using Matplotlib.

Algorithm:
Step 1. Start
Step 2: Install required libraries (if not already installed).
Step 3: Create and manipulate NumPy arrays.
Step 4: Create and manipulate Pandas DataFrames.
Step 5: Perform basic statistical operations on NumPy arrays and Pandas DataFrames.
Step 6:: Plot basic visualizations using Matplotlib.
Step 7: Display outputs and plots.
Step 8. Stop
Program:
# Step 1: Import required libraries
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

# Step 2: Create and manipulate NumPy arrays

# Creating a 2D array of shape (3, 3)
numpy_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("NumPy Array:\n", numpy_array)
# Performing basic operations on the array
array_sum = np.sum(numpy_array)
array_mean = np.mean(numpy_array)
print("\nSum of elements in NumPy Array:", array_sum)
print("Mean of elements in NumPy Array:", array_mean)
# Step 3: Create and manipulate Pandas DataFrame
# Creating a DataFrame from a dictionary

data = { 'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40], 'Salary': [50000, 60000, 70000,
80000]}

df = pd.DataFrame(data)
print("\nPandas DataFrame:\n", df)

8
# Step 4: Perform basic statistical operations on the DataFrame
df_summary = df.describe()
print("\nSummary Statistics of DataFrame:\n", df_summary)
# Step 5: Plot basic visualizations using Matplotlib
# Plotting Salary vs Age
plt.figure(figsize=(8, 6))

plt.plot(df['Age'], df['Salary'], marker='o', color='b', label='Salary vs Age')

plt.title('Salary vs Age')
plt.xlabel('Age')
plt.ylabel('Salary')
plt.grid(True)
plt.legend()
plt.show()
# Step 6: Creating a bar plot of Names vs Salary
plt.figure(figsize=(8, 6))
plt.bar(df['Name'], df['Salary'], color='green')
plt.title('Salary by Employee Name')
plt.xlabel('Name')
plt.ylabel('Salary')
plt.show()

9
Output:

Inference:

Result:
Thus the program has been executed successfully
10
EX.NO: 4 Explore various variable and row filters in R for cleaning data. Apply various
DATE: plot features in R on sample data sets and visualize

AIM:
Explore various variable and row filters in R for cleaning data. Apply various plot features in R on sample
data sets and visualize.

Algorithm:
Step 1. Start
Step 2. Clean data
Step 3. Filter data
Step 4. Select variables
Step 5. Drop variables
Step 6. Rename variables
Step 7. Filtering row variables
Step 8. Plotting features in R
Step 9. Visualization techniques implementation
Step 10. Stop

Program:
import numpy as np
a=np.array([1,2,3,4])
b=np.array([9,5,6,7])
print(a)
print("After Inserted:", np.insert(a,1,5))
print("After Deleted:", np.delete(a, [1]))
print("Concatenate:",np.concatenate ((a,b), axis=0))
c=a.reshape(2,2)
d=a.reshape(2,2)print(c)print(d)
print("Vstack:", np.vstack((c,d)))
print("hstack:", np.hstack((c,d)))
import matplotlib.pyplot as plt
plt.scatter(a,b)
plt.show()

11
OUTPUT:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data-pd.read_csv("heart.csv")
OUTPUT:

12
data.shape()
OUTPUT:

data.info()
OUTPUT:

data.describe()
OUTPUT:

data.isnull().sum()
OUTPUT:

13
data_dup=data.duplicated().any()
Print(data_dup)
OUTPUT:

data=data.drop_duplicates()
data_dup=data.duplicated().any()
Print(data_dup) Print(data.shape)
OUTPUT:

14
data.columns
OUTPUT:

data[‘target’].value_counts()
OUTPUT:

Plt.hist(data[‘target’])
Plt.xlabel(‘Affected people’)
Plt.ylabel(“count of affected people”)
Plt.title(“heart attack analysis”)
Plt.show()
15
OUTPUT:

Inference:

Result:

Thus, the program is executed successfully

16
EX.NO: 5
Perform Time Series Analysis and apply the various visualization techniques
DATE:

AIM:

Perform time series analysis and apply the various visualization techniques
Algorithm:
Step 1. Start
Step 2. Import necessary libraries
Step 3. Load the time series dataset
Step 4. Data Preprocessing
Step 5. Visualize Raw Time Series Data
Step 6. Decompose the Time Series
Step 7. Moving Average/Exponential Smoothing
Step 8. Moving Average/Exponential Smoothing
Step 9. Correlation and Autocorrelation
Step 10. Seasonality Check
Step 11. Apply Advanced Visualization Techniques
Step 12. Trend/Seasonality Analysis
Step 13. Forecasting and Prediction
Step 14. End

Program:
Pip install radar
Pip install faker
import datetime

import pandas as pd
import datetime
import random import radar
from faker import Faker
fake Faker()
def generateData(n):

listdata []

start datetime.datetime(2019,8,1)
end datetime.datetime(2019,8,30)

17
delta end start
date radar.random_datetime(start='2019-08-1', stop-'2019-08-30').strftime("%Y-%m-%d")
for in range(n):
price round(random.uniform (900,1000),4)

df= pd.DataFrame(listdata, columns = ['Date', 'Price'])

listdata.append([date, price])

df['Date'] pd.to_datetime(df['Date'], format="%Y-%m-%d')

df df.groupby (by='Date').mean()
return df

df-generateData(50)
df.head(10)

mport matplotlib.pyplot as plt

plt.rcParams['figure.figsize'] (14, 10) plt.plot(df)
plt.show()
from google.colab import drive import os
drive.mount('/content/drive')print (fnames)
['TSLA.xlsx', 'DATEPRICE.xlsx']
df1 pd.read_csv(os.path.join(path, fnames[1]))
df1.head(6)import radar
radar.random_datetime(start stop datetime.datetime(year-2000, month-5, day-24),
datetime.datetime(year=2013,month 5, day=24))
datetime.datetime(2000, 8, 31, 16, 30,
24)import radar
radar.random_datetime()
datetime.datetime(1995, 11, 4, 13, 32,
30)import pandas as pd
days['Saturday', 'Sunday', 'Monday, Tuesday, 'Wednesday', 'Thursday'
'Friday']calories [1670, 2011, 1853 2557
1390, 2118,2063]
df_days_calories pd.DataFrame( { 'day' days 'calories'
calories })df_days_calories
import numpy as np import matplotlib.pyplot as plt x= [1, 2, 3, 4] y = [95, 38, 54, 35] labels ['LABEL1',

18
'LABEL2', 'LABEL3', 'LABEL4'] plt.plot(x, y)

plt.xticks(x, labels, rotation = 'vertical')

plt.margins (0.2)
plt.subplots_adjust(bottom = 0.15)
plt.show()import numpy as np
import random

import calendar

import matplotlib.pyplot as
pltmonths list(range(1, 13))
sold_quantity [round(random.uniform (100, 200)) for x in
range(1,13)]figure, axis plt.subplots() # Outline plotting
plt.xticks (months, calendar.month_name[1:13], rotation-20) plot axis.bar(months, sold_quantity) # for
barchart drawing
for rectangle in plot:
height rectangle.get_height() axis.text(rectangle.get_x() rectangle.get_width() /2., 1.002 "height, '%d' %
int(height)) plt.show()
import numpy as
npimport random.
import calendar
import matplotlib.pyplot as
pltmonths list(range(1, 13))
sold quantity [round(random.uniform (100, 200)) for x in range(1, 13)] figure, axis
plt.subplots() plt.yticks(months, calendar.month_name[1:13], rotation-20) plot
axis.barh(months, sold_quantity)for rectangle in plot:
width rectangle.get_width()

axis.text(width 2.5, rectangle.get_y() + 0.38, 'd' % int(width), ha='center', va

'bottom') plt.show()
import matplotlib.pyplot as plt
x =[5, 7, 8, 7, 2, 17, 2, 9,4, 11, 12, 9, 6]

y [99, 86, 87, 88, 100, 86,103, 87, 94, 78, 77, 85, 86] plt.scatter(x, y, c ="blue")

19
#To show the
plot plt.show()
OUTPUT:

Inference:

Result:

The Time Series Analysis and the various visualization techniques are implemented successfully
20
EX.NO:6 Build cartographic visualization for multiple datasets involving various countries
DATE: of the world; states and districts in India etc.

Aim: .
Build cartographic visualization for multiple datasets involving various countries
of the world; states and districts in India etc
Algorithm:
Step 1. Start
Step 2: Install required libraries: geopandas, matplotlib, and
plotly.
Step 3: Load world shapeless data using GeoPandas.
Step 4: Load or create the dataset (e.g., population, GDP) for different countries or regions.
Step 5: Merge the shape-file data with your dataset using GeoPandas for geographical plotting.
Step 6: Plot the data on a world map using Matplotlib.
Step 7: Load shape-files for Indian states and districts.
Step 8: Plot specific regions like India's states or districts on a map.
Step 9: (Optional) Use interactive Plotly for creating interactive maps.
Step 10: Display the cartographic visualizations.
Step 11. Stop
Program:
# Step 1: Import necessary libraries
import geopandas as gpd
import matplotlib.pyplot as plt

# Step 2: Load world shapefile data using GeoPandas

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# Step 3: Load or create the dataset

# For this example, let's create some dummy data for population
data = {
'iso_a3': ['USA', 'CHN', 'IND', 'RUS', 'BRA'],
'population': [331002651, 1439323776, 1380004385, 145934462, 212559417]
}
df_data = pd.DataFrame(data)

# Step 4: Merge shapefile data with our dataset using GeoPandas

world = world.merge(df_data, on='iso_a3', how='left')
21
# Step 5: Plot the world map and display population data using Matplotlib
fig, ax = plt.subplots(1, 1, figsize=(15, 10))
world.boundary.plot(ax=ax)
world.plot(column='population', cmap='OrRd', legend=True, ax=ax,
missing_kwds={"color": "lightgrey",
"label": "No Data"
})
plt.title("World Population Distribution")
plt.show()

# Step 6: Load shapefile for Indian states

india_states = gpd.read_file('path_to_india_states_shapefile.shp')

# Step 7: Plot India's states on a map

fig, ax = plt.subplots(1, 1, figsize=(10, 10))
india_states.boundary.plot(ax=ax, color='black')
plt.title("India States Boundary Map")
plt.show()
# Step 8: Load shapefile for Indian districts (Optional)
india_districts = gpd.read_file('path_to_india_districts_shapefile.shp')
# Plot Indian districts on a map
fig, ax = plt.subplots(1, 1, figsize=(10, 10))
india_districts.boundary.plot(ax=ax, color='black')
plt.title("India Districts Boundary Map")
plt.show()
# Step 9: Optional - Interactive Map using Plotly (if you want interactivity)
import plotly.express as px
# Use Plotly for an interactive map of world population
fig = px.choropleth(world,
locations="iso_a3",
color="population",
hover_name="name",
color_continuous_scale=px.colors.sequential.Plasma)
fig.update_layout(title="World Population Interactive Map", title_x=0.5)
fig.show()

22
Output:

Inference:

Result:

The program has been executed successfully

23
EX.NO: 7
Perform EDA on Wine Quality Data Set.
DATE:

Aim:
Perform EDA on Wine Quality Data Set
Algorithm :

Step 1. Start
Step 2: Import necessary libraries.
Step 3: Load the Wine Quality dataset.
Step 4: Perform basic dataset exploration (e.g., checking missing values, data types, summary statistics).
Step 5: Visualize distributions of key features.
Step 6: Analyze correlations between features.
Step 7: Visualize relationships between key variables and the target variable (wine quality).
Step 8: Conclude with insights derived from the analysis
Step 9:stop
Program:
# Step 1: Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Step 2: Load the Wine Quality dataset

# Download from:
https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv
file_path = 'winequality-red.csv'
wine_data = pd.read_csv(file_path, sep=';')

# Step 3: Basic dataset exploration

print("First 5 rows of the dataset:")
print(wine_data.head())
print("\nSummary statistics:")
print(wine_data.describe())

# Checking for missing values

print("\nMissing values in the dataset:")

24
print(wine_data.isnull().sum())

# Step 4: Visualize distributions of key features

# Distribution of Wine Quality
plt.figure(figsize=(8, 6))
sns.countplot(x='quality', data=wine_data)
plt.title('Distribution of Wine Quality')
plt.xlabel('Quality')
plt.ylabel('Count')
plt.show()

# Step 5: Analyze correlations between features

# Correlation matrix
correlation_matrix = wine_data.corr()

plt.figure(figsize=(12, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Heatmap of Wine Quality Features')
plt.show()

# Step 6: Visualize relationships between key variables and wine quality

# Alcohol vs. Quality
plt.figure(figsize=(8, 6))
sns.boxplot(x='quality', y='alcohol', data=wine_data)
plt.title('Alcohol Content vs Wine Quality')
plt.xlabel('Wine Quality')
plt.ylabel('Alcohol Content')
plt.show()

# Step 7: Acidity vs. Quality (Citric Acid)

plt.figure(figsize=(8, 6))
sns.boxplot(x='quality', y='citric acid', data=wine_data)
plt.title('Citric Acid vs Wine Quality')
plt.xlabel('Wine Quality')
plt.ylabel('Citric Acid Content')
plt.show()

25
# Step 8: Sulfates vs. Quality
plt.figure(figsize=(8, 6))
sns.boxplot(x='quality', y='sulphates', data=wine_data)
plt.title('Sulfates vs Wine Quality')
plt.xlabel('Wine Quality')
plt.ylabel('Sulfates')
plt.show()
Output:

Inference:

Result:
Thus application of list for polynomial manipulation was demonstrated.
26
Perform Data Analysis and representation on a Map using various Map
EX.NO.:8
data sets with Mouse Rollover effect, user interaction, etc.
DATE:

Aim:
This experiment aims to conduct comprehensive data analysis and visualization on geographic
maps using various map datasets. The goal is to leverage interactive features such as mouse
rollover effects and user interactions to enhance data exploration and presentation on the map
interface.

Instructions :
1. Utilize the provided Python program to conduct data analysis and visualization on a
map using the Folium library.
2. Ensure that you have the necessary libraries installed, including Folium and Pandas, to
run the code successfully.
3. Load the crime dataset for Boston, ensuring to handle missing values by dropping rows
with null latitude, longitude, and district information.
4. Filter the dataset to focus on specific types of offenses, such as larceny, auto theft,
robbery, etc., occurring after 2018.
5. Further filter the dataset to extract daytime robbery incidents, considering offenses
occurring between 9 AM and 6 PM.
6. Create a base map (m_1) using Folium, specifying the location, map tiles, and zoom
level.
7. Customize the map visualization by selecting suitable tile layers, such as
'openstreetmap', and adjusting the zoom level to focus on the desired area.
8. Plot the daytime robbery incidents as markers on the map using latitude and longitude
information from the filtered dataset.
Code :
import folium
import pandas as pd
# Create a base map
m_1 = folium.Map(location=[42.32,-71.0589], tiles='openstreetmap', zoom_start=10)

m_1

27
# Load and preprocess crime dataset
crimes = pd.read_csv("/content/crime.csv", encoding='latin-1')
crimes.dropna(subset=['Lat', 'Long', 'DISTRICT'], inplace=True)
crimes = crimes[crimes.OFFENSE_CODE_GROUP.isin(['Larceny', 'Auto Theft', 'Robbery',
'Larceny From Motor Vehicle', 'Residential Burglary', 'Simple Assault', 'Harassment',
'Ballistics', 'Aggravated Assault', 'Other Burglary', 'Arson', 'Commercial Burglary', 'HOME
INVASION', 'Homicide', 'Criminal Harassment', 'Manslaughter'])]
crimes = crimes[crimes.YEAR >= 2018]

crimes.head()

# Filter for daytime robberies

daytime_robberies = crimes[((crimes.OFFENSE_CODE_GROUP == 'Robbery') &
(crimes.HOUR.isin(range(9,18))))]
# Create a map with markers for daytime robberies
78 | P a g e79 | P a g e
m_2 = folium.Map(location=[42.32,-71.0589], tiles='cartodbpositron', zoom_start=13)
for idx, row in daytime_robberies.iterrows():
folium.Marker([row['Lat'], row['Long']]).add_to(m_2)

m_2

Observations :
1. The provided Python code utilizes Folium to create interactive maps for visualizing
crime data in Boston.
2. The base map (m_1) is initially created with the 'openstreetmap' tile layer, providing an
overview of the Boston area.
3. Crime data is loaded and preprocessed, ensuring that rows with missing latitude,
longitude, and district information are dropped.
4. The dataset is filtered to focus on specific types of offenses occurring after the year 2018.
28
5. Further filtering is applied to extract daytime robbery incidents, considering offenses
occurring between 9 AM and 6 PM.
6. The final map (m_2) is generated with markers representing the locations of daytime
robberies in Boston, providing a visual representation of crime hotspots.
7. Users can interact with the map by zooming in/out and clicking on individual markers
to view specific crime details, enhancing data exploration capabilities.

Inference:

Result:
The result of the experiment is the successful implementation of data analysis and visualization on
geographic maps using various map datasets. The goal is to leverage interactive features such as mouse
rollover effects and user interactions to enhance data exploration and presentation on the map interface.

29
EX. NO.:9 Use a case study on a data set and apply the various EDA and
DATE: visualization techniques and present an analysis report.

Aim:

Use a case study on a data set and apply the various EDA and visualization techniques
and present an analysis report.

Algorithm :

Step 1. Start

Step 2. Loads the dataset and displays the first few rows.

Step 3. Provides summary statistics and correlation matrix for numerical variables.
Step 4. Creates a histogram to visualize the distribution of the `Age` variable.
Step 5. Uses a box plot to show the distribution of `Salary` by `Performance_Rating`.
Step 6. Generates a pairplot to visualize relationships between numerical variables.
Step 7. Displays a correlation heatmap to show the correlation between variables

Step 8. Assume we have a dataset (`employee_data.csv`) with the following columns: Employee_ID`
`Age`, `Salary`, `Performance_Rating`, and `Years_of_Experience`.
Step 9. Stop

Program:
# Summary statistics

summary_stats = df.describe()

print("\nSummary Statistics:\n", summary_stats)

# Correlation matrix

correlation_matrix = df.corr()

print("\nCorrelation Matrix:\n", correlation_matrix)

# Distribution of Age

plt.figure(figsize=(8, 6))

sns.histplot(df['Age'], bins=20, kde=True)

plt.title('Distribution of Age')

plt.xlabel('Age')

plt.ylabel('Frequency') plt.show()

30
# Box plot of Salary by Performance Rating

plt.figure(figsize=(10, 6))

sns.boxplot(x='Performance_Rating', y='Salary',

data=df)plt.title('Salary Distribution by Performance

Rating') plt.xlabel('Performance Rating')

plt.ylabel('Salary') plt.show()

# Pairplot for multiple variables

sns.pairplot(df, hue='Performance_Rating')

plt.suptitle('Pairplot of Employee Data', y=1.02)

plt.show()

# Correlation heatmap

plt.figure(figsize=(10, 8))

sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")

plt.title('Correlation Heatmap')

plt.show()

Output:

31
Inference:

Result:

Thus the case study on a data set and apply the various EDA andvisualization techniques was
executed and verified successfully.

CST 322 Data Analytics (Elective)
No ratings yet
CST 322 Data Analytics (Elective)
244 pages
DWDM R20 Lab Manual 3-1 Cse 2022-2023 Sem 1
No ratings yet
DWDM R20 Lab Manual 3-1 Cse 2022-2023 Sem 1
151 pages
Cse3036 Predictive Analytics Final Lab Manual
No ratings yet
Cse3036 Predictive Analytics Final Lab Manual
112 pages
CS3352 Foundations of Data Science
No ratings yet
CS3352 Foundations of Data Science
27 pages
DM Lab Manual
No ratings yet
DM Lab Manual
72 pages
DSBDA Lab Manual 2022-23
100% (2)
DSBDA Lab Manual 2022-23
148 pages
Ad3467 Data Science and Analytics Laboratory Manual
No ratings yet
Ad3467 Data Science and Analytics Laboratory Manual
59 pages
BTCS9202 Data Sciences Lab Manual
No ratings yet
BTCS9202 Data Sciences Lab Manual
39 pages
DL Lab Manual Student
No ratings yet
DL Lab Manual Student
6 pages
Tsaf Lab Manual
No ratings yet
Tsaf Lab Manual
133 pages
Ad3411 Datascience and Analytics Record
No ratings yet
Ad3411 Datascience and Analytics Record
49 pages
DSBDA Lab Manual 2022-23 Final-1
No ratings yet
DSBDA Lab Manual 2022-23 Final-1
148 pages
DEV Lab Record Updated Final
No ratings yet
DEV Lab Record Updated Final
59 pages
DL 1
No ratings yet
DL 1
63 pages
Dav - Lab Manual
No ratings yet
Dav - Lab Manual
34 pages
Ccs334 Big Data Analytics Laboratory Manual
No ratings yet
Ccs334 Big Data Analytics Laboratory Manual
75 pages
FDS Lab Manual Student Manual
No ratings yet
FDS Lab Manual Student Manual
50 pages
6 Big Data Analytics Lab Manual
No ratings yet
6 Big Data Analytics Lab Manual
73 pages
DMV Lab Manual
No ratings yet
DMV Lab Manual
45 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
93 pages
CCS334 Bda
No ratings yet
CCS334 Bda
19 pages
DATA MINING Using PYTHON
No ratings yet
DATA MINING Using PYTHON
37 pages
Aids Os Lab
No ratings yet
Aids Os Lab
132 pages
DW Lab Manual FINAL
No ratings yet
DW Lab Manual FINAL
39 pages
DL LAB MANUAL - Merged
No ratings yet
DL LAB MANUAL - Merged
58 pages
Python Lab Record AIDS
No ratings yet
Python Lab Record AIDS
79 pages
EDA Lab Record
No ratings yet
EDA Lab Record
45 pages
DV Lab Manual
No ratings yet
DV Lab Manual
68 pages
RMM Data Mining Lab Manual Iv-I Cse R16 2019-2020 PDF
No ratings yet
RMM Data Mining Lab Manual Iv-I Cse R16 2019-2020 PDF
136 pages
Department of Computer Science and Engineering: Even Semester
No ratings yet
Department of Computer Science and Engineering: Even Semester
45 pages
Data Warehousing Lab Manual Regulation 2015
No ratings yet
Data Warehousing Lab Manual Regulation 2015
51 pages
BigDataAnalytics Lab Manual (DS)
No ratings yet
BigDataAnalytics Lab Manual (DS)
44 pages
Eda Lab Manual Without Output
No ratings yet
Eda Lab Manual Without Output
33 pages
1 To 5 and 9
No ratings yet
1 To 5 and 9
38 pages
Ilide - Info Data Analytics Lab File Rohit PR
No ratings yet
Ilide - Info Data Analytics Lab File Rohit PR
23 pages
191ai32a - Data Structures Laboratory Record
No ratings yet
191ai32a - Data Structures Laboratory Record
98 pages
Ba Lab Manual
No ratings yet
Ba Lab Manual
85 pages
ML Using Python IT UPDATED
No ratings yet
ML Using Python IT UPDATED
53 pages
DMBI Lab Manual Final
No ratings yet
DMBI Lab Manual Final
56 pages
Experiment List. DSPYL
No ratings yet
Experiment List. DSPYL
10 pages
DM Lab Manual
No ratings yet
DM Lab Manual
26 pages
3 Cse Big Data Analytics 19a 05 602p R 19 Lab Manual
No ratings yet
3 Cse Big Data Analytics 19a 05 602p R 19 Lab Manual
29 pages
Data Analytics Course Handout 2024 29.11.24 Anjamma
No ratings yet
Data Analytics Course Handout 2024 29.11.24 Anjamma
42 pages
DW Lab Manual
No ratings yet
DW Lab Manual
39 pages
Data Mining. VBV JJJ Ldce Vgec
No ratings yet
Data Mining. VBV JJJ Ldce Vgec
43 pages
21ai66 ML Lab Manual
No ratings yet
21ai66 ML Lab Manual
41 pages
Iii-Ii Aids R22 ML
No ratings yet
Iii-Ii Aids R22 ML
25 pages
ML Lab Manual 20-06
No ratings yet
ML Lab Manual 20-06
40 pages
Experiment List. DSPYL
No ratings yet
Experiment List. DSPYL
10 pages
P and S Manual - II Year Aids STD
No ratings yet
P and S Manual - II Year Aids STD
26 pages
Course Plan For DEV
No ratings yet
Course Plan For DEV
18 pages
Institute's Vision
No ratings yet
Institute's Vision
57 pages
Dav Cis R20 DS
No ratings yet
Dav Cis R20 DS
9 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
62 pages
Geethanjali College of Engineering and Technology (Ugc Autonomous Institution)
No ratings yet
Geethanjali College of Engineering and Technology (Ugc Autonomous Institution)
34 pages
Internship
No ratings yet
Internship
22 pages
LecturePlan CS201 20SMP-460
No ratings yet
LecturePlan CS201 20SMP-460
5 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
Lecture Zero - UIT-Data Sciednce
No ratings yet
Lecture Zero - UIT-Data Sciednce
18 pages
Computational Science: An Introduction for Scientists and Engineers
From Everand
Computational Science: An Introduction for Scientists and Engineers
Christopher D Wentworth
No ratings yet

Eda Lab Verified

Uploaded by

Eda Lab Verified

Uploaded by

Approved by AICTE, Affiliated to Anna University, Chennai.

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

191ITV63/EXPLORATORY DATA ANALYSIS LABORATORY

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

Signature of the Course In-charge Signature of Head of the Department

Internal Examiner: …………… External Examiner: ………………

POs Programme Outcomes (POs)

Engineering Knowledge: Apply knowledge of mathematics, science, engineering fundamentals and an

PEOs Programme Educational Objectives(PEOs)

PROGRAMME SPECIFIC OUTCOMES(PSOs)

PSO’s PROGRAMME SPECIFIC OUTCOMES(PSOs)

The student should be made to:

□ To understand the overview of exploratory data analysis.

□ To apply bivariate data exploration and analysis.

At the end of the course, the student should be able to

□ Understand the fundamentals of exploratory data analysis.

CO PO 1 PO 2 PO 3 PO 4 PO 5 PO 6 PO 7 PO 8 PO 9 PO10 PO11 PO12 PSO1 PSO2 PSO3

S.NO. DATE LIST OF EXPERIMENTS CO PAGE MARK FACULTY

1. Install the data Analysis and CO1

3. Working with Numpy arrays, Pandas CO3

5. Perform Time Series Analysis and CO4

6. Build cartographic visualization for CO4

7. Perform EDA on Wine Quality CO5

9. Use a case study on a data set and C05

To install the data analysis and visualization tool using python

How to Install Python on Windows:

• Install Python directly

• Install Python using

Python can be accessed via the terminal or the Start Menu.

How to Install Python on Windows Using Anaconda:

the following command:

import plotly.graph_objs as go from

categories = pd.get dummies(df["Category"])

spam or not pd.concat([df. categories], axis=1) spam or not.drop('Category'.axis-1,inplace-True)

hamdf.loc[np.where(spam or not['ham] 1)] reset index() spamd.log[np.where(spam_or not['ham']0)]

ham.drop('index.axis-1,inplace=True) spam.drop('index'.axis-1, inplace=True)hist data-

freq df-Freq df(removed) top_10-freq dfl:10]

figpx.bar(top_10, x = 'Term', y 'Frequency.text = 'Frequency', color-Term',

for idx in range(len(top_10)):fig data[idx].marker line width=2 fig data[idx] marker

fig update traces (textposition-'inside'.text font size-11) fig.show()

figpx.bar(top_10, x = 'Term', y Excquency text Frequency', color discrete sequence-px.colors

for idx in range(len(top_10)):fig.data[idx].market line.width-2

import matplotlib.pyplot as plt

# Step 2: Create and manipulate NumPy arrays

plt.plot(df['Age'], df['Salary'], marker='o', color='b', label='Salary vs Age')

Thus, the program is executed successfully

df= pd.DataFrame(listdata, columns = ['Date', 'Price'])

df['Date'] pd.to_datetime(df['Date'], format="%Y-%m-%d')

mport matplotlib.pyplot as plt

plt.xticks(x, labels, rotation = 'vertical')

axis.text(width 2.5, rectangle.get_y() + 0.38, 'd' % int(width), ha='center', va

# Step 2: Load world shapefile data using GeoPandas

# Step 3: Load or create the dataset

# Step 4: Merge shapefile data with our dataset using GeoPandas

# Step 6: Load shapefile for Indian states

# Step 7: Plot India's states on a map

The program has been executed successfully

# Step 2: Load the Wine Quality dataset

# Step 3: Basic dataset exploration

# Checking for missing values

# Step 4: Visualize distributions of key features

# Step 5: Analyze correlations between features

# Step 6: Visualize relationships between key variables and wine quality

# Step 7: Acidity vs. Quality (Citric Acid)

# Filter for daytime robberies

print("\nSummary Statistics:\n", summary_stats)

print("\nCorrelation Matrix:\n", correlation_matrix)

sns.histplot(df['Age'], bins=20, kde=True)

data=df)plt.title('Salary Distribution by Performance

Rating') plt.xlabel('Performance Rating')

# Pairplot for multiple variables

plt.suptitle('Pairplot of Employee Data', y=1.02)

sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")

You might also like