Eda Lab Manual Without Output
Eda Lab Manual Without Output
ISO 9001:2015 Certified Institution, Accredited by NBA (BME, CSE, ECE, EEE, IT & MECH), Accredited by NAAC.
#42, Avadi-Vel Tech Road, Avadi, Chennai- 600062, Tamil Nadu, India.
NAME :
REGISTER NO :
VM NO : VM -
BRANCH : AI&DS
YEAR : IV
SEMESTER : VII
Vision
To promote centre of excellence through effectual Teaching and Learning, imparting the
contemporary knowledge centric education through innovative research in multidisciplinary fields.
Mission
To impart quality technical skills through practicing, knowledge updating in recent technology
and produce professionals with multidisciplinary and leadership skills.
To promote innovative thinking for design and development of software products of varying
complexity with intelligence to fulfil the global standards and demands.
To inculcate professional ethics among the graduates and to adapt the changing technologies
through lifelong learning.
An Autonomous Institution
ISO 9001:2015 Certified Institution, Accredited by NBA (BME, CSE, ECE, EEE, IT & MECH), Accredited by NAAC.
#42, Avadi-Vel Tech Road, Avadi, Chennai- 600062, Tamil Nadu, India.
CERTIFICATE
No………………….. College Roll No: ................................. Certified that this is the bonafide record of work
done by the above student in the 191ITV63– EXPLORATORY DATA ANALYSIS LABORATORY
during the academic year 2024-2025.
Submitted for the University Practical Examination held on ………………... at VEL TECH MULTI
TECH Dr.RANGARAJAN Dr.SAKUNTHALA ENGINEERING COLLEGE, No.42, AVADI – VEL
TECH ROAD, AVADI, CHENNAI-600062.
Signature of Examiners
Date: ………………
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA
SCIENCE
POs Programme Outcomes
(POs)
PO2 Problem Analysis: Identify, formulate, review research literature and analyze complex
engineering problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.
PO3 Design / Development of solutions: Design solutions for complex engineering problems and
design system components or processes that meet specified needs with appropriate consideration
for public health and safety, cultural, societal, and environmental considerations.
PO4 Conduct Investigations of Complex Problems: Use research-based knowledge and research
methods including design of experiments, analysis and interpretation of data, and synthesis of
the information to provide valid conclusions.
PO5 Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities
withan understanding of the limitations.
PO6 The Engineer and Society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to
theprofessional engineering practice.
PO7 Environment and sustainability: Understand the impact of the professional engineering
solutionsin societal and environmental contexts, and demonstrate the knowledge of, and need
for sustainable development.
PO8 Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
normsof the engineering practice.
PO9 Individual and team work: Function effectively as an individual, and as a member or leader
indiverse teams, and in multidisciplinary settings.
PO11 Project Management and Finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.
PO12 Life-long learning: Recognize the need for, and have the preparation and ability to engage
inindependent and life-long learning in the broadest context of technological change.
PROGRAM EDUCATIONAL OBJECTIVES(PEOs)
Train the graduates with the potential of strong knowledge in the respective field and to create
PEO1
innovative multidisciplinary solutions for challenges in the society.
Groom the engineers to understand, analyse different nature of data and use Machine Learning
PEO2 techniques to develop software systems with varying complexity for data intensive
applications.
To practice professionalism among the graduates and reflect good leadership skills with ethical
PEO3 standards and continued professional development through lifelong learning.
To impart theoretical knowledge in the respective field along with recent industrial tools and
PSO1
techniques to solve societal problems
Apply the core competency obtained in the field of Machine Learning for analysis, design
PSO2
and development of computing systems for multi-disciplinary problems
Acquire knowledge in the field of intelligence, deep learning and develop software solutions
PSO3 for security and analytics of large volume of data.
COURSE OBJECTIVES
COURSE OUTCOMES
Course
PSO1
PSO2
PSO3
Outcome
PO10
PO11
PO12
PO 1
PO 2
PO 3
PO 4
PO 5
PO 6
PO 7
PO 8
PO 9
-
CO1 - - -
- - -
CO2 - - - -
- -
CO3 - - - -
CO4 - - - - - - -
CO5 - - - - - - - -
CO - - - - -
PAGE FACULTY
NO SIGN
DATE LIST OF EXPERIMENTS
S.NO CO
1 CO1
Install the data Analysis and Visualization tool: Python
3
Working with Numpy arrays, Pandas data frames, Basic CO3
plots using Matplotlib.
8
Use a case study on a data set and apply the various EDA an CO6
d visualization techniques and present an analysis report.
EX.NO: 1
Install the data Analysis and Visualization tool: Python
DATE:
AIM:
There are several ways to install Python on a Windows machine. Below are the options we’ll explore in
this tutorial:
•Install Python directly from the Microsoft Store: This quick and easy option will get you up and running with
Python in no time. It is especially useful for beginners who want to use Python on their machine for learning
purposes.
•Install Python directly from the Python website: This method gives you more control over the installation
process and allows you to customize your installation.
•Install Python using an Anaconda distribution: Anaconda is a popular Python distribution that comes with a
large number of pre-installed packages and tools, making it a good option for scientific computing and data
science.
No matter which method you choose, you'll be able to start using Python on your Windows machine in just a few
steps. Sometimes, you can have Python already pre-installed on your machine. Here’s how you can check if your
Windows machine has Python installed. Checking if Python is Already Installed on Your Windows Machine
To check if Python is installed on your Windows machine using the terminal, follow these steps:
1.Open a command line tool such as Windows Terminal (the default on Windows 11) or Command Prompt (the
default on Windows 10).
2.In the command line, type `python`. If Python is installed, you should see a message like “Python 3.x.x”
followed by the Python prompt, which looks like this “>>>”. Note that “3.x.x” represents the version number of
Python.
3.If Python is not installed on your machine, you will be automatically taken to the Microsoft Store installation of
Python. Note that the page you are taken to may not be the latest version of Python.
To check if Python is installed on your Windows machine using the Start Menu, follow these steps:
1.Press the Windows key or click on the Start button to open the Start Menu. Type "python".
2.If Python is installed, it should show up as the best match. Press "Enter" or click on the version of Python you
want to open. You should see a message like “Python 3.x.x” followed by the Python prompt, which looks like
this “>>>”. Note that “3.x.x” represents the version number of Python.
3.If Python is not installed on your machine, you will only see results for web searches for "python", or a
suggestion to search the Microsoft Store for "python".
How to Install Python on Windows Using Anaconda
There are different distributions of Python that come pre-installed with relevant packages and tools.
One of the most popular distributions is the Anaconda Python distribution, which comes
preinstalled with a host of data science packages and tools for scientific computing. To download
Python using an Anaconda distribution, follow these steps:
2.Scroll down to the “Anaconda Installers” section — there, you will find different versions of the
Anaconda Installer. Click on the Windows installation for the latest version of Python (at the time of
writing, it is
3.Download the installer file to your local machine. Once the download is finalized, start the
installation by clicking on the installer.
4.Once the download is complete, double-click the file to begin the installation process.
5.Complete the installation by clicking on Continue and ticking the license agreement until the
installer starts extracting files and the installation process is complete.
6.In the “Advanced Installations Options” screen, you have the option to “Add Anaconda3 to my
PATH environment variable”. This is only recommended if you only have the Anaconda Python
installation (rather than multiple versions) and you want to use the conda tool from the terminal
(rather than from an IDE).
7.The installer will extract the files and start the installation process. This may take a few minutes.
When the installation is complete, you will be prompted to optionally install DataSpell, a data
science IDE developed by JetBrains.
8.Once the installation is successfully completed, you will see a “Thanks for installing Anaconda”
screen. Press “Finish.”
9.Once the installation is complete, follow the instructions in the section "Checking if Python is
Already Installed on Your Windows Machine" to check that Python has been installed correctly.
Python is modular, with a large ecosystem of packages that provide functionality for specific data
science tasks. For example, the pandas package provides functionality for data manipulation,
scikitlearn provides machine learning functionality, and PyTorch provides deep learning
functionality.There are two package management tools for installing Python packages: pip3 and
conda. These tools allow you to install and upgrade Python packages.
Use pip3 if you installed Python from the Python website or the Microsoft Store. To install
packages with pip3, follow these steps:
1.Before you attempt to install Python packages, make sure Python is installed on your machine. To
install Python, follow the instructions in one of the previous sections (such as downloading and
installing from the website using the Microsoft Store).
2.To install a package using pip3, open a Terminal on macOS or Command Prompt on Windows
and type the following command: The {package_name} here refers to a package you want to install.
For example, to install the numpy package, you would type: pip3 install numpy
3.If the package has dependencies (i.e., it requires other packages for it to function), pip3 will
automatically install them as well.
4.Once the installation is complete, you can import the package into your Python code. For
example, if you installed the numpy package, you could import it and use it like this:
Use conda if you installed Python from Anaconda. conda comes with many Python packages for
data science installed, so you don't need to install common packages yourself.
To install packages with conda, follow these steps:
1.Before you attempt to install Python packages, make sure Python is installed on your machine. To
install Python, follow the instructions in one of the previous sections (such as downloading and
installing from Anaconda).
2.To install a package using conda, open a Terminal on macOS or Command Prompt on Windows
and type the following command:
3.If you want to update a package to the latest compatible version, you can use the conda update
command.
For example, to update the pytorch package to the latest version, you would type:
4.If you want to uninstall a package, you can use the conda remove command.
5.To list all the packages that are installed, use conda's list command.
conda list
Result:
AIM :
Perform exploratory data analysis (EDA) on with datasets like email data set. Export all your
emails as a dataset, import them inside a pandas data frame, visualize them and get different insights from
the data
Algorithm
Step 1. Start
Step 2. Export email dataset from email client and save it as a csv file
Step 3. Load the email dataset using pandas and numpy
Step 4. Perform EDA
Step 5. Stop
Program
print(df.shape)
df = pd.read_csv('../input/spam-email/spam.csv') msno.matrix(df).set_title('Distribution
of missing values',fontsize=20)
df.head(3)
category_ct = df['Category'].value_counts()
fig = px.pie(values=category_ct.values,
names=category_ct.index,
color_discrete_sequence=px.colors.sequential.OrRd,
title= 'Pie Graph: spam or not')
fig.update_traces(hoverinfo='label+percent', textinfo='label+value+percent', textfont_size=15,
marker=dict(line=dict(color='#000000', width=2))) fig.show()
Output:
Result:
The experiment has been implemented successfully.
EX.NO: 3 Working with Numpy arrays, Pandas data frames, Basic
DATE plots using Matplotlib.
AIM :
Working with Numpy arrays, Pandas data frames, Basic plots using Matplotlib.
Algorithm
Step 1. Start
Step 2: Install required libraries (if not already installed).
Step 3: Create and manipulate NumPy arrays.
Step 4: Create and manipulate Pandas DataFrames.
Step 5: Perform basic statistical operations on NumPy arrays and Pandas DataFrames.
Step 6:: Plot basic visualizations using Matplotlib.
Step 7: Display outputs and plots.
Step 8. Stop
Program
# Step 1: Import required libraries
import numpy as np
import pandas as pd
array_sum = np.sum(numpy_array)
array_mean = np.mean(numpy_array)
data = {
df = pd.DataFrame(data)
df_summary = df.describe()
plt.figure(figsize=(8, 6))
plt.title('Salary vs Age')
plt.xlabel('Age')
plt.ylabel('Salary')
plt.grid(True)
plt.legend()
plt.show()
plt.figure(figsize=(8, 6))
plt.xlabel('Name')
plt.ylabel('Salary')
plt.show()
OUTPUT
Result:
Thus the program has been executed successfully
EX.NO: 4 Explore various variable and row filters in R for cleaning data. Apply various
DATE: plot features in R on sample data sets and visualize
AIM:
Explore various variable and row filters in R for cleaning data. Apply various plot features in R on sample
data sets and visualize
Algorithm
Step 1. Start
Step 2. Clean data
Step 3. Filter data
Step 4. Select variables
Step 5. Drop variables
Step 6. Rename variables
Step 7. Filtering row variables
Step 8. Plotting features in R
Step 9. Visualization techniques implementation
Step 10. Stop
Program
AIM:
Algorithm
Step 1. Start
Step 2.
Step 3.
Step 4.
Step 5.
Step 6.
Step 7.
Step 8.
Step 9.
Step 10.
Program
OUTPUT
RESULT
The Time Series Analysis and the various visualization techniques are implemented successfully
EX.NO:6 Build cartographic visualization for multiple datasets involving various countries
DATE of the world; states and districts in India etc.
Aim: .
Algorithm
Step 1. Start
Step 2: Install required libraries: geopandas, matplotlib, and plotly.
Step 3: Load world shapefile data using GeoPandas.
Step 4: Load or create the dataset (e.g., population, GDP) for different countries or regions.
Step 5: Merge the shapefile data with your dataset using GeoPandas for geographical plotting.
Step 6: Plot the data on a world map using Matplotlib.
Step 7: Load shapefiles for Indian states and districts.
Step 8: Plot specific regions like India's states or districts on a map.
Step 9: (Optional) Use interactive Plotly for creating interactive maps.
Step 10: Display the cartographic visualizations.
Step 11. Stop
Program
# Step 1: Import necessary libraries
import geopandas as gpd
import matplotlib.pyplot as plt
# Step 5: Plot the world map and display population data using Matplotlib
fig, ax = plt.subplots(1, 1, figsize=(15, 10))
world.boundary.plot(ax=ax)
world.plot(column='population', cmap='OrRd', legend=True, ax=ax, missing_kwds={
"color": "lightgrey",
"label": "No Data"
})
plt.title("World Population Distribution")
plt.show()
# Step 9: Optional - Interactive Map using Plotly (if you want interactivity)
import plotly.express as px
Result:
Aim:
Perform EDA on Wine Quality Data Set
Algorithm
Step 1. Start
Step 2: Import necessary libraries.
Step 3: Load the Wine Quality dataset.
Step 4: Perform basic dataset exploration (e.g., checking missing values, data types, summary statistics).
Step 5: Visualize distributions of key features.
Step 6: Analyze correlations between features.
Step 7: Visualize relationships between key variables and the target variable (wine quality).
Step 8: Conclude with insights derived from the analysis
Step 9:stop
Program
# Step 1: Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
print("\nSummary statistics:")
print(wine_data.describe())
Aim:
Use a case study on a data set and apply the various EDA and visualization techniques
andpresent an analysis report.
Algorithm
Step 1. Start
Step 2. Loads the dataset and displays the first few rows.
Step 3. Provides summary statistics and correlation matrix for numerical variables.
Step 4. Creates a histogram to visualize the distribution of the `Age` variable.
Step 5. Uses a box plot to show the distribution of `Salary` by `Performance_Rating`.
Step 6. Generates a pairplot to visualize relationships between numerical variables.
Step 7. Displays a correlation heatmap to show the correlation between variables
Step 8. Assume we have a dataset (`employee_data.csv`) with the following columns: Employee_ID`
`Age`, `Salary`, `Performance_Rating`, and `Years_of_Experience`.
Step 9. Stop
Program
# Summary statistics
summary_stats = df.describe()
# Correlation matrix
correlation_matrix = df.corr()
# Distribution of Age
plt.figure(figsize=(8, 6))
plt.title('Distribution of Age')
plt.xlabel('Age')
plt.ylabel('Frequency') plt.show()
# Box plot of Salary by Performance Rating
plt.figure(figsize=(10, 6))
plt.ylabel('Salary') plt.show()
sns.pairplot(df, hue='Performance_Rating')
plt.show()
# Correlation heatmap
plt.figure(figsize=(10, 8))
plt.title('Correlation Heatmap')
plt.show()
OUTPUT:
RESULT
Thus the case study on a data set and apply the various EDA and
visualization techniques was execute and verified successfully.