0% found this document useful (0 votes)

16 views41 pages

PDF Experiments-1 DADV

The document outlines the steps for performing Exploratory Data Analysis (EDA) using Python, emphasizing the importance of data pre-processing and feature engineering. It details key processes such as data loading, cleaning, univariate and bivariate analysis, and visualization techniques using libraries like Pandas, NumPy, Matplotlib, and Seaborn. The conclusion highlights the significance of EDA in understanding datasets and informing further analysis.

Uploaded by

okkshrutii

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views41 pages

PDF Experiments-1 DADV

Uploaded by

okkshrutii

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

CSE-5th SEM

303105315 - Data Analytics and Data Visualization Laboratory

Dr. Vinod Patidar

Asst. Prof. (CSE Dept.)
9009526270
Experiment-1

1. Perform Exploratory Data Analysis on the given dataset

using Python.
Introduction to EDA
• The main objective of this Experiment is to cover
the steps involved in Data pre-processing, Feature
Engineering, and different stages of Exploratory
Data Analysis, which is an essential step in any
research analysis.
• Data pre-processing, Feature Engineering, and
EDA are fundamental early steps after data
collection.
What is Exploratory Data Analysis?
• Exploratory Data Analysis (EDA) is a method of analyzing
datasets to understand their main characteristics.
• It involves summarizing data features, detecting patterns, and
uncovering relationships through visual and statistical
techniques.
• EDA helps in gaining insights and formulating hypotheses for
further analysis.
What is Data Pre-processing and Feature Engineering?

• Data pre-processing involves cleaning and preparing raw data

to facilitate feature engineering. Meanwhile, feature
engineering entails employing various techniques to
manipulate the data. This may include adding or removing
relevant features, handling missing data, encoding variables,
and dealing with categorical variables, among other tasks.
• Feature Engineering is a critical task that significantly
influences the outcome of a model. It involves crafting new
features based on existing data while pre-processing primarily
focuses on cleaning and organizing the data.
Let’s look at how to perform EDA using python!

Step 1: Import Python Libraries

• Import all libraries which are required for our analysis, such as Data Loading, Statistical
analysis, Visualizations, Data Transformations, Merge and Joins, etc.
• Here is the link: (https://www.kaggle.com/datasets/sukhmanibedi/cars4u/data) to the
dataset.
Pandas and Numpy have been used for Data Manipulation and
numerical Calculations

Matplotlib and Seaborn have been used for Data visualizations.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#to ignore warnings
import warnings
warnings.filterwarnings('ignore')
Step 2: Reading Dataset

• The Pandas library offers a wide range of possibilities for loading data into the pandas
DataFrame from files like JSON, .csv, .xlsx, .sql, .pickle, .html, .txt, images etc.

• Most of the data are available in a tabular format of CSV files. It is trendy and easy to
access. Using the read_csv() function, data can be converted to a pandas DataFrame.

• In this example, the data to predict Used car price is being used as an example. In this
dataset, we are trying to analyze the used car’s price and how EDA focuses on
identifying the factors influencing the car price. We have stored the data in the
DataFrame data.
data.shape

OUTPUT: (27, 7)
Step 3: Data Reduction

• Some columns or variables can be dropped if they do not add value to our analysis.

• In our dataset, the column S.No have only ID values, assuming they don’t have any
predictive power to predict the dependent variable.
Step 4: Feature Engineering

• Feature engineering refers to the process of using domain knowledge to select and
transform the most relevant variables from raw data when creating a predictive
model using machine learning or statistical modeling.

• The main goal of Feature engineering is to create meaningful data from raw data.
Step 5: Creating Features

• We will play around with the variables Year and Name in our dataset. If we see
the sample data, the column “Year” shows the manufacturing year of the car.

• It would be difficult to find the car’s age if it is in year format as the Age of the
car is a contributing factor to Car Price.

• Introducing a new column, “Car_Age” to know the age of the car.

Step 6: EDA Exploratory Data Analysis
Exploratory Data Analysis refers to the crucial process of performing initial
investigations on data to discover patterns to check assumptions with the help
of summary statistics and graphical representations.

• EDA can be leveraged to check for outliers, patterns, and trends in the given
data.
• EDA helps to find meaningful patterns in data.
• EDA provides in-depth insights into the data sets to solve our business
problems.
• EDA gives a clue to impute missing values in the dataset.
Step 7: Statistics Summary
• The information gives a quick and simple description of the data.
• It can include Count, Mean, Standard Deviation, median, mode,
minimum value, maximum value, range, standard deviation, etc.
• Statistics summary gives a high-level idea to identify whether the
data has any outliers, data entry error, distribution of data such
as the data is normally distributed or left/right skewed
Step 8: Statistics Summary…
In python, this can be achieved using describe()

describe() function gives all statistics summary of data

describe()– Provide a statistics summary of data belonging to numerical datatype

such as int, float
•
describe(include=’all’)
provides a statistics summary of all data, include object,
category etc
Before we do EDA, lets separate Numerical and
categorical variables for easy analysis
Step 9: EDA Univariate Analysis
• Analyzing/visualizing the dataset by taking one variable at a time:
• Data visualization is essential; we must decide what charts to plot to
better understand the data. In this article, we visualize our data using
Matplotlib and Seaborn libraries.
• Matplotlib is a Python 2D plotting library used to draw basic charts
we use Matplotlib.
• Seaborn is also a python library built on top of Matplotlib that uses
short lines of code to create and style statistical plots from Pandas
and Numpy.
• Univariate analysis can be done for both Categorical and Numerical
variables.
• Categorical variables can be visualized using a Count plot,
Bar Chart, Pie Plot, etc.
• Numerical Variables can be visualized using Histogram, Box
Plot, Density Plot, etc.
• In our example, we have done a Univariate analysis using
Histogram and Box Plot for continuous Variables.
• In the below fig, a histogram and box plot is used to show
the pattern of the variables, as some variables have
skewness and outliers.
Exploratory Data Analysis in Python
Exploratory data analysis (EDA) is a critical initial step in the data
science workflow. It involves using Python libraries to inspect,
summarize, and visualize data to uncover trends, patterns, and
relationships. Here’s a breakdown of the key steps in performing
EDA with Python:
1. Importing Libraries:
• pandas (pd): For data manipulation and analysis.
• NumPy (np): For numerical computations.
•Matplotlib.pyplot (plt): For basic plotting functionalities.
•Seaborn (sns): A built-on top of Matplotlib, providing high-level visualization.
2. Loading the Data:

•Use pd.read_csv() for CSV files, similar functions exist for other data
formats (e.g., .xlsx, .json).
3. Initial Inspection:
•Get an overview of the data using df.head(), .tail(), and .info().

•Check data types with df.dtypes.

4. Data Cleaning:

•dentify and handle missing values using methods

like df.isnull().sum().

•Find and address duplicates with df.duplicated().sum().

5. Univariate Analysis:

•Analyze single variables at a time.

•Use descriptive statistics with df.describe() for numerical data.

•Create histograms, box plots, and density plots to visualize distributions.

6. Bivariate Analysis:
•Explore relationships between two variables.

•Create scatter plots to identify trends and potential

correlations.
7. Visualization:

•Effective visualizations are crucial for understanding

data.

•Use various plots like bar charts, pie charts, and

heatmaps to represent categorical data.
Conclusion
In conclusion,
• Exploratory Data Analysis (EDA) is crucial for understanding datasets,
identifying patterns, and informing subsequent analysis.
• Data pre-processing and feature engineering are essential steps in
preparing data for analysis, involving tasks such as data reduction,
cleaning, and transformation.
• Python libraries offer powerful tools for executing these steps
efficiently.
Thanks…

Azure Storage Account
No ratings yet
Azure Storage Account
17 pages
Technical Ptoposal-Zncb Head Office-Questionnaire Information - Og
No ratings yet
Technical Ptoposal-Zncb Head Office-Questionnaire Information - Og
303 pages
CCNA 200 301 June 2023 v1 2
No ratings yet
CCNA 200 301 June 2023 v1 2
320 pages
Wilkinson-Reinsch1971 Book HandbookForAutomaticComputatio
No ratings yet
Wilkinson-Reinsch1971 Book HandbookForAutomaticComputatio
450 pages
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
SR5200 Service Manual
No ratings yet
SR5200 Service Manual
52 pages
AIDS C04-Session-22
No ratings yet
AIDS C04-Session-22
22 pages
Release Notes For Asyncos 12.5.2 For Cisco Email Security Appliances
No ratings yet
Release Notes For Asyncos 12.5.2 For Cisco Email Security Appliances
21 pages
Lab07ML - f40
No ratings yet
Lab07ML - f40
13 pages
2-ch3 Autoinstall
No ratings yet
2-ch3 Autoinstall
15 pages
AWS Re Invent
No ratings yet
AWS Re Invent
34 pages
Whitepaper Zend PHP Extensions
No ratings yet
Whitepaper Zend PHP Extensions
52 pages
Exploratory Data Analysis of Heart Disease Dataset 1737826105
No ratings yet
Exploratory Data Analysis of Heart Disease Dataset 1737826105
50 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
4 pages
Mastering Exploratory Data Analysis With Python - A Comprehensive Guide To Unveiling Hidden Insights
No ratings yet
Mastering Exploratory Data Analysis With Python - A Comprehensive Guide To Unveiling Hidden Insights
73 pages
2023 State of AI Infrastructure Survey
No ratings yet
2023 State of AI Infrastructure Survey
19 pages
Eda Expt
No ratings yet
Eda Expt
6 pages
Intro To Exploratory Data Analysis Eda in Python
No ratings yet
Intro To Exploratory Data Analysis Eda in Python
7 pages
Data Engineering 101 SQL Basics Part 1 173288970
No ratings yet
Data Engineering 101 SQL Basics Part 1 173288970
25 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
K - Means Clustering
No ratings yet
K - Means Clustering
34 pages
Questions & Answers CCS336 CSM
No ratings yet
Questions & Answers CCS336 CSM
46 pages
1.3.1. Exploratory Data Analysis
No ratings yet
1.3.1. Exploratory Data Analysis
24 pages
Datascience 3
No ratings yet
Datascience 3
40 pages
Polynomials Mat 110 2022 Presentation 1
No ratings yet
Polynomials Mat 110 2022 Presentation 1
21 pages
Large Language Model Agent
No ratings yet
Large Language Model Agent
9 pages
Exfo Spec-Sheet Optical-Wave-Expert v5 en
No ratings yet
Exfo Spec-Sheet Optical-Wave-Expert v5 en
9 pages
Lecture 5: Simulation Technology and Manufacturing System Simulation
No ratings yet
Lecture 5: Simulation Technology and Manufacturing System Simulation
50 pages
EDA Module 2
No ratings yet
EDA Module 2
34 pages
Data Analytics Fundamentals-2
No ratings yet
Data Analytics Fundamentals-2
34 pages
Step-by-Step Exploratory Data Analysis (EDA) Using Python
100% (1)
Step-by-Step Exploratory Data Analysis (EDA) Using Python
20 pages
Edap Lab
No ratings yet
Edap Lab
47 pages
Devil S Dragon: White Paper
No ratings yet
Devil S Dragon: White Paper
20 pages
5TH Sem Weekly Examination Time Table - A.y.2025-26
No ratings yet
5TH Sem Weekly Examination Time Table - A.y.2025-26
4 pages
Unit 6
No ratings yet
Unit 6
3 pages
AUTOMATED EDA Libraries
No ratings yet
AUTOMATED EDA Libraries
12 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Guidelines: G Suite
No ratings yet
Guidelines: G Suite
31 pages
Flashman Royal Flash Flashmans Lady George Macdonald Fraser Download
No ratings yet
Flashman Royal Flash Flashmans Lady George Macdonald Fraser Download
14 pages
Excavating AI
No ratings yet
Excavating AI
3 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
5 pages
Exp 5
No ratings yet
Exp 5
15 pages
PMP Calculations Questions
No ratings yet
PMP Calculations Questions
16 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
Exploratory Data Analysis (EDA)
No ratings yet
Exploratory Data Analysis (EDA)
12 pages
Peter Respondek
No ratings yet
Peter Respondek
3 pages
Group 7
No ratings yet
Group 7
19 pages
EDA Feature Eng - Estimation Inference and Hypothesis
No ratings yet
EDA Feature Eng - Estimation Inference and Hypothesis
53 pages
SAP BW Useful Tables
No ratings yet
SAP BW Useful Tables
12 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Unit 1
No ratings yet
Unit 1
52 pages
DSP Unit - Ii
No ratings yet
DSP Unit - Ii
14 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
11 pages
QP of AI Grade IX Set B
No ratings yet
QP of AI Grade IX Set B
2 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Document
No ratings yet
Document
21 pages
Block Arrangement BTech Supplementary Exam Semester 1 Summer 2024 25 PIT 1 Afternoon-1
No ratings yet
Block Arrangement BTech Supplementary Exam Semester 1 Summer 2024 25 PIT 1 Afternoon-1
9 pages
Intro
No ratings yet
Intro
26 pages
Dataprep - Eda: Task-Centric Exploratory Data Analysis For Statistical Modeling in Python
No ratings yet
Dataprep - Eda: Task-Centric Exploratory Data Analysis For Statistical Modeling in Python
10 pages
Machine
No ratings yet
Machine
10 pages
AI-MAJOR-AUGUST - Aryal Ashish
No ratings yet
AI-MAJOR-AUGUST - Aryal Ashish
16 pages
DL EDA Process
No ratings yet
DL EDA Process
2 pages
Unit 1
No ratings yet
Unit 1
23 pages
‏لقطة شاشة ٢٠٢٤-٠٥-٠٧ في ٧.٢٧.١٤ م
No ratings yet
‏لقطة شاشة ٢٠٢٤-٠٥-٠٧ في ٧.٢٧.١٤ م
12 pages
Lesson 5 Exploratory Data Analysis
No ratings yet
Lesson 5 Exploratory Data Analysis
10 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Exploratory Data Analysis (EDA) Using Python
No ratings yet
Exploratory Data Analysis (EDA) Using Python
21 pages
Unit 1
No ratings yet
Unit 1
50 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
No ratings yet
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
4 pages
Exp 12
No ratings yet
Exp 12
7 pages
Perform Exploratory Data Analysis
No ratings yet
Perform Exploratory Data Analysis
5 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Practical 02
No ratings yet
Practical 02
3 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
Unit 2
No ratings yet
Unit 2
58 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
CS3451 Os Syllabus
No ratings yet
CS3451 Os Syllabus
1 page
Wa0000.
No ratings yet
Wa0000.
15 pages
FDS Unit 2
No ratings yet
FDS Unit 2
15 pages
Data Sciecnce
No ratings yet
Data Sciecnce
16 pages
Exp 12
No ratings yet
Exp 12
4 pages
Exploratory Data Analysis With Python
No ratings yet
Exploratory Data Analysis With Python
2 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
Unit 1
No ratings yet
Unit 1
19 pages
CS615 FINAL TERM SOLVED MCQs BY FAISAL
No ratings yet
CS615 FINAL TERM SOLVED MCQs BY FAISAL
65 pages
Data Mining Vs Data Exploration UNIT-II
No ratings yet
Data Mining Vs Data Exploration UNIT-II
11 pages
Unit 3
No ratings yet
Unit 3
47 pages
What Is Exploratory Data Analysis?: Intuition
No ratings yet
What Is Exploratory Data Analysis?: Intuition
8 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Automatic Control Systems in Mechanical Engineering
No ratings yet
Automatic Control Systems in Mechanical Engineering
9 pages
Genesys Aerosystems VFR HeliSAS Helicopter Autopilot +++++
No ratings yet
Genesys Aerosystems VFR HeliSAS Helicopter Autopilot +++++
7 pages
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
No ratings yet
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
8 pages
Aryan_pal_Aim_1_SE[1]
No ratings yet
Aryan_pal_Aim_1_SE[1]
7 pages
SE Practical 2
No ratings yet
SE Practical 2
8 pages
SE Practical 1
No ratings yet
SE Practical 1
4 pages
Eda Sandhya
No ratings yet
Eda Sandhya
7 pages
Grade 8 Computer Studies Notes
100% (1)
Grade 8 Computer Studies Notes
73 pages