0% found this document useful (0 votes)

16 views5 pages

Perform Exploratory Data Analysis

what is perform exploratory data analysis?

Uploaded by

Abu Sufian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views5 pages

Perform Exploratory Data Analysis

what is perform exploratory data analysis?

Uploaded by

Abu Sufian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

How to perform exploratory data analysis

Imagine you have a dataset of students' test scores and demographics.

Here's a simplified step-by-step approach:

1. Load Data:
o Read the data into a table.
o Example: students = pd.read_csv('students.csv')
2. Initial Exploration:
o Look at the first few rows: students.head()
o Check data structure: students.info()
3. Summary Statistics:
o Calculate mean and median of test scores.
o Count unique values in gender column.
4. Handle Missing Data:
o Identify missing entries: students.isnull().sum()
o Fill missing scores with the mean or remove those rows.
5. Visualize Data:
o Histogram of test scores.
o Box plot of test scores by gender.
o Scatter plot of test scores versus study hours.
6. Find Patterns:
o Calculate correlation between study hours and test scores.
o Cross-tabulate test scores and extracurricular participation.
7. Identify Outliers:
o Use IQR to find unusually high or low test scores.
o Use Z-score to find test scores that are far from the average.
8. Feature Engineering:
o Create a new feature combining study hours and class
participation.
Performing Exploratory Data Analysis (EDA) involves several steps, from understanding the structure of
the data to summarizing its main characteristics. Below is a detailed guide on how to perform EDA using
Python with libraries like Pandas, Matplotlib, and Seaborn.

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from scipy import stats

# Load data

df = pd.read_csv('your_dataset.csv')

# Understand data structure

print(df.head())

print(df.shape)

print(df.info())

print(df.describe())

# Data cleaning

df.dropna(inplace=True)

df.drop_duplicates(inplace=True)
# Univariate analysis

df['column_name'].hist(bins=30)

plt.show()

sns.boxplot(x=df['column_name'])

plt.show()

# Bivariate analysis

plt.scatter(df['column_x'], df['column_y'])

plt.xlabel('column_x')

plt.ylabel('column_y')

plt.show()

correlation_matrix = df.corr()

sns.heatmap(correlation_matrix, annot=True)

plt.show()

# Categorical data analysis

df['categorical_column'].value_counts().plot(kind='bar')
plt.show()

# Identifying outliers using IQR

Q1 = df['column_name'].quantile(0.25)

Q3 = df['column_name'].quantile(0.75)

IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR

upper_bound = Q3 + 1.5 * IQR

outliers = df[(df['column_name'] < lower_bound) | (df['column_name']

> upper_bound)]

print(outliers)

# Identifying outliers using Z-Score

df['z_score'] = stats.zscore(df['column_name'])

outliers = df[np.abs(df['z_score']) > 3]

print(outliers)

# Feature engineering

df['new_feature'] = df['feature1'] + df['feature2']

# Visualizing relationships

sns.pairplot(df)

plt.show()

# Hypothesis testing

group1 = df[df['group_column'] == 'group1']['numeric_column']

group2 = df[df['group_column'] == 'group2']['numeric_column']

t_stat, p_value = ttest_ind(group1, group2)

print(f'T-statistic: {t_stat}, P-value: {p_value}')

This workflow provides a structured approach to performing EDA, helping you understand the dataset's
characteristics and relationships before moving on to more complex analysis or modeling.

Jis G 3106 PDF
50% (2)
Jis G 3106 PDF
38 pages
Report of Geophysical Survey For Will
No ratings yet
Report of Geophysical Survey For Will
12 pages
Eda
No ratings yet
Eda
4 pages
Document (4)
No ratings yet
Document (4)
21 pages
UNIT 1
No ratings yet
UNIT 1
23 pages
EXP-12
No ratings yet
EXP-12
4 pages
Lesson 5 Exploratory Data Analysis
No ratings yet
Lesson 5 Exploratory Data Analysis
10 pages
What Is Exploratory Data Analysis?: Intuition
No ratings yet
What Is Exploratory Data Analysis?: Intuition
8 pages
Exploratory Data Analysis With Python
No ratings yet
Exploratory Data Analysis With Python
2 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Exp-12
No ratings yet
Exp-12
7 pages
Comprehensive EDA Python Guide
No ratings yet
Comprehensive EDA Python Guide
13 pages
Dev 1
No ratings yet
Dev 1
2 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
05_AIHC_Exp02
No ratings yet
05_AIHC_Exp02
11 pages
Eda Sandhya
No ratings yet
Eda Sandhya
7 pages
DL_EDA_process
No ratings yet
DL_EDA_process
2 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
DOC-20250125-WA0000.
No ratings yet
DOC-20250125-WA0000.
15 pages
ML EXP1_2201107
No ratings yet
ML EXP1_2201107
34 pages
Unit 3
No ratings yet
Unit 3
47 pages
IMPDAV
No ratings yet
IMPDAV
105 pages
EDA DeepDive Guide
No ratings yet
EDA DeepDive Guide
3 pages
What Is Exploratory Data Analysis - by Prasad Patil - Towards Data Science
No ratings yet
What Is Exploratory Data Analysis - by Prasad Patil - Towards Data Science
17 pages
Exploratory Data Analysis (EDA)
No ratings yet
Exploratory Data Analysis (EDA)
1 page
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
3 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
Exploratory Data Analysis EDA Part of Data PreProcessing
No ratings yet
Exploratory Data Analysis EDA Part of Data PreProcessing
11 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Week-6 DS Practical
No ratings yet
Week-6 DS Practical
12 pages
Eda Expt
No ratings yet
Eda Expt
6 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
Exploratory Data
No ratings yet
Exploratory Data
47 pages
EDA Feature eng- Estimation Inference and Hypothesis
No ratings yet
EDA Feature eng- Estimation Inference and Hypothesis
53 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Activity-EDA
No ratings yet
Activity-EDA
4 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
EDA On Titanic Dataset
100% (1)
EDA On Titanic Dataset
39 pages
Unit 3 Ids Notes
No ratings yet
Unit 3 Ids Notes
31 pages
6.1EDA Inferential.docx
No ratings yet
6.1EDA Inferential.docx
3 pages
Lab07ML - f40
No ratings yet
Lab07ML - f40
13 pages
BI-LEc 3
No ratings yet
BI-LEc 3
24 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
2 pages
Eda
No ratings yet
Eda
6 pages
Mastering Exploratory Data Analysis With Python - A Comprehensive Guide To Unveiling Hidden Insights
No ratings yet
Mastering Exploratory Data Analysis With Python - A Comprehensive Guide To Unveiling Hidden Insights
73 pages
AUTOMATED EDA Libraries
No ratings yet
AUTOMATED EDA Libraries
12 pages
DSML Notes
No ratings yet
DSML Notes
32 pages
AIDS C04-Session-22
No ratings yet
AIDS C04-Session-22
22 pages
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
9 pages
Best Journal
No ratings yet
Best Journal
11 pages
Unit 3
No ratings yet
Unit 3
222 pages
_Exploratory_Data_Analysis_of_Heart_Disease_Dataset__1737826105
No ratings yet
_Exploratory_Data_Analysis_of_Heart_Disease_Dataset__1737826105
50 pages
eda1
No ratings yet
eda1
25 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Intro
No ratings yet
Intro
26 pages
Data Exploration and Visualization
100% (1)
Data Exploration and Visualization
281 pages
Group-7
No ratings yet
Group-7
19 pages
FDS Unit 2
No ratings yet
FDS Unit 2
15 pages
1.3.1. Exploratory Data Analysis
No ratings yet
1.3.1. Exploratory Data Analysis
24 pages
Data Structures in C / C ++: Exercises and Solved Problems
From Everand
Data Structures in C / C ++: Exercises and Solved Problems
Fulbia Torres
No ratings yet
Exploring the World of Data Science and Machine Learning
From Everand
Exploring the World of Data Science and Machine Learning
NIBEDITA Sahu
No ratings yet
use of ICT
No ratings yet
use of ICT
3 pages
Networks
No ratings yet
Networks
4 pages
ICT Future
No ratings yet
ICT Future
4 pages
ICT&IT
No ratings yet
ICT&IT
4 pages
What Is Client Server
No ratings yet
What Is Client Server
5 pages
Introduction To Entrepreneurship
No ratings yet
Introduction To Entrepreneurship
4 pages
Hbte Pa 16 16 6 127 0.5 B S (En)
No ratings yet
Hbte Pa 16 16 6 127 0.5 B S (En)
8 pages
RB T 100 en 905 730 2
No ratings yet
RB T 100 en 905 730 2
4 pages
Make A Maze in Blender For Unity 3d
No ratings yet
Make A Maze in Blender For Unity 3d
2 pages
Vishwakarma Institute of Technology, Pune: Academic Time Table 2010-2011 First Term
No ratings yet
Vishwakarma Institute of Technology, Pune: Academic Time Table 2010-2011 First Term
2 pages
Documentum Content Transformation Services 16.7 Release Notes
No ratings yet
Documentum Content Transformation Services 16.7 Release Notes
22 pages
ECE 545 Lecture 2 - Part 2: ECE 545 - Introduction To VHDL
No ratings yet
ECE 545 Lecture 2 - Part 2: ECE 545 - Introduction To VHDL
26 pages
Dme PDF
No ratings yet
Dme PDF
8 pages
Exercise 6 (Hydrocarbons)
0% (2)
Exercise 6 (Hydrocarbons)
6 pages
Spherical Washers: Extract
No ratings yet
Spherical Washers: Extract
1 page
Sbul128 - 1 MK1 Software Release Notes
No ratings yet
Sbul128 - 1 MK1 Software Release Notes
3 pages
Activity3 3 1utilites
No ratings yet
Activity3 3 1utilites
2 pages
Point D.P.P Subjective
No ratings yet
Point D.P.P Subjective
5 pages
Worksheet2 2016
No ratings yet
Worksheet2 2016
5 pages
Thermal Energy Notes
No ratings yet
Thermal Energy Notes
62 pages
MHT Cet Physics Chemistry Question Paper Solution PDF
No ratings yet
MHT Cet Physics Chemistry Question Paper Solution PDF
23 pages
Soap and Detergent
100% (3)
Soap and Detergent
16 pages
Double Reduction: Worm Gear Units
No ratings yet
Double Reduction: Worm Gear Units
18 pages
LPG Fuel System Engine Mazda M4-2.0G
No ratings yet
LPG Fuel System Engine Mazda M4-2.0G
23 pages
1. C3 & M3 Maths (SRP) Material (25-26)
No ratings yet
1. C3 & M3 Maths (SRP) Material (25-26)
10 pages
Dkg-119 Manual and Remote Start Unit
No ratings yet
Dkg-119 Manual and Remote Start Unit
45 pages
csc264 Answers
No ratings yet
csc264 Answers
12 pages
14 Electrostatics
No ratings yet
14 Electrostatics
21 pages
Classification of Computer
No ratings yet
Classification of Computer
29 pages
EC18 Errata - 2024 0221
No ratings yet
EC18 Errata - 2024 0221
18 pages
Robotics and Automation - Question Bank EC6003
No ratings yet
Robotics and Automation - Question Bank EC6003
18 pages
Wisdot Bridge Manual: Chapter 38 - Railroad Structures
No ratings yet
Wisdot Bridge Manual: Chapter 38 - Railroad Structures
30 pages
Conceptual Physics Nov 3-7
No ratings yet
Conceptual Physics Nov 3-7
1 page
Datasheet
No ratings yet
Datasheet
3 pages