0% found this document useful (0 votes)

41 views4 pages

Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science

Exploratory Data Analysis (EDA) is essential in data science for summarizing datasets, identifying patterns, and detecting anomalies. The process involves steps such as loading data, handling missing values, visualizing data, and feature engineering to improve data quality. EDA ultimately enhances model accuracy by ensuring a thorough understanding of the data before applying predictive models.

Uploaded by

Vikram Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views4 pages

Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science

Uploaded by

Vikram Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Exploratory Data Analysis (EDA) in Data Science

1. Introduction to EDA
Exploratory Data Analysis (EDA) is a fundamental step in data science and machine
learning that involves analyzing datasets to summarize their key characteristics, identify
patterns, and detect anomalies before applying predictive models.

Objectives of EDA:

 Understand data structure and patterns.

 Identify missing values, outliers, and inconsistencies.
 Discover relationships between variables.
 Validate assumptions before building models.
 Improve data quality through feature engineering.

2. Steps in Exploratory Data Analysis

Step Description
Load Data Import dataset using Pandas
Understand Structure View column types, missing values, and basic stats
Handle Missing Values Remove or fill NaNs (mean, median, mode)
Remove Duplicates Identify and drop duplicate rows
Visualize Data Histograms, boxplots, scatter plots, heatmaps
Outlier Detection Use IQR or boxplots
Handle Categorical Data Convert to numeric format (one-hot, label encoding)
Feature Engineering Create new features and scale data
Save Cleaned Data Store processed dataset for modeling

Step 1: Load the Dataset

 Import necessary libraries and read the dataset.

import pandas as pd

df = pd.read_csv("data.csv") # Replace with actual file path

print(df.head()) # Display first five rows

Step 2: Understand Data Structure

 View column types, null values, and basic information.

print(df.info()) # Column names, data types, non-null values
print(df.describe()) # Summary statistics (mean, median, etc.)

3. Handling Missing Data

Missing data can impact model accuracy. Common techniques to handle missing values:

 Remove missing values: df.dropna()

 Fill missing values with mean/median/mode:

df.fillna(df.mean(), inplace=True) # Fill numerical NaNs with mean

df.fillna(df.mode().iloc[0], inplace=True) # Fill categorical NaNs with
mode

4. Handling Duplicate Data

 Detect and remove duplicate rows to avoid redundancy.

print("Duplicates:", df.duplicated().sum()) # Count duplicate rows

df.drop_duplicates(inplace=True) # Remove duplicates

5. Data Visualization for EDA

A. Univariate Analysis (Single Variable)

1. Histogram (Data Distribution)

o Helps understand the spread of numerical features.
2. import matplotlib.pyplot as plt
3. df["column_name"].hist(bins=30)
4. plt.show()
5. Boxplot (Outlier Detection)
o Shows quartiles and outliers.
6. import seaborn as sns
7. sns.boxplot(df["column_name"])
8. plt.show()

B. Bivariate Analysis (Two Variables)

1. Scatter Plot (Correlation between two features)

o Used for continuous variables.
2. sns.scatterplot(x="feature1", y="feature2", data=df)
3. plt.show()
4. Correlation Heatmap
o Shows relationships between numerical variables.
5. plt.figure(figsize=(10,6))
6. sns.heatmap(df.corr(), annot=True, cmap="coolwarm")
7. plt.show()
8. Pairplot
o Visualizes pairwise relationships.
9. sns.pairplot(df)
10. plt.show()

6. Outlier Detection and Handling

A. Using IQR (Interquartile Range) Method

 Remove data points beyond 1.5 times the IQR.

Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = Q3 - Q1
df_cleaned = df[~((df < (Q1 - 1.5 * IQR)) | (df > (Q3 + 1.5 *
IQR))).any(axis=1)]

7. Handling Categorical Data

A. Encoding Categorical Variables

1. One-Hot Encoding (Best for nominal categories)

df = pd.get_dummies(df, columns=["categorical_column"], drop_first=True)

2. Label Encoding (For ordinal categories like Low, Medium, High)

from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()
df["encoded_column"] = encoder.fit_transform(df["categorical_column"])

8. Feature Engineering
 Creating new meaningful features to improve models.

A. Creating a New Feature

df["new_feature"] = df["feature1"] * df["feature2"]

B. Feature Scaling

1. Min-Max Scaling (Rescale to range 0-1)

2. from sklearn.preprocessing import MinMaxScaler
3. scaler = MinMaxScaler()
4. df_scaled = scaler.fit_transform(df)
5. Standardization (Mean = 0, Std Dev = 1)
6. from sklearn.preprocessing import StandardScaler
7. scaler = StandardScaler()
8. df_scaled = scaler.fit_transform(df)
9. Saving the Cleaned Dataset
df.to_csv("cleaned_data.csv", index=False)

EDA is a crucial step in data science that ensures data quality and model accuracy. By
exploring and visualizing the dataset, we can make informed decisions before applying
machine learning models.

Mastering Exploratory Data Analysis With Python - A Comprehensive Guide To Unveiling Hidden Insights
No ratings yet
Mastering Exploratory Data Analysis With Python - A Comprehensive Guide To Unveiling Hidden Insights
73 pages
Exploratory Data Analysis (EDA) in Python
No ratings yet
Exploratory Data Analysis (EDA) in Python
6 pages
Data Exploration and Visualization
100% (1)
Data Exploration and Visualization
281 pages
Step-by-Step Exploratory Data Analysis (EDA) Using Python
100% (1)
Step-by-Step Exploratory Data Analysis (EDA) Using Python
20 pages
Lecture 22
No ratings yet
Lecture 22
20 pages
Exploratory Data Analysis of Heart Disease Dataset 1737826105
No ratings yet
Exploratory Data Analysis of Heart Disease Dataset 1737826105
50 pages
AIDS C04-Session-22
No ratings yet
AIDS C04-Session-22
22 pages
Data Analytics Fundamentals-2
No ratings yet
Data Analytics Fundamentals-2
34 pages
PDF Experiments-1 DADV
No ratings yet
PDF Experiments-1 DADV
41 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
Eda 1
No ratings yet
Eda 1
25 pages
Devish All Unit
No ratings yet
Devish All Unit
42 pages
1.3.1. Exploratory Data Analysis
No ratings yet
1.3.1. Exploratory Data Analysis
24 pages
03a EDA
No ratings yet
03a EDA
47 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
4 pages
Exploratory Data Analysis (EDA) Using Python
No ratings yet
Exploratory Data Analysis (EDA) Using Python
21 pages
Bộ đề thi HSG 8
No ratings yet
Bộ đề thi HSG 8
69 pages
Script Output
No ratings yet
Script Output
53 pages
Exploratory Data
No ratings yet
Exploratory Data
47 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
Group 7
No ratings yet
Group 7
19 pages
Unit 6
No ratings yet
Unit 6
3 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Unit 1
No ratings yet
Unit 1
52 pages
Exploratory Data Analysis EDA Part of Data PreProcessing
No ratings yet
Exploratory Data Analysis EDA Part of Data PreProcessing
11 pages
Eda Indepth
No ratings yet
Eda Indepth
19 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
Intro
No ratings yet
Intro
26 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
11 pages
ML Exp1 - 2201107
No ratings yet
ML Exp1 - 2201107
34 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Beyond C: Team Emertxe
100% (1)
Beyond C: Team Emertxe
135 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Document
No ratings yet
Document
21 pages
Comprehensive EDA Python Guide
No ratings yet
Comprehensive EDA Python Guide
13 pages
Machine
No ratings yet
Machine
10 pages
Unit 1
No ratings yet
Unit 1
23 pages
Lesson 5 Exploratory Data Analysis
No ratings yet
Lesson 5 Exploratory Data Analysis
10 pages
Mod 2
No ratings yet
Mod 2
121 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Exp 12
No ratings yet
Exp 12
7 pages
CZA PPT Tirana Final 8 CD 5039
No ratings yet
CZA PPT Tirana Final 8 CD 5039
94 pages
Wa0000.
No ratings yet
Wa0000.
15 pages
Unit 3
No ratings yet
Unit 3
47 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
15 pages
Practical 02
No ratings yet
Practical 02
3 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
FDS Unit 2
No ratings yet
FDS Unit 2
15 pages
EDA DeepDive Guide
No ratings yet
EDA DeepDive Guide
3 pages
Preliminar Não Fabricar: Plan View From Above Showing Foundation Hole Drilling
No ratings yet
Preliminar Não Fabricar: Plan View From Above Showing Foundation Hole Drilling
1 page
Perform Exploratory Data Analysis
No ratings yet
Perform Exploratory Data Analysis
5 pages
Exp 12
No ratings yet
Exp 12
4 pages
Exploratory Data Analysis: Prasad Deshmukh
No ratings yet
Exploratory Data Analysis: Prasad Deshmukh
15 pages
DL EDA Process
No ratings yet
DL EDA Process
2 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
Lesson Plan in Science 6
100% (1)
Lesson Plan in Science 6
6 pages
Exploratory Data Analysis (EDA)
No ratings yet
Exploratory Data Analysis (EDA)
1 page
Unit 2
No ratings yet
Unit 2
36 pages
General Physics 1: Phys100
No ratings yet
General Physics 1: Phys100
20 pages
Dev 1
No ratings yet
Dev 1
2 pages
150# Cs Ball Valve Datasheet: General
100% (1)
150# Cs Ball Valve Datasheet: General
3 pages
Power Press
100% (1)
Power Press
7 pages
Exploratory Data Analysis With Python
No ratings yet
Exploratory Data Analysis With Python
2 pages
What Is Exploratory Data Analysis?: Intuition
No ratings yet
What Is Exploratory Data Analysis?: Intuition
8 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Eda Sandhya
No ratings yet
Eda Sandhya
7 pages
Cover Sheet: For Audited Financial Statements
80% (10)
Cover Sheet: For Audited Financial Statements
2 pages
Cantilever Slab
No ratings yet
Cantilever Slab
3 pages
IMU (V) 2012 13 Detail Brochure
No ratings yet
IMU (V) 2012 13 Detail Brochure
6 pages
Yellowstripe Scad
No ratings yet
Yellowstripe Scad
7 pages
DPKG Command Cheat Sheet For Debian Linux
No ratings yet
DPKG Command Cheat Sheet For Debian Linux
2 pages
Lecture-3.1.5
No ratings yet
Lecture-3.1.5
14 pages
9780374533557RGGReading Group Gold
No ratings yet
9780374533557RGGReading Group Gold
5 pages
Human Values: DR - Sunil Ms Ob LPU
No ratings yet
Human Values: DR - Sunil Ms Ob LPU
11 pages
TP5088 PDF
No ratings yet
TP5088 PDF
6 pages
2av56 Sensor
No ratings yet
2av56 Sensor
1 page
Horizontal Circular Prac
No ratings yet
Horizontal Circular Prac
3 pages
PNB STMT (Kavit)
No ratings yet
PNB STMT (Kavit)
6 pages
Ex Inspections - A Journey For Maintenance Engineers: Shailesh Chauhan Shell Project &technology Stavanger Norway
No ratings yet
Ex Inspections - A Journey For Maintenance Engineers: Shailesh Chauhan Shell Project &technology Stavanger Norway
4 pages
Imagery Use in Sport: Mediational Effects For Efficacy: Sandra E. Short, Amy Tenute, & Deborah L. Feltz
No ratings yet
Imagery Use in Sport: Mediational Effects For Efficacy: Sandra E. Short, Amy Tenute, & Deborah L. Feltz
11 pages
1210 6261v1 PDF
No ratings yet
1210 6261v1 PDF
8 pages
Emulgel Preparation
No ratings yet
Emulgel Preparation
6 pages
Marker Enzymes
No ratings yet
Marker Enzymes
4 pages
Case Study
No ratings yet
Case Study
2 pages
Case Study BARGAIN CITY
No ratings yet
Case Study BARGAIN CITY
1 page
Reported Speech: Mr.A-Bouhandi
No ratings yet
Reported Speech: Mr.A-Bouhandi
1 page