Module 1 - 2 - EDA

Uploaded by

24ad10ra51

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views12 pages

Module 1 - 2 - EDA

Uploaded by

24ad10ra51

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Dr.

Abhishek Bhatt, Faculty, Centre for AI, MITS-DU, Gwal

Exploratory data analysis:
The graphical summaries
❖The set of observations is called a dataset.

❖By exploring the dataset we can gain insight into what probability model suits the
phenomenon.

❖To graphically represent univariate datasets, consisting of repeated measurements of

one particular quantity, we discuss the classical histogram, the more recently introduced
kernel density estimates and the empirical distribution function.

❖To represent a bivariate dataset, which consists of repeated measurements of two

quantities, we use the scatterplot.

1
Dr. Abhishek Bhatt, Faculty, Centre for AI, MITS-DU, Gwal
Exploratory vs Confirmatory Data Analysis
EDA CDA
• No hypothesis at first • Start with hypothesis

• Generate hypothesis • Test the null hypothesis

• Uses graphical methods • Uses statistical models

(mostly)

2
Dr. Abhishek Bhatt, Faculty, Centre for AI, MITS-DU, Gwal
Exploratory data analysis (EDA):
What is EDA……
• It involves analyzing and visualizing data to understand its
• key characteristics,
• uncover patterns, and
• identify relationships between variables
• It refers to the method of studying and exploring record sets
• to apprehend their predominant traits,
• discover patterns,
• locate outliers, and
• identify relationships between variables.
• EDA is normally carried out as a preliminary step before
undertaking extra formal statistical analyses or modeling.
Dr. Abhishek Bhatt, Faculty, Centre for AI, MITS-DU, Gwal
Key aspects of EDA include:
• Distribution of Data: Understand their range, central tendencies
(mean, median), and dispersion (variance, standard deviation).
• Graphical Representations: Visualize relationships
• Outlier Detection: Identifying unusual values
• Correlation Analysis: Find the relationships between variables to
understand how they might affect each other.
• Handling Missing Values: Apply imputation or removal, depending
on their impact
• Summary Statistics: Insights into data trends
• Testing Assumptions: To meet certain conditions
Dr. Abhishek Bhatt, Faculty, Centre for AI, MITS-DU, Gwal
Why Exploratory Data Analysis is Important?
• Understanding Data Structures: dataset, features and key aspects
• Specially in the context of statistical modelling
• Identifying Patterns and Relationships:
Dr. Abhishek Bhatt, Faculty, Centre for AI, MITS-DU, Gwal
• Detecting Anomalies and Outliers:identifying errors or unusual data
points that may adversely affect the results of your analysis.
• Testing Assumptions:If the assumptions do not hold, the conclusions
drawn from the model could be invalid.
• Informing Feature Selection and Engineering: Which features are
most relevant to include in a model and how to transform them
(scaling, encoding) to improve model performance.
• Optimizing Model Design: Decide on the complexity of the model,
and better tune model parameters
• Facilitating Data Cleaning: spotting missing values and errors in the
data.
• Enhancing Communication: Visual and statistical summaries make it
easy to understand for people without technical backgrounds.
Dr. Abhishek Bhatt, Faculty, Centre for AI, MITS-DU, Gwal
Types of Exploratory Data Analysis
• Univariate
• Bivariate
• Multivariate
Dr. Abhishek Bhatt, Faculty, Centre for AI, MITS-DU, Gwal
Univariate
• Histograms: Used to visualize the distribution of a variable.
• Box plots: Useful for detecting outliers and understanding the
spread and skewness of the data.
• Bar charts: Employed for categorical data to show the
frequency of each category.
• Summary statistics: Calculations like mean, median, mode,
variance, and standard deviation that describe the central
tendency and dispersion of the data.
Dr. Abhishek Bhatt, Faculty, Centre for AI, MITS-DU, Gwal
Bivariate
• Scatter Plots: These are one of the most common tools used in bivariate analysis. A
scatter plot helps visualize the relationship between two continuous variables.
• Correlation Coefficient: This statistical measure (often Pearson’s correlation coefficient
for linear relationships) quantifies the degree to which two variables are related.
• Cross-tabulation: Also known as contingency tables, cross-tabulation is used to analyze
the relationship between two categorical variables. It shows the frequency distribution
of categories of one variable in rows and the other in columns, which helps in
understanding the relationship between the two variables.
• Line Graphs: In the context of time series data, line graphs can be used to compare two
variables over time. This helps in identifying trends, cycles, or patterns that emerge in
the interaction of the variables over the specified period.
• Covariance: Covariance is a measure used to determine how much two random variables
change together. However, it is sensitive to the scale of the variables, so it’s often
supplemented by the correlation coefficient for a more standardized assessment of the
relationship.
Dr. Abhishek Bhatt, Faculty, Centre for AI, MITS-DU, Gwal
Multivariate
• Pair plots: Visualize relationships across several variables
simultaneously to capture a comprehensive view of potential
interactions.
• Principal Component Analysis (PCA): A dimensionality
reduction technique used to reduce the dimensionality of large
datasets, while preserving as much variance as possible.
Dr. Abhishek Bhatt, Faculty, Centre for AI, MITS-DU, Gwal
Steps for Performing Exploratory Data Analysis
Dr. Abhishek Bhatt, Faculty, Centre for AI, MITS-DU, Gwal
Steps
• Step 1: Understand the Problem and the Data : Knowing the Problem
• Step 2: Import and Inspect the Data: Structure, Variable types
• Step 3: Handle Missing Data: Noisy, NA, Missing
• Step 4: Explore Data Characteristics: Statistical Description (Mean,
Mode. Variance, Skewness, Kurtosis etc.)
• Step 5: Perform Data Transformation: (Scaling, Nomalizing,
Aggregation, Encoding)
• Step 6: Visualize Data Relationships (Create Frequency Tables, Charts,
Plots, Correlation matrix)
• Step 7: Handling Outliers : Z score, IQR etc.
• Step 8: Communicate Findings and Insights : Pattents and critical
analysis of Results

5.1 Exploratory Analysis en
No ratings yet
5.1 Exploratory Analysis en
79 pages
Amit Khilare Used Device Data PM Project
No ratings yet
Amit Khilare Used Device Data PM Project
25 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
173 pages
Data Science Presentation
100% (3)
Data Science Presentation
113 pages
Unit 3
No ratings yet
Unit 3
222 pages
Exp 4-10 Merged
No ratings yet
Exp 4-10 Merged
89 pages
Chapter 2 - Data Exploration, Preprocessing and Visualization
No ratings yet
Chapter 2 - Data Exploration, Preprocessing and Visualization
92 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
Unit 3 Ids Notes
No ratings yet
Unit 3 Ids Notes
31 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
43 pages
03 Phan Tich Dau Tu Nang Cao - Phan Tich Kham Pha Du Lieu
No ratings yet
03 Phan Tich Dau Tu Nang Cao - Phan Tich Kham Pha Du Lieu
47 pages
IMPDAV
No ratings yet
IMPDAV
105 pages
Exploratory Data Analysis - v3 - Part1
No ratings yet
Exploratory Data Analysis - v3 - Part1
36 pages
4 DataUnderstanding
No ratings yet
4 DataUnderstanding
51 pages
AIDS C04-Session-22
No ratings yet
AIDS C04-Session-22
22 pages
DSE 3 Unit 4
No ratings yet
DSE 3 Unit 4
8 pages
03a EDA
No ratings yet
03a EDA
47 pages
Data Exploration LEC3 AM
No ratings yet
Data Exploration LEC3 AM
59 pages
Data Science - Module 2 (Updated)
No ratings yet
Data Science - Module 2 (Updated)
94 pages
EDA - Module 4
No ratings yet
EDA - Module 4
35 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
4 ExploratoryAnalysis
No ratings yet
4 ExploratoryAnalysis
42 pages
Data Basics For ML
No ratings yet
Data Basics For ML
23 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
68 pages
Lecture 2.1 Data - Exploration
No ratings yet
Lecture 2.1 Data - Exploration
22 pages
Ad3301 Apr May 2024 Answer Key
No ratings yet
Ad3301 Apr May 2024 Answer Key
31 pages
Chapter Five
No ratings yet
Chapter Five
48 pages
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
No ratings yet
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
42 pages
Lecture 1 Exploratory Data Analysis
No ratings yet
Lecture 1 Exploratory Data Analysis
41 pages
Ds Unit 2 QB
No ratings yet
Ds Unit 2 QB
25 pages
L4 Exploratory Analysis en
No ratings yet
L4 Exploratory Analysis en
42 pages
CH4 Exploratory Data Analysis
No ratings yet
CH4 Exploratory Data Analysis
12 pages
AI6322 - Module 3 - Exploratory Data Analysis (EDA) - MODULE
No ratings yet
AI6322 - Module 3 - Exploratory Data Analysis (EDA) - MODULE
15 pages
Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
48 pages
EDA Feature Eng - Estimation Inference and Hypothesis
No ratings yet
EDA Feature Eng - Estimation Inference and Hypothesis
53 pages
Exploratory Data Analysis Presentation
No ratings yet
Exploratory Data Analysis Presentation
16 pages
Fda End Sem
No ratings yet
Fda End Sem
14 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
10 pages
Exploratory Data Analysis and Data Visualization: Credits: Chrisvolinsky - Columbia University
No ratings yet
Exploratory Data Analysis and Data Visualization: Credits: Chrisvolinsky - Columbia University
49 pages
C21 Sma Exp4
No ratings yet
C21 Sma Exp4
12 pages
BS Computer Science Transcript
No ratings yet
BS Computer Science Transcript
2 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
9 pages
Wa0000.
No ratings yet
Wa0000.
15 pages
Unit 3
No ratings yet
Unit 3
77 pages
BI-LEc 3
No ratings yet
BI-LEc 3
24 pages
Exploratory Data Analysis (EDA) in Python
No ratings yet
Exploratory Data Analysis (EDA) in Python
6 pages
What Is Exploratory Data Analysis?: Intuition
No ratings yet
What Is Exploratory Data Analysis?: Intuition
8 pages
Unit 1
No ratings yet
Unit 1
52 pages
Unit .......
No ratings yet
Unit .......
45 pages
Playfair Cipher With Examples
No ratings yet
Playfair Cipher With Examples
6 pages
Day 1 Article For Discussion
No ratings yet
Day 1 Article For Discussion
5 pages
Dev Answer Key
100% (1)
Dev Answer Key
17 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
3 pages
A Brief Review of D-Forward Neural Networks
No ratings yet
A Brief Review of D-Forward Neural Networks
8 pages
Data Science Process
No ratings yet
Data Science Process
30 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Grey Minimalist Business Project Presentation
No ratings yet
Grey Minimalist Business Project Presentation
5 pages
The Analysis - in - EDA
No ratings yet
The Analysis - in - EDA
7 pages
Unit 3
No ratings yet
Unit 3
47 pages
Statistical Analysis and Visualization
From Everand
Statistical Analysis and Visualization
Mohit Chatterjee
No ratings yet
Data Exploration & Visualization
No ratings yet
Data Exploration & Visualization
23 pages
JASP Tutorial
No ratings yet
JASP Tutorial
21 pages
Applications of AI in InfoSec
No ratings yet
Applications of AI in InfoSec
86 pages
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
No ratings yet
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
79 pages
Lecture3 Tolerant Retrieval
100% (1)
Lecture3 Tolerant Retrieval
48 pages
PROJECT
No ratings yet
PROJECT
71 pages
Multiple Decrement Models
No ratings yet
Multiple Decrement Models
25 pages
D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19
No ratings yet
D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19
108 pages
Fallsem2024-25 Sts4021 Ss Ch2024250100090 Reference Material I 13-08-2024 Binary Palindrome 14
No ratings yet
Fallsem2024-25 Sts4021 Ss Ch2024250100090 Reference Material I 13-08-2024 Binary Palindrome 14
23 pages
GIH6 Project Scheduling
No ratings yet
GIH6 Project Scheduling
39 pages
DWM Exp6 A49
No ratings yet
DWM Exp6 A49
7 pages
Adiabatic Quantum Computing: Tameem Albash
No ratings yet
Adiabatic Quantum Computing: Tameem Albash
71 pages
Rainbow 17
No ratings yet
Rainbow 17
19 pages
ML Unit 1
No ratings yet
ML Unit 1
17 pages
TPDE Question Bank
No ratings yet
TPDE Question Bank
7 pages
Finite Mixture Models
No ratings yet
Finite Mixture Models
26 pages
Correlation and Regression Analysis: Pembe Begul GUNER
No ratings yet
Correlation and Regression Analysis: Pembe Begul GUNER
30 pages
Mcq's On Unit V
100% (1)
Mcq's On Unit V
6 pages
Unit3 Eda
No ratings yet
Unit3 Eda
13 pages
Course Outline BBA-5A Final Uploaded Fall 2023as
No ratings yet
Course Outline BBA-5A Final Uploaded Fall 2023as
4 pages
Incremental and Decremental Algorithm Design Strategies: Dr. Munesh Singh
No ratings yet
Incremental and Decremental Algorithm Design Strategies: Dr. Munesh Singh
15 pages
Turbo Codes
No ratings yet
Turbo Codes
28 pages
Sampling Theorem Lecture 16
No ratings yet
Sampling Theorem Lecture 16
8 pages
Cormen Growth of Functions
No ratings yet
Cormen Growth of Functions
10 pages
Digital Image Processing Sub Code: EC732 L: T: P Total Lecture HRS: Exam Marks: 100 3: 0: 0 Exam Hours: 03 Credits: 3
No ratings yet
Digital Image Processing Sub Code: EC732 L: T: P Total Lecture HRS: Exam Marks: 100 3: 0: 0 Exam Hours: 03 Credits: 3
2 pages
Ieee Neural Network Based Vehicle Number Plate Recognition System Icpedc47771.2019.9036497
No ratings yet
Ieee Neural Network Based Vehicle Number Plate Recognition System Icpedc47771.2019.9036497
3 pages
Approximation Algorithms II Max 3 SAT
No ratings yet
Approximation Algorithms II Max 3 SAT
5 pages
Envision G3 02 03 AP
No ratings yet
Envision G3 02 03 AP
2 pages
OLS2
No ratings yet
OLS2
4 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet