0% found this document useful (0 votes)
5 views16 pages

Lecture 21

The document discusses Exploratory Data Analysis (EDA), highlighting its importance in summarizing dataset characteristics through visual methods. It outlines the steps involved in EDA, including data sourcing, cleaning, and analysis, while emphasizing its role in preparing datasets for deeper analysis and identifying business problems. Additionally, it mentions the use of confusion matrices in evaluating model performance, particularly in classifying images accurately.

Uploaded by

sojicex430
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views16 pages

Lecture 21

The document discusses Exploratory Data Analysis (EDA), highlighting its importance in summarizing dataset characteristics through visual methods. It outlines the steps involved in EDA, including data sourcing, cleaning, and analysis, while emphasizing its role in preparing datasets for deeper analysis and identifying business problems. Additionally, it mentions the use of confusion matrices in evaluating model performance, particularly in classifying images accurately.

Uploaded by

sojicex430
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Lecture 21

Data Analytics
and
Visualization
Course Code: CS2205

Dr. Rahul Mishra


IIT Patna
2
True/False, Positive/Negative.

A benefit of the six images(dataset) we have is that we know the associated category, animal or
not animal for each image. If we put these into a table with predicted categories as columns and
actual categories as rows we have made ourselves a confusion matrix.

3
Model 4 (Over predict images as animal)

The last model is similar to model 3 but this one correctly classifies all the animals (true positives) but it
mistakes the mop for an animal (false positive) Note that recall now is 100% as the model does not
produce any false negatives. This model is the one that produces the highest F1 score (86%).

4
Exploratory Data Analysis

5
Agenda

1. What is Exploratory Data Analysis?


2. Why EDA is important?
3. Visualization
- Important charts for visualization.
4. Steps involved in EDA:
- Data Sourcing
- Data Cleaning
- Univariate analysis with visualization
- Bivariate analysis with visualization
- Derived Metrics
5. Use Cases
6
7
What is Exploratory Data Analysis

• Exploratory Data Analysis is an approach to analyze the datasets to summarize their main
characteristics in form of visual methods.
• EDA is nothing but a data exploration technique to understand various aspects of the data.
• The main aim of EDA is to obtain confidence in a data to an extent where we are ready to
engage a machine learning model.
• EDA is important to analyze the data; it’s a first step in the data analysis process.

8
• EDA gives a basic idea to understand the data and make sense of the data to figure out the
question you need to ask and find out the best way to manipulate the dataset to get the
answer to your question.
• Exploratory Data Analysis helps us in finding errors, discovering data, mapping out data
structure, and finding anomalies.
• Exploratory Data Analysis is important for business processes because we are preparing
datasets for deep, thorough analysis that will detect business problems.
• EDA helps to build a quick and dirty model, or a baseline model, which can serve as a
comparison against later models that you will build.

9
10
11
12
13
14
15
https://github.com/pik1989/EDA/blob/main/Handling_Missing_Values.ipynb

16

You might also like