0% found this document useful (0 votes)

11 views16 pages

03b EDA-Tutorial

Uploaded by

Van loi Ha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views16 pages

03b EDA-Tutorial

Uploaded by

Van loi Ha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Lession 03 - Tutorial

Tutorial
Exploratory Data Analysis
and Tools – Orange and Python
Exploratory Data Analysis
• EDA is an iterative cycle:
• Generate questions about your data:
• Search for answers by visualising, transforming, and modelling your data
• Use what you learn to refine your questions and/or generate new questions

https://duo.com/labs/research/gamifying-data-
science-education

2
We are considering the popular data set “iris”
-The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by the British statistician and
biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an
example of linear discriminant analysis.[1](Wikipedia)

qUCI Machine Learning Repository

[1]. R. A. Fisher (1936). "The use of multiple measurements in taxonomic problems". Annals of Eugenics. 7 (2): 179–188. doi:10.1111/j.1469-
1809.1936.tb02137.x. hdl:2440/15227.
Let’s start with Orange
- Load the data set

qWhat to notice?
- Type of target is categorical è classification
- Data size is smallè might need cross validation
Orange EDA: A first look

qWhat to notice?
- Variable scale is different (e.g. sepal length
is the widest and petal width is the least)
è might need normalization
Orange EDA: A first look

qWhat to notice?
- Type of target is categorical è classification
- Data size is smallè might need cross validation
Orange EDA:
What are the stats of the variables?

qWhat to notice?
- Centre, spread, no missing values of variables
- Distributions of variables over classes
- Classes balance
Orange EDA:
What are the most important features?

qWhat to notice?
- Petal length seems the most important and sepal width is the least
è Feature selection
Orange EDA:
What are the most important features?

qWhat to notice?
- [-1, 1], negative/positive, strong/weak…
- Petal length and petal width look strongly corelated
è Feature selection
Orange EDA:
What is the relationship between two variables (e.g. the sepal length and width) per/regardless class?

- Change variables
- What to notice?
- Compare with the correlation shown previously
Orange EDA:
How the values of a certain variable (e.g. sepal length) are distributed?

Univariate

qWhat to notice?
- Graphical presentation for the stats
Orange EDA:
How the values of a certain variable (e.g. sepal length) are distributed per target class (iris species)?

Multivariate

qWhat to notice?
- Graphical presentation for the stats per class
- Small sepal length è iris-setosa class
Orange EDA:
How the values of a certain variable (e.g. sepal length) are distributed per target class (iris species)?

qWhat to notice?
- Similar to box plot but the density/frequency of the samples for variable values is
visualized
Orange EDA:
How the values of a certain variable (e.g. sepal length) are distributed per target class (iris species)?

Change

qWhat to notice?
- Show the points for clearer visualization
Orange EDA:
How the values of a certain variable (e.g. sepal length) are distributed per target class (iris species)?

qWhat to notice?
- Shorter sepal èIris-setosa
- Longer sepal è more likely Iris-virginica
Orange EDA:
How the values of input variables are distributed w.r.t. another variable?

Data Exploration and Visualisation With R: Yanchang Zhao
No ratings yet
Data Exploration and Visualisation With R: Yanchang Zhao
45 pages
EDA AnalysisA
No ratings yet
EDA AnalysisA
15 pages
9 .ML Programs
No ratings yet
9 .ML Programs
95 pages
Module 2 Iris Data Set
No ratings yet
Module 2 Iris Data Set
1 page
A Complete Guide To The Iris Dataset in R
No ratings yet
A Complete Guide To The Iris Dataset in R
3 pages
Ass 10 DSBDL
No ratings yet
Ass 10 DSBDL
9 pages
Using R For Data Preprocessing, Exploratory Analysis, Visualization
No ratings yet
Using R For Data Preprocessing, Exploratory Analysis, Visualization
7 pages
Ads Exp 3
No ratings yet
Ads Exp 3
7 pages
EXPERIMENT
No ratings yet
EXPERIMENT
16 pages
10
No ratings yet
10
7 pages
Module2 R Report
No ratings yet
Module2 R Report
6 pages
David James B. Ignacio - Midterm Exam 1
No ratings yet
David James B. Ignacio - Midterm Exam 1
3 pages
Data Science Project
No ratings yet
Data Science Project
31 pages
Introduction To R. Graphical Representation of Multivariate Observations
No ratings yet
Introduction To R. Graphical Representation of Multivariate Observations
5 pages
Task 1
No ratings yet
Task 1
14 pages
NUMPY-case Study
100% (1)
NUMPY-case Study
4 pages
Module 2e - Data Visualization - NV
No ratings yet
Module 2e - Data Visualization - NV
9 pages
Tidyverse Cheat Sheet
No ratings yet
Tidyverse Cheat Sheet
1 page
ML R Experiment1
No ratings yet
ML R Experiment1
10 pages
Merging and Importing Data Additionalmaterial
No ratings yet
Merging and Importing Data Additionalmaterial
2 pages
Iris Visual Code
No ratings yet
Iris Visual Code
6 pages
Exploratory Data Analysis: M. Srinath
No ratings yet
Exploratory Data Analysis: M. Srinath
19 pages
Exploratory Data Analysis - Iris Dataset - by Pranshu Sharma - Analytics Vidhya - Medium
No ratings yet
Exploratory Data Analysis - Iris Dataset - by Pranshu Sharma - Analytics Vidhya - Medium
24 pages
Practical 01
No ratings yet
Practical 01
18 pages
2 Eda
No ratings yet
2 Eda
20 pages
Discriminant Analysis Example
No ratings yet
Discriminant Analysis Example
19 pages
Varad Aiml 3.3
No ratings yet
Varad Aiml 3.3
4 pages
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
No ratings yet
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
14 pages
Amber Iris
No ratings yet
Amber Iris
23 pages
Introduction To Orange: Data Analytics Core
50% (2)
Introduction To Orange: Data Analytics Core
33 pages
Iris Flower Classification
No ratings yet
Iris Flower Classification
47 pages
Iris Project Presentation
No ratings yet
Iris Project Presentation
13 pages
Data Mining - R Assignment: Konstantinos Stavrou (70134) 11/11/2012
No ratings yet
Data Mining - R Assignment: Konstantinos Stavrou (70134) 11/11/2012
13 pages
Final Assignmnt Stat Kamii
No ratings yet
Final Assignmnt Stat Kamii
6 pages
AMR - Assignment 1-Sample Solutions
No ratings yet
AMR - Assignment 1-Sample Solutions
7 pages
R For Data Science - Tidyverse For Beginners (Ggplot2, Dplyr, Tidyr, Readr, Purr, Tibble, Stringr, Forcats) PDF
No ratings yet
R For Data Science - Tidyverse For Beginners (Ggplot2, Dplyr, Tidyr, Readr, Purr, Tibble, Stringr, Forcats) PDF
1 page
R For Data Science: Dplyr Ggplot2
No ratings yet
R For Data Science: Dplyr Ggplot2
1 page
Wk. 4. Exploring Data (12-05-2021)
No ratings yet
Wk. 4. Exploring Data (12-05-2021)
10 pages
AI Lab Exercise 3
No ratings yet
AI Lab Exercise 3
1 page
Data Visualization With Ggplot2: Sca!er Plots
No ratings yet
Data Visualization With Ggplot2: Sca!er Plots
54 pages
Lab 3 - SciKitLearn ML
No ratings yet
Lab 3 - SciKitLearn ML
2 pages
王玉 20201108012390
No ratings yet
王玉 20201108012390
13 pages
Irisdataset withLegend.R
No ratings yet
Irisdataset withLegend.R
3 pages
An Introduction To Data Analysis Visualization Using R
No ratings yet
An Introduction To Data Analysis Visualization Using R
30 pages
Business Analytics Assignment NAME: Divyansh: Bisht
No ratings yet
Business Analytics Assignment NAME: Divyansh: Bisht
7 pages
Iris HC Solution
No ratings yet
Iris HC Solution
31 pages
Univariate and Multivariate Data Exploration
No ratings yet
Univariate and Multivariate Data Exploration
26 pages
2.1 Exploratory Data Analysis Using Python
No ratings yet
2.1 Exploratory Data Analysis Using Python
12 pages
21brs1149 Da3 Prob
No ratings yet
21brs1149 Da3 Prob
5 pages
Exploratory Data Analysis Reference
100% (2)
Exploratory Data Analysis Reference
49 pages
Data Exploration LEC3 AM
No ratings yet
Data Exploration LEC3 AM
59 pages
DSBDA Lab Assignment No 10
No ratings yet
DSBDA Lab Assignment No 10
3 pages
Dsbda Lab - 3 - 1737952797670
No ratings yet
Dsbda Lab - 3 - 1737952797670
9 pages
Assigntment 3 Python Lab
No ratings yet
Assigntment 3 Python Lab
1 page
Vansh 3089 CA2
No ratings yet
Vansh 3089 CA2
13 pages
EDA With R Lab Manual
No ratings yet
EDA With R Lab Manual
110 pages
Python (Visualization)
No ratings yet
Python (Visualization)
3 pages
Part A Assignment 10
No ratings yet
Part A Assignment 10
3 pages
The Science Teacher's Toolbox: Hundreds of Practical Ideas to Support Your Students
From Everand
The Science Teacher's Toolbox: Hundreds of Practical Ideas to Support Your Students
Mandi S. White
No ratings yet
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet
SBI PO Computer Awareness-25
No ratings yet
SBI PO Computer Awareness-25
4 pages
The Elephant in The Room: Reforming Zimbabwe's Security Sector Ahead of Elections
No ratings yet
The Elephant in The Room: Reforming Zimbabwe's Security Sector Ahead of Elections
43 pages
SM Ami-208mc (E)
No ratings yet
SM Ami-208mc (E)
18 pages
12 Ip Ak 18112024
No ratings yet
12 Ip Ak 18112024
4 pages
Sample Business Plan For Cooking School
No ratings yet
Sample Business Plan For Cooking School
4 pages
SaravanaKumar Resume1
No ratings yet
SaravanaKumar Resume1
1 page
Trade Union Movement in Nigeria and International Trade Union Movement
92% (12)
Trade Union Movement in Nigeria and International Trade Union Movement
22 pages
Catalogue Motor Orbitrol
100% (1)
Catalogue Motor Orbitrol
76 pages
Uchaguzi Wa Wanafunzi Wa Kidato Cha Tano Na Vyuo Vya Kati, 2022233149
No ratings yet
Uchaguzi Wa Wanafunzi Wa Kidato Cha Tano Na Vyuo Vya Kati, 2022233149
9 pages
Katalog Produk Flexible Hose
No ratings yet
Katalog Produk Flexible Hose
15 pages
1st MT Math 8
No ratings yet
1st MT Math 8
3 pages
Affirmation of Maxwell Leighton
No ratings yet
Affirmation of Maxwell Leighton
13 pages
Firefighters
No ratings yet
Firefighters
9 pages
Klueberoil C 1-150 030119 PI GB en
No ratings yet
Klueberoil C 1-150 030119 PI GB en
2 pages
Comparison Between Alkaline Electrolyser & PEM
No ratings yet
Comparison Between Alkaline Electrolyser & PEM
2 pages
Toesox v. Kulae
No ratings yet
Toesox v. Kulae
12 pages
Pamantasan Lungsod NG Pasig: Entrepreneurial Mind Chap. 3. Corporate Entrepreneurial Mind Set
No ratings yet
Pamantasan Lungsod NG Pasig: Entrepreneurial Mind Chap. 3. Corporate Entrepreneurial Mind Set
37 pages
The Theory of Surplus Value Is A Key Component of Karl Marx
No ratings yet
The Theory of Surplus Value Is A Key Component of Karl Marx
2 pages
Night Work Permit
No ratings yet
Night Work Permit
2 pages
Dogs, Poops & Disposals: Challenging All Assumptions
No ratings yet
Dogs, Poops & Disposals: Challenging All Assumptions
23 pages
Chapter 8
No ratings yet
Chapter 8
47 pages
Internet First Media - Tracxn Feed Report - 13 Jan 2021
100% (1)
Internet First Media - Tracxn Feed Report - 13 Jan 2021
74 pages
Focus Questions - Class 10 24-25
No ratings yet
Focus Questions - Class 10 24-25
7 pages
Pfumvudza 2020
100% (1)
Pfumvudza 2020
3 pages
CEJ - Pinta Astuti July 2022
No ratings yet
CEJ - Pinta Astuti July 2022
14 pages
Prall Tester For Determination of Wearing of Asphalt Pavements and Investigation of Abrasion Due To Studded Tyres
No ratings yet
Prall Tester For Determination of Wearing of Asphalt Pavements and Investigation of Abrasion Due To Studded Tyres
3 pages
Design FRP Girth Flange
No ratings yet
Design FRP Girth Flange
1 page
AFAR Quiz
No ratings yet
AFAR Quiz
7 pages
GK 3 Ok
No ratings yet
GK 3 Ok
10 pages
Design and Analysis of Cylinder and Cylinder Head of 4-Stroke SI Engine For Weight Reduction
No ratings yet
Design and Analysis of Cylinder and Cylinder Head of 4-Stroke SI Engine For Weight Reduction
7 pages

03b EDA-Tutorial

Uploaded by

03b EDA-Tutorial

Uploaded by

Lession 03 - Tutorial

qUCI Machine Learning Repository

You might also like