0% found this document useful (0 votes)
16 views

Lecture W6 EDA

The document outlines the course content for 'Data Science Fundamentals' focusing on Exploratory Data Analysis (EDA). It discusses the importance of EDA in discovering patterns, interpreting results, and the techniques used for data analysis. Additionally, it contrasts exploratory and confirmatory data analysis, emphasizing the need for effective data representation and interpretation.

Uploaded by

Manish Verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Lecture W6 EDA

The document outlines the course content for 'Data Science Fundamentals' focusing on Exploratory Data Analysis (EDA). It discusses the importance of EDA in discovering patterns, interpreting results, and the techniques used for data analysis. Additionally, it contrasts exploratory and confirmatory data analysis, emphasizing the need for effective data representation and interpretation.

Uploaded by

Manish Verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

School of Computing

Science and Engineering

Program: M.C.A.
Course Code: MCAS9220
Course Name: Data Science Fundamentals
Exploratory Data Analysis
Lecture overview

• Data analysis template

• Exploratory Data Analysis (EDA)


– The role of EDA
– Doing EDA
– Interpreting EDA results
Discover patterns in data

• Why is it important to find patterns?


• What counts as a pattern?
• What techniques can we use to find patterns?
• When can such techniques be used?
• How should the results be interpreted?
Data analysis template

1. Exploratory Data Analysis


– Summary of the data
– Accidental and unexpected patterns

2. Data Screening
– check for statistical hiccups

3. Fit model eg. ANOVA & do specific tests

4. Exploratory Data Analysis & Data Screening revisited:


check residuals
The role of EDA

• Exploratory Data Analysis

Explore a data set


Use methods that help you understand the data
- to help you understand the events that
generated the data
- to help you see what happened, sometimes in
spite of your expectations
Simple example

Class attendance and language learning

Bob: 10 classes; 100 words


Carol: 15 classes 150 words
Dave: 12 classes; 120 words
Ann: 17 classes; 170 words
Steve: 13 classes; 95 words
Recognising patterns

EDA supplies statistical techniques


Ways to tabulate,
summarise, display,
reduce …data
that work in combination with a very powerful
pattern recognition device…
Data Analysis (DA)

• DA can't be done mechanically


• Often there has to be a "creative" element
• Conventional DA is in a sense idealistic
• Trade-off between
"ideal" experimentation v. ecological validity
• Sometimes questions are tentative
• We need data analysis skills that allow data to
speak to us despite our expectation
More interesting example

NameVoyager

NameMapper
NameVoyager

Variable Method used to


represent

Time horizontal axis


No. / billion babies vertical axis
Sex colour hue
Rank in 2007 colour saturation
Name label
Detail pop-up, click thru
Confirmatory vs. exploratory data analysis

Confirmatory data Exploratory data


analysis analysis
• tests a hypothesis • finds a good description
• settles questions • raises new questions

(Inferential statistics) (Descriptive statistics)


What is data?

• A bunch of numbers (usually)


• Each number summarises some property or
event of interest
e.g. 18
– Age, Beck Depression Inventory (BDI) score, Income
in £’000s
• Data: lots of numbers
– e.g. 18, 24, 43, 22, 37, …

Is there a pattern?
Data reduction – fewer numbers

• Summarise proportion
27 / 48 children in class A are boys
16 / 23 children in class B are boys
Re-presented: 56% of class A, 69% of class B are boys

• Summarise change
Before: 112, 134, 121, 97
After: 116, 132, 140, 108
Re-presented
Change: 4, -2, 19, 11
Simpler descriptions are better

"Anything that looks below the previously


described surface makes the description
more effective" Tukey (1977)
Revealing patterns

• Raw data is hard to understand


• EDA provides ways of presenting data that make
the data easier to understand

• Example of Lord Rayleigh's research on the


weight of nitrogen
– used a chemical compound to isolate a fixed amount
of nitrogen
– repeated this experiment 15 times
Date Source compound Extraction method Weight observed
29.11.93 NO hot iron 2.30143
5.12.93 NO hot iron 2.29816
6.12.93 NO hot iron 2.30182
8.12.93 NO hot iron 2.29890
12.12.93 Air hot iron 2.31017
14.12.93 Air hot iron 2.30986
19.12.93 Air hot iron 2.31010
22.12.93 Air hot iron 2.31001
26.12.93 N2O hot iron 2.29889
28.12.93 N2O hot iron 2.29940
9.1.94 NH4NO2 hot iron 2.29849
13.1.94 NH4NO2 hot iron 2.29889
27.1.94 Air ferrous hydrate 2.31024
30.1.94 Air ferrous hydrate 2.31030
1.2.94 Air ferrous hydrate 2.31028
Box & whisker plot
dot plot
Two separate box & whisker plots
Technique

• Find a graph that shows clearly that the data can


be divided into two different groups

• Appropriate representation depends on your


practical goal
Precise descriptions are better

• "Most of the key questions in our world sooner


or later demand answers to "by how much?"
rather than merely to "in which direction?"
(Tukey, 1977)

• Hick's Law
• Choice Reaction Time experiment
• RT increases with number of possible response alternatives
Hick's law
Hick's law
Interpreting EDA

Multiplicity
Interpreting EDA

• Summarise the results


• Discover unanticipated results
– new line of research, new experiment
– qualify conclusion from the present study
• Generate hypotheses
• Check assumptions
– qualify conclusion from the present study
– address anomalies

• NOT (or, rarely) a definitive conclusion


Practical week 7

1. Using EDA for data screening in simple &


multiple regression

2. Visualisation
(a) NameVoyager
(b) Bullying data

Register for bullying data before the practical!

You might also like