Chapter 3 Data Analysis
Chapter 3 Data Analysis
Slide 1/96
Good graphics Descriptive statistics Desctibing Data Exercises
Outline
Good graphics
Exploratory Data Analysis
Types of Variables
Some Graphs
Descriptive statistics
Measures of location
Measures of spread
(Potential) Outliers
Correlation
Desctibing Data
Numerical data
Categorical data
Relationships
Exercises
Slide 2/96
Good graphics Descriptive statistics Desctibing Data Exercises
References
References:
DeVeaux, Velleman, Bock – Chapters 3,4,5
Utts and Heckard (4th edition) – Chapters 2, 3 (not 3.4), 4.1,
4.3
Utts and Heckard (5th edition) – Chapters 2, 3.1, 3.3, 3.4,
4.1, 4.3
Slide 3/96
Good graphics Descriptive statistics Desctibing Data Exercises
Learning Outcomes:
Slide 4/96
Good graphics Descriptive statistics Desctibing Data Exercises
Displaying Data
Slide 5/96
Good graphics Descriptive statistics Desctibing Data Exercises
Displaying Data
Slide 6/96
Good graphics Descriptive statistics Desctibing Data Exercises
Displaying Data
Slide 7/96
Good graphics Descriptive statistics Desctibing Data Exercises
. . . involves
▶ examining the variability in the sample data (& looking for
any patterns)
▶ uncovering possible explanations for the
variability. . . modelling the data
▶ detecting the story that the data-in-hand has to tell
Slide 8/96
Good graphics Descriptive statistics Desctibing Data Exercises
Slide 9/96
Good graphics Descriptive statistics Desctibing Data Exercises
Types of Variables
Slide 10/96
Good graphics Descriptive statistics Desctibing Data Exercises
Types of Variables
Types of Variables
Slide 11/96
Good graphics Descriptive statistics Desctibing Data Exercises
Types of Variables
Types of Variables
Slide 12/96
Good graphics Descriptive statistics Desctibing Data Exercises
Types of Variables
Types of Variables
Slide 12/96
Good graphics Descriptive statistics Desctibing Data Exercises
Types of Variables
Types of Variables
Slide 12/96
Good graphics Descriptive statistics Desctibing Data Exercises
Types of Variables
Slide 13/96
Good graphics Descriptive statistics Desctibing Data Exercises
Types of Variables
Slide 14/96
Good graphics Descriptive statistics Desctibing Data Exercises
Types of Variables
Slide 15/96
Good graphics Descriptive statistics Desctibing Data Exercises
Types of Variables
Slide 16/96
Good graphics Descriptive statistics Desctibing Data Exercises
Types of Variables
Hierarchy of Information
There is a hierarchy in the amount of information contained in
each variable type.
Slide 17/96
Good graphics Descriptive statistics Desctibing Data Exercises
Types of Variables
Types of Questions
The type of questions that we can ask is also related to the type of
variable that we have recorded.
Examples: One categorical variable
▶ What is the degree of myopia for children who slept with
night-time lighting in early childhood?
▶ Which major is most common for students enrolled in an
Introductory Statistics class?
Slide 18/96
Good graphics Descriptive statistics Desctibing Data Exercises
Types of Variables
Types of Questions
Slide 19/96
Good graphics Descriptive statistics Desctibing Data Exercises
Types of Variables
Types of Questions
Slide 20/96
Good graphics Descriptive statistics Desctibing Data Exercises
Types of Variables
Types of Questions
Slide 21/96
Good graphics Descriptive statistics Desctibing Data Exercises
Types of Variables
Types of Questions
Slide 22/96
Good graphics Descriptive statistics Desctibing Data Exercises
Some Graphs
Graphical displays
Note:
For further discussion of the relative merits of the different graphical displays
discussed in this section see Utts and Heckard, Chapter 2, pages 35–37.
Slide 23/96
Good graphics Descriptive statistics Desctibing Data Exercises
Some Graphs
Slide 24/96
Good graphics Descriptive statistics Desctibing Data Exercises
Some Graphs
Slide 25/96
Good graphics Descriptive statistics Desctibing Data Exercises
Some Graphs
Slide 26/96
Good graphics Descriptive statistics Desctibing Data Exercises
Some Graphs
Slide 27/96
Good graphics Descriptive statistics Desctibing Data Exercises
Some Graphs
Slide 28/96
Good graphics Descriptive statistics Desctibing Data Exercises
Some Graphs
Slide 29/96
Good graphics Descriptive statistics Desctibing Data Exercises
Some Graphs
numerical categorical
variables variables displays
3 0 surface plot
3-D plot
(not covered in this subject)
2 1 scatterplot — with groups
1 2 interaction plot
(see Exercise 4: Lymphocytes)
0 3 cross-tabulation
comparative bar chart
(3 variables on one axis)
Slide 30/96
Good graphics Descriptive statistics Desctibing Data Exercises
Measures of location
Mean
Slide 31/96
Good graphics Descriptive statistics Desctibing Data Exercises
Measures of location
Order statistics
If we arrange a set of observations x1 , x2 , . . . , xn from smallest to
largest, then the values are referred to as the order statistics and
are denoted as
x(1) ≤ x(2) ≤ · · · ≤ x(n)
Example: The first 10 values of Pulse1.
x1 , . . . , x10 64 58 62 66 64 74 84 68 62 76
x(1) , . . . , x(10) 58 62 62 64 64 66 68 74 76 84
Slide 32/96
Good graphics Descriptive statistics Desctibing Data Exercises
Measures of location
Median
Slide 33/96
Good graphics Descriptive statistics Desctibing Data Exercises
Measures of location
Quartiles
Slide 34/96
Good graphics Descriptive statistics Desctibing Data Exercises
Measures of location
Calculating Quartiles
Slide 35/96
Good graphics Descriptive statistics Desctibing Data Exercises
Measures of spread
Range
The range is the difference between the largest and smallest values.
Slide 36/96
Good graphics Descriptive statistics Desctibing Data Exercises
Measures of spread
Slide 37/96
Good graphics Descriptive statistics Desctibing Data Exercises
Measures of spread
Five-number summary
Slide 38/96
Good graphics Descriptive statistics Desctibing Data Exercises
Measures of spread
Standard deviation
Slide 39/96
Good graphics Descriptive statistics Desctibing Data Exercises
Measures of spread
Player A 1 2 3 9 11 10
Player B 6 6 5 7 6 6
Slide 40/96
Good graphics Descriptive statistics Desctibing Data Exercises
Measures of spread
Slide 41/96
Good graphics Descriptive statistics Desctibing Data Exercises
(Potential) Outliers
Outliers
2021 Australian olympic women’s rowing eight
(Potential) Outliers
Outliers
2021 Australian olympic women’s rowing eight
(Potential) Outliers
Questions to ask
▶ Is it a legitimate data value?
▶ Is it likely to be a data entry mistake?
▶ Does the individual really belong to a different group?
Slide 44/96
Good graphics Descriptive statistics Desctibing Data Exercises
(Potential) Outliers
Slide 45/96
Good graphics Descriptive statistics Desctibing Data Exercises
(Potential) Outliers
For Pulse2:
Slide 46/96
Good graphics Descriptive statistics Desctibing Data Exercises
(Potential) Outliers
For Pulse2:
IQR =
Q1 − (1.5 × IQR) =
Q3 + (1.5 × IQR) =
Slide 47/96
Good graphics Descriptive statistics Desctibing Data Exercises
(Potential) Outliers
Slide 48/96
Good graphics Descriptive statistics Desctibing Data Exercises
Correlation
Correlation coefficient (r )
Slide 49/96
Good graphics Descriptive statistics Desctibing Data Exercises
Correlation
Correlation coefficient (r )
Properties:
▶ −1 < r < 1. Positive r indicates positive association and
negative r indicates negative association.
▶ r = −1 and r = 1 occur when the data lie on a straight line.
▶ r ≈ 0 indicates no linear relation between x and y .
▶ correlation (r ) is affected by outliers
Slide 50/96
Good graphics Descriptive statistics Desctibing Data Exercises
Correlation
Exploring correlation (r )
Slide 51/96
Good graphics Descriptive statistics Desctibing Data Exercises
Correlation
more. . .
Slide 52/96
Good graphics Descriptive statistics Desctibing Data Exercises
Correlation
Exploring correlation (r )
Correlation
Anscombe’s Data
Slide 54/96
Good graphics Descriptive statistics Desctibing Data Exercises
Correlation
Datasaurus Dozen
https://www.autodeskresearch.com/publications/samestats
Slide 55/96
Good graphics Descriptive statistics Desctibing Data Exercises
Correlation
An Unusual case!
Oddity?. . . See Utts 5th edition pp123 (section 4.3) for other
examples
Slide 56/96
Good graphics Descriptive statistics Desctibing Data Exercises
Numerical data
Slide 57/96
Good graphics Descriptive statistics Desctibing Data Exercises
Numerical data
Dotplot
Slide 58/96
Good graphics Descriptive statistics Desctibing Data Exercises
Numerical data
Histogram
Histogram of Pulse1
25
20
Frequency
15
10
0
50 60 70 80 90 100
Pulse1
Slide 59/96
Good graphics Descriptive statistics Desctibing Data Exercises
Numerical data
About Histograms
Slide 60/96
Good graphics Descriptive statistics Desctibing Data Exercises
Numerical data
Boxplot
Slide 61/96
Good graphics Descriptive statistics Desctibing Data Exercises
Numerical data
Slide 62/96
Good graphics Descriptive statistics Desctibing Data Exercises
Categorical data
Comments?
Slide 68/96
Good graphics Descriptive statistics Desctibing Data Exercises
Categorical data
Slide 65/96
Good graphics Descriptive statistics Desctibing Data Exercises
Categorical data
Frequency Table
Slide 66/96
Good graphics Descriptive statistics Desctibing Data Exercises
Categorical data
Frequency Table
Slide 66/96
Good graphics Descriptive statistics Desctibing Data Exercises
Categorical data
Comments about:
▶ the incidence of Myopia. . . ?
▶ the association between degree of myopia and night time
lighting?
Slide 67/96
Good graphics Descriptive statistics Desctibing Data Exercises
Categorical data
Slide 69/96
Good graphics Descriptive statistics Desctibing Data Exercises
Categorical data
Slide 70/96
Good graphics Descriptive statistics Desctibing Data Exercises
Categorical data
Slide 70/96
Good graphics Descriptive statistics Desctibing Data Exercises
Categorical data
Slide 70/96
Good graphics Descriptive statistics Desctibing Data Exercises
Categorical data
Slide 70/96
Good graphics Descriptive statistics Desctibing Data Exercises
Categorical data
Slide 70/96
Good graphics Descriptive statistics Desctibing Data Exercises
Relationships
Useful Displays
▶ dotplots, histograms or boxplots of the response variable (root
depth), by the explanatory variable (soil density), using the
same scale
▶ these are often called comparative plots
Slide 71/96
Good graphics Descriptive statistics Desctibing Data Exercises
Relationships
Comparative dotplots
Slide 72/96
Good graphics Descriptive statistics Desctibing Data Exercises
Relationships
Comparative boxplots
Slide 73/96
Good graphics Descriptive statistics Desctibing Data Exercises
Relationships
Slide 74/96
Good graphics Descriptive statistics Desctibing Data Exercises
Relationships
Summary Measures
Slide 75/96
Good graphics Descriptive statistics Desctibing Data Exercises
Relationships
Comments
Shape
Centre
Spread
Unusual?
Slide 76/96
Good graphics Descriptive statistics Desctibing Data Exercises
Relationships
Slide 77/96
Good graphics Descriptive statistics Desctibing Data Exercises
Relationships
Useful Displays:
▶ scatterplots
▶ dotplots of differences or ratios
Slide 78/96
Good graphics Descriptive statistics Desctibing Data Exercises
Relationships
Relationships
Slide 80/96
Good graphics Descriptive statistics Desctibing Data Exercises
16 14 15 16 14 20 18 16 13 17 16 13 18
16 22 20 24 18 16 20 26 19 14 17 14 15
19 12 11 16 19 13 13 18 8 16 28 16 27
17 12 15 7 12 22 20 16 10 8 16 13 14
14 16 18 15 21 23 16 5 14 23 17 15 14
22 20 22 13 20 18 13 14 21 14 18 10 20
24 17 21 15 18 12 23 17 10 15 11 5 16
19 22 10 15 17 13 23 20 3 18 15 22 12
9 20 16 17 17 16 21 18 11 14 6 21 25
18 26 18 18 15
Slide 81/96
Good graphics Descriptive statistics Desctibing Data Exercises
Comments
Slide 82/96
Good graphics Descriptive statistics Desctibing Data Exercises
Slide 83/96
Good graphics Descriptive statistics Desctibing Data Exercises
Air speed 0.5 0.5 0.5 1.0 1.0 1.0 1.5 1.5
Evaporation 5.39 4.43 5.50 7.70 6.20 6.14 5.62 6.12
Slide 83/96
Good graphics Descriptive statistics Desctibing Data Exercises
Comments:
Generally speaking, increased evaporation of water from soil seems
to be related to increased air flow.
Slide 84/96
Good graphics Descriptive statistics Desctibing Data Exercises
Slide 85/96
Good graphics Descriptive statistics Desctibing Data Exercises
Melon yields
Slide 86/96
Good graphics Descriptive statistics Desctibing Data Exercises
Melon yields
Slide 87/96
Good graphics Descriptive statistics Desctibing Data Exercises
Comments:
Possible model
Slide 88/96
Good graphics Descriptive statistics Desctibing Data Exercises
Litter
1 2 3 4 5
A 7.1 6.1 6.9 5.6 6.4
B 6.7 5.1 5.9 5.1 5.8
C 7.1 5.8 6.2 5.0 6.2
D 6.7 5.4 5.7 5.2 5.3
Lymphocyte counts
7.0
6.5
Count
6.0
5.5
5.0
A B C D
Drug
Slide 90/96
Good graphics Descriptive statistics Desctibing Data Exercises
Lymphocyte counts
6.0
5.5
5.0
A B C D
Drug
Slide 91/96
Good graphics Descriptive statistics Desctibing Data Exercises
Lymphocyte counts
6.0
5.5
5.0
A B C D
Drug
Slide 92/96
Good graphics Descriptive statistics Desctibing Data Exercises
Comments
Slide 93/96
Good graphics Descriptive statistics Desctibing Data Exercises
Possible models
or
Slide 94/96
Good graphics Descriptive statistics Desctibing Data Exercises
Slide 95/96
Good graphics Descriptive statistics Desctibing Data Exercises
Slide 96/96