0% found this document useful (0 votes)
11 views105 pages

Chapter 3 Data Analysis

a

Uploaded by

devadityasen2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views105 pages

Chapter 3 Data Analysis

a

Uploaded by

devadityasen2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 105

Good graphics Descriptive statistics Desctibing Data Exercises

Chapter 3: Exploratory Data Analysis

MAST10010 Data Analysis

Department of Mathematics & Statistics


University of Melbourne

Slide 1/96
Good graphics Descriptive statistics Desctibing Data Exercises

Outline
Good graphics
Exploratory Data Analysis
Types of Variables
Some Graphs
Descriptive statistics
Measures of location
Measures of spread
(Potential) Outliers
Correlation
Desctibing Data
Numerical data
Categorical data
Relationships
Exercises
Slide 2/96
Good graphics Descriptive statistics Desctibing Data Exercises

References

References:
DeVeaux, Velleman, Bock – Chapters 3,4,5
Utts and Heckard (4th edition) – Chapters 2, 3 (not 3.4), 4.1,
4.3
Utts and Heckard (5th edition) – Chapters 2, 3.1, 3.3, 3.4,
4.1, 4.3

Slide 3/96
Good graphics Descriptive statistics Desctibing Data Exercises

Learning Outcomes:

At the end of this topic you should:


▶ Understand the importance of presenting data effectively
▶ Be able to distinguish different types of variables
▶ Understand and use common summary statistics
▶ Be able to produce good graphics to display a variety of data
types
▶ Be able to describe data using relevant graphs and statistics

Slide 4/96
Good graphics Descriptive statistics Desctibing Data Exercises

Displaying Data

Slide 5/96
Good graphics Descriptive statistics Desctibing Data Exercises

Displaying Data

Some good things about the graph:





Some issues with the graph:


Slide 6/96
Good graphics Descriptive statistics Desctibing Data Exercises

Displaying Data

Slide 7/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exploratory Data Analysis

Exploratory Data Analysis

. . . involves
▶ examining the variability in the sample data (& looking for
any patterns)
▶ uncovering possible explanations for the
variability. . . modelling the data
▶ detecting the story that the data-in-hand has to tell

Slide 8/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exploratory Data Analysis

Steps in an Exploratory Data Analysis

1. Display the sample data (Graphs, tables)


2. Summarise the distribution of the sample data using
summary measures (statistics)
3. Describe (in words) what is revealed by our pictures and
summary statistics

4. ( ◦
◦ ) Conjecture about what could be happening in the
population

Slide 9/96
Good graphics Descriptive statistics Desctibing Data Exercises

Types of Variables

How to think about Exploratory Data


Analysis. . .

Slide 10/96
Good graphics Descriptive statistics Desctibing Data Exercises

Types of Variables

Types of Variables

The type of variables we have measured determines what


questions we can ask of the data and how we can meaningfully
display and summarise our data.
For example: Suppose we recorded ratings (G, PG, M) for movies
screened at a cinema over a period of time. Would it really make
sense to talk about an average rating for the period?

Slide 11/96
Good graphics Descriptive statistics Desctibing Data Exercises

Types of Variables

Types of Variables

The type of variable is specified by the possible values that the


variable can take.

Slide 12/96
Good graphics Descriptive statistics Desctibing Data Exercises

Types of Variables

Types of Variables

The type of variable is specified by the possible values that the


variable can take.
▶ categorical
▶ nominal
▶ ordered

Slide 12/96
Good graphics Descriptive statistics Desctibing Data Exercises

Types of Variables

Types of Variables

The type of variable is specified by the possible values that the


variable can take.
▶ categorical
▶ nominal
▶ ordered
▶ numerical
▶ discrete
▶ continuous

Slide 12/96
Good graphics Descriptive statistics Desctibing Data Exercises

Types of Variables

Example: Pulse Rates

▶ 92 students measured their resting pulse (Pulse1).


▶ Each student tossed a coin to determine whether or not to
run on the spot for 1 minute.
▶ After running (or not running) all re-measured their pulse
(Pulse2).
▶ Also recorded were:
▶ gender (1 = male, 2 = female)
▶ whether ran (1 = yes, 2 = no)
▶ level of other physical activity
(1 = slight, 2 = moderate, 3 = a lot)
▶ smoking status (1 = regularly, 2 = not regularly)
▶ height (in inches) and weight (in pounds).

Slide 13/96
Good graphics Descriptive statistics Desctibing Data Exercises

Types of Variables

The Pulse Data


Some of the data
PULSE1 PULSE2 RAN SMOKES GENDER HEIGHT WEIGHT ACTIVITY
64 88 1 2 1 66.00 140 2
58 70 1 2 1 72.00 145 2
62 76 1 1 1 73.50 160 3
66 78 1 1 1 73.00 190 1
64 80 1 2 1 69.00 155 2
74 84 1 2 1 73.00 165 1
84 84 1 2 1 72.00 150 3
68 72 1 2 1 74.00 190 2
62 75 1 2 1 72.00 195 2
76 118 1 2 1 71.00 138 2
: : : : : : : :
76 76 2 2 2 61.75 108 2

Slide 14/96
Good graphics Descriptive statistics Desctibing Data Exercises

Types of Variables

Some possible questions of interest


▶ what is the distribution of PULSE1?
▶ what influences PULSE1?
▶ what is the effect of running on pulse?
▶ do other variables, such as smoking, appear to influence the
effect of running on pulse?

You will be investigating these questions (and more) in your lab


class next week.

Slide 15/96
Good graphics Descriptive statistics Desctibing Data Exercises

Types of Variables

Variable types Quiz


https://www.pollev.com/paulfijn

Slide 16/96
Good graphics Descriptive statistics Desctibing Data Exercises

Types of Variables

Hierarchy of Information
There is a hierarchy in the amount of information contained in
each variable type.

Slide 17/96
Good graphics Descriptive statistics Desctibing Data Exercises

Types of Variables

Types of Questions

The type of questions that we can ask is also related to the type of
variable that we have recorded.
Examples: One categorical variable
▶ What is the degree of myopia for children who slept with
night-time lighting in early childhood?
▶ Which major is most common for students enrolled in an
Introductory Statistics class?

Slide 18/96
Good graphics Descriptive statistics Desctibing Data Exercises

Types of Variables

Types of Questions

Examples: Two categorical variables


▶ What is the association between night-time lighting in early
childhood and the degree of myopia in later childhood?
▶ For dolphins, what is the association between type of activities
undertaken and time of day (am, pm, afternoon)?

Slide 19/96
Good graphics Descriptive statistics Desctibing Data Exercises

Types of Variables

Types of Questions

Examples: One numerical variable Cloud seeding and rainfall


experiment—amount of rainfall from seeded clouds.
▶ What is the average rainfall from seeded clouds?
▶ How variable is the rainfall from seeded clouds?
▶ Smallest/largest recorded rainfall from seeded clouds?
▶ Any unusual readings?

Slide 20/96
Good graphics Descriptive statistics Desctibing Data Exercises

Types of Variables

Types of Questions

Examples: One categorical and one numerical variable


▶ Does cloud seeding affect amount of rainfall? (rainfall data
collected from both seeded and unseeded clouds)
If so, how?
▶ Do different soil densities affect root growth (lengths)?

Slide 21/96
Good graphics Descriptive statistics Desctibing Data Exercises

Types of Variables

Types of Questions

Examples: Two numerical variables


▶ What is the relationship between soil water evaporation and
air speed?
▶ Can we reliably predict downstream pollution levels based on
the upstream pollution levels (downstream pollution levels
related to upstream pollution levels?)
▶ Is there a trend in the ice break-up times over years?

Slide 22/96
Good graphics Descriptive statistics Desctibing Data Exercises

Some Graphs

Graphical displays

. . . ‘a picture is worth a thousand words’. . .

This statement applies equally to displays for numerical variables,


categorical variables and combinations of variables.
The skill is in choosing an informative display!

Note:
For further discussion of the relative merits of the different graphical displays
discussed in this section see Utts and Heckard, Chapter 2, pages 35–37.

Slide 23/96
Good graphics Descriptive statistics Desctibing Data Exercises

Some Graphs

Barchart of myopia by night light

Slide 24/96
Good graphics Descriptive statistics Desctibing Data Exercises

Some Graphs

Dotchart of carbon commitments

Slide 25/96
Good graphics Descriptive statistics Desctibing Data Exercises

Some Graphs

Dotplots comparing mpox vaccination methods

Slide 26/96
Good graphics Descriptive statistics Desctibing Data Exercises

Some Graphs

Boxplots comparing mpox vaccination methods

Slide 27/96
Good graphics Descriptive statistics Desctibing Data Exercises

Some Graphs

Scatterplot of cannabidiol (CBD) oil and


back pain

Slide 28/96
Good graphics Descriptive statistics Desctibing Data Exercises

Some Graphs

Summary of Graphical Displays


numerical categorical
variables variables displays
1 0 dotplot, histogram, boxplot

0 1 table — with counts or percentages


bar chart
2 0 scatterplot
dotplot of differences
1 1 comparative dotplots, boxplots

0 2 contingency table (Row/Col %)


(comparative) bar chart (%)

Slide 29/96
Good graphics Descriptive statistics Desctibing Data Exercises

Some Graphs

Summary of Graphical Displays

numerical categorical
variables variables displays
3 0 surface plot
3-D plot
(not covered in this subject)
2 1 scatterplot — with groups

1 2 interaction plot
(see Exercise 4: Lymphocytes)
0 3 cross-tabulation
comparative bar chart
(3 variables on one axis)

Slide 30/96
Good graphics Descriptive statistics Desctibing Data Exercises

Measures of location

Mean

For n observations, let x1 denote the first observation, x2 the


second, and so on up to xn .
n
x1 + x2 + · · · + xn 1X
Sample mean: x̄ = = xi
n n
i=1

Slide 31/96
Good graphics Descriptive statistics Desctibing Data Exercises

Measures of location

Order statistics
If we arrange a set of observations x1 , x2 , . . . , xn from smallest to
largest, then the values are referred to as the order statistics and
are denoted as
x(1) ≤ x(2) ≤ · · · ≤ x(n)
Example: The first 10 values of Pulse1.

x1 , . . . , x10 64 58 62 66 64 74 84 68 62 76
x(1) , . . . , x(10) 58 62 62 64 64 66 68 74 76 84

minimum = x(1) (= 58)


maximum = x(n) (= x(10) = 84)

Slide 32/96
Good graphics Descriptive statistics Desctibing Data Exercises

Measures of location

Median

Defined in terms of the order statistics as

Sample median : M = x( n+1 )


2

What happens if n is even?

e.g. for the first 10 values of Pulse1:

The median is robust to outliers.

Slide 33/96
Good graphics Descriptive statistics Desctibing Data Exercises

Measures of location

Quartiles

The quartiles (lower and upper), together with the median,


partition a data set into four groups of (approximately) equal size.
The definitions that we use are those used in MINITAB, which may
differ from those found elsewhere.

Lower quartile: Q1 = x( 1 (n+1))


4
Upper quartile: Q3 = x( 3 (n+1))
4

Slide 34/96
Good graphics Descriptive statistics Desctibing Data Exercises

Measures of location

Calculating Quartiles

For the first 10 values of Pulse1:

Slide 35/96
Good graphics Descriptive statistics Desctibing Data Exercises

Measures of spread

Range

The range is the difference between the largest and smallest values.

(= 84 − 58 = 26 for the first 10 values of Pulse1).

The range is very sensitive to outliers.

When used in conjunction with the IQR it becomes more


informative as a measure of spread.

Slide 36/96
Good graphics Descriptive statistics Desctibing Data Exercises

Measures of spread

Interquartile range (IQR)

The Interquartile range (IQR) is the range covered by the middle


50% of the data. So, in a sense, it provides information about how
consistent the observations are with their median.

Interquartile range = IQR = Q3 − Q1


Example: For first 10 values of Pulse1: IQR = 74.5 − 62 = 12.5

Slide 37/96
Good graphics Descriptive statistics Desctibing Data Exercises

Measures of spread

Five-number summary

minimum lower quartile median upper quartile maximum


x(1) Q1 M Q3 x(10)

Slide 38/96
Good graphics Descriptive statistics Desctibing Data Exercises

Measures of spread

Standard deviation

The standard deviation measures how consistent the observations


are with their arithmetic mean (not to be confused with
consistency with the median!)

Sample standard deviation:


v
u n
u 1 X
s= t (xi − x̄)2
n−1
i=1

Slide 39/96
Good graphics Descriptive statistics Desctibing Data Exercises

Measures of spread

Standard deviation — in practice

You have $150,000 to spend on recruiting a new football player to


your team.
The only data available to you (to help you make your decision) is
the number of goals kicked per game for the past 6 games.

Player A 1 2 3 9 11 10
Player B 6 6 5 7 6 6

Which player will you buy?

Slide 40/96
Good graphics Descriptive statistics Desctibing Data Exercises

Measures of spread

Standard deviation — Nice things

▶ It is in the units of the data.


▶ For any distribution at least 75% of the data lies within two
standard deviations of the mean (Chebyshev’s Inequality).
▶ In practice, for most data sets, approximately 95% of the
data lie within two standard deviations of the mean (useful
‘rule of thumb’).
This is a better rule when data are (close to) normally
distributed.
▶ The square of the standard deviation is the variance.

Slide 41/96
Good graphics Descriptive statistics Desctibing Data Exercises

(Potential) Outliers

Outliers
2021 Australian olympic women’s rowing eight

(adapted from Utts & Heckard Ch 2)


Slide 42/96
Good graphics Descriptive statistics Desctibing Data Exercises

(Potential) Outliers

Outliers
2021 Australian olympic women’s rowing eight

(adapted from Utts & Heckard Ch 2)


Slide 43/96
Good graphics Descriptive statistics Desctibing Data Exercises

(Potential) Outliers

Identifying (Potential) Outliers

Questions to ask
▶ Is it a legitimate data value?
▶ Is it likely to be a data entry mistake?
▶ Does the individual really belong to a different group?

(Additional reading reference: Utts and Heckard, Chapter 2.6)

Slide 44/96
Good graphics Descriptive statistics Desctibing Data Exercises

(Potential) Outliers

Identifying (Potential) Outliers

Approach taken by MINITAB:


▶ IQR rule: any observations that are more than (1.5 × IQR)
above Q3 or below Q1 are marked as potential outliers on a
MINITAB boxplot
▶ some texts refer to extreme outliers as observations that are
more than 3 × IQR above Q3 or below Q1

Slide 45/96
Good graphics Descriptive statistics Desctibing Data Exercises

(Potential) Outliers

Example — Outliers in Pulse Data?

For Pulse2:

minimum lower quartile median upper quartile maximum


x(1) Q1 M Q3 x(92)

Slide 46/96
Good graphics Descriptive statistics Desctibing Data Exercises

(Potential) Outliers

For Pulse2:

IQR =
Q1 − (1.5 × IQR) =
Q3 + (1.5 × IQR) =

Slide 47/96
Good graphics Descriptive statistics Desctibing Data Exercises

(Potential) Outliers

Best Statistics to Use for Describing. . .

Slide 48/96
Good graphics Descriptive statistics Desctibing Data Exercises

Correlation

Correlation coefficient (r )

A measure of the strength and direction of the linear relationship


between two (numerical) variables. For variables X and Y :
n   
1 X xi − x̄ yi − ȳ
r=
n−1 sx sy
i=1

This can (usefully) be re-written as:


n
1 X
r= (zxi ) (zyi )
n−1
i=1

Slide 49/96
Good graphics Descriptive statistics Desctibing Data Exercises

Correlation

Correlation coefficient (r )

Properties:
▶ −1 < r < 1. Positive r indicates positive association and
negative r indicates negative association.
▶ r = −1 and r = 1 occur when the data lie on a straight line.
▶ r ≈ 0 indicates no linear relation between x and y .
▶ correlation (r ) is affected by outliers

Slide 50/96
Good graphics Descriptive statistics Desctibing Data Exercises

Correlation

Exploring correlation (r )

Correlation, r , is a scale-free measure (it has no units). This means


that changing the scale of the measured variables has no effect on
the value of the correlation.

Recall from earlier — re-expressed formula for correlation


n
1 X
r= (zxi ) (zyi )
n−1
i=1

Slide 51/96
Good graphics Descriptive statistics Desctibing Data Exercises

Correlation

more. . .

▶ x̄ and sx indicate the location and spread of the data along


the x-axis (i.e. the location and spread of the x-data)
▶ ȳ and sy indicate the location and spread of the data along
the y -axis (i.e. the location and spread of the y -data)
▶ correlation, r , measures how the points are distributed in the
x-y plane

Slide 52/96
Good graphics Descriptive statistics Desctibing Data Exercises

Correlation

Exploring correlation (r )

▶ The above plots are all of the same data set!


▶ Correlation (r ) is scale-free but scatterplots are not.
▶ Correlation needs to be interpreted hand-in-hand with a
scatterplot.
Slide 53/96
Good graphics Descriptive statistics Desctibing Data Exercises

Correlation

Anscombe’s Data

Slide 54/96
Good graphics Descriptive statistics Desctibing Data Exercises

Correlation

Datasaurus Dozen

https://www.autodeskresearch.com/publications/samestats

Slide 55/96
Good graphics Descriptive statistics Desctibing Data Exercises

Correlation

An Unusual case!

Simpson’s Paradox is an oddity that can happen when the number


of units in one category greatly outweigh the other.
e.g. Kidney stone treatment (based upon BMJ 1986;292:879–82).
Recorded is number of patients with successful outcomes, for two
different types of treatment.

Stone Size Treatment A Treatment B Total


Large 192 (78%) 55 (22%) 247 (100%)
Small 81(26%) 234(74%) 315 (100%)
Overall 273 (49%) 289 (51%) 562 (100%)

Oddity?. . . See Utts 5th edition pp123 (section 4.3) for other
examples

Slide 56/96
Good graphics Descriptive statistics Desctibing Data Exercises

Numerical data

Describing numerical data

. . . involves commenting on the


shape — symmetric/skewed, unimodal/multimodal
location — the ‘centre’ of the data
spread — variability in the values
unusual features such as ‘outliers’ or ‘groupings’
association — between (pairs of) variables

These should be supported with (appropriate) statistics, where


possible.

Slide 57/96
Good graphics Descriptive statistics Desctibing Data Exercises

Numerical data

Dotplot

Slide 58/96
Good graphics Descriptive statistics Desctibing Data Exercises

Numerical data

Histogram

Histogram of Pulse1
25

20
Frequency

15

10

0
50 60 70 80 90 100
Pulse1

Slide 59/96
Good graphics Descriptive statistics Desctibing Data Exercises

Numerical data

About Histograms

▶ Histogram a bar chart representing frequencies in ranges of


the values. To construct the classes:
1. divide the range of the data by k, the number of desired
intervals;
2. round the answer to a sensible unit;
3. choose the first class to contain the smallest observation, and
make sure intervals don’t overlap.
Choosing k: A rough rule sometimes used by computer
packages is: k ≈ log2 (n).

For example, if n = 30 take k ≈ 5. If n = 600, take k ≈ 9.

Slide 60/96
Good graphics Descriptive statistics Desctibing Data Exercises

Numerical data

Boxplot

A visual display of the five-number summary.


1. The ‘box’ is drawn between the lower and upper quartiles,
with a line at the median.
2. ‘Whiskers’ are drawn to the largest and smallest values that
are not ‘outliers’; outliers are often marked specially.
3. The quantitative variable could go on the horizontal or the
vertical axis. Horizontal boxplots are easier to read in most
cases.

Slide 61/96
Good graphics Descriptive statistics Desctibing Data Exercises

Numerical data

Boxplot — Final pulse

Slide 62/96
Good graphics Descriptive statistics Desctibing Data Exercises

Categorical data

Comments?

Slide 68/96
Good graphics Descriptive statistics Desctibing Data Exercises

Categorical data

Myopia & Night lighting

▶ A survey of 479 children


▶ Recorded current degree of myopia for each child and what
type of night lighting child slept with up until age 2.
▶ Investigated association between night time lighting in infancy
(before age 2) and eyesight (degree of myopia)
Reference: Utts and Heckard; Example 2.2 (page22)
Source: Nature 1999, Vol 399, pp113-114.

Slide 65/96
Good graphics Descriptive statistics Desctibing Data Exercises

Categorical data

Frequency Table

Slept with No Myopia Myopia High Myopia Total


Darkness 155 15 2 172
Night Light 153 72 7 232
Full Light 34 36 5 75
Total 342 123 14 479

Slide 66/96
Good graphics Descriptive statistics Desctibing Data Exercises

Categorical data

Frequency Table

Slept with No Myopia Myopia High Myopia Total


Darkness 155 15 2 172
Night Light 153 72 7 232
Full Light 34 36 5 75
Total 342 123 14 479

What does this tell us about:


▶ the incidence of Myopia, in general. . . ?
▶ the association between degree of myopia and nighttime
lighting?

Slide 66/96
Good graphics Descriptive statistics Desctibing Data Exercises

Categorical data

Tabled (row) Percentages

Slept with No Myopia Myopia High Myopia Total


Darkness 90 9 1 100
Night Light 66 31 3 100
Full Light 45 48 7 100
Total 71 26 3 100

Comments about:
▶ the incidence of Myopia. . . ?
▶ the association between degree of myopia and night time
lighting?

Slide 67/96
Good graphics Descriptive statistics Desctibing Data Exercises

Categorical data

Degree of Myopia by Night Light

Slide 69/96
Good graphics Descriptive statistics Desctibing Data Exercises

Categorical data

What is the data telling us?

Slide 70/96
Good graphics Descriptive statistics Desctibing Data Exercises

Categorical data

What is the data telling us?

▶ Overall: 29% have some degree of myopia

Slide 70/96
Good graphics Descriptive statistics Desctibing Data Exercises

Categorical data

What is the data telling us?

▶ Overall: 29% have some degree of myopia


Darkness: incidence of some degree of myopia is much lower
than expected (10% cf 29%)
Night Light: incidence is higher than expected (34% cf 29%)
Full Light: incidence of some degree of myopia is much
higher than expected (55% cf 29%)

Slide 70/96
Good graphics Descriptive statistics Desctibing Data Exercises

Categorical data

What is the data telling us?

▶ Overall: 29% have some degree of myopia


Darkness: incidence of some degree of myopia is much lower
than expected (10% cf 29%)
Night Light: incidence is higher than expected (34% cf 29%)
Full Light: incidence of some degree of myopia is much
higher than expected (55% cf 29%)
▶ Conclude: There appears to be an association between
incidence of Myopia and type of night-time lighting. Increased
night time lighting is associated with increased incidence of
(some degree of) myopia.

Slide 70/96
Good graphics Descriptive statistics Desctibing Data Exercises

Categorical data

What is the data telling us?

▶ Overall: 29% have some degree of myopia


Darkness: incidence of some degree of myopia is much lower
than expected (10% cf 29%)
Night Light: incidence is higher than expected (34% cf 29%)
Full Light: incidence of some degree of myopia is much
higher than expected (55% cf 29%)
▶ Conclude: There appears to be an association between
incidence of Myopia and type of night-time lighting. Increased
night time lighting is associated with increased incidence of
(some degree of) myopia.
▶ Does night-time lighting cause myopia?

Slide 70/96
Good graphics Descriptive statistics Desctibing Data Exercises

Relationships

One numerical and one categorical

Example: Root depth and Soil Density data

Useful Displays
▶ dotplots, histograms or boxplots of the response variable (root
depth), by the explanatory variable (soil density), using the
same scale
▶ these are often called comparative plots

Slide 71/96
Good graphics Descriptive statistics Desctibing Data Exercises

Relationships

Comparative dotplots

Slide 72/96
Good graphics Descriptive statistics Desctibing Data Exercises

Relationships

Comparative boxplots

Slide 73/96
Good graphics Descriptive statistics Desctibing Data Exercises

Relationships

Improving the boxplot

Slide 74/96
Good graphics Descriptive statistics Desctibing Data Exercises

Relationships

Summary Measures

Slide 75/96
Good graphics Descriptive statistics Desctibing Data Exercises

Relationships

Comments

Shape

Centre

Spread

Unusual?

Slide 76/96
Good graphics Descriptive statistics Desctibing Data Exercises

Relationships

Other useful summary measures

If we are just comparing two groups then:


▶ difference between means (medians)
▶ ratio of standard deviations (or IQRs)
. . . may be useful for comparing the two groups.

Slide 77/96
Good graphics Descriptive statistics Desctibing Data Exercises

Relationships

Two numeric variables

It is important to acknowledge that we have data pairs; that is,


two observations from one study unit.
This needs to be reflected in our choice of graphical display.

Useful Displays:
▶ scatterplots
▶ dotplots of differences or ratios

Slide 78/96
Good graphics Descriptive statistics Desctibing Data Exercises

Relationships

Relationship between initial and final pulse?

▶ Positive association between Pulse2 and Pulse1.


▶ Strength of the association depends upon whether or not they
Ran.
Slide 79/96
Good graphics Descriptive statistics Desctibing Data Exercises

Relationships

Describing the relationship between initial


and final pulse

▶ There is a positive linear relationship between initial and final


pulse for all students.
▶ The association is stronger in those who did not run
(r = 0.92) than for those who did not (r = 0.61).

Slide 80/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exercise 1: Test scores

Exercise 1 — Test scores

16 14 15 16 14 20 18 16 13 17 16 13 18
16 22 20 24 18 16 20 26 19 14 17 14 15
19 12 11 16 19 13 13 18 8 16 28 16 27
17 12 15 7 12 22 20 16 10 8 16 13 14
14 16 18 15 21 23 16 5 14 23 17 15 14
22 20 22 13 20 18 13 14 21 14 18 10 20
24 17 21 15 18 12 23 17 10 15 11 5 16
19 22 10 15 17 13 23 20 3 18 15 22 12
9 20 16 17 17 16 21 18 11 14 6 21 25
18 26 18 18 15

Produce appropriate displays & summaries of these data.

Slide 81/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exercise 1: Test scores

Comments

Slide 82/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exercise 2: Soil water evaporation

Exercise 2 — Soil water evaporation

To investigate how evaporation of water from soil is affected by air


flow above it, a single soil sample was used, the speed of air flow
was varied and the rate of evaporation was measured.

Slide 83/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exercise 2: Soil water evaporation

Exercise 2 — Soil water evaporation

To investigate how evaporation of water from soil is affected by air


flow above it, a single soil sample was used, the speed of air flow
was varied and the rate of evaporation was measured.

Air speed 0.5 0.5 0.5 1.0 1.0 1.0 1.5 1.5
Evaporation 5.39 4.43 5.50 7.70 6.20 6.14 5.62 6.12

AirSpeed 1.5 2.0 2.0 2.0 2.5 2.5 2.5


Evaporation 7.20 6.88 7.73 6.01 5.10 7.29 7.28
Make a graphical study of the effect of air speed on evaporation.

Slide 83/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exercise 2: Soil water evaporation

What is the data telling us?

Comments:
Generally speaking, increased evaporation of water from soil seems
to be related to increased air flow.

Suggested models Evaporation at speed(i) = mean speed level(i)


+ random error
or
Rate of Evaporation = some function of Air speed + random error

Slide 84/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exercise 3: Melon yields

Exercise 3 — Melon yields

Six plots of each of four varieties of melons gave the following


yields.
Variety
A B C D
25 40 18 28
17 35 23 29
26 32 26 33
16 37 15 32
22 43 11 30
16 37 24 28

Make a graphical study of the effect of variety on melon yields.

Slide 85/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exercise 3: Melon yields

Melon yields

Slide 86/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exercise 3: Melon yields

Melon yields

Slide 87/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exercise 3: Melon yields

Comments & Possible Model

Comments:

Possible model

Slide 88/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exercise 4: Lymphocyte counts

Exercise 4 — Lymphocyte counts


To compare the effects of four drugs (A, B, C and a placebo D) on
lymphocyte counts in mice, a randomised block design with four
mice from each of five litters was used. The lymphocyte counts
(’000s per mm3 blood) were:

Litter
1 2 3 4 5
A 7.1 6.1 6.9 5.6 6.4
B 6.7 5.1 5.9 5.1 5.8
C 7.1 5.8 6.2 5.0 6.2
D 6.7 5.4 5.7 5.2 5.3

Make a graphical study of the effect of drugs on lymphocyte


counts in mice taking account of litter effects.
Slide 89/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exercise 4: Lymphocyte counts

Lymphocyte counts

Scatterplot of Count vs Drug

7.0

6.5
Count

6.0

5.5

5.0

A B C D
Drug

Slide 90/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exercise 4: Lymphocyte counts

Lymphocyte counts

Scatterplot of Count vs Drug


Litter
1
7.0
2
3
4
5
6.5
Count

6.0

5.5

5.0

A B C D
Drug
Slide 91/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exercise 4: Lymphocyte counts

Lymphocyte counts

Count versus Drug, by Litter


Litter
7.0 1
2
3
4
5
6.5
Count

6.0

5.5

5.0

A B C D
Drug
Slide 92/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exercise 4: Lymphocyte counts

Comments

Slide 93/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exercise 4: Lymphocyte counts

Possible models

Lymphocyte count (for a mouse on Drug i) = mean for Drug(i) + error

or

Lymphocyte count (ijk) = overall mean + effect for Drug(i)


+ effect for Litter(j) + error

Slide 94/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exercise 5: Nenana Classic

Exercise 5 — Nenana Classic


▶ The Nenana river is located in Alaska, usually freezes over
during October and November.
▶ The Nenana River Ice Classic competition began in
1917. . . guessing the exact time ice on the river would break
up.
▶ Ice breakup time: A tripod, connected to an on-shore clock
with a string, is planted in two feet of river ice during river
freeze-up in October or November. The clock stops when the
tripod moves as the ice breaks up.

Slide 95/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exercise 5: Nenana Classic

Exercise 5 — Nenana Classic


▶ The Nenana river is located in Alaska, usually freezes over
during October and November.
▶ The Nenana River Ice Classic competition began in
1917. . . guessing the exact time ice on the river would break
up.
▶ Ice breakup time: A tripod, connected to an on-shore clock
with a string, is planted in two feet of river ice during river
freeze-up in October or November. The clock stops when the
tripod moves as the ice breaks up.
▶ Do the river ice breakup dates (and times) provide evidence of
climate change in the region?
You can watch the ice breaking up in this video:
https://www.youtube.com/watch?v=AV5hu-xSBqU
Slide 95/96
Good graphics Descriptive statistics Desctibing Data Exercises

Exercise 5: Nenana Classic

Exercise 5 — Nenana Classic


Do ice beak-up times provide any indication of climate change in
the region?

Slide 96/96

You might also like