SCI2010 - Week 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

SCI1020: Introduction to Statistical Reasoning

WEEK 43:
4 EXPLORING DATA: VARIABLES AND DISTRIBUTIONS

Student's Name: Tutorial Day/Time:

PRELIMINARY READING: D S Moore et al, “Basic Practice of Statistics”, Chs 1-3


On completion of this workshop you should be able to:
1. Summarize data by calculating a five-number summary or mean and standard deviation;
2. Produce and describe a histogram to show the distribution in a set of quantitative data;
3. Produce a box plot of data and compare box-plots for different sets of data;
4. Describe the shape of the Normal distribution and skewed distributions;
5. Apply the 68-95-99.7 rule to “Normally distributed data; and
6. Calculate a z-score and use the Standard Normal Probability table.

PRELIMINARY QUESTIONS:
These problems are to help you engage with the lecture material, and also to make sure that everyone is up-to-
speed before the workshop starts. Please make sure you do them before class each week!
Q.1 State in your own words what is meant by each of these terms. Be SPECIFIC.

Term Definition

Continuous or
Quantitative data

Categorical data

Distribution

Histogram

Q.2 Do Q1.24 from Moore text, 8th ed, p.37 (7th ed: p.38). Identify the variables of each
type in the scenario:
Quantitative
variables:

Categorical
variables

Q.3
a) Sketch briefly the following shapes of density curve: e.g. see Moore Ch.3.
ALSO Indicate on each the approx. positions of the mean and the median.
Skewed to right Normal distribution Skewed to left

Week 3 Copyright 2019: Monash University Page | 1


b) All Normal distribution curves show the 68-95-99.7 Rule. State what this rule indicates.

Q.4 The standardised score in a normal population, the z-score


It can be difficult to determine how extreme an individual data value is, compared to the
values in the whole population. If the population has a Normal (z) distribution of values then
this can be calculated via a standardised score… a z-score.
a) What is the formula for calculating a standardised z-score of an individual in a population?
Define all terms. (Moore, Ch.3)

WORKSHOP PROBLEMS:
Q.5 Do Q2.31 from Moore et al, 8th ed, p.69 (7th ed: p.68) Construction of a histogram
using Excel. Download the data set “Guinea pigs” for this question from Moodle/ “Part I
Exploring data” block.

To construct a histogram in Excel: See “Introduction to Excel” Section 2.4 pp. 9-10.
1. Identify the range (= max-min) of the data:
2. Identify what bin size you will use for your histogram. The x-axis of the histogram should include
approx. 8-12 equally-sized intervals for counting ... called bins (intervals), covering the range from
the min to max. You have to tell Excel what the intervals are by creating a new column in your
spreadsheet that includes the upper limit of each bin. For example, if you want your bins to be "0 to
9", "10 to 19", "20 to 29" etc, the entry (in a column) would be:
BIN 0 10 20 30 40 and so on (in a column!)
What BIN interval size is most suitable in this question?
3. Obtain the frequency distribution (histogram) by using the Excel function “Data Analysis” >
Histogram. Specify your input range (the column with the survival times in it), and the bin range
(the column with the bin limits in it that you just created) and an empty region of your spreadsheet
for the 'Output range' column. Make sure you click 'chart output' to get the histogram!
a) Draw the histogram here or attach a printout from Excel. (remember to label the axes!!)

Week 3 Page | 2
Is this histogram skewed? If so, which way? Does there appear to be any obvious outliers
in the data?

The five-number summary is:


(minimum, first quartile (Q1), median, third quartile (Q3) and maximum).
Use the ordered list of data directly and counting, OR
Use the Excel command "=QUARTILE(data range cells,Q)" where Q is the quartile wanted.
Example: =QUARTILE($A$1:$A$12,3) to find the 3rd Quartile of data in cells A1 to A12.
(Investigate Quartile in the function menu, fx).
A box plot is a graphical representation of a five-number summary for the data.

b) Calculate (by hand or excel) the five-number summary for above “Guinea pig” data.
Q Quartile Survival Time (days)

0 Minimum

1 Lower Quartile (Q1)

2 Median

3 Upper Quartile (Q3)

4 Maximum

c) ALSO: Draw the box plot, including outliers, of these data.


 Determine the inter-quartile range for Survival Time, IQR= Q3-Q1 = ______________
 Determine the 'boundaries' for the box plot. Outside these the values are outliers.
Lower boundary = Q1 - (1.5 x IQR) Upper boundary = Q3 + (1.5 x IQR)

 Are there any outliers? If so, what are their values?

Week 3 Page | 3
 Identify the last data values that are within these outlier “boundaries”. These are the
values at which the ‘whisker’ lines end.
Lowest DATA value on or inside the Lower boundary:

Highest DATA value on or inside the Upper boundary:

You are now ready to draw the box plot for the survival time of the guinea pigs in the
experiment! Sketch it below WITH labelled scale.
Note that the box-plot already on the graph is for the survival time data of rats under the same
experimental conditions. Use a ruler!
Box plot for Guinea pig survival times:

Guinea pigs

Rats

             
0 100 200 300 400 500 600

Note: MS Excel does NOT have a direct way of drawing box-plots for you.

Compare the survival times of the guinea pigs compared to the rats by interpreting this
comparative box above. Consider all four features indicated.
Position (boxes in general?):

Spread (IQRs?):

Outlier presence:

Shape of distribution:

Week 3 Page | 4
Q.6 Do Q3.7 from Moore et al, p.85. Monsoon yearly rainfall distribution.
Note given parameters: mean, µ = Standard deviation,  =
Use the 68-95-99.7 rule to draw the Normal distribution for this system with the x-axis values
in place:

a) 95% of all years will have rainfall between _______ and _______ mm.
b) The driest 2.5% of years will have rainfall less than ________ mm.

Q.7
a) Estimate your height: cm
What is your height’s z-score compared to recent Australian population data:
male average height is 178 cm with a standard deviation of 6.5 cm, and
female average height is 164 cm with a standard deviation of 6.1 cm.
b) Your z-score within your gender’s height in Australia:

c) Sketch a Standard Normal curve (x-axis is now a z-axis) and show your position in the
population using this z-score. What percentage of the population of your gender is
TALLER than you? Use the Table A of Standard Normal Probabilities: Moore et al,
8th ed, pp. 696-697 or inside back cover (7th ed, pp. 678-699).
See Moore et al, pp.88-89 for how to use this table.
NB. Table A gives the area (probability) for LESS THAN a given z-value. See the shaded
area in the diagram at the top of the table.

MARK : /10

Week 3 Page | 5

You might also like