SCI2010 - Week 4
SCI2010 - Week 4
SCI2010 - Week 4
WEEK 43:
4 EXPLORING DATA: VARIABLES AND DISTRIBUTIONS
PRELIMINARY QUESTIONS:
These problems are to help you engage with the lecture material, and also to make sure that everyone is up-to-
speed before the workshop starts. Please make sure you do them before class each week!
Q.1 State in your own words what is meant by each of these terms. Be SPECIFIC.
Term Definition
Continuous or
Quantitative data
Categorical data
Distribution
Histogram
Q.2 Do Q1.24 from Moore text, 8th ed, p.37 (7th ed: p.38). Identify the variables of each
type in the scenario:
Quantitative
variables:
Categorical
variables
Q.3
a) Sketch briefly the following shapes of density curve: e.g. see Moore Ch.3.
ALSO Indicate on each the approx. positions of the mean and the median.
Skewed to right Normal distribution Skewed to left
WORKSHOP PROBLEMS:
Q.5 Do Q2.31 from Moore et al, 8th ed, p.69 (7th ed: p.68) Construction of a histogram
using Excel. Download the data set “Guinea pigs” for this question from Moodle/ “Part I
Exploring data” block.
To construct a histogram in Excel: See “Introduction to Excel” Section 2.4 pp. 9-10.
1. Identify the range (= max-min) of the data:
2. Identify what bin size you will use for your histogram. The x-axis of the histogram should include
approx. 8-12 equally-sized intervals for counting ... called bins (intervals), covering the range from
the min to max. You have to tell Excel what the intervals are by creating a new column in your
spreadsheet that includes the upper limit of each bin. For example, if you want your bins to be "0 to
9", "10 to 19", "20 to 29" etc, the entry (in a column) would be:
BIN 0 10 20 30 40 and so on (in a column!)
What BIN interval size is most suitable in this question?
3. Obtain the frequency distribution (histogram) by using the Excel function “Data Analysis” >
Histogram. Specify your input range (the column with the survival times in it), and the bin range
(the column with the bin limits in it that you just created) and an empty region of your spreadsheet
for the 'Output range' column. Make sure you click 'chart output' to get the histogram!
a) Draw the histogram here or attach a printout from Excel. (remember to label the axes!!)
Week 3 Page | 2
Is this histogram skewed? If so, which way? Does there appear to be any obvious outliers
in the data?
b) Calculate (by hand or excel) the five-number summary for above “Guinea pig” data.
Q Quartile Survival Time (days)
0 Minimum
2 Median
4 Maximum
Week 3 Page | 3
Identify the last data values that are within these outlier “boundaries”. These are the
values at which the ‘whisker’ lines end.
Lowest DATA value on or inside the Lower boundary:
You are now ready to draw the box plot for the survival time of the guinea pigs in the
experiment! Sketch it below WITH labelled scale.
Note that the box-plot already on the graph is for the survival time data of rats under the same
experimental conditions. Use a ruler!
Box plot for Guinea pig survival times:
Guinea pigs
Rats
0 100 200 300 400 500 600
Note: MS Excel does NOT have a direct way of drawing box-plots for you.
Compare the survival times of the guinea pigs compared to the rats by interpreting this
comparative box above. Consider all four features indicated.
Position (boxes in general?):
Spread (IQRs?):
Outlier presence:
Shape of distribution:
Week 3 Page | 4
Q.6 Do Q3.7 from Moore et al, p.85. Monsoon yearly rainfall distribution.
Note given parameters: mean, µ = Standard deviation, =
Use the 68-95-99.7 rule to draw the Normal distribution for this system with the x-axis values
in place:
a) 95% of all years will have rainfall between _______ and _______ mm.
b) The driest 2.5% of years will have rainfall less than ________ mm.
Q.7
a) Estimate your height: cm
What is your height’s z-score compared to recent Australian population data:
male average height is 178 cm with a standard deviation of 6.5 cm, and
female average height is 164 cm with a standard deviation of 6.1 cm.
b) Your z-score within your gender’s height in Australia:
c) Sketch a Standard Normal curve (x-axis is now a z-axis) and show your position in the
population using this z-score. What percentage of the population of your gender is
TALLER than you? Use the Table A of Standard Normal Probabilities: Moore et al,
8th ed, pp. 696-697 or inside back cover (7th ed, pp. 678-699).
See Moore et al, pp.88-89 for how to use this table.
NB. Table A gives the area (probability) for LESS THAN a given z-value. See the shaded
area in the diagram at the top of the table.
MARK : /10
Week 3 Page | 5