Data Representation Interpretation
Data Representation Interpretation
booklet
Data Representation
&
Interpretation
(Statistics)
Year 10 Mathematics
Data Representation
&
Interpretation
Name:
Checklist
Types of Data Graphing with
Mean Technology
Median Calculations with
Mode Technology
Range Standard Deviation
Outliers Bivariate Numerical
Quartiles Data
Scatterplots
IQR
Line of Best Fit & r2
Histograms
Boxplots
https://www.did-you-knows.com/did-you-know-facts/statistics.php
Displaying statistics
https://www.statista.com/chart/12975/top-ten-movies-by-first-weekend-release-box-office-revenue/
Displaying statistics
https://www.statista.com/statistics/262926/box-office-revenue-of-the-most-successful-
movies-of-all-time/
Top Ten Lists …
According to …
Data Collection
Populations: The whole
population is surveyed.
5 Number summary
- Minimum
- Q1
- Median
- Q3
- Maximum
Find the 5 number summary for the
following data set:
4, 2, 3, 5, 2, 4, 6, 7, 8, 1, 3, 4, 2
1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 6, 7, 8
122236678
Q1 Q3
11 13 23 26 29
So the range is 18, but the middle 68% of the data values lie in a range of 7.14.
Enter this data into the Graphic Calculator …
min Median max
122236678
Q1 Q3
* later
Try this one. Enter this data into the Graphic Calculator …
2 3 2 7 9 1 2 6 17 8
Sort first. Enter into List 1 and sort (F6, TOOL (F1), SRT-A, etc…
• A third limitation …
• Another limitation is …
Distribution of Data: Symmetrical Vs Skewed
This graph is typical of a data
set that displays a
symmetrical distribution.
Note the position of the mode.
Distribution Curve & Standard Deviation
𝑭𝒐𝒓 𝒘𝒆𝒔𝒕𝒆𝒓𝒏 𝒔𝒖𝒃𝒖𝒓𝒃𝒔 𝒄𝒂𝒕𝒔 …
𝒙 =𝟕 . 𝟐𝟐
𝒔=𝟏 . 𝟕𝟐
≈ 𝟔𝟖 %
𝒙−𝒔 𝒙 𝒙 +𝒔
Distribution Curve & Standard Deviation
𝑭𝒐𝒓 𝒆𝒂𝒔𝒕𝒆𝒓𝒏 𝒔𝒖𝒃𝒖𝒓𝒃𝒔 𝒄𝒂𝒕𝒔 …
𝒙 =𝟓 . 𝟖𝟏
𝒔=𝟏 . 𝟏𝟏
≈ 𝟔𝟖 %
𝒙−𝒔 𝒙 𝒙 +𝒔
Desmos
https://www.desmos.com/calculator
Desmos:
For Sketching a Normal Distribution Curve
For example, a normal distribution curve where …
=2
= 0.7
How To Display Group Data
The following is a list of marks from a year 8 science test.
27 31 33 38 38 39 41 42 42 43 45 49 50 50 51 52
52 55 58 59 61 62 62
Outliers and Data
- Outliers are data points which are
significantly different to those in the overall
data set.
- Outliers may be a true natural variation or
have arisen due to an error.
- Outliers which have arisen due to errors
should be discarded.
- If an outlier is a genuine data value we have
to carefully consider whether to retain or
discard the value and the impact on the
conclusions we can draw from the data set.
Measures of centre vs outliers
• The mean is a non-resistant measure of center,
this means that extreme outliers can have a
significant impact on the mean.
• The median is a resistant measure of center.
This means that that extreme outliers have no or
minimal impact on the median.
• The mode is a resistant measure of center.
Outliers have no impact on the mode.
Example: The house sales for a particular
suburb over the last month are recorded below:
$ 763,500 $ 670,600 $ 660,570 $ 799,000 $ 655,700 $ 686,800
$ 741,420 $ 699,900 $ 686,165 $ 731,100 $ 765,240 $ 799,590
1. Calculate the mean and median sale price for the suburb for the month.
2. It turns out that one more property sold in the suburb, an old house on a
very large block, which was bought for a developer for $2,150,000.
Recalculate the mean and median for the suburb.
Recalculated mean = $831,507
Recalculated median = $731,100
3. Explain how the outlier affected the mean
and median value.
The outlier had a much bigger impact on the mean than the median.
The outlier increased the median sale price by $15,800 to $731,300
while the mean increased by $109,874.46 to $831,507. We can see that
the changed median is still close to the center of the data set, while the
mean has moved outside of the data set, meaning it is higher than any
of the sale prices in the suburb except for the outlier.
4. If you were wishing to describe the average
home value for this suburb, which number
would you choose? The mean or the median?
Why?
Using the median is more appropriate in this case as it gives a much truer
picture of the average house price. If we were to use the mean we are using a
value that is higher than 12/13 sales in the suburb. This is not a true
representation of the average sale price of a home in that suburb and not what
people could genuinely expect if they were to sell their home.
Group Data, Midpoint, Graphics Calculator
Same as
Bivariate Data
When analysing data, it is often necessary to
know whether one variable is related to another.
Strength of Correlations
https://www.econometrics-with-r.org/3-7-scatterplots-sample-covariance-and-sample-correlation.html
Correlation: Some Examples
https://1.bp.blogspot.com
Bivariate Data Case Study
A researcher investigated and tabulated the mean average temperature of
twelve cities around the world and their elevation above sea level.
Describe a likely intention of this research and investigation?
Elevation (m) 600 850 150 300 100 200 500 450 750 30 300 50
Mean
Temperature 15 10 16 15 25 20 21 19 9 27 22 28
(oC)
Bivariate Data & Scatterplot
Using Excel, enter the data onto a table as shown below and construct a
scatterplot.
Construct the table, highlight table, Insert Scatterplot, Add axes labels.
Bivariate Data & Line of Best Fit
Insert a line of best fit (trend line).
Bivariate Data, Equation of Line & R2
Double click on the line of best fit.
On the right, tick Display Equation on chart, tick Display R squared on chart.
𝟐
𝑺𝒊𝒏𝒄𝒆 𝒓 =𝟎 . 𝟔𝟗𝟐𝟕 𝒕𝒉𝒆𝒏, 𝒓 =𝟎 . 𝟖𝟑𝟐𝟑
Finding r2 using the graphics
calculator
2. Hit F3 for REG (regression) then F1 for x
(linear)
58
R2 & R (Correlation Coefficient)
R (or ) is the Correlation Coefficient. In summary …
• ranges from -1 to +1
• The closer is to 1 the stronger the correlation.
r 2 =1 Perfect Correlation
59
R2 & R (Correlation Coefficient)
R (or ) is the Correlation Coefficient. In summary …
• ranges from -1 to +1
• The closer is to 1 the stronger the correlation.
Pearson’s Coefficient