0% found this document useful (0 votes)
13 views61 pages

Data Representation Interpretation

Uploaded by

alexaserafimides
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views61 pages

Data Representation Interpretation

Uploaded by

alexaserafimides
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 61

Please pick up new

booklet
Data Representation
&
Interpretation
(Statistics)
Year 10 Mathematics
Data Representation
&

Interpretation

Name:
Checklist
Types of Data Graphing with
Mean Technology
Median Calculations with
Mode Technology
Range Standard Deviation
Outliers Bivariate Numerical
Quartiles Data
Scatterplots
IQR
Line of Best Fit & r2
Histograms
Boxplots
https://www.did-you-knows.com/did-you-know-facts/statistics.php
Displaying statistics

https://www.statista.com/chart/12975/top-ten-movies-by-first-weekend-release-box-office-revenue/
Displaying statistics

https://www.statista.com/statistics/262926/box-office-revenue-of-the-most-successful-
movies-of-all-time/
Top Ten Lists …
According to …
Data Collection
Populations: The whole
population is surveyed.

Sample: Only a sample of


the population is surveyed.
Types of Data
Discrete Data or a discrete variable takes
a numerical exact value.
This is usually the result of counting.

Continuous data or a continuous variable


takes a numerical value within a certain
range.
This is usually the result of measuring.
Types of Data…Example
Discrete Continuous
The number of runners The race time of
in a race. runners in a race.
Finding Quartiles
 Quartiles divide the data into quarters.
 We can find the upper and lower quartile by finding the
median, then finding median of the upper and lower
halves.
 REMEMBER: The data must be IN ORDER before
finding the median and quartiles.

5 Number summary
- Minimum
- Q1
- Median
- Q3
- Maximum
Find the 5 number summary for the
following data set:

4, 2, 3, 5, 2, 4, 6, 7, 8, 1, 3, 4, 2

1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 6, 7, 8

Minimum Q1 = 2 Median Q3 = (5+6)/2 = 5.5 Maximum


Statistical Measurements
on the Graphic Calculator
 Turn calculator on
 Select MENU
 Select STAT, EXE
 Delete all lists if old data is present, F6 then DEL-A (F4), EXE
 Enter data in List 1
122236678
 Select CALC (F2) then 1VAR
 Select SET (F6), check “1VAR XList :List1”, EXE, EXIT
 Back to CALC (F2), then 1VAR
 Scroll down to find all relevant calculations.
Enter this data into the Graphic Calculator …
min Median max

122236678
Q1 Q3

… record these statistical measurements.


IQR & Standard Deviation?
Consider the data set 11, 23, 26, 29, 13

11 13 23 26 29

• The IQR measure the spread of the middle 50%. IQR = 26 – 13 = 13

• Standard Deviation is a better version of this. It measures the middle


68% of the data set.
What is Standard Deviation?
• Standard deviation is a similar measurement to Inter Quartile Range (IQR).
• It is measure of the variation or spread of a set of values.
• is the population standard deviation, is the sample standard deviation
• The standard deviation is a more detailed measure of the spread of the
distribution.
• It tells us to what degree the data points deviate from the mean.
Standard deviation

• The standard deviation is a more detailed


measure of the spread of the distribution.

• It tells us to what degree the data points


deviate from the mean.
2kg 3.8kg 5.6kg 7.4kg 9.2kg
0.2 11kg
kg average weight of a pumpkin is 5.6kg with a standard deviation of 1.8kg
Approximate
How to Calculate Standard Deviation?
How to Calculate Standard Deviation?
How to Calculate Standard Deviation?

means “the sum of”

So the range is 18, but the middle 68% of the data values lie in a range of 7.14.
Enter this data into the Graphic Calculator …
min Median max

122236678
Q1 Q3

… record these statistical measurements. Results

* later
Try this one. Enter this data into the Graphic Calculator …
2 3 2 7 9 1 2 6 17 8
Sort first. Enter into List 1 and sort (F6, TOOL (F1), SRT-A, etc…

… record these statistical measurements.


Try this one. Enter this data into the Graphic Calculator …
2 3 2 7 9 1 2 6 17 8
Sort first. Enter into List 1 and sort (F6, TOOL (F1), SRT-A, etc…

… record these statistical measurements. Results


Too Many Cats?
Statistical Case Study Example: Frequency Table
Residence of a western suburb complained to the council of too many cats. The council
decided to conduct a survey. A number of streets in that suburb were selected to
determine whether cats were a problem in that suburb. This western suburb council then
asked an eastern side suburb for their survey results of a similar survey.
The results of the two surveys are shown below in these frequency tables.
Western Suburbs Survey Results Eastern Suburbs Survey Results
Number of cats Tally Frequency Number of cats Tally Frequency
4 0 3 0
5 | 1 4 || 2
6 || 2 5 |||| 4
7 ||| 3 6 |||| | 6
8 || 2 7 ||| 3
9 0 8 | 1
10 0 9 0
11 1 1 10 0
Total 9 Total 16
Western Suburb Survey Results:
Number of Cats
Use the graphics calculator to calculate and record
the following statistical measurements.
Western Suburb Survey Results:
Number of Cats
Use the graphics calculator to calculate and record
the following statistical measurements. Results
Using desmos.com for Boxplots: Western
Log on to https://www.desmos.com/calculator/avntywmp7c
Or search “Boxplot – Desmos”

Number of cats Data: 5, 6, 6, 7, 7, 7, 8, 8, 11

Number of Cats: Western


Using Excel for Histograms: Western
Eastern Suburb Survey Results:
Number of Cats
Use the graphics calculator to calculate and record
the following statistical measurements.
Eastern Suburb Survey Results:
Number of Cats
Use the graphics calculator to calculate and record
the following statistical measurements. Results
Using desmos.com for Boxplots
Log on to https://www.desmos.com/calculator/avntywmp7c
Or search “Boxplot – Desmos”

Number of cats Data: 4,4,5,5,5,5,6,6,6,6,6,6,7,7,7,8

Number of Cats: Eastern


Using Excel for Histograms: Eastern
Limitations
• One limitation to this statistical investigation is the size of the survey
sample. Both sample sizes, 9 for the western suburbs and 16 for the eastern
suburbs may be considered small for a statistical investigation. This means
that results may not be definitive.

• A second limitation to this investigation is the difference in the two sample


sizes. The results may be considered questionable in terms of fairness and
consistency.

• A third limitation …

• Another limitation is …
Distribution of Data: Symmetrical Vs Skewed
This graph is typical of a data
set that displays a
symmetrical distribution.
Note the position of the mode.
Distribution Curve & Standard Deviation
𝑭𝒐𝒓 𝒘𝒆𝒔𝒕𝒆𝒓𝒏 𝒔𝒖𝒃𝒖𝒓𝒃𝒔 𝒄𝒂𝒕𝒔 …
𝒙 =𝟕 . 𝟐𝟐
𝒔=𝟏 . 𝟕𝟐

≈ 𝟔𝟖 %

𝒙−𝒔 𝒙 𝒙 +𝒔
Distribution Curve & Standard Deviation
𝑭𝒐𝒓 𝒆𝒂𝒔𝒕𝒆𝒓𝒏 𝒔𝒖𝒃𝒖𝒓𝒃𝒔 𝒄𝒂𝒕𝒔 …
𝒙 =𝟓 . 𝟖𝟏
𝒔=𝟏 . 𝟏𝟏

≈ 𝟔𝟖 %

𝒙−𝒔 𝒙 𝒙 +𝒔
Desmos
https://www.desmos.com/calculator
Desmos:
For Sketching a Normal Distribution Curve
For example, a normal distribution curve where …
=2
= 0.7
How To Display Group Data
The following is a list of marks from a year 8 science test.
27 31 33 38 38 39 41 42 42 43 45 49 50 50 51 52
52 55 58 59 61 62 62
Outliers and Data
- Outliers are data points which are
significantly different to those in the overall
data set.
- Outliers may be a true natural variation or
have arisen due to an error.
- Outliers which have arisen due to errors
should be discarded.
- If an outlier is a genuine data value we have
to carefully consider whether to retain or
discard the value and the impact on the
conclusions we can draw from the data set.
Measures of centre vs outliers
• The mean is a non-resistant measure of center,
this means that extreme outliers can have a
significant impact on the mean.
• The median is a resistant measure of center.
This means that that extreme outliers have no or
minimal impact on the median.
• The mode is a resistant measure of center.
Outliers have no impact on the mode.
Example: The house sales for a particular
suburb over the last month are recorded below:
$ 763,500 $ 670,600 $ 660,570 $ 799,000 $ 655,700 $ 686,800
$ 741,420 $ 699,900 $ 686,165 $ 731,100 $ 765,240 $ 799,590

1. Calculate the mean and median sale price for the suburb for the month.

Mean = 8,659,585 /12 = $721,632


$655,700 $660,570 $670,600 $686,165 $686,800 $699,900 | $731,100 $741,420 $763,500 $765,240 $799,000 $799,590
Median = (699,900 + 731,100)/2 = 715,500

2. It turns out that one more property sold in the suburb, an old house on a
very large block, which was bought for a developer for $2,150,000.
Recalculate the mean and median for the suburb.
Recalculated mean = $831,507
Recalculated median = $731,100
3. Explain how the outlier affected the mean
and median value.
The outlier had a much bigger impact on the mean than the median.
The outlier increased the median sale price by $15,800 to $731,300
while the mean increased by $109,874.46 to $831,507. We can see that
the changed median is still close to the center of the data set, while the
mean has moved outside of the data set, meaning it is higher than any
of the sale prices in the suburb except for the outlier.
4. If you were wishing to describe the average
home value for this suburb, which number
would you choose? The mean or the median?
Why?
Using the median is more appropriate in this case as it gives a much truer
picture of the average house price. If we were to use the mean we are using a
value that is higher than 12/13 sales in the suburb. This is not a true
representation of the average sale price of a home in that suburb and not what
people could genuinely expect if they were to sell their home.
Group Data, Midpoint, Graphics Calculator

Then select SET

Same as
Bivariate Data
When analysing data, it is often necessary to
know whether one variable is related to another.

To analyse the relation between variables, one


variable needs to be labelled as the dependent
and the other is labelled as the independent (often
time).
Scatter Plots
Consider the following scatter plot that shows the
age and work done by 5 employees in a small
business.

Which person is the oldest?


E
Which person did the most work?
B and E
Which person did the least work?
A
Correlation
Correlation is a measure of strength of the
relationship between two variables.
Positive or Negative Correlations

Positive Correlation Negative Correlation No Correlation

Strength of Correlations

Strong Moderate Weak


Correlation: Some Examples

https://www.econometrics-with-r.org/3-7-scatterplots-sample-covariance-and-sample-correlation.html
Correlation: Some Examples

https://1.bp.blogspot.com
Bivariate Data Case Study
A researcher investigated and tabulated the mean average temperature of
twelve cities around the world and their elevation above sea level.
Describe a likely intention of this research and investigation?

Elevation (m) 600 850 150 300 100 200 500 450 750 30 300 50

Mean
Temperature 15 10 16 15 25 20 21 19 9 27 22 28
(oC)
Bivariate Data & Scatterplot
Using Excel, enter the data onto a table as shown below and construct a
scatterplot.

Construct the table, highlight table, Insert Scatterplot, Add axes labels.
Bivariate Data & Line of Best Fit
Insert a line of best fit (trend line).
Bivariate Data, Equation of Line & R2
Double click on the line of best fit.
On the right, tick Display Equation on chart, tick Display R squared on chart.

𝟐
𝑺𝒊𝒏𝒄𝒆 𝒓 =𝟎 . 𝟔𝟗𝟐𝟕 𝒕𝒉𝒆𝒏, 𝒓 =𝟎 . 𝟖𝟑𝟐𝟑
Finding r2 using the graphics
calculator
2. Hit F3 for REG (regression) then F1 for x
(linear)

3. Read the required value the results


screen and interpret

58
R2 & R (Correlation Coefficient)
R (or ) is the Correlation Coefficient. In summary …
• ranges from -1 to +1
• The closer is to 1 the stronger the correlation.

VALUE STRENGTH OF ASSOCIATION


r2 = 0 No Correlation

0  r 2  .25 Very Weak Correlation

.25  r 2  .50 Weak Correlation

.50  r 2  .75 Moderate Correlation

.75  r 2  .90 Strong Correlation

.90  r 2  1 Very Strong Correlation

r 2 =1 Perfect Correlation

59
R2 & R (Correlation Coefficient)
R (or ) is the Correlation Coefficient. In summary …
• ranges from -1 to +1
• The closer is to 1 the stronger the correlation.
Pearson’s Coefficient

You might also like