Statistical Concepts
Statistical Concepts
2) Click the Try CODAP button in the top right corner to open CODAP in a new
tab.
3) In the ‘What Would you Like to Do?’ box, select ‘Open Document or Browse
Examples’, select ‘Getting started with CODAP’, then click Open.
4) Complete the five basic CODAP tasks listed on the screen. If you need
assistance with a task, click ‘Show me’. You’re finished with these tasks when all
task checkboxes have been checked.
5) Next you will add a ‘Body Mass Index’ attribute to the case table to learn how to
add an attribute based on a formula.
● Resize the table by dragging the right edge or a lower corner until you can see
all nine attributes.
● Add a new attribute to this table by clicking the grey plus button in the top right
corner of the table. Make sure the table is selected to see this option.
Type ‘BMI’ then press Enter to name the new attribute.
● Click on the BMI attribute heading, then select ‘Edit Formula’. Enter the
formula Mass/Height^2. You can find attribute names, like ‘Mass’ and
‘Height’, under the ---Insert Value--- button.
7) Select those points in the graph by dragging a rectangle around them and look
at the table to see what mammals they are. CODAP’s representations are linked
dynamically, so if you select items in one representation they are automatically
selected in all other representations.
8) Hide Selected Cases by clicking the Eye icon in the Graph Menu (make sure
you have the graph selected for the menu to appear). This will remove those two,
selected outliers from the graph.
9) Rescale the graph by moving the mouse to the x-axis where it changes to a
hand icon, then dragging to the right.
● Save a File to Google Drive by clicking on the menu in the top left corner
of the header bar, selecting ‘Save…’, selecting the Google Drive tab in the
prompt box (second option), then following the Google Drive dialogue.
● Save a File to a shared URL by clicking on the menu in the top left corner
of the header bar, selecting ‘Share…’, selecting ‘Get link to shared view’,
enabling sharing, then copying the displayed URL. To save additional work
done after initially saving a fi le, select ‘Share… > Update shared view’.
Initial Terminology
INTRODUCING CRITICAL STATISTICAL LITERACY
What is statistics?
“Statistics is the science of learning from data and of measuring, controlling, and
communicating uncertainty.” American Statistical Association (ASA)
“Statistics has three primary components: How best can we collect data? How
should it be analyzed? And what can we infer from the analysis?” (Diez, et al.,
2015)
What questions from current events or from your own life can you think of
that could be answered by collecting and analyzing data?
The Statistical Investigation Cycle
GAISEIIPreK-12_Full.pdf
Population and Sample
Descriptive Inferential
Statistics Statistics
“Consists of methods for “Consists of methods
organizing, displaying, that use sample results
and describing data by to help make decisions
using tables, graphs, and or predictions about a
summary measures”. population”.
Categorical
Nominal Party
afiliation
VARIABLE
Discrete # of
siblings
Numerical
Continuous Height
“It is a characteristic or measurement that can be determined for each member of a population”.
What kind of variable do you think is “phone number”?
Measures of Central Tendency
Example
Data about students’ height (in cm) from a classroom
Data set
(195,170,165,165,160) Sample size (n) = 5
165
n is even n is odd
Average of (n + 1) ÷ 2 and n ÷ 2) + 1 (n + 1) ÷ 2
Variable
Case
DATA MATRIX
Bar Graph
Ap Kiwifru Bluebe
Fruit: Orange Banana Grapes
ple it rry
Peop
35 30 10 25 40 5
le:
-In a bar graph, the length of the bar for each
category represents the number of observations
in each category (frequency).
- Activity 1
- Activity 2
- Activity 3
For the data set (195,170,165,165,160)
Dispersion
Some Measures of dispersion
Ex.2
Dataset
3
5
6
8
11
14
17
24
Dataset
3
5
6
8
11
14
17
200
Q1 represents the first quartile, which is the 25th percentile, and is the
median of the smaller half of the data set.
We calculate the variability in the data using the range of the middle
50% of the data:
Q3 - Q1, interquartile range (IQR, for short).
Box plots
Q2= 15 Q3-Q1= 25 - 7= 18
Q1= 7
Q1 - 1.5*IQR = -20
Q3= 25
Q3 + 1.5*IQR = 52
min= 5
max= 80
Example 2
Consider the following data set:
Q1= 3
Q1 - 1.5*IQR = -19.5
Q3= 18 Q3 + 1.5*IQR = 40.5
min= 1 max= 20
Rules of thumb for identifying outliers
Are these variables associated? How would you describe the association? Who is affecting whom?
Explanatory and response variable
Might affect
Explanatory variable(s) Response variable
(Independent variable) (Dependent variable)
https://isaim2018.cs.ou.edu/papers/ISAIM2018_Deebani_Kachouie.pdf
Simple Linear Regression Model
Lea (1965) discussed the relationship between mean annual temperature and a mortality index for a type
of breast cancer in women. The data taken from certain regions of Great Britain, Norway, and Sweden,
consist of the mean annual temperature (in degrees Fahrenheit), and a mortality index for neoplasms of
the female breast.
What should be the first step in analyzing any possible relationship between mean annual temperature and
mortality index?
Let’s make a scatter plot
What is this plot revealing?
ei = yi − 𝑦𝑖, i = 1, . . . , N,
𝑀 = −21.79 + 2.36 𝑇
Β1= -2.36
Besides plotting, Pearson (r) correlation is a size effect measure that can be used to assess the linear
relationship between two variables, and the direction of it.
1
𝑟= 𝑍 𝑍
𝑛−1
x= c(102.5, 104.5, 100.4, 95.9, 87, 95, 88.6, 89.2, 78.9, 84.6, 81.7, 72.2, 65.1, 68.1, 67.3, 52.5)
> y= c(51.3, 49.9, 50, 49.2, 48.5, 47.8, 47.3, 45.1, 46.3, 42.1, 44.2, 43.5, 42.3, 40.2, 31.8, 34)