0% found this document useful (0 votes)
30 views

Lecture 4

1) Variability refers to how different or similar the values in a data set are from each other. The more dissimilar the values, the higher the variability. 2) Range is a simple measure of variability that tells the span between the highest and lowest values. However, it does not fully capture variability. 3) Variance and standard deviation are better measures that account for how far all values are from the mean by squaring the differences. This prevents values from cancelling out.

Uploaded by

addis zewd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Lecture 4

1) Variability refers to how different or similar the values in a data set are from each other. The more dissimilar the values, the higher the variability. 2) Range is a simple measure of variability that tells the span between the highest and lowest values. However, it does not fully capture variability. 3) Variance and standard deviation are better measures that account for how far all values are from the mean by squaring the differences. This prevents values from cancelling out.

Uploaded by

addis zewd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Variation

• Variability: The extent numbers in a data set are


dissimilar (different) from each other.
• When all elements measured receive the same
scores (e.g., everyone in the data set is the same
age, in years), there is no variability in the data set.
• As the scores in a data set become more
dissimilar, variability increases.
Variation: Range
• The range tells us the span over which the data are
distributed, and is only a very rough measure of
variability.
• Range: The difference between the maximum and
minimum scores.
Example: The youngest student in a class is 19 and the
oldest is 46. Therefore, the age range of the class is 46
– 19 = 27 years.
X X X
5 0.00 This is an example of data
5 0.00 with NO variability
5 0.00
5 0.00
5 0.00

 X= 25 n=5 X =5
X X X
6 +1.00 This is an example of data
4 -1.00 with low variability
6 +1.00
5 0.00
4 -1.00

 X= 25 n=5 X =5
X X X
8 +3.00 This is an example of data
1 -4.00 with higher variability
9 +4.00
5 0.00
2 -3.00

 X= 25 n=5 X =5
Note:
• Let’s say we wanted to figure out the average
deviation from the mean. Normally, we would want
to sum all deviations from the mean and then divide
by n, i.e.,
 X  X 
n

• BUT: We have a problem. ( X  X ) will always add


up to zero
• However, if we square each of the deviations from
the mean, we obtain a sum that is not equal to
zero.
• This is the basis for the measures of variance and
standard deviation, the two most common
measures of variability of data.
X XX X  X 
2

8 +3.00 9.00
1 -4.00 16.00
9 +4.00 16.00
5 0.00 0.00
2 -3.00 9.00
 X = 25  X  X  = 0.00
 
2 = 50.00
XX

Note: The  X  X 2 is called the Sum of Squares.


Variance of a Population
• VARIANCE OF A POPULATION: the sum of
squared deviations from the mean divided by the
number of scores (sigma squared):

 X   
2
  2

n
Population Standard Deviation
Square root of the variance 2

 X   
2

n
Sample Variance
• The sum of squared deviations from the mean
divided by the number of degrees of freedom (an
estimate of the population variance, n-1)

s 
2  X x  2

n 1
Sample Standard Deviation
• Square root of the variance s2

s  X  x  2

n 1
Why use Standard Deviation and not
Variance!??!
• Normally, you will only calculate variance in
order to calculate standard deviation, as standard
deviation is what we typically want.

• Why? Because standard deviation expresses


variability in the same units as the data.

• Example: Standard deviation of ages in a class is


3.7 years.
Degrees of Freedom
• Degrees of Freedom: The number of
independent observations, or, the number of
observations that are free to vary.
• In our data example above, there are 5
numbers that total 25 (  X = 25, n = 5)
Degrees of Freedom
• Many combinations of numbers can total 25, but only the
first 4 can be any value.
• The 5th number cannot vary if  X = 25
• This example has 4 degrees of freedom, as four of the
five numbers are free to vary.
• Sample standard deviation usually underestimates
population standard deviation.
• Using n-1 in the denominator corrects for this and gives
us a better estimate of the population standard deviation.
Normal Distribution
• The normal distribution is a theoretical
distribution.
• “Normal” does not mean typical or average, it is a
technical term given to this mathematical
function.
• The normal distribution is unimodal and
symmetrical, and is often referred to as the Bell
Curve.
Normal Distribution

Mean
Median
Mode
Normal Distribution
• We study the normal distribution because many
naturally occurring events yield a distribution
that approximates the normal distribution.
Properties of Area Under the Normal
Distribution
• One of the properties of the Normal Distribution
is the fixed area under the curve.
• If we split the distribution in half, 50% of the
scores of the sample lie to the left of the mean (or
median, or mode), and 50% of the scores lie to
the right of the mean (or median, or mode).
• The mean, median, and mode always cut the
Normal Distribution in half, and are equal since
the Normal Distribution is unimodal and
symmetrical.
50% of 50% of
scores scores

Mean, Median, Mode


• The entire area under the normal curve can be
considered to be a proportion of 1.0000.
• Thus, half, or .5000 of the scores lie in the bottom
half (i.e., left of the mean) of the distribution, and
half, or .5000 of the scores lie in the top half (i.e.,
right of the mean).
.5000 of .5000 of
scores scores

Mean, Median, Mode


Z-scores
• Z-Scores (or standard scores) are a way of
expressing a raw score’s place in a distribution.

• Z-score formula:

X 
z

• The mean  and standard deviation  are
always notated in Greek letters.

• Z-scores only reflect the data points’ position relative to


the overall data set (so you’re now considering the data
as a population, as you’re not looking to infer to a
greater population).

• This means use the population formula for standard


deviation rather than the sample formula whenever you
calculate Z.
• A z-score is a better indicator of where your score
falls in a distribution than a raw score.
• A student could get a 75/100 on a test (75%) and
consider this to be a very high score.
• If the average of the class marks is 89 and the
(population) standard deviation is 5.2, then the z-score
for a mark of 75 would be:
 89 
X 
= = 5.2

z = (75-89)/5.2 z
z = (-14)/5.2
z = -2.69

• This means that a mark of 75% is actually 2.69
standard deviations BELOW the mean.
• The student would have done poorly on this test,
as compared to the rest of the class.
• z = 0 represents the mean score (which would be
89 in this example).
• z < 0 represents a score less than the mean (which
would be less than 89).
• z > 0 represents a score greater than the mean
(which would be greater than 89).
• A z-score expresses the position of the raw score
above or below the mean in standard deviation
sized units.
• E.g.,
z = +1.50 means that the raw score is 1 and one-half
standard deviations above the mean.
z = -2.00 means that the raw score is 2 standard
deviations below the mean.
Z-score Example
• If you write two exams, in Math and English, and
get the following scores:

Math 70% (class = 55,  = 10)
English 60% (class  = 50,  = 5)
• Which test mark represents the better performance
(relative to the class)?
• Math mark:
z = (70-55)/10
z = +1.50
• English mark:
z = (60-50)/5 X 
z = +2.00 z

Z-score Example Illustration

Mean
Z=0.00 Z=1.50 Z=2.00
The Answer
• Because: Z = +2.00 is greater than Z = +1.50, the
English class mark of 60% reflects a better
performance relative to that class than does the
Math class mark of 70%.
Z-score: Solving for X
• The z-score formula can be rearranged to solve
for X:

X   X  (z)( )  
z

• This formula is used when you know the z-score
of a data point, and want to solve for the raw
score.
Example
• E.g., if a class midterm exam has  = 65 and  = 5,
what exam mark has a z-score value of 1.25?
X = (1.25)(5) + 65
X  (z)( )   = 6.25 + 65
= 71.25

So, a person whose test is 1.25 standard deviations above the


mean obtained a score of 71.25%.
Skew Distributions
• Outliers skew distributions.
• If group has one high score,
the curve has a positive
skew (contains more low
scores)
• If a group has a low outlier,
the curve has a negative
skew (contains more high
scores)

You might also like