Midterm Psych

Download as pdf or txt
Download as pdf or txt
You are on page 1of 84

Summary of Measures

Summary of Measures

Central Tendency Quartile Variation

Mean Mode
Median Range Coefficient of
Variation
Variance

Standard Deviation
2

Measures of
Variation
3

Measures of Variation
How Can We Measure Variability?

• Range
• Variance
• Standard Deviation
• Coefficient of Variation
• Chebyshev’s Theorem
• Empirical Rule (Normal)
Example for Range : Outdoor Paint
Two experimental brands of outdoor paint are
tested to see how long each will last before
fading. Six cans of each brand constitute a small
population. The results (in months) are shown.
Find the mean and range of each group.
Brand A Brand B
10 35
60 45
50 30
30 35
40 40
20 25

Bluman, Chapter 3 4
Example: Outdoor Paint
Brand A Brand B   X

210
 35
10 35 Brand A: N 6
60 45 R  60  10  50
50 30
30 35
  X

210
 35
40 40 Brand B: N 6
20 25
R  45  25  20

The average for both brands is the same, but the range
for Brand A is much greater than the range for Brand B.

Which brand would you buy?


5
Variance

•Important Measure of Variation


•Shows Variation About the Mean:
•For the Population: 
 iX   
2

 
2

N
•For the Sample: 
 ix  x  2

s 
2

n 1
For the Population: use N in the For the Sample : use n - 1 in
denominator. the denominator.
Standard Deviation

•Important Measure of Variation


•Shows Variation About the Mean:
•For the Population: 
 iX    2


N

•For the Sample: 


 ix  x  2

s
n 1
8
\

Measures of Variation: Variance &


Standard Deviation
• The variance is the average of the
squares of the distance each value is
from the mean.
• The standard deviation is the square
root of the variance.
• The standard deviation is a measure of
how spread out your data are.
Uses of the Variance and Standard Deviation

•To determine the spread of the data.


•To determine the consistency of a variable.
•To determine the number of data values that
fall within a specified interval in a
distribution (Chebyshev’s Theorem).
•Used in inferential statistics.
9
Variance
Sample Variance
Example 3-21: Outdoor Paint
Find the variance and standard deviation for the
data set for Brand A paint. 10, 60, 50, 30, 40, 20

 X  
2

Months, X µ X – µ (X – µ) 2
 2

n
10 35 –25 625
1750
60 35 25 625 
50 35 15 225 6
30 35 –5 25  291.7
40 35 5 25
20 35 –15 225 
1750
1750 6
 17.1
12
13

Measures of Variation:
Variance & Standard Deviation
(Sample Theoretical Model)
• The sample variance is

  X X
2

s 2

n 1
• The sample standard deviation is

 X  X 
2

s
n 1
14

Measures of Variation:
Variance & Standard Deviation
(Sample Computational Model)
• Is mathematically equivalent to the
theoretical formula.
• Saves time when calculating by hand
• Does not use the mean
• Is more accurate when the mean has been
rounded.
15
Bluman, Chapter 3

Measures of Variation:
Variance & Standard Deviation
(Sample Computational Model)
• The sample variance is

n X    X 
2 2

s 
2

n  n  1
• The sample standard deviation is

s s 2
Example: European Auto Sales
Find the variance and standard deviation for the
amount of European auto sales for a sample of 6
years. The data are in millions of dollars.
11.2, 11.9, 12.0, 12.8, 13.4, 14.3

n X    X 
2 2
X X2
s 
2
11.2 125.44 n  n  1
11.9 141.61
6  958.94    75.6 
12.0 144.00 2
12.8 163.84 s 
2 s  1.28
2

13.4 179.56 6  5 s  1.13


14.3 204.49
75.6 958.94 2

s  6  958.94  75.6 /  6  5
2

16
Shortcut

The shortcut formulas for computing the variance


and standard deviation for data obtained from
samples are as follows.
EXAMPLE:
The number of incidents where policies were needed for a
sample of ten schools in Allegheny County is
7, 37, 3, 8, 48, 11, 6, 0, 10, 3. Assume the data
represent samples.
Find the range. range = 48 – 0 = 48

Use the shortcut formula for the unbiased


estimator to compute the variance and
standard deviation.
Use the shortcut formula for the unbiased
estimator to compute the variance and
standard deviation.
 2
4061 –  133  / 10 
2  
s = - 254.7
10 – 1

s- 254.7 - 16

range = 48 s2 = 254.7 s = 16

Is the data consistent or does it vary? Explain.

By any of these measures, it can


be said that the data can vary.
Finding the Sample Variance and Standard
Deviation for Grouped Data
EXAMPLE:
The data shows the number of murders in 25
selected cities.
Number f Find the variance and
27-90 13 standard deviation.
91-154 2
155-218 0
219-282 5
283-346 0
347-410 2
411-474 0
475-538 1
539-602 2
Class Xm f f •X f •X 2
m m
27-90 58.5 13 760.5 44,489.25
91-154 122.5 2 245 30,012.5

155-218 186.5 0 0 0
219-282 250.5 5 1252.5 313,751.25

283-346 314.5 0 0 0
347-410 378.5 2 757 286,524.5
411-474 442.5 0 0 0
475-538 506.5 1 506.5 256,542.25

539-602 570.5 2 1141 650,940.5


The data shows the number of murders in 25
selected cities.
Find the variance and standard deviation.
2
f • X m = 4662.5  f • X m = 1,582,260.25
2
2   f • X 
2
 f •X –
n  4662.5  2
s = 1, 582, 260.25 –
n–1 2 25
s =
24
= 29,696

s = 29,696 - 172.3
24

Measures of Variation:
Coefficient of Variation
The coefficient of variation is the standard
deviation divided by the mean, expressed as
a percentage.
s
CVAR  100%
X
Use CVAR to compare standard deviations
when the units are different.
Coefficient of Variation
•Measure of Relative Variation
•Always a %
•Shows Variation Relative to Mean
•Used to Compare 2 or More Groups
•Formula (for Sample):

 SD 
CV     100%
 X 
Comparing Coefficient of Variation

• Stock A: Average Price last year = $50


• Standard Deviation = $5
• Stock B: Average Price last year = $100
• Standard Deviation = $5

Coefficient of Variation:
Stock A: CV = 10%
Stock B: CV = 5%
Example: Sales of Automobiles
The mean of the number of sales of cars over a
3-month period is 87, and the standard
deviation is 5. The mean of the commissions is
$5225, and the standard deviation is $773.
Compare the variations of the two.
5
CVar  100%  5.7% Sales
87
773
CVar  100%  14.8% Commissions
5225

Commissions are more variable than sales.


27
Chebyshev’s theorem

The proportion of values from a data set that will


fall within k standard deviations of the mean will
be at least , where k is a number greater than 1 (k
is not necessarily an integer).
Example
The mean of a distribution is 20 and the standard
deviation is 2. Answer each. Use Chebyshev’s theorem.

a. At least what percentage of the values will


fall between 10 and 30?

b. At least what percentage of the


values will fall between 12 and 28?
a. Subtract the mean from the larger value: 30 – 20 = 10
10
Divide by the standard deviation to get k: =5
2
1
1– = 0.96 or 96%
5 2

b. Subtract the mean from the larger


value: 28 – 20 = 8. Divide by the standard
8
deviation to get k: = 4
2
1
1– = 0.9375 or 93.75%
2
4
Find the Variance, SD
1. 10, 13, 15, 17, 18, 19
2. 45, 60, 55, 33, 24, 25, 27, 38
3. 55, 66, 77, 88, 99, 22, 33, 44
4. 11, 140, 98, 23, 45, 14, 56, 78, 93, 200, 123, 165
Find the Variance, SD & CV
1. 5 test scores for Calculus I are 95, 83, 92, 81, 75.

2. Consider this dataset showing the retirement age of 11


people, in whole years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
3. Here are a bunch of 10 point quizzes from MAT117:
9, 6, 7, 10, 9, 4, 9, 2, 9, 10, 7, 7, 5, 6, 7
4. 11, 140, 98, 23, 45, 14, 56, 78, 93, 200, 123, 165
Find the Variance, SD, & CVar
1. A sample of scores in Math 1 are as follows: 12, 13,
15, 17, 18, 19, 22, 25
2. Jane’s Science scores: 25, 33, 46, 47, 50, 51, 32, 23, 41
3. Mary’s “baon” in 7 days: 50, 65, 55, 80, 90, 30, 45
4. Sample bills are as follows: 110, 100, 120, 102, 105,
106, 110, 104, 107, 108
Find the Variance, SD & CV
1. Exam marks for 60 students (marked out of 65)

2. Class Interval Frequency


2–4 3
5- 7 12
8 - 10 9
11 -13 7
14-16 5
Find the Variance, SD
1. Exam marks for 50 students (marked out of 40)
20 23 24 30 32 23 21 33 40 39
25 33 37 38 40 35 28 31 33 40
30 27 35 24 22 26 31 34 40 34
35 26 25 35 27 33 24 35 36 23
22 36 20 34 40 40 25 34 36 34

2. Class Interval Frequency


2–6 4
7 - 11 12
12 - 16 9
17 -21 8
22-26 7
Shape of Curve
• Describes How Data Are Distributed
• Measures of Shape:
• Symmetric or skewed

Left-Skewed Symmetric Right-Skewed


Mean Median Mode Mean = Median = Mode Mode Median Mean
The Empirical (Normal) Rule

Chebyshev’s theorem applies to any distribution regardless of


its shape. However, when a distribution is bell-shaped (or
what is called normal), the following statements, which make
up the empirical rule, are true.
Approximately 68% of the data values will fall within 1
standard deviation of the mean.
Approximately 95% of the data values will fall within 2
standard deviations of the mean.
Approximately 99.7% of the data values will fall within 3
standard deviations of the mean.
39
40

Examples of the Empirical Rule


Let's assume a population of animals in a zoo is known to be normally
distributed. Each animal lives to be 13.1 years old on average (mean),
and the standard deviation of the lifespan is 1.5 years. If someone
wants to know the probability that an animal will live longer than
14.6 years, they could use the empirical rule. Knowing the
distribution's mean is 13.1 years old, the following age ranges occur
for each standard deviation:
• One standard deviation (µ ± σ): (13.1 - 1.5) to (13.1 + 1.5), or 11.6 to 14.6
• Two standard deviations (µ ± 2σ): 13.1 - (2 x 1.5) to 13.1 + (2 x 1.5), or 10.1 to 16.1
• Three standard deviations (µ ± 3σ): 13.1 - (3 x 1.5) to 13.1 + (3 x 1.5), or, 8.6 to 17.6
41

Examples of the Empirical Rule

You need to calculate the total probability of the animal


living 14.6 years or longer. The empirical rule shows that
68% of the distribution lies within one standard
deviation, in this case, from 11.6 to 14.6 years. Thus, the
remaining 32% of the distribution lies outside this range.
One half lies above 14.6 and the other below 11.6. So,
the probability of the animal living for more than 14.6 is
16% (calculated as 32% divided by two)
Example
The average U.S. yearly per capita consumption of citrus
fruits is 26.8 pounds. Suppose that the distribution of
fruit amounts consumed is bell-shaped with a standard
deviation equal to 4.2 pounds.

What percentage of Americans would you expect to


consume more than 31 pounds of citrus
fruit per year?
By the Empirical Rule, 68% of consumption is within 1
standard deviation of the mean. Then 1/2 of 32%, or 16%,
of consumption would be more than 31 pounds of citrus
fruit per year.
 = 4.2  = 26.8

16% 34% 34% 16%

22.6 26.8 31
44

Measures of Variation:
Range Rule of Thumb
The Range Rule of Thumb approximates the
standard deviation as
Range
s
4
when the distribution is unimodal and
approximately symmetric.
45

Measures of Variation:
Range Rule of Thumb
Use X  2s to approximate the lowest
value and X  2sto approximate the
highest value in a data set.
Example: X  10, Range  12
12 LOW  10  2  3  4
s  3
4 HIGH  10  2  3  16
46

Measures of Variation: Chebyshev’s


Theorem
The proportion of values from any data set
that fall within k standard deviations of the
mean will be at least 1 – 1/k , where k is a
2

number greater than 1 (k is not necessarily


an integer).
# of standard Minimum Proportion Minimum Percentage within
deviations, k within k standard k standard deviations
deviations
2 1 – 1/4 = 3/4 75%
3 1 – 1/9 = 8/9 88.89%
4 1 – 1/16 = 15/16 93.75%
47

Measures of Variation: Chebyshev’s


Theorem
Example: Prices of Homes
The mean price of houses in a certain
neighborhood is $50,000, and the standard
deviation is $10,000. Find the price range for
which at least 75% of the houses will sell.

Chebyshev’s Theorem states that at least 75% of


a data set will fall within 2 standard deviations of
the mean.
50,000 – 2(10,000) = 30,000
50,000 + 2(10,000) = 70,000

At least 75% of all homes sold in the area will have a


price range from $30,000 and $70,000.
48
Example: Travel Allowances
A survey of local companies found that the mean
amount of travel allowance for executives was
$0.25 per mile. The standard deviation was 0.02.
Using Chebyshev’s theorem, find the minimum
percentage of the data values that will fall
between $0.20 and $0.30.

.30  .25 / .02  2.5 1  1/ k  1  1/ 2.5


2 2

.25  .20  / .02  2.5  0.84


k  2.5
At least 84% of the data values will fall between
$0.20 and $0.30.
49
50

Measures of Variation:
Empirical Rule (Normal)
The percentage of values from a data set that
fall within k standard deviations of the mean
in a normal (bell-shaped) distribution is
listed below.
# of standard Proportion within k standard
deviations, k deviations
1 68%
2 95%
3 99.7%
51

Measures of Variation:
Empirical Rule (Normal)
Summary of Measures
Summary of Measures

Central Tendency Quartile Variation

Mean Mode
Median Range Coefficient of
Variation
Variance

Standard Deviation
Summary of Measures
Summary of Measures

Central Tendency Quartile Variation

Mean Mode
Median Range Coefficient of
Variation
Variance

Standard Deviation
54

Measures of
Position
55

Measures of Position
• z-score
• Percentile
• Quartile
• Outlier
56

Measures of Position: z-score


• A z-score or standard score for a value is obtained by
subtracting the mean from the value and dividing the result by
the standard deviation.

X X X 
z z
s 
• A z-score represents the number of standard deviations a value
is above or below the mean.
Example: Test Scores
A student scored 65 on a calculus test that had a
mean of 50 and a standard deviation of 10; she
scored 30 on a history test with a mean of 25 and
a standard deviation of 5. Compare her relative
positions on the two tests.
X  X 65  50
z   1.5 Calculus
s 10
X  X 30  25
z   1.0 History
s 5
She has a higher relative position in the Calculus class.
57
Example
Which of the following exam scores has a better
relative position?
a. A score of 42 on an exam with X = 39 and s = 4

42 – 39 3
z= =
4 4
b. A score of 76 on an exam with
X = 71 and s = 3

76 – 71 5
z= =
3 3
59

Measures of Position: Percentiles


• Percentiles separate the data set into 100 equal groups.
• A percentile rank for a datum represents the percentage of
data values below the datum.

Percentile 
 # of values below X   0.5
100%
total # of values
n p
c
100
60

Measures of Position: Example of


a Percentile Graph
Example: Test Scores
A teacher gives a 20-point test to 10 students.
Find the percentile rank of a score of 12.
18, 15, 12, 6, 8, 2, 3, 5, 20, 10

Sort in ascending order.


2, 3, 5, 6, 8, 10, 12, 15, 18, 20

6 values
Percentile 
 # of values below X   0.5
100%
total # of values
6  0.5 A student whose
 100%
10 score was 12 did
 65% better than 65% of
61 the class.
Example: Test Scores
A teacher gives a 20-point test to 10 students. Find
the value corresponding to the 25 percentile.
th

18, 15, 12, 6, 8, 2, 3, 5, 20, 10

Sort in ascending order.


2, 3, 5, 6, 8, 10, 12, 15, 18, 20

n  p 10  25
c   2.5  3
100 100

The value 5 corresponds to the 25 th

percentile.
62
Example
Find the percentile ranks of each weight in the data set.
The weights are in pounds.
Data: 78, 82, 86, 88, 92, 97
number of values below + 0.5
Percentile =  100%
total number of values
Data: 78, 82, 86, 88, 92, 97
0 + 0.5 th
For 78,  100% = 8 percentile
6

For 82, 1 + 0.5 th


 100% = 25 percentile
6

For 86, 2 + 0.5  nd


= 42 percentile
6 100%
number of values below + 0.5
Percentile =  100%
total number of values
Data: 78, 82, 86, 88, 92, 97

For 88, 3 + 0.5 th


 100% = 58 percentile
6

For 92, 4 + 0.5 th


 100% = 75 percentile
6

For 97, 5 + 0.5 nd


 100% = 92 percentile
6
Example
What value corresponds to the 30th percentile?
Find the percentile ranks of each weight in the data set.
The weights are in pounds.
78, 82, 86, 88, 92, 97
6(30)
c= = 1.8 or 2
100

Therefore, the answer is the


2 in the series, or 82.
nd
67

Measures of Position:
Quartiles and Deciles
• Deciles separate the data set into 10
equal groups. D1=P10, D4=P40
• Quartiles separate the data set into 4
equal groups. Q1=P25, Q2=MD,
Q3=P75
• The Interquartile Range, IQR = Q3 –
Q1.
Finding Data Values Corresponding to Q1, Q2, and Q3
Step 1 Arrange the data in order from lowest to highest.

Step 2 Find the median of the data values. This is the


value for Q2.

Step 3 Find the median of the data values that fall below
Q2. This is the value for Q1.

Step 4 Find the median of the data values that fall above
Q2. This is the value for Q3.
Example: Quartiles
Find Q1, Q2, and Q3 for the data set.
15, 13, 6, 5, 12, 50, 22, 18

Sort in ascending order.


5, 6, 12, 13, 15, 18, 22, 50

1
315
2
Q 1
4
2
2
61
Q
1 9
2
12
82
Q
3 20
2
69
70

Measures of Position:
Outliers
•An outlier is an extremely high or
low data value when compared
with the rest of the data values.
•A data value less than Q1 –
1.5(IQR) or greater than Q3 +
1.5(IQR) can be considered an
outlier.
71

Exploratory Data Analysis


• The Five-Number Summary is
composed of the following numbers:
Low, Q1, MD, Q3, High
• The Five-Number Summary can be
graphically represented using a
Boxplot.
Constructing Boxplots
1. Find the five-number summary.
2. Draw a horizontal axis with a scale that includes
the maximum and minimum data values.
3. Draw a box with vertical sides through Q1 and
Q3, and draw a vertical line though the median.
4. Draw a line from the minimum data value to the
left side of the box and a line from the
maximum data value to the right side of the
box.

72
Example 3-38: Meteorites
The number of meteorites found in 10 U.S. states
is shown. Construct a boxplot for the data.
89, 47, 164, 296, 30, 215, 138, 78, 48, 39

30, 39, 47, 48, 78, 89, 138, 164, 215, 296

Low Q1 MD Q3 High
Five-Number Summary: 30-47-83.5-164-296
47 83.5 164
30 296

73
Example
Identify the five number summary and find
the interquartile range.
8, 12, 32, 6, 27, 19, 54
Data arranged in order:
6, 8, 12, 19, 27, 32, 54
Minimum: 6
Median: 19
Maximum: 54
Q1: 8 Q3: 32

Interquartile Range: 32 – 8 = 24
Example
Use the boxplot to identify the maximum value, minimum
value, median, first quartile, third quartile, and
interquartile range.

50 55 60 65 70 75 80 85 90 95 100
Minimum: 55 Maximum: 95
Median: 70 Interquartile Range:
Q1: 65 90 – 65 = 25

Q3: 90
Information Obtained from a Boxplot
1. a. If the median is near the center of the box, the
distribution is approximately symmetric.
b. If the median falls to the left of the center of the box,
the distribution is positively skewed.
c. If the median falls to the right of the center, the
distribution is negatively skewed.
2. a. If the lines are about the same length, the
distribution is approximately symmetric.
b. If the right line is larger than the left line, the
distribution is positively skewed.
c. If the left line is larger than the right line, the
distribution is negatively skewed.
Example
9.8 8.0 13.9 4.4 3.9 21.7
15.9 3.2 11.7 24.8 34.1 17.6

These data are the number of inches of


snow reported in randomly selected cities
for September 1 through January 10.
Construct a boxplot and comment on the
skewness of the data.
Data arranged in order :
9.8 8.0 13.9 4.4 3.9 21.7
15.9 3.2 11.7 24.8 34.1 17.6

Minimum: 3.2 Maximum: 34.1

MD: 11.7 + 13.9 Q1: 4.4 + 8.0 = 6.2


= 12.8
2 2

Q3: 17.6 + 21.7


= 19.65
2
Comment on the skewness of the data.

3.2 6.2 12.8 19.65 34.1


The distribution is
positively skewed.
0 5 10 15 20 25 30 35
Exercise
These data represent the volumes in cubic yards of the
largest dams in the United States and in South America.

Construct a boxplot of the data for each


region and compare the distributions.
United States South America
125,628 311,539
92,000 274,026
78,008 105,944
77,700 102,014
66,500 56,242
62,850 46,563
52,435
50,000
For USA: Min = 50,000 Max = 125,628
MD = 72,100 Q1 = 57,642.5 Q3 = 85,004
United States
125,628
92,000
78,008
77,700
66,500
62,850
52,435
72,100 50,000
125,628

50 100 150 200 250 300 350


For South Min = 46,563 Max = 311,539
America: MD = 103,979 Q = 56,242
1
Q3 = 274,026
South America
311,539
274,026
105,944
102,014
56,242
46,563
103,979
311,539

50 100 150 200 250 300


Compare the distributions:
72,100
125,628

USA

103,979
311,539

50 100 150 200 250 300


South America
FORMULA for Quartiles of GROUPED DATA
𝒓σ𝒇
−𝒄𝒇
Qr = lbqc + 𝟒
𝒊
𝒇𝒒𝒄
Where:
r=for what quartile (ex. 1 Q)
st

lbqc=lower boundary of the quartile class


σ 𝒇= sum of the frequencies
cf = cumulative frequency of the class lower than the quartile class
fqc=frequency of the quartile class
i=class interval
84

Exercise

Class Interval Frequency


2–4 3
5- 7 12
8 - 10 9
11 -13 7
14-16 5

You might also like