0% found this document useful (0 votes)
45 views62 pages

Chapter 3 A

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 62

Chapter 3

Data Description
Outline

3-1 Measures of Central Tendency

3-2 Measures of Variation

3-3 Measures of Position

3-4 Exploratory Data Analysis


OBJECTIVES

Summarize data, using measures of central tendency,


1 such as the mean, median, mode, and midrange.

2
Describe data, using measures of variation, such as
the range, variance, and standard deviation.

Identify the position of a data value in a data set,


3
using various measures of position, such as
percentiles, deciles, and quartiles.

4 Use the techniques of exploratory data analysis,


including boxplots and five-number summaries, to
discover various aspects of data.
Introduction

Measures of Measures of Measures of


central tendency variation position

• Measures of • Measures of • Measures


average dispersion , tell where
mean the which the data
center of the determine value falls
distribution the spread of within the
or the most the data data set .
typical one . values.

4
Introduction

Measures of Measures of Measures of


central tendency variation position

• mean • range • Percentile


• Median • Variance • Deciles
• Mode • Standard • quartiles
• midrange deviation

5
3.1 Measures of Central Tendency

Measures

Statistics Parameter

Is a characteristic or measure Is a characteristic or measure obtained by


obtained by using the data values using all the data values for a specific
from a sample. population.

Thus, the average of household (HH) income obtained from a sample of household is a
statistic,
and the average of household (HH) income obtained from the entire population of HH is a
parameter
6
Measures of Central Tendency:
The Mean
The mean also known as the arithmetic
average, is found by adding the values of Mean
the data and dividing by the total number
of values.

Sample mean Population Mean


𝑥1 + 𝑥2 + … . . +𝑥𝑛 𝑋1 +𝑋2 + …..+𝑋𝑁
𝑥ҧ = µ=
𝑛 𝑁
𝑛
σ𝑖=1 𝑥𝑖 σ𝑁𝑖=1 𝑋𝑖
= =
𝑛 𝑁
Greek letter μ (mu)

7
Example 3-1-page 112 Avian Flu Cases:

8
Example 3-1: Days Off per Year

The data represent the number of days off per year for a
sample of individuals selected from nine different
countries. Find the mean.
20, 26, 40, 36, 23, 42, 35, 24, 30
𝑥1 + 𝑥2 + … . . +𝑥𝑛
𝑥ҧ =
𝑛

20 + 26 + 40 + 36 + 23 + 42 + 35 + 24 + 30 276
X= = = 30.7
9 9

The mean number of days off is 30.7 days.

*The mean, in most cases, is not an actual data value.

9
Measures of Central Tendency:
mean for grouped data

Mean of grouped data

10
Finding the Mean for Grouped Data

• Step 1 Make a table as shown.


A B C D
Class Frequency f Midpoint Xm f * Xm
• Step 2 Find the midpoints of each class and place them in column C.
• Step 3 Multiply the frequency by the midpoint for each class, and
place the product in column D.
• Step 4 Find the sum of column D.
• Step 5 Divide the sum obtained in column D by the sum of the
frequencies obtained in column B.
• The formula for the mean is X=
 f  Xm
n

11
Example 3-3: Miles Run

Class Frequency f Midpoint, Xm 𝑓. 𝑋𝑚


5.5 - 10.5 1 (5.5+10.5)/2=8 1X8=8
10.5 - 15.5 2 13 2X13=26
15.5 - 20.5 3 18 54
20.5 - 25.5 5 23 115
25.5 - 30.5 4 28 112
30.5 - 35.5 3 33 99
35.5 - 40.5 2 38 76
n=f = 20  f ·Xm = 490

X=
 f X m
=
490
= 24.5 miles
n 20
Measures of Central Tendency: Weighted Mean

• Sometimes, you must find the mean of a data set in


which not all values are equally represented.
• The type of mean that considers an additional factor
is called the weighted mean, and it is used when the
values are not all equally represented.
• Weighted mean of a variable is obtained by
multiplying each value (x) by its corresponding weight
(w) and dividing the sum of the products by the sum
of the weights.

13
Measures of Central Tendency: Weighted Mean

14
Example 3-14: Grade Point Average

A student received the following grades. Find the


corresponding GPA.
Course Credits, w Grade, X
English Composition 3 A (4 points)
Introduction to Psychology 3 C (2 points)
Biology 4 B (3 points)
Physical Education 2 D (1 point)

X=  wX
=
3  4 + 3  2 + 4  3 + 2 1 32
= = 2.7
w 3+3+ 4+ 2 12

The grade point average is 2.7.


15
Measures of Central Tendency: Median

• The median is the midpoint of the data array. The symbol


for the median is MD.
Steps in computing the median of a data array
Step 1 Arrange the data in order.
Step 2 Select the middle point
• The median will be one of the data values if there is an odd
number of values. e.g . 1 2 3
• The median will be the average of two data values if there is
an even number of values. e.g. 1 2 3 4

16
Example 3-4: Tablet Sales

17
Example 3-5: Tornadoes in the U.S.

The number of tornadoes that have occurred in


the United States over an 8-year period follows.
Find the median.
684, 764, 656, 702, 856, 1133, 1132, 1303

Find the average of the two middle values.


656, 684, 702, 764, 856, 1132, 1133, 1303

764 + 856 1620


MD = = = 810
2 2

The median number of tornadoes is 810.


18
Measures of Central Tendency: Mode

• The mode is the value that occurs most often


in a data set.
• It is sometimes said to be the most typical
case.
• There may be no mode, one mode (unimodal),
two modes (bimodal), or many modes
(multimodal).

19
Examples: Mode
Unimodal: Find the mode of the signing bonuses of eight NFL players
for a specific year. The bonuses in millions of dollars are
18.0, 14.0, 34.5, 10, 11.3, 10, 12.4, 10
You may find it easier to sort first.
10, 10, 10, 11.3, 12.4, 14.0, 18.0, 34.5

The mode is 10.

Find the mode for the number of branches that six banks have.
401, 344, 209, 201, 227, 353

Since each value occurs only once, there is no mode.

Note: Do not say that the mode is zero. That would be incorrect,
because in some data, such as temperature, zero can be an actual
value.
20
Example 3-7: Licensed Nuclear Reactors

The data show the number of licensed nuclear


reactors in the United States for a recent 15-year
period. Find the mode.
104 104 104 104 104
107 109 109 109 110
109 111 112 111 109

104 and 109 both occur the most (5 times). The data
set is said to be bimodal.

The modes are 104 and 109.


21
Example 3-9,3-10: Mode of grouped data

22
Measures of Central Tendency: Midrange

• The midrange is a rough estimate of the middle.


• It midrange is the average of the lowest and highest values in
a data set.
Lowest + Highest
MR =
2

23
Properties and Uses of Central Tendency

The Mean
• Define rigorously with a mathematical formula which is highly
amenable to mathematical treatment
• Uses all data values.
• Varies less than the median or mode when samples are taken
from the same population and all three measures are
computed for these samples.
• Used in computing other statistics, such as the variance
• Unique, usually not one of the data values
• Cannot be used with open-ended classes
• Affected by extremely high or low values, called outliers

24
Properties of the Median

➢Gives the midpoint


➢Used when it is necessary to find out whether
the data values fall into the upper half or lower
half of the distribution.
➢Can be used for an open-ended distribution.
➢Affected less than the mean by extremely high
or extremely low values.

25
Properties of the Mode

➢Used when the most typical case is desired


➢Easiest average to compute
➢Can be used with nominal data
➢Not always unique or may not exist

26
Properties of the Midrange

➢Easy to compute.
➢Gives the midpoint.
➢Affected by extremely high or low values in a
data set

27
Distributions Shapes

28
Exercise

3. High Temperatures The reported high temperatures (in degrees Fahrenheit) for
selected world cities on an October day are shown below. Find (i) the mean, (ii) the
median, (iii) the mode, and (iv) the mid-range. Which measure of central tendency
do you think best describes these data?
62 72 66 79 83 61 62 85 72 64 74 71
42 38 91 66 77 90 74 63 64 68 42
Solution:
62+72+66+ ………………..+42 1566
(i) Mean = x = = = 68.1
23 23
(ii) Arrange the observation in ascending order
38 42 42 61 62 62 63 64 64 66 66 68 71 72 72 74 74 77
79 83 85 90 91
Median = 68.
(iii) Modes are: 42 62 64 66 72 74
38+91 129
(iv) Midrange = = =64.5
2 2 29
Exercise

14 . Hourly Compensation for Production Workers The hourly compensation


costs (in U.S. dollars) for production workers in selected countries are
represented below. Find the (a) mean, and (b) modal class.

30
Exercise

Solution:

Class Frequency f Midpoint Xm f*Xm


2.48 - 7.48 7 4.98 34.86
X=
 f *X m
=
495.15
= 17.68
7.49 – 12.49 3 9.99 29.97 n 28
12.50 – 17.50 1 15.00 15.00
17.51 – 22.51 7 20.01 140.07 2.48 – 7.48 and 17.51 – 22.51
22.52 – 27.52 5 25.02 125.10
27.53 – 32.53 5 30.03 150.15
Total n =  f = 28  f *X m = 495.15

31
Extending the concept

36. If the mean of five values is 64, find the sum of the
values. 320

37. If the mean of five values is 8.2 and four of the values
are 6, 10, 7, and 12, find the fifth value.

38. Find the mean of 10, 20, 30, 40, and 50.
a. Add 10 to each value and find the mean. 40
b. Subtract 10 from each value and find the mean. 20
c. Multiply each value by 10 and find the mean. 300
d. Divide each value by 10 and find the mean. 3
e. Make a general statement about each situation.
3-2 Measures of Variation

In statistics, to describe the data set accurately, in addition to


measures of central tendency we must know how the data values
spread from one another.

33
3-2 Measures of Variation

Consider the example of the test scores of two groups of students.

Group 1 Group 2
45 70 Mean score for both the group
100 75 of students is 75.
80 80
225 225 But their performance is not same.

How Can We Measure Variability?


Range Variance Standard Deviation
Coefficient of Variation Chebyshev’s Theorem
Empirical Rule (Normal)

34
Measures of Variation: Range

• The range is the difference between the highest and


lowest values in a data set. R=highest – lowest
Example 3-15/16: Outdoor Paint
Two experimental brands of outdoor paint are tested to see how
long each will last before fading. Six cans of each brand constitute
a small population. The results (in months) are shown. Find the
mean and range of each group. Brand A Brand B
10 35
60 45
50 30
30 35
40 40
20 25
35
Example 3-15/16 : Outdoor Paint

Brand A Brand B =  X 210


= = 35
10 35 Brand A: N 6
60 45 R = 60 − 10 = 50
50 30
30 35
=  X
=
210
= 35
40 40 Brand B: N 6
20 25
R = 45 − 25 = 20

The average for both brands is the same, but the range
for Brand A is much greater than the range for Brand B.

Which brand would you buy?

Bluman Chapter 3 36
Measures of Variation: Variance & Standard Deviation

Measures Variance Standard deviation


Definition is the average of the is the square root of the
squares of the distance variance.
each value is from the The standard deviation is a
mean. measure of how spread out
your data are.

σ𝑁
𝑖=1 𝑋 − 𝜇
2 σ𝑁
𝑖=1 𝑋 − 𝜇
2
2 𝜎=
𝜎 =
𝑁 𝑁
Uses or -To determine the spread of the data.
Purposes -To determine the consistency of a variable.
-To determine the number of data values that fall within a
specified interval in a distribution (Chebyshev’s Theorem).
-Used in inferential statistics.

37
Example 3-21: Outdoor Paint

Find the variance and standard deviation for the data set for
Brand A paint. 10, 60, 50, 30, 40, 20

= =
X 210
Months, X X – µ (X – µ)2 = 35
N 6
10 –25 625
( X − )
2

60 25 625 2 =
n
50 15 225 1750
=
30 –5 25 6
40 5 25 = 291.7
20 –15 225
1750
00 1750  =
6
= 17.1

38
Measures of Variation: Variance & Standard Deviation
(Sample Theoretical Model)

• The sample variance is


 ( X −X)
2

s 2
=
n −1
• The sample standard deviation is

( X − X )
2

s=
n −1

39
Variance & Standard Deviation
(shortcut or Sample Computational formula for 𝑠 2 and s)

• It Is mathematically equivalent to the theoretical formula.


• Saves time when calculating by hand
• Does not use the mean
• Is more accurate when the mean has been rounded.
n X − (  X )
2 2
The sample variance is
s =
2

n ( n − 1)

The sample standard deviation is

s= s 2

40
Example : European Auto Sales

Find the variance and standard deviation for the amount of


European auto sales for a sample of 6 years. The data are in
millions of dollars.
11.2, 11.9, 12.0, 12.8, 13.4, 14.3
n X − (  X )
2 2
X X 2
s = 2

11.2 n ( n − 1)
125.44
11.9 6 ( 958.94 ) − ( 75.6 )
2
141.61
12.0 144.00 s =
2
s 2 = 1.28
6 ( 5)
12.8 163.84 s = 1.13
13.4 179.56
14.3 204.49 Note: that σ 𝑋 2 is not the same as σ 𝑋 2 .
75.6 958.94 -The notation σ 𝑋 2 means to square the values
first, then sum;
- σ 𝑋 2 means to sum the values first, then square
the sum. 41
shortcut or Sample Computational formula for 𝑠 2 and s for grouped
data )
• The steps for finding the variance and standard deviation for grouped data
are summarized in this Procedure Table.

Bluman, Chapter 3 42
Example 3-22: Miles run per week
Find the variance and the standard deviation for the frequency
distribution of the data in Example 2–7. The data represent the
number of miles that 20 runners ran during one week.

Bluman, Chapter 3 43
Solution of example 3-24

Substitute in the formula and solve for 𝑠 2 to get the variance.

s =
2
( )
n  f  X m2 − ( f  X m )
2

=
20(13,310) − (490)
2
= 68.7
n(n − 1) 20(20 − 1)
Take the square root to get the standard deviation
s = 68.7 = 8.3
Bluman, Chapter 3 44
Measures of Variation: Coefficient of Variation

Whenever two samples have the same units of measure: the variance and
standard deviation for each can be compared directly.

But what if we want to compare the standard deviations of two different


variables, such as the number of sales per salesperson over a 3-month
period and the commissions made by these salespeople?

Coefficient of variation is a statistic that allows you to compare


standard deviations when the units are different.

The coefficient of variation is the standard deviation divided by


the mean, expressed as a percentage.

s
CVAR = 100%
X
45
Example 3-23: Sales of Automobiles

The mean of the number of sales of cars over a 3-month


period is 87, and the standard deviation is 5.
The mean of the commissions is $5225, and the standard
deviation is $773. Compare the variations of the two.

5
CVar = 100% = 5.7% Sales
87
773
CVar = 100% = 14.8% Commissions
5225

Commissions are more variable than sales.

46
Range Rule of Thumb

The range can be used to approximate the standard


deviation. The approximation is called the range rule
of thumb.
The Range Rule of Thumb approximates the standard
deviation as
Range
s
4
Note: only when the distribution is unimodal and
approximately symmetric.

47
Example : Range Rule of Thumb

• For example: the data set 5, 8, 8, 9, 10, 12, 13


the standard deviation for the data is 2.7, and the range is :
13 –5= 8. The range rule of thumb applies that s= 8/4≈2.

• The range rule of thumb in this case underestimates the


standard deviation somewhat.

• A note of caution should be mentioned here. The range rule


of thumb is only an approximation and should be used when
the distribution of data values is unimodal and roughly
symmetric.

48
Range Rule of Thumb

The range rule of thumb can be used to estimate the largest and smallest
data values of a data set. The smallest data value will be approximately 2
standard deviations below the mean, and the largest data value will be
approximately 2 standard deviations above the mean of the data set.

Smallest data value = X − 2s


Largest data value = X + 2s
12
Example: X = 10, Range = 12 s =3
4

LOW  10 − 2 ( 3) = 4
HIGH  10 + 2 ( 3) = 16
49
Measures of Variation: Chebyshev’s Theorem

• Note: The variance and standard deviation of a variable


can be used to determine the spread, or dispersion, of a
variable. That is, the larger the variance or standard
deviation, the more the data values are dispersed.

• For example, if two variables measured in the same


units have the same mean, say, 70, and the first variable
has a standard deviation of 1.5 while the second variable
has a standard deviation of 10, then the data for the
second variable will be more spread out than the data for
the first variable.

50
Measures of Variation: Chebyshev’s Theorem

• Chebyshev’s theorem, developed by the Russian mathematician


Chebyshev (1821–1894), specifies the proportions of the spread in
terms of the standard deviation.
• Chebyshev’s Theorem: The proportion of values from any data set
that fall within k standard deviations of the mean will be at least 1 –
1/k2, where k is a number greater than 1 (k is not necessarily an
integer).

51
Chebyshev’s Theorem

Chebyshev’s Theorem: The proportion of values from any data


set that fall within k standard deviations of the mean will be at
least 1 – 1/k2, where k is a number greater than 1 (k is not
necessarily an integer).

# of standard Minimum Proportion Minimum Percentage


deviations, k within k standard within k standard
deviations deviations
2 1 – 1/4 = 3/4 75%
3 1 – 1/9 = 8/9 88.89%
1 – 1/16 =
4 93.75%
15/16
52
Example : Chebyshev’s Theorem

for a variable which has a mean of 70 and a standard deviation of


1.5, at least three-fourths, or 75%, of the data values fall between
67 and 73.
These values are found by adding 2 standard deviations to the
mean and subtracting 2 standard deviations from the mean, as
shown:

53
Example 3-25: Prices of Homes

The mean price of houses in a certain neighborhood is $50,000, and


the standard deviation is $10,000. Find the price range for which at
least 75% of the houses will sell.
Solution:
Chebyshev’s Theorem states that at least 75% of a data set will fall
within 2 standard deviations of the mean.
0r

Smallest data value is = 50,000 – 2(10,000) = 30,000


Largest data value is = 50,000 + 2(10,000) = 70,000

Thus, at least 75% of all homes sold in the area will have a price range
from $30,000 and $70,000.

54
Example 3-26: Travel Allowances

A survey of local companies found that the mean amount of travel


allowance for executives was $0.25 per mile. The standard
deviation was 0.02. Using Chebyshev’s theorem, find the
minimum percentage of the data values that will fall between
$0.20 and $0.30.
Solution:
We have, k = (upper limit – mean)/standard dev
= (0.30 – 0.25)/0.02 = 2.5

Thus,
At least 84% of the data values will fall between $0.20 and $0.30.

Bluman, Chapter 3 55
Measures of Variation: Empirical Rule (Normal)

Chebyshev’s theorem applies to any distribution regardless of its


shape. However, when a distribution is bell-shaped (or what is
called normal), the following statements, which make up the
empirical rule, are true.

56
Measures of Variation: Empirical Rule (Normal)

• Other way: The percentage of values from a data set


that fall within k standard deviations of the mean in a
normal (bell-shaped) distribution is listed below.

# of standard deviations, Proportion within k


k standard deviations
1 68%
2 95%
3 99.7%

Bluman, Chapter 3 57
Measures of Variation: Empirical Rule (Normal)

Suppose that the scores on a national achievement exam have a mean


of 480 and a standard deviation of 90. If these scores are normally
distributed, then approximately 95% of the scores will fall between 300
and 660 (480+2*90=660 and 480- 2*90=300).

Bluman, Chapter 3 58
Linear transformation of the data

Sometimes, it is necessary to transform the data values into other


data values.
Example:
Some time the temperature value is collected using the Celsius
temperature scale, but in some state the data should be
transferred to the Fahrenheit temperature scale.

This change is called linear transformation of the data.


Question now?
How does the transformation of the data values effect the mean
and the standard deviation???

59
Linear transformation of the data

Example:
Suppose you own a store with five employees, their hourly
salaries are: $10, $13, $10, $11, $16

𝑥=$12
ҧ s=2.550

then you decided to give each employee a raise of $1.00 per hour.
So, the new salaries will be : $11, $14, $11, $12, $17

𝑥=$13
ҧ s=2.550
So, we noticed that the value of the mean increases by the
amount added to the data, but the standard deviation dose not
change.
60
Linear transformation of the data

Example:
Suppose that the five employees worked , the numbers of hours
per week shown as : 15, 12, 18, 20, 10

𝑥=15
ҧ s=4.123

You next decide to double the amount of each employee’s hours


for December: 30, 24, 36, 40, 20

𝑥ҧ = 30 s=8.246
So we noticed that the value of the mean and standard
deviation also doubled .
61
Linear transformation of the data

62

You might also like