0% found this document useful (0 votes)
18 views44 pages

Chapter 03

toan1

Uploaded by

Huy Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views44 pages

Chapter 03

toan1

Uploaded by

Huy Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Chapter 3

Calculating Descriptive Statistics


CHAPTER 3 MAP
3.1 Measures of Central Tendency

3.2 Measures of Variability

3.3 Using the Mean and Standard Deviation Together

3.4 Working with Grouped Data

3.5 Measures of Relative Position

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-1
3.1 Measures of Central Tendency

Central tendency is a single value used to


describe the center point of a data set.

Measures of Central Tendency

Mean Median Mode

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-2
The Mean
The mean, or average, is the most common
measure of central tendency.
• Calculate the mean by adding all the values in
a data set and then dividing the result by the
number of observations.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-3
The Mean
Formula for the Sample Mean:
where = the sample mean
= the values in the sample

= the sum of all the data values

n = the number of data values


in the sample
Pronounced
“x-bar”

Sample size Observed values

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-4
The Mean
Formula for the Population Mean:

where = the population mean


(the Greek letter “mu”)

N = the number of data values


in the population

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-5
Calculating The Mean
Example: suppose a sample of size n = 5 gives
the following values:
6.2 7.1 4.8 9.0 3.3

The sample mean:

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-6
Advantages and Disadvantages of Using the
Mean to Summarize Data

Advantages:
• Simple to calculate
• Summarizes the data with a single value
Disadvantages:
• With only a summary value you lose information about
the original data.
• Sample 1 with n = 3: 999, 1000, 1001 = 1000
• Sample 2 with n = 3: 0, 1000, 2000 = 1000
• Just knowing the mean does not help you know what the
underlying data looks like.
• The value of the mean is sensitive to outliers (values
that are much higher or lower than most of the data).

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-7
The Median

The median is the value in the data set for


which half the observations are higher and
half the observations are lower.
• First arrange the data in ascending order.
• Use an Index Point to determine the position of
the median in the data set.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-8
The Median

Example with sample of size n = 7:


21 27 27 28 34 45 50

The index number is not a whole number so


round up to i = 4.
The median value is, therefore, in the fourth
position of our sorted data.

21 27 27 28 34 45 50

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-9
The Median

The median is not sensitive to outliers.


21 27 27 28 34 45 5000
• The median is still 28.

When there are an odd number of data


values, the median is always the middle value
in the data set.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-10
The Median

When the index point is an even whole


number, the position of the median is halfway
between the index point (i) and the next
highest data point (the i + 1 position).

When there are an even number of data


values, the median is halfway between the
two middle values.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-11
The Median
Example with sample of size n = 6:
145 157 170 182 204 209
The index number is a whole number so the median
value is halfway between the third and fourth values in
the sorted data.

median = (170 + 182)/2 = 176

145 157 170 182 204 209

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-12
The Mode

The mode is the value that appears most often


in a data set.
• If no data value or category repeats more than
once, then we say that the mode does not exist.
• More than one mode can exist if two or more
values tie for most frequent.

The mode is a particularly useful way to


describe categorical data.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-13
The Mode
Example with numerical data:
• Number of children per family in a sample of 24
families:
0,0,0,0,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,4,5

Number
of children Frequency
0 4 The value that
1 5 appears most
2 8 often is 2
0 1 2 3 4 5
3 4 (occurs 8 times),
4 2 so the mode = 2
children.
5 1
Mode = 2

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-14
The Shapes of Frequency Distributions

Distribution Shape

Symmetric Skewed
Left- Right-
Skewed Skewed

Mean = Median Mean < Median Median < Mean

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-15
Which Measure of Central Tendency Should
You Use?

The mean is generally used as it is relatively easy


to determine and most widely understood by
people with little statistical training.
If outliers are present, the median is often used,
since the median is not sensitive to outliers
• For example, median home prices may be reported
for a region; it is less sensitive to outliers.

For categorical data, the mode is the only choice

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-16
Advantages and Disadvantages

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-17
3.2 Measures of Variability

Measures of variability show how much spread


is present in the data.

Measures of Variability

Range Variance Standard


Deviation
For a sample
For a sample
For a population
For a population

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-18
The Range
Simplest measure of variation
Difference between the highest value and the
lowest value in a data set

Range = Highest value – Lowest Value

Example: 1, 2, 4, 4, 6, 8, 8, 8, 8, 9, 11, 11, 12, 13

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 13 - 1 = 12
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-19
The Range
Advantages:
• Easy to calculate and understand
Disadvantages:
• Only based on two numbers in the data set
(Ignores the way in which data are distributed)
• Sensitive to outliers
Example:

1, 2, 4, 4, 6, 8, 8, 8, 8, 9, 11, 11, 12, 13 Range = 12

1, 2, 4, 4, 6, 8, 8, 8, 8, 9, 11, 11, 12, 1000 Range = 999

Only one value changed - the range does not


accurately reflect the overall variability of the data
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-20
The Variance and Standard Deviation

• The sample variance is denoted by s2

Sample Variance Formula:

where = sample mean


n = sample size
= the difference between each
data value and the sample mean

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-21
Calculating the Sample Variance

Sample
Data (xi) : 4 6 8 9 11 12 12 18
n=8 Mean = = 10

The variance measures the


variability, or spread, of the
data points around the mean.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-22
The Standard Deviation

The standard deviation is the square root of


the variance.
• Has the same units as the original data

Sample standard deviation formula:

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-23
Calculating the
Sample Standard Deviation
Sample
Data (xi) : 4 6 8 9 11 12 12 18
n=8 Mean = = 10

A measure of how far on


average each data value is
from the mean of the sample

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-24
Short-Cut Formulas for the Sample
Variance and Standard Deviation
Equivalent, but easier for hand calculations

Short-cut formula
for the sample
variance:

Short-cut formula for


the sample standard
deviation:

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-25
The Variance and Standard Deviation for a
Population

Used when the data set represents an entire


population rather than a sample from a
population

Population Variance Formula:

where = population mean


N = population size
= the difference between each data
value and the population mean

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-26
The Variance and Standard Deviation for a Population

Used when the data set represents an entire


population rather than a sample from a
population
Population Standard
Deviation Formula:

where = population mean


N = population size
= the difference between each data
value and the population mean

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-27
Short-Cut Formulas for the Population
Variance and Standard Deviation

Equivalent, but easier for hand calculations

Short-cut formula
for the population
variance:

Short-cut formula for


the population
standard deviation:

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-28
The Coefficient of Variation

The coefficient of variation, CV, measures the


standard deviation in terms of its percentage
of the mean.
• A high CV indicates high variability relative to the
size of the mean.
• A low CV indicates low variability relative to the
size of the mean.
A smaller coefficient of variation indicates
more consistency within a set of data values.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-29
The Coefficient of Variation
Formula for the sample coefficient of variation:
where s = the sample standard deviation
= the sample mean

Formula for the population coefficient of variation:


where = the population standard deviation
= the population mean

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-30
Coefficient of Variation Example

Stock Price for Nike: Stock Price for Google:


Average price last year = $59.67 Average price last year = $1045.85

Standard deviation = $6.64 Standard deviation =


$68.70

Coefficient of Variation:

Nike:
Although Google
Google: had a larger
deviation, it had
the more
consistent price.
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-31
3.4 Working with Grouped Data

Suppose data has already been summarized by a


frequency distribution.
• The individual data values are no longer shown.
• Only grouped data is available.

To estimate the average for the frequency


distribution:
• Find the midpoint for each group.
(The midpoint is the halfway point in each
group.)
• Use the midpoint as a representative value for that
group.
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-32
The Mean of Grouped Data
The formula for the Sample Mean from Grouped
Data: where = the frequency for class i
= the midpoint for class i

= the total number of observations

k = the number of classes


• The mean is only an approximate value since the midpoint is
just an estimate of the value in each class.
Formula for the
population mean from
grouped data:
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-33
Example: The Mean of Grouped Data

Example An online merchant has collected the following


grouped data for the number of web pages viewed by a sample
of its customers:
Number of pages Frequency

1 to under 5 6
5 to under 9 12
9 to under 13 10
13 to under 17 4

The merchant would like to calculate the average number of


viewed pages.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-34
Example: The Mean of Grouped Data
1. Find the midpoint for each class
Number of Midpoint Frequency
pages (mi) (fi)
1 to under 5 3 6 The midpoint is
5 to under 9 7 12 the halfway point
9 to under 13 11 10 in each group.
13 to under 17 15 4

2. Calculate the mean:

The average number of viewed pages is about 8.5.


Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-35
The Variance and Standard Deviation of
Grouped Data

Formula for the Sample Variance: Grouped Data


where = the approximate sample mean
= the frequency for class i
= the midpoint for class i

= the total number of observations

k = the number of classes

Formula for the Population Variance: Grouped Data

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-36
Example: The Variance and Standard Deviation
of Grouped Data
Number of Midpoint Frequency
pages (mi) (fi)
1 to under 5 3 6
5 to under 9 7 12
9 to under 13 11 10
Calculate the variance 13 to under 17 15 4
and standard deviation.
Recall that = 8.5.

So the standard deviation is


Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-37
3.5 Measures of Relative Position

Measures of relative position compare the


position of one value in relation to other
values in the data set.
Measures of Relative Position

Percentiles Quartiles

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-38
Quartiles

Quartiles split the ranked data into 4 equal


groups:
• The first quartile (Q1) is the value that constitutes
the 25th percentile.
• The second quartile (Q2) is the value that
constitutes the 50th percentile.
• Note that the second quartile (the 50th percentile) is
the median.
• The third quartile (Q3) is the value that
constitutes the 75th percentile.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-39
Quartiles

Example: Find the first quartile


Sample Data: 11 12 13 16 16 17 18 21 22 22 25
(n = 11)

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-40
Interquartile Range

The interquartile range, IQR, describes the


middle 50% of a range.
Find the IQR by subtracting the first quartile
from the third quartile.
IQR = Q3 – Q1

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-41
Box-and-Whisker Plots
A box-and-whisker plot is a graphical display showing
the relative position of the three quartiles as a box on a
number line.
It also shows the minimum and maximum values in the
data set and any outliers.
Example:

Lowest 1st Median 3rd Highest value


value that is Quartile Quartile that is not an
not an outlier. outlier.
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-42
Example: Constructing a Box-and-Whiskers Plot

n = 15

Index point for Q1:


Q3

Rounding up, Q1 is in the


Q2
fourth position in the
sorted data, so Q1 = 2.37

Q1
Similarly, we find
Q2 = 3.27
Q3 = 4.26

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-43
Example: Constructing a Box-and-Whiskers Plot

• Complete the box-and-whisker plot:

Min Q1 Q2 Q3 Max
0.59 2.37 3.27 4.26 5.97 11.31
(outlier)

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-44

You might also like