QBM 101 Business Statistics: Department of Business Studies Faculty of Business, Economics & Accounting HE LP University

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 62

QBM 101 Business Statistics

Department of Business Studies


Faculty of Business, Economics &
Accounting
HE
SUBJECT OUTLINE:
 Module 1: Introduction; organizing
and graphing data; numerical
descriptive measures
 Module 2: Probability, discrete random
variables; continuous random variables and
the normal distribution
 Module 3: Sampling distributions;
estimation; hypothesis testing
 Module 4: Simple linear regression
CHAPTER 3: NUMERICAL
DESCRIPTIVE MEASURES

3.1 Measures of central tendency
for ungrouped data

3.2 Measures of dispersion
for ungrouped data

3.3 Mean, variance, and standard
deviation for grouped data
 3.4 Use of standard deviation
 3.5 Measures of position
 3.6 Box-and-whisker plot
Mean for population data,   N x

Mean for sample data, x  n x


Table 3.1 lists the total cash donations (rounded to millions of
dollars) given by eight U.S. companies during the year 2010.
Find the mean of cash donations made by these eight companies.

 x  319 199 110  63  21  315  26  63 


1116
Example 3-2: The following are the ages (in years) of all
eight employees of a small company:

53 32 61 27 39 44 49 57

Find the mean age of these employees.

The population mean is

  x 362
  45.25 years
N 8

Thus, the mean age of all eight employees of this


company is 45.25 years, or 45 years and 3 months.
Table 3.2 lists the total number of homes lost to foreclosure in
seven states during 2010.

Note that the number of homes foreclosed in California


is very large compared to those in the other six states.
Hence, it is an outlier.
Mean without the outlier

49,723  20,352  10,824  40,911  18,038 



61,848
201,696 6
 6  33,616
Mean with the
outlier
173,175  49, 723  20,352 10,824  40,91118, 038 

61,848
374,871 7
 7  53,553
When outliers exist, use median instead
of mean as a measure of central tendency.
MEDIAN
 The median is the value of the middle term in a data set that
has been ranked in increasing order.

 n 1 
th

Median =  2  value

173,175 49,723 20,352 10,824 40,911 18,038 61,848


Find the median for these data.
28.0  28.2 56.2
Median    2 8 . 1  $ 2 8 . 1 million
2 2
MEDIAN

 The median gives the center of a histogram,


with half the data values to the left of the
median and half to the right of the median.
 The advantage of using the median as a
measure of central tendency is that it is not
influenced by outliers.
 Consequently, the median is preferred over the
mean as a measure of central tendency for data
sets that contain outliers.
MODE
 The mode is the value that occurs with the highest frequency
in a data set.
 Example: 77 82 74 81 79 84 74 78
 Mode = 74
 Advantage: Can be used for both QL and QT data,
whereas the mean and median can be calculated for only
quantitative data.
 Disadvantage: (dependent on the nature of
data set)
 No mode: (1, 2, 3, 5, 7, 9)
 Unimodal: A data set with only one mode. (1, 2, 2,
3, 4)
 Bimodal: A data set with two modes. (1, 2, 2, 3, 3)
 Multimodal: A data set with more than
two modes. (1, 2, 2, 3, 3, 4, 4, 5)
EXAMPLE 3-10

The status of five students who are members of the student


senate at a college are senior, sophomore, senior, junior,
and senior, respectively. Find the mode.

Because senior occurs more frequently than the other


categories, it is the mode for this data set.

We cannot calculate the mean and median for this


qualitative data set. (Not applicable for QL data)
RELATIONSHIPS AMONG THE MEAN, MEDIAN,
AND MODE

Symmetric

Skewed to the left / negatively skewed Skewed to the right / positively skewed
EXERCISE 1

3.10 The following data set belongs to a


sample:

12, 4, -10, 8, 8, -13

Calculate the mean, median,


mode. and
 Range = Largest value – smallest value

Range = Largest value – smallest value


= 267,277 – 49,651
= 217,626 square miles

Disadvantage: influenced by outliers, only take into consideration


two values and all the other values are ignored.
VARIANCE AND STANDARD DEVIATION
 Standard deviation tells how closely the values of a data
set are clustered around the mean.
 A lower value of the s. d. means that the data set are
spread over a relatively smaller range around the
mean.
 In contrast, a larger value of the s. d. for a data set
means that the data set are spread over a relatively
larger range around the mean.

 2   x   and s2   x  
N n1
2
x  
2

 x
   xN  and s 
2 x 2n 
The marks of 4 students from a class of 1000
students are given as follows:

(2) 2 112  (17)2  2


s  nx  
8 4 
478
3  12.62
1 1
x
2
SHORTCUT FORMULA

 

N
x 2
and s 2 
n
2
x
  Nx   n x 
2 2 2

1
    
   x 2
N and s   x 2
n n
Nx  2
 1x  2
  1, 746, 098  2, 854 
2

  2 6
n
x 2

s  x 
2
n 1

6 1
1, 746, 098 1, 357, 552.667

5
 
 77,
s 
709.06666
x 2
2n 77, 709.06666
x
n 1 
 278.7634601 
EXAMPLE 3-13
Following are the 2011 earnings (in thousands of
dollars) before taxes for all six employees of a
small company.

88.50 108.40 65.50 52.50 79.80


54.60

Calculate the variance and standard deviation for


these data.

  35978.51 449.30 
2


N
x2
6
 
2 2
Nx 6
 388.90 thousand2  388.90 10002  388, 900, 000
 388.90  $19.721 thousand = $19721
EXERCISE 2

3.50 The following data give the


number of hot dogs consumed by 10
participants in a hot-dog-eating contest.

21 17 32 5 25 15 17 21 9 24

Calculate the range, variance, and standard


deviation for these data.
  mf
,x  mf

N n
Example 3-14: The table shows the daily commuting times (in
minutes) from home to work for all 25 employees of a company.

  mf

535
 21.40 minutes
N 25
EXAMPLE 3-15
The table gives the frequency distribution of the number
of orders received each day during the past 50 days at the
office of a mail-order company.
Calculate the mean.
x  mf

832
 16.64 orders
n
50
VARIANCE AND STANDARD DEVIATION

, s   f m  x 
2

2   f  mN  
2 2

n 1

(mf )  mf 
2
2

m f
2 N
2
n f
m 2

 N ,s 
2
n
 
1
(mf )  mf 
2
2

m 2
f
N m 2
f n n
 N ,s 1
 

Example 3-16: The table give the frequency distribution of
the daily commuting times (in minutes) from home to
work for all 25 employees of a company. Calculate the
variance and standard deviation.
(535) 2
(  mf )
2
14,825 

N
m f 
2
25
3376
 135.04
2  N  25  25

 2  135.04  11.62 minutes


Example 3-17: The table gives the frequency distribution
of the number of orders received each day during the
past 50 days at the office of a mail-order company.
Calculate the variance and standard deviation.
(832)2
( mf )2 14, 216
 m 2
f
n
s2  n 50  50  7.5820
1 1
s  s 2  7.5820  2.75 orders
EXERCISE 3
3.68 The following table gives information on the
amounts (in dollars) of electric bills for August
2012 for a sample of 50 families. Find the
mean, variance, and standard deviation.

Amount (dollars) Number of families


0 to less than 60 3
60 to less than 120 17
120 to less than 180 13
180 to less than 240 11
240 t0 less than 300 6
Chebyshev’s Theorem:
For any number k greater than 1, at least (1 – 1/k²) of
the data values lie within k standard deviations of the
mean. (applicable to any distribution)
EXAMPLE 3-18
The average systolic blood pressure for 4000 women who
were screened for high blood pressure was found to be 187
mm Hg with a standard deviation of 22. Using
Chebyshev’s theorem, find at least what percentage of
women in this group have a systolic blood pressure
between 143 and 231 mm Hg.
Given  187 and  22.

1
44 1 1
k  22  2,1 2  (2) 2  1
4  1.25  .75 or 75%
k
1
EMPIRICAL RULE
For a bell shaped distribution, approximately
1. 68% of the observations lie
within one
standard deviation of the mean
2. 95% of the observations lie
within two
standard deviations of the mean
3. 99.7% of theobservations lie within
three
standard deviations of the mean
EXAMPLE 3-19

The age distribution of a sample of 5000 people is bell-


shaped with a mean of 40 years and a standard deviation of
12 years. Determine the approximate percentage of people
who are 16 to 64 years old.
Given x  40 and s 
12. 64  40  40 16 
24
Each of the two points, 16 and 64, is 24 units away from
the mean. Because the area within two standard
deviations of the mean is approximately 95% for a bell-
shaped curve, approximately 95% of the people in the
sample are 16 to 64 years old.
RANGE APPROXIMATION OF S.D.
Range
Range  4 (for population)    4
Range Range = (Giá trị lớn nhất – GTNN)
Range  4s (for sample)  s  4
Assuming that the maximum weight of students in a class
is 102kg and the minimum weight is 32kg. What is the
approximate value of standard deviation?

Range 102  32 70
s    17.5kg
4 4 4
COEFFICIENT OF VARIATION (CV)
3.57 Comparing the variability of two different data sets
that have different units of measurement. CV expresses
standard deviation as a percentage of the mean. A low CV
indicates that there is a low variation in the data set and
hence, a higher consistency.


CV  100% (population)


s
CV  x 100% (sample)
3.57 The yearly salaries of all employees who work for a
company have a mean of $62,350 and a standard deviation
of $6820. The years of experience for the same employees
have a mean of 15 years and a standard deviation of 2
years. Is the relative variation in the salaries larger or
smaller than that in years of experience for these
employees?

 6820
100  100 
CVsalary  
10.94% 62350
 2
CVexperience  100  100 

13.33% 15
The relative variation in salaries is lower than
that in years of experience.
EXERCISE 4
3.81 The mean monthly mortgage paid by all home
owners in a town is $2365 with a standard deviation of
$340.

(a)Using Chebyshev’s theorem, find at least what percentage


of current credit card debts for all households are between:
(i) $1685 to $3045
(ii) $1345 to $3385

(b) Using Chebyshev’s


theorem, find the
interval that
contains the monthly
mortgage payments of at
least 84% of all home
EXERCISE 4
3.84 The prices of all college textbooks follow a bell- shaped
distribution with a mean of $180 and a standard deviation of
$30.

(a)Using the empirical rule, find the percentage of all college


textbooks with their prices between:
(i) $150 and $210
(ii) $120 and $240

(b) Using the empirical rule, find the interval


that
contains the prices of 99.7% of college textbooks.

(c)Using Chebyshev’s theorem, find the interval that


contains the price of at least 36% of college textbooks.
EXERCISE 4
3.58 The SAT scores of 100 students have a mean of 1020
and a standard deviation of 115. The GPAs of the same 100
students have a mean of 3.21 and a standard deviation of .
26. Is the relative variation in SAT scores larger or smaller
than that in GPAs? Which measurement has a higher
consistency?
 Quartiles

IQR = Interquartile range = Q3 – Q1


EXAMPLE 3-20
Find the three quartiles and the IQR.
IQR = Interquartile range = Q3 – Q1
= 51.5 – 24.05 = $27.45 million
EXAMPLE 3-21

The following are the ages (in years) of nine


employees of an insurance company:

47 28 39 51 33 37 59 24
33

(a)Find the values of the three quartiles. Where does the


age of 28 years fall in relation to the ages of the
employees?

(b) Find the interquartile range.


The age of 28 falls in the lowest 25% of the ages.

IQR = Q3 – Q1
= 49 – 30.5
= 18.5 years
PERCENTILES AND PERCENTILE RANK

Pk  Value of the  


kn
 th term in a ranked data set

100
EXAMPLE 3-22

The data arranged in increasing order is as follows:

21.6 21.7 22.9 25.2 26.5 28.0


28.2 32.6 32.9 70.1 76.1 84.5

The position of the 60th percentile is - 12 là có 12 số


- Tại vị trí thứ 7 là 28.2

kn (60)(12)
100  100  7.20th term  7th term
P60 = 60th percentile = 28.2 = $28.2
million
Percentile rank of xi
Number of values less than xi
= ×100
Total number of values in the data set
Find the percentile rank for $26.5 million. Give a
brief interpretation of this percentile rank.
21.6 21.7 22.9 25.2 26.5 28.0
28.2 32.6 32.9 70.1 76.1 84.5

In this data set, 4 of the 12 values are less than


$26.5 million. Hence,
4
Percentile rank of 26.5 × 100 =
1
= 33.33%
Approximately 33.33% of these 2 12 CEOs had 2010 total
compensations of less than $26.5 million. Hence,
66.67% of these 12 CEOs had $26.5 million or higher
total compensations in 2010.
EXERCISE 5
3.90 The following data give the weights (in pounds)
lost by 15 members of a health club at the end of 2
months after joining the club.

4 10 8 7 24
12 5 13 11 10
20 9 8 9 18
(a)Compute the values of the three quartiles and
the interquartile range.
(b)Calculate the (approximate) value of the 82nd
percentile.
(c) Find the percentile rank of 10.
A plot that shows the center,
spread, and skewness of a data
set. It is constructed by drawing a
box and two whiskers that use the
median, the first quartile, the
third quartile, and the smallest
and the largest values in the data set
between the lower and the upper
inner fences.
BOX-AND-WHISKER PLOT: STEPS
(i)Rank the data in increasing order and calculate the three
quartiles and the IQR
(ii) Calculate L.I.F. = Q1 – 1.5 x IQR
U.I.F. = Q3 + 1.5 x IQR
(iii) Determine the largest and smallest value in the
given data set within the two inner fences.
(iv)Draw a horizontal line and mark the data on it such
that all values are covered. Above the horizontal line, draw a
box with three vertical lines indicating the three quartiles.
(v)Draw two lines/ whiskers that join the box with the
largest and smallest values within the two inner fences found
in step (iii). A value that falls outside the two inner fences is
an outlier, marked by an asterisk.
* Five number summary: minimum, first quartile,
median, third quartile, maximum
EXAMPLE 3-24

The following data are the incomes (in thousands of


dollars) for a sample of 12 households.

75 69 84 112 74 104 81 90 94 144 79


98

Construct a box-and-whisker plot for these data.

Ranked data:
69 74 75 79 81 84 90 94 98 104 112
144
69 74 75 79 81 84 90 94 98 104 112 144
Median = (84 + 90) / 2 = 87
Q1 = (75 + 79) / 2 = 77
Q3 = (98 + 104) / 2 = 101
IQR = Q3 – Q1 = 101 – 77 = 24
1.5 x IQR = 1.5 x 24 = 36
Lower inner fence = Q1 – 36 = 77 – 36 = 41
Upper inner fence = Q3 + 36 = 101 + 36 = 137
Smallest value within the two inner fences = 69
Largest value within the two inner fences = 112
EXERCISE 6
3.108 The following data give the numbers of new cars
sold at a dealership during a 20-day period.

3 3 4 5 5 6 7 7 8 8
8 9 9 10 10 11 11 12 12 16

Make a box-and-whisker plot. Comment on the


skewness of these data.
EXCEL
EXCEL
EXCEL
SUMMARY
 Measures of central tendency (mean, mode,
median) and dispersion (range, standard
deviation, variance) for grouped and
ungrouped data
 Skewness of data set

Chebyshev’s theorem and
empirical rule; coefficient of variation
 CV and range of approximation of SD

Measure of (quartiles and
position percentiles)
 Box-and-whisker plot

You might also like