Business Statistics: Measures of Central Tendency

Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

Business Statistics

By

Dr. Yousaf Ali Khan

Measures of Central Tendency


Shape – Center - Spread
• When we gather data, we want to uncover the
“information” in it. One easy way to do that is
to think of: “Shape –Center- Spread”
• Shape – What is the shape of the histogram?
• Center – What is the mean or median?
• Spread – What is the range or standard
deviation?
Measures of Central Tendency
• Measures of central tendency are statistical measures
which describe the position of a distribution.
• They are also called statistics of location, and are the
complement of statistics of dispersion, which provide
information concerning the variance or distribution of
observations.
• In the univariate context, the mean, median and mode are
the most commonly used measures of central tendency.
• computable values on a distribution that discuss the
behavior of the center of a distribution.
Three types of measures of central
tendency
❖There are three common measures of central
tendency:

• Mean (average)
• Median (middle)
• Mode (most)
The Mean
The mean (arithmetic mean or average) of a set of data is
found by adding up all the items and then dividing by the sum
of the number of items.
The mean of a sample is denoted by x (read “x bar”).
The mean of a complete population is denoted by  (the
lower case Greek letter mu).
The mean of n data items x1, x2,…, xn, is given by the formula

or
Example
Ten students were polled as to the number of siblings in their
individual families.
The raw data is the following set: {3, 2, 2, 1, 3, 6, 3, 3, 4, 2}.
Find the mean number of siblings for the ten students.

siblings
Weighted Mean
• The weighted mean of n numbers x1, x2,…, xn, that are
weighted by the respective factors f1, f2,…, fn is given by the
formula:
w=
(x f ) .
• Example
f
Listed below are the grades of a students semester courses. Calculate the
Grade Point Average (GPA).
Course Grade Points Credits x*f
(x) (f)
Math A 4 5
History B 3 3
Health A 4 2
Art C 2 2
Median
• Another measure of central tendency, is the median.
• This measure divides a group of numbers into two parts, with
half the numbers below the median and half above it.
• The median is not as sensitive to extreme values as the mean.
• To find the median of a group of items:
• 1. Rank the items.
• 2. If the number of items is odd, the median is the
middle item in the list.
• 3. If the number of items is even, the median is the mean of the
two middle numbers.
Example
• Ten students in a math class were polled as to the number of siblings in
their individual families and the results were:
3, 2, 2, 1, 1, 6, 3, 3, 4, 2.
Find the median number of siblings for the ten students.

Position of the median: 10/2 = 5


Between the 5th and 6th values

Data in order: 1, 1, 2, 2, 2, 3, 3, 3, 4, 6

Median = (2+3)/2 = 2.5 siblings


Median
Example:
Nine students in a math class were polled as to the number of
siblings in their individual families and the results were:
3, 2, 2, 1, 6, 3, 3, 4, 2.
Find the median number of siblings for the ten students.
Position of the median: 9/2 = 4.5
The 5th value

In order: 1, 2, 2, 2, 3, 3, 3, 4, 6

Median = 3 siblings
Median in a Frequency Distribution
Example:
Find the median for the distribution.
Value (x) 1 2 3 4 5
Frequency (f) 4 3 2 6 8

Position of the median is the sum of the frequencies divided by 2.


 (f) 23
Position of the median = = = 11.5 = 12th term
2 2
Add the frequencies from either side until the sum is 12.
The 12th term is the median and its value is 4.
Mode
The mode of a data set is the value that occurs the most often.
If a. distribution has two modes, then it is called bimodal.
In a large distribution, this term is commonly applied even
when the two modes do not have exactly the same frequency
Example:
Ten students in a math class were polled as to the number of
siblings in their individual families and the results were: 3, 2,
2, 1, 3, 6, 3, 3, 4, 2. Find the mode for the number of siblings.

3, 2, 2, 1, 3, 6, 3, 3, 4, 2
The mode for the number of siblings is 3.
Mode in a Frequency Distribution
Example:
Find the mode for the distribution.
Value (x) 1 2 3 4 5
Frequency (f) 4 3 2 6 8

The mode in a frequency distribution is the value that has the


largest frequency.
The mode for this frequency distribution is 5 as it occurs eight
times.
Geometric Mean
• The geometric mean is calculated by finding the nth root
of the product of n values.
• It is often used in analyzing growth rates in financial data
(where using the arithmetic mean will provide misleading
results).
• It should be applied anytime you want to determine the
mean rate of change over several successive periods (be
it years, quarters, weeks, . . .).
• Other common applications include changes in
populations of species, crop yields, pollution levels, and
birth and death rates.
Geometric Mean
𝑛
𝑥𝑔ҧ = 𝑥1 𝑥2 … (𝑥𝑛 )

= [(x1)(x2)…(xn)]1/n

• Example: Rate of Return


Period Return (%) Growth Factor
1 -6.0 0.940
2 -8.0 0.920
3 -4.0 0.960
4 2.0 1.020
5 5.4 1.054
5
𝑥𝑔ҧ = .94 . 92)(.96)(1.02)(1.054)
= [.89254]1/5 = .97752
Shape: The “shape” of the data is
called its “distribution”?
• If mean = median = mode, the shape of the distribution is symmetric.
• If mode < median < mean, the shape of the distribution trails to the right, is
positively skewed.
• If mean < median < mode, the shape of the distribution trails to the left, is
negatively skewed.
• Distributions of various “shapes” have different properties and names such
as the “normal” distribution, which is also known as the “bell curve”
(among mathematicians it is called the Gaussian Distribution).

Left-Skewed Symmetric Right-Skewed

Mean Median Mod Mean = Median = Mode Mode Median Mean


e
Measures of Dispersion
• Measures of dispersion characterise how spread out the
distribution is, i.e., how variable the data are.
• Commonly used measures of dispersion include:
1. Range
2. Variance & Standard deviation
3. Coefficient of Variation (or relative standard deviation)
4. Inter-quartile range
Range
• the sample Range is the difference between the largest and
smallest observations in the sample
• easy to calculate;
– Blood pressure example: min=113 and max=170, so
the range=57 mmHg
• useful for “best” or “worst” case scenarios ☺
• sensitive to extreme values 
• Sometimes range is reported as an interval, anchored between
the smallest and largest data value, rather than the actual width
of that interval
Percentiles
• A percentile provides information about how the data are spread
over the interval from the smallest value to the largest value.
• The pth percentile of a data set is a value such that at least p percent of the items
take on this value or less and at least (100 - p) percent of the items take on this
value or more.

➢ Colleges and universities frequently report admission test scores in terms of


percentiles.
For instance, suppose an applicant obtains a raw score of 54 on the verbal portion of
an admission test. How this student performed in relation to other students taking the
same test may not be readily apparent. However, if the raw score of 54 corresponds to
the 70th percentile, we know that approximately 70% of the students scored lower
than this individual and approximately 30% of the students scored higher than this
individual.
Percentiles
• Arrange the data in ascending order.
• Compute Lp, the location of the pth percentile.

Lp = (p/100)(n + 1)
80th Percentile
• Example: Apartment Rents
Lp = (p/100)(n + 1) = (80/100)(70 + 1) = 56.8
(the 56th value plus .8 times the
difference between the 57th and 56th values)
80th Percentile = 635 + .8(649 – 635) = 646.2
Quartiles
• Quartiles are specific percentiles.
• First Quartile = 25th Percentile
• Second Quartile = 50th Percentile = Median
• Third Quartile = 75th Percentile
Inter-quartile range
• The Median divides a distribution into two halves.

• The first and third quartiles (denoted Q1 and Q3) are


defined as follows:
– 25% of the data lie below Q1 (and 75% is above Q1),
– 25% of the data lie above Q3 (and 75% is below Q3)
• The inter-quartile range (IQR) is the difference between
the first and third quartiles, i.e.
IQR = Q3- Q1
Inter-quartile range
An alternative definition of Q1 and Q3 is based on Q1
having a rank position = 0.25(n+1) and Q3 having
rank position = 0.75(n+1), where n is the sample
size.

If n=10, then Q1 would have rank position =


0.2511=2.75 and Q3 has rank position = 8.25 .
Therefore Q1 is found by interpolating between the
second an third observations and Q3 is found by
interpolating between observations 8 and 9.
Example
The ordered blood pressure data is:
113 124 124 132 146 151 170

Q1 Q3

Inter Quartile Range (IQR) is 151-124 = 27


Box-plots
• A box-plot is a visual description of the
distribution based on
– Minimum
– Q1
– Median
– Q3
– Maximum
• Useful for comparing large sets of data
Example
The pulse rates of 12 individuals arranged in
increasing order are:
62, 64, 68, 70, 70, 74, 74, 76, 76, 78, 78, 80

Q1=(68+70)2 = 69, Q3=(76+78)2 = 77

IQR = (77 – 69) = 8


Box-plot
Sample and Population Variance
• The sample variance, s2, is the arithmetic mean of
the squared deviations from the sample mean:
• Variation About the Mean:
• For the Population:
• For Sample: n
(
 ix − x )2

s 2 = i =1
n −1

• Variance is one of the most frequently used


measures of spread,
Standard Deviation
• The sample standard deviation, s, is the
square-root of the variance
n
 ( xi − x )2
i =1
s=
n −1

for a population  = 2

◼ s has the advantage of being in the same units as the


original variable x
Example
Data Deviation Deviation2
151 13.86 192.02
124 -13.14 172.73
132 -5.14 26.45
170 32.86 1079.59
146 8.86 78.45
124 -13.14 172.73
113 -24.14 582.88
Sum = 960.0 Sum = 0.00 Sum = 2304.86
Example (contd.)

 (x − x ) = 2304 .86
2
i
i =1

2304 .86
Therefore,
s=
7 −1
= 19 .6
Coefficient of Variation
.• The coefficient of variation (CV) or relative standard
deviation (RSD) is the sample standard deviation
expressed as a percentage of the mean, i.e.
s
CV =   100 %
x
• The CV is not affected by multiplicative changes in scale
• Consequently, a useful way of comparing the dispersion
of variables measured on different scales
Example
The CV of the last example is:

 19.6 
CV = 100   %
 137 .1 
= 14.3%

i.e., the standard deviation is 14.3% as large as


the mean.
Measures of Association Between Two Variables
• Thus far we have examined numerical methods used to summarize
the data for one variable at a time.
• Often a manager or decision maker is interested in the relationship
between two variables.
• Two descriptive measures of the relationship between two
variables are covariance and correlation coefficient.
Covariance
• The covariance is a measure of the linear association between
two variables.
• Positive values indicate a positive relationship.
• Negative values indicate a negative relationship.

• The covariance is computed as follows:

σ(𝑥𝑖 −𝑥)(𝑦
ҧ ത
𝑖 −𝑦)
For samples: 𝑠𝑥𝑦 =
𝑛−1

σ(𝑥𝑖 −𝜇𝑥 )(𝑦𝑖 −𝜇𝑦 )


For populations: 𝜎𝑥𝑦 =
𝑁
Correlation Coefficient
• Correlation is a measure of linear association and not
necessarily causation.
• Just because two variables are highly correlated, it does not
mean that one variable is the cause of the other.

• The correlation coefficient is computed as follows:


𝑠𝑥𝑦
For samples: 𝑟𝑥𝑦 =
𝑠𝑥 𝑠𝑦

𝜎𝑥𝑦
For populations: 𝜌𝑥𝑦 =
𝜎𝑥 𝜎𝑦
Correlation Coefficient
• The coefficient can take on values between -1 and +1.
• Values near -1 indicate a strong negative linear relationship.
• Values near +1 indicate a strong positive linear relationship.
• The closer the correlation is to zero, the weaker the relationship.
Covariance and Correlation Coefficient
• Example: Golfing Study
A golfer is interested in investigating the relationship, if any,
between driving distance and 18-hole score.
Average Driving Average
Distance (yds.) 18-Hole Score
277.6 69
259.5 71
269.1 70
267.0 70
255.6 71
272.9 69
Covariance and Correlation Coefficient
• Example: Golfing Study
x y (𝑥𝑖 -𝑥)ҧ (𝑦𝑖 -𝑦)
ത (𝑥𝑖 -𝑥)ҧ (𝑦𝑖 -𝑦)

277.6 69 10.65 -1.0 -10.65


259.5 71 -7.45 1.0 -7.45
269.1 70 2.15 0 0
267.0 70 0.05 0 0
255.6 71 -11.35 1.0 -11.35
272.9 69 5.95 -1.0 -5.95
Average 267.0 70.0 Total -35.40
Std. Dev. 8.2192 .8944
Covariance and Correlation Coefficient
• Example: Golfing Study
• Sample Covariance
σ(𝑥𝑖 −𝑥)(𝑦
ҧ ത
𝑖 −𝑦) −35.40
𝑠𝑥𝑦 = = = -7.08
𝑛−1 6−1

• Sample Correlation Coefficient


𝑠𝑥𝑦 −7.08
𝑟𝑥𝑦 = = = −.9631
𝑠𝑥 𝑠𝑦 8.2192 .8944)

You might also like