0% found this document useful (0 votes)
105 views58 pages

Business Statistics: Shalabh Singh Room No: 231 Shalabhsingh@iim Raipur - Ac.in

The mean net worth of the 20 richest individuals in 2007 is $26.9 billion.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views58 pages

Business Statistics: Shalabh Singh Room No: 231 Shalabhsingh@iim Raipur - Ac.in

The mean net worth of the 20 richest individuals in 2007 is $26.9 billion.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Shalabh Singh

Room No: 231


shalabhsingh@iim
raipur.ac.in

Business Statistics
Quantitative Analysis for Management - I

• Probability and Statistics


– Introduction
– Descriptive statistics
– Analyzing real-world datasets
– Probability, Random variable, Conditional probability
– Probability distributions
– Correlation
– Introduction to Regression Analysis
– Introduction to Sampling, Sampling distributions
– Hypothesis testing
– Time series and forecasting

• Evaluation Scheme
Class Participation: 20%
Quiz/Assignment: 20%
Mid-Term Exam: 30%
End-Term Exam: 30%
Finance

• Financial analyst use variety of


statistical information for
investment recommendation.
• For example, Barron’s (February
Examples of 18, 2008) reported that the
average dividend yield for the 30
statistics in stocks in the Dow Jones Industrial
Average was 2.45%. Altria Group
practice showed a dividend yield of 3.05%.
In this case, the statistical
information on dividend yield
indicates a higher dividend yield
for Altria Group than the average
for the Dow Jones stocks.
Marketing
Electronic scanners at retail checkout
counters collect data for a variety of
marketing research applications. For
example, data suppliers such as
ACNielsen and Information Resources,
Inc., purchase point-of-sale scanner
Examples of data from grocery stores, process the
data, and then sell statistical
statistics in summaries of the data to
manufacturers. Manufacturers spend
practice hundreds of thousands of dollars per
product category to obtain this type of
scanner data. Manufacturers also
purchase data and statistical
summaries on promotional activities
such as special pricing and the use of
in-store displays.
Production
Today’s emphasis on quality makes quality
control an important application of statistics
in production. A variety of statistical quality
control charts are used to monitor the output
of a production process. In particular, an x-bar
chart can be used to monitor the average
output. Suppose, for example, that a machine
fills containers with 12 ounces of a soft drink.
Examples of Periodically, a production worker selects a
sample of containers and computes the
statistics in average number of ounces in the sample. This
average, or x-bar value, is plotted on an x-bar
practice chart. A plotted value above the chart’s upper
control limit indicates overfilling, and a
plotted value below the chart’s lower control
limit indicates underfilling. The process is
termed “in control” and allowed to continue
as long as the plotted x-bar values fall
between the chart’s upper and lower control
limits. Properly interpreted, an x-bar chart
can help determine when adjustments are
necessary to correct a production process.
Economics
Economists frequently provide
forecasts about the future of the
economy or some aspect of it.
They use a variety of statistical
information in making such
forecasts. For instance, in
Examples of forecasting inflation rates,
statistics in economists use statistical
information on such indicators as
practice the Producer Price Index, the
unemployment rate, and
manufacturing capacity utilization.
Often these statistical indicators
are entered into computerized
forecasting models that predict
inflation rates.
Information Systems
Information system administrators
are responsible for the day-to-day
operation of an organization’s
computer networks. A variety of
Examples of statistical information helps
statistics in administrators assess the
performance of computer
practice networks, including local area
networks (LANs), wide area
networks (WANs), network
segments, intranets, and other
data communication systems.
Introduction and Descriptive
Statistics

⚫ Using Statistics
⚫ Measures of Central Tendency
⚫ Percentiles and Quartiles
⚫ Measures of Variability
⚫ Skewness and Kurtosis
⚫ Relations between the Mean and Standard Deviation
⚫ Methods of Displaying Data
⚫ Exploratory Data Analysis
⚫ Using the Computer
Learning Objectives
After studying this chapter, you should be able to:

⚫ Distinguish between qualitative data and quantitative


data.
⚫ Describe nominal, ordinal, interval, and ratio scales of
measurements.
⚫ Describe the difference between population and
sample.
⚫ Calculate and interpret percentiles and quartiles.
⚫ Explain measures of central tendency and how to
compute them.
⚫ Create different types of charts that describe data sets.
⚫ Use Excel templates to compute various measures and
create charts.
WHAT IS STATISTICS?

⚫ Statistics is a science that helps us make better


decisions in business and economics as well as in
other fields.
⚫ Statistics teaches us how to summarize, analyze,
and draw meaningful inferences from data that
then lead to improve decisions.
Using Statistics (Two Categories)

⚫ Descriptive Statistics ⚫ Inferential Statistics


✓ Collect ✓ Predict and forecast
✓ Organize values of population
✓ Summarize parameters
✓ Display ✓ Test hypotheses about
values of population
✓ Analyze
parameters
✓ Make decisions
Types of Data - Two Types

⚫ Qualitative - ⚫ Quantitative -
Categorical or Measurable or
Nominal: Countable:
Examples are- Examples are-
✓ Color ✓ Temperatures
✓ Gender ✓ Salaries
✓ Nationality ✓ Number of points
scored on a 100
point exam
Scales of Measurement

• Nominal Scale - groups or classes


✓Gender, color, professional classification, etc.
• Ordinal Scale - order matters
✓Ranks (top ten videos, products, etc.)
• Interval Scale - difference or distance matters
✓Temperatures (0F, 0C)
• Ratio Scale - Ratio matters – “True Zero Point”
✓Salaries, weight, volume, area, length, etc.
• Population
– Collection of all the items or individuals about which you want
to draw a conclusion.
• Sample
– A portion of a population selected for analysis.
• Parameter
– A numerical measure that describes a characteristic of a
population.
• Statistic
– A numerical measure that describes a characteristic of a
sample.
Simple Random Sample

⚫ Sampling from the population is often done


randomly, such that every possible sample of
equal size (n) will have an equal chance of being
selected.
⚫ A sample selected in this way is called a simple
random sample or just a random sample.
⚫ A random sample allows chance to determine its
elements.
Samples and Populations

Population (N) Sample (n)


Why Sample?

⚫Data collection of a population


may be:
✓ Impossible
✓ Impractical
✓ Too costly
Measures of location
Measures of Central Tendency
or Location

• Median ➢ Middle value when


sorted in order of
magnitude
➢ 50th percentile

• Mode ➢ Most frequently-


occurring value

• Mean ➢ Average
Measures of Central Tendency
• Data is clustered around a central point.
• Let the central point represent the data set.
• What is the most appropriate value for that central
point?
• First thing that comes to mind is : AVERAGE or MEAN

The Arithmetic Mean


• Sum of values of all observations (data points) divided by
number of elements in the data set
• Let xi = value of ith data point, and let n be the number of
data points
• Mean 𝑥ҧ =  xi / n
Arithmetic Mean or Average

Population Mean Sample Mean


N n
 =  xi x =  xi
i =1 i =1
Example

The magazine Forbes publishes


annually a list of the world’s
wealthiest individuals. For, 2007,
the net worth of the 20 richest
individuals, in $billions, is as
follows: (data is given on the next
slide). Also, the data has been
sorted in magnitude.
Example – Mean

Sorted
Billions Billions
33 18
26 18
n
538
x =  xi =
24 18
21
19
18
19 = 26.9
20
18
20
20 i =1 20
18 20
52 21
56 22
27 22
22 23
18 24
49 26
22 27
20 32
23 33
32 49
20 52
18 56
Sum = 538
Properties of Arithmetic Mean
• Every data set has a mean and it is unique
• Total value property: Mean times n equals the sum
of all observations.
• (Avg. daily production)  (No. of days) = Total
Production
• Marginal addition
– Add a new item with xi > . How will it affect the mean?
– It pulls the average up.
– What if xi < . Average is pushed down.
Disadvantages of the arithmetic mean
• Outliers ??
– 3, 6, 4, 7, 5, 30 or 33, 42, 39, 45, 37, 28, 8
– Outliers may significantly affect the value of arithmetic
mean.
• Can not be computed for qualitative data
Weighted Mean
• In the formula for mean, each xi is given equal
importance
• But in some instances the mean is computed by
assigning each observation a relative weight. In such
cases mean is referred as weighted mean.
σ 𝑤𝑖 𝑥𝑖
𝑥ҧ = σ 𝑤𝑖
𝑤𝑖 is the weight for observation i.
A small case
Percentage annual returns and growth factors for the mutual
fund data.
Year Return (%) Growth Factor
1 -22.7 .779
2 28.7 1.287
3 10.9 1.109
4 4.9 1.049
5 15.8 1.158
6 5.5 1.055
7 -37.0 .630
8 26.5 1.265
9 15.1 1.151
10 2.1 1.021

Is the mean annual percentage return 5.04% ?


Geometric Mean
• Often used for analyzing growth rates in financial
data
• It is a measure of location that is calculated by
finding the nth root of the product of n values
Geometric Mean
𝑥𝑔 = 𝑛 𝑥1 𝑥2 … 𝑥𝑛
Median
• Middle value, middlemost or most central item
• Half the data items have value below the median,
and half above.
– Arrange n observations in increasing order.
– If n is odd, (n+1)/2th observation is the median.
– If n is even, median = average of (n/2)th and (n/2+1)th
observation.
1, 3, 4, 9, 12, 17, 22 n=7 (n+1)/2 = 4
Median = x4 = 9
1, 3, 4, 9, 12, 17
n=6 n/2 = 3 Median = (x3+x4)/2 = (4+9)/2 = 6.5
Example – Median

Sorted
Billions Billions
33 18
26 18 Median
24 18
21 18 50th Percentile
19 19
20 20
18 20 (20+1)50/100=10.5 22 + (.5)(0) = 22
18 20
52 21
56 22 Median
27 22
22
18
23
24
The median is the middle
49
22
26
27 value of data sorted in
20 32
23 33 order of magnitude. It is
32 49
20 52 the 50th percentile.
18 56
Advantages and disadvantages of the median
• Median, unlike the Mean, is not affected by outliers.

• Can be found for qualitative data also.

• Data must be arrayed first to calculate median


• Example: Suppose there are 7 people who graduate from some
university with degrees in management. They all get jobs, and
their per month salaries are
45,000 48,000 50,000 51,000 53,000 55,000 5,000,000
• The last guy got a job of playing Cricket.
• Average Salary?
• Arithmetic Mean = 757,429
• Median = 51,000

• So, if you were trying to tell prospective management students


what they could expect to earn after graduation, which number
would give - the median or the mean?

• On the other hand, if you were just trying to get people to come
to your university, which number would attract more students?
The Mode
• French word meaning fashion.
• Here it means most frequent: the value that is
repeated most often.
• Consider the following sorted data set.
– 26, 26, 27, 28, 28, 28, 28, 29, 29, 29, 30, 30, 31
– Which value occurs most frequently?
• Mode is not affected by the outliers.
• There may be no mode at all.
• There may be more than one modes.
Example - Mode

Mode = 18
The mode is the most frequently occurring value. It
is the value with the highest frequency.
Example - Mode

Mode = 18

The mode is the most frequently occurring value. It


is the value with the highest frequency.
Percentiles and Quartiles

⚫ Given any set of numerical observations, order


them according to magnitude.
⚫ The Pth percentile in the ordered set is that value
below which lie P% (P percent) of the observations
in the set.
⚫ The position of the Pth percentile is given by
(n + 1)P/100, where n is the number of
observations in the set.
Example

Billions Sorted Billions


33 18
26 18
24 18
21 18
19 19
20 20
18 20
18 20
52 21
56 22
27 22
22 23
18 24
49 26
22 27
20 32
23 33
32 49
20 52
18 56
Example (Continued)
Percentiles

⚫ Find the 50th, 80th and the 90th percentiles of this


data set.
⚫ To find the 50th percentile, determine the data point
in position (n + 1)P/100 = (20 + 1)(50/100)
= 10.5.
⚫ Thus, the percentile is located at the 10.5th
position.
⚫ The 10th observation in the ordered set is 22, and
the 11th observation is also 22.
Example (Continued)
Percentiles

⚫ The 50th percentile will lie halfway between the


10th and 11th values (which are both 22 in this case)
and is thus 22.
Example (Continued)
Percentiles

⚫ To find the 80th percentile, determine the data


point in position (n + 1)P/100 = (20 + 1)(80/100)
= 16.8.
⚫ Thus, the percentile is located at the 16.8th
position.
⚫ The 16th observation is 32, and the 17th
observation is also 33.
⚫ The 80th percentile is a point lying 0.8 of the
way from 32 to 33 and is thus 32.8.
Example (Continued)
Percentiles

⚫ To find the 90th percentile, determine the data point in


position (n + 1)P/100 = (20 + 1)(90/100) = 18.9.
⚫ Thus, the percentile is located at the 18.9th position.
⚫ The 18th observation is 49, and the 19th observation is
also 52.
⚫ The 90th percentile is a point lying 0.9 of the
way from 49 to 52 and is thus 49 + 0.9(52 – 49) = 49 +
0.93 = 49 + 2.7 = 51.7.
Quartiles – Special Percentiles

⚫ Quartiles are the percentage points that break down


the ordered data set into quarters.
⚫ The first quartile is the 25th percentile. It is the point
below which lie 1/4 of the data.
⚫ The second quartile is the 50th percentile. It is the
point below which lie 1/2 of the data. This is also
called the median.
⚫ The third quartile is the 75th percentile. It is the
point below which lie 3/4 of the data.
Quartiles and Interquartile Range

⚫ The first quartile, Q1, (25th percentile) is


often called the lower quartile.
⚫ The second quartile, Q2, (50th
percentile) is often called the median
or the middle quartile.
⚫ The third quartile, Q3, (75th percentile)
is often called the upper quartile.
⚫ The interquartile range is the difference
between the first and the third quartiles.
Example: Finding Quartiles
Sorted
Billions Billions (n+1)P/100 Quartiles
33 18
26 18 Position
24 18
21 18
19 19 19 + (.25)(1) = 19.25
First Quartile (20+1)25/100=5.25
20 20
18 20
18 20
52 21
56 22 (20+1)50/100=10.5 22 + (.5)(0) = 22
Median
27 22
22 23
18 24
49 26
27+ (.75)(5) = 30.75
22 27 Third Quartile (20+1)75/100=15.75
20 32
23 33
32 49
20 52
18 56
Placement
Report - ISB
Measures of dispersion
Measures of dispersion
• Is the Mean, Mode or Median an adequate
representation of data?
• It is measure of central tendency, which totally ignores
variability.
• Variability is a vital aspect of data.
• Enables us to judge the reliability of our measure of
central tendency
• To recognize the dispersion will help to tackle the
problem easily
– (quality control use, financial use)
• Comparing dispersions may be quite useful
Dispersion: Distance Measures
• Distance measures
– Range
– Interfractile range
– Quartile deviation
• Range:
– The difference between the highest and the lowest observed
values
– Range = value of highest observation - value of lowest observation
• Disadvantages:
– Easy to find but usefulness is limited
– Ignores the nature of variation
– Heavily influenced by extreme values
– Changes drastically to one sample to the other in a given
population
– Open-ended distributions has no range
Interfractile Range
• A given fraction of the data lie at or below a fractile
– The median is .5 fractile
• Special fractiles: deciles, percentiles and quartiles
• Interfractile range is the measure of the spread between two
fractiles in a frequency distribution.
• Interquartile range:
– It measures approx. how far from the median we must go on either
side before we can include one-half the values of the data set.
– Interquartile range = Q3 - Q1
Interquartile range

¼ of items ¼ of items

lowest 1st quartile 2nd quartile 3rd quartile highest


observation Q1 Q2 Q3 observation
1st quartile median 3rd quartile

• One-half of the interquartile range is a measure


called the quartile deviation = (Q3-Q1) / 2
Measures of Variability
Dispersion: average deviation measures
• How much do the data points tend to deviate from
the mean?
• Can we define a measure for it ?
• Average deviation from some measure of central
tendency
– Variance
– Standard deviation
– Both deal with the average distance of any observation
from the mean.
Average Deviation?
• 1,2,3,4,5
• Mean : µ = 3
• Deviations? -2, -1, 0, 1, 2
• What is the average deviation? ZERO!
• Ignore the sign of deviation (absolute deviation)
• Average absolute deviation
= (∑|xi- µ |) / n = 6/5 = 1.2
Advantage:
It is a better measure of dispersion than the ranges
because it takes every observation into account.
Disadvantage:
It is mathematically inconvenient
Variance
• Absolute value function is mathematically
inconvenient. Is there an alternative?
• Square the deviations and then take average.
• Variance = 2 = [(xi-)2] / N
• Consider the data set: 1, 2, 3, 4, 5 ,  = 3
2 = (4+1+0+4+1) / 5 = 2.0
Alternative Formula:
• 2 = [(xi-)2] /N
• 2 = xi2/N – 2
• 2 = xi2/N – (xi/N)2
Standard deviation
• A significant change in variance to compute a useful
measure of variation
• Since we squared the deviations before taking average,
square root of this value is more meaningful.
• Moreover, for the variance the units are the squares of
the units.
• Standard deviation is the positive square root of the
variance.
• Standard Deviation =  =  ([(xi-)2] / N)
• This measure has nice mathematical properties, and is
universally accepted.
Variance and Standard Deviation

Population Variance Sample Variance

(x - x)
n
N 2

(x - )2

s =
2 i =1

 2 = i=1
N
(n - 1)
( )
2

( x)
2
N n
 x
i =1
N
x -
n

 -
x2 i =1 2

= n
i =1
= i=1 N
N (n - 1)
=  2

s= s 2
X x- (x-)2
120 -22.5 506.25
125 -17.5 306.25
130 -12.5 156.25
135 -7.5 56.25
140 -2.5 6.25
145 2.5 6.25
150 7.5 56.25
155 12.5 156.25
160 17.5 306.25
165 22.5 506.25
 = 142.5 2 = 206.25
 = 14.3614
Example

Billions Sorted Billions


33 18
26 18
24 18
21 18
19 19
20 20
18 20
18 20
52 21
56 22
27 22
22 23
18 24
49 26
22 27
20 32
23 33
32 49
20 52
18 56
1-
58
Calculation of Sample Variance

x x-x (x - x) 2 x2
18 -8.9 79.21 324 n
18 -8.9 79.21 324  (x - x) 2
2657 .8
18 -8.9 79.21 324 s2 = i =1
=
18 -8.9 79.21 324 (n - 1) (20 - 1)
19 -7.9 62.41 361 2657 .8
= = 139.88421
20 -6.9 47.61 400 19
2
20
20
-6.9
-6.9
47.61
47.61
400
400
 n x 
x 2 -  i =1 
n
21 -5.9 34.81 441  n
22 -4.9 24.01 484 = i =1
22 -4.9 24.01 484 (n - 1)
2
23 -3.9 15.21 529 289444
24 -2.9 8.41 576 17130 - 538 17130 -
= 20 = 20
26 -0.9 0.81 676 (20 - 1) 19
27 0.1 0.01 729 17130 - 14472 .2 2657 .8
32 5.1 26.01 1024 = = = 139.88421
19 19
33 6.1 37.21 1089
s= = 139.88421 = 11.82
2
49 22.1 488.41 2401 s
52 25.1 630.01 2704
56 29.1 846.81 3136
538 0 2657.8 17130

You might also like