Business Statistics: Prof. Lancelot JAMES
Business Statistics: Prof. Lancelot JAMES
Prerequisites-Good STAMINA
Class Participation is Encouraged
Grading: Homeworks/Projects and Final Exam
Textbook: Bowerman, O Connell, Orris (2004) Essentials
of Business Statistics. Mc Graw Hill.
Use the online tutorials
(http://highered.mcgraw-hill.com/sites/
0072827823/student_view0/electronic_
tutorials.html)
Salutations
Prof/Dr. James
Introduction
Descriptive Statistics
What is statistics?
DATA IS EVERYWHERE
Introduction
Descriptive Statistics
Making DECISIONS
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Key Definitions
Introduction
Populations and samples
Descriptive Statistics
Introduction
Populations and samples
Types of Data
Descriptive Statistics
Introduction
Descriptive Statistics
1. Population:
Some Examples of a Population
The Moon
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Qualitative(Categorical)
Introduction
Descriptive Statistics
Quantitative(Numerical)
Introduction
Descriptive Statistics
Questions
1. For each of the following random variables, determine
whether the variable is categorical or numerical(quantitative).
a) Number of telephones per household
b) Type of telephone primarily used
c) Number of long-distance calls made per month
d) Length (in minutes) of longest long-distance call made per
month
e) Color of telephone primarily used
f) monthly charge (in dollars and cents) for long distance
calls made
g) ownership of a cellular phone
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Ordered Array
Introduction
Types of Variables: Qualitative, Quantitative
Descriptive Statistics
Introduction
Descriptive Statistics
Stems-leading digits
Note that often numbers are rounded off and there can be
many different stem and leaf plots for the same data. Stem and
leaf plots display how values are clustered or grouped together
Introduction
Descriptive Statistics
2
20
5
Introduction
Descriptive Statistics
2
20
5320
011146688
3357
23346889999
056789
235799
48
38
6
Introduction
Descriptive Statistics
Stem-and-leaf display
Building a Stem-and-leaf display
Data in raw form: 24, 26, 24, 21, 27, 27, 30, 41, 32, 38
Order the Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Choose Stem unit and Leave Unit: 10s digit for Stem 1s digit
for Leaf
For each measurement: list the leaves of each stem.
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
5 categories
n categories
20 categories
range
# of classes
Introduction
Descriptive Statistics
fi
n
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
96, 171, 202, 178, 147, 102, 153, 197, 127, 82, ....158
Introduction
Descriptive Statistics
213 82
= 26.2 := 30
5
213 82
= 21.83 := 25
6
213 82
Class width =
= 18.71 := 20
7
Class width =
Introduction
Descriptive Statistics
EC
80<100
100<120
120<140
A chart for 7 intervals
140<160
160<180
180<200
200<220
Midpoint
90
110
130
150
170
190
210
Freq
4
7
9
13
9
5
3
Per
8%
14%
18%
26%
18%
10%
6%
Introduction
Descriptive Statistics
Histograms/Polygon graphs
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Scatter Diagram
Introduction
Types of Variables: Qualitative, Quantitative
Descriptive Statistics
Introduction
Descriptive Statistics
Tables and charts are often used for Categorical data. There
are many similarities between the methods for numerical data
and categorical data. One main distinction is that the terms,
classes or class intervals, which are based on a range of
numerical values is replaced by types of objects or
categories.
The idea of frequencies or percentages is then taken with
respect to these categories.
Introduction
Descriptive Statistics
Summary Table
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Bar Chart
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Pie Chart
Introduction
Descriptive Statistics
Pareto Diagram
Introduction
Descriptive Statistics
var 1
V1
V2
Total
var 2
B
Total
Introduction
Tables and Charts for Categorical Data:
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Xi := X1 + X2 + . . . + Xn
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Median
The Median is the middle value in an ordered array of data. The
median is not affected by extreme values and may be
preferable to the mean in this situation.
There are two methods of computing the median of the set of
data depending on whether the sample size is even or odd.
First one needs to remember to order the data from the
minimum to maximum value.
1
When n is odd;
Median =
n+1
ranked observation
2
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Geometric Mean:Investments
The Geometric Mean and the Geometric Rate of Return are
used to measure the status of an investment over time.
Measures the rate of change of a variable over time.
1
G = [(1 + R1 ) (1 + Rn )] n 1
R
where Ri is the rate of return in time period i. The rate of
return is defined to be the loss or gain in period i divided by
the starting value in the period and then multiplied by
100%.
Introduction
Descriptive Statistics
R2 = (
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Quartiles
The Quartiles divide the ranked data into four quarters.
1
The value of the data where 25% of the data is below and
75% are above it is called the 1st quartile, denoted as Q1 .
A formula for Q1 is given as
Q1 =
n+1
ordered observation
4
The value of the data where 75% of the data is below and
25% are above it is called the 3rd quartile, denoted as Q3 .
A formula for Q3 is given as
Q1 =
3(n + 1)
ordered observation
4
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Measures of Variation
However the data sets are quite different. For the first data
set all measures of variation would yield the value 0 while
this will not be the case for the second data set.
Introduction
Descriptive Statistics
Measures of Variation
Range
Introduction
Descriptive Statistics
Measures of Variation
Introduction
Descriptive Statistics
Measures of Variation
Interquartile Range
The formula is
Interquartile range = Q3 Q1
Introduction
Descriptive Statistics
Measures of Variation
Note that,
n+1
6
= = 1.5
4
4
Introduction
Descriptive Statistics
Measures of Variation
(X1 X) + + (Xn X)
S =
n1
2
Introduction
Descriptive Statistics
Measures of Variation
Introduction
Descriptive Statistics
Measures of Variation
4+1+0+1+4
= 2.5
4
Introduction
Descriptive Statistics
Measures of Variation
S =
Pn
2
i=1 Xi
nX
.
n1
In other words this suggest that most values are quite near
the mean.
Introduction
Descriptive Statistics
Measures of Variation
Standard Deviation
1
Introduction
Descriptive Statistics
Measures of Variation
The more spread out, or dispersed, the data are, the larger
will be the Range, the Interquartile Range, the Variance,
and the Standard deviation
Introduction
Descriptive Statistics
Measures of Variation
Coefficient of Variation
The Coefficient of Variation measures the scatter in the data
relative to the mean.
1
S
100%
X
Introduction
Descriptive Statistics
Shape of Data
The third property of a data set is related to the way the data
are distributed. All descriptions of shape are taken relative to
how symmetric the data set is. A data set which is not
symmetric is said to be asymmetrical or skewed
Introduction
Descriptive Statistics
Introduction
Shape of a data set
Question 1
Descriptive Statistics
Introduction
Descriptive Statistics
Question 2
A sociologist recently conducted a survey of citizens over 60
years of age whose net worth is too high to qualify for
subsidized medical care and have no private health insurance.
A summary of ages of the 25 uninsured senior citizens were as
follows
The average age is 74.04, the median age is 73, the first
Quartile is 65, the third Quartile is 81.
Identify which of the statements is correct.
1. One fourth of the senior citizens sampled are below 64
years of age
2. The middle 50% of the senior citizens sampled are
between 65 and 73 years of age
3. 25% of the senior citizens sampled are older than 81 years
of age
4. All of the above are correct
Introduction
Descriptive Statistics
sum
Xi
5
7
1
2
4
19
(Xi X)
1.2
3.2
-2.8
-1.8
0.2
0
X=
S2 =
(Xi X)
1.44
10.24
7.84
3.24
0.04
22.8
19
= 3.8
5
22.8
= 5.7 and S = 2.387
4
S
CV = = 62.8%
X
Xi 2
25
49
1
4
16
95
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
Measure of Variation
Introduction
Descriptive Statistics
Questions
1. For each of the following random variables, determine
whether the variable is categorical or numerical(quantitative).
a) Number of telephones per household
b) Type of telephone primarily used
c) Number of long-distance calls made per month
d) Length (in minutes) of longest long-distance call made per
month
e) Color of telephone primarily used
f) monthly charge (in dollars and cents) for long distance
calls made
g) ownership of a cellular phone
Introduction
Descriptive Statistics
Q1
median
Q3
Xlargest
Introduction
Descriptive Statistics
Q3
Xlargest
S YMMETRICAL DATA
Xsmall
Q1
med
Q3
Xlargest
Q1
med Q3
Xlargest
Introduction
Descriptive Statistics
Skewness
Symmetry/Skewness
Skewed to the right, Symmetrical, Skewed to the left
Introduction
Descriptive Statistics
N
1X
=
(Xi )2
N
i=1
and =
Introduction
Descriptive Statistics
Introduction
Descriptive Statistics
k
1
2
3
4
% of data within k SD
each way from the mean
68%
95%
99%
ALL
Introduction
Descriptive Statistics
1
(2) The Chebyshev Rule: states that at least 1 2
k
data lie within k standard deviation of their mean
(regardless of how skewed the data is).
1
(in %)
k
1 2
k
1 Not calculable (NA)
2
3/4
(75%)
3
8/9
(89%)
4
15/16
(94%)
of the
Introduction
Descriptive Statistics
Example
The mean is = 28.2 and = 6.75.
a. 1 standard deviation: between 21.45 and 34.95 Ans:
Empirical Rule 68%
b. 2 standard deviations:between 14.7 and 41.7 Ans:
Empirical Rule 95%
c. Between 21.45 and 34.95 using Chebyshev rule Ans: NA
d. Between 14.7 and 41.7 using Chebyshev Rule Ans: 75%
e. Between 7.95 and 48.45 using Chebyshev Rule Ans: 89%
f. 94% should have values within 4 standard deviations from
the mean according to Chebyshev Rule, which is between
1.2 and 55.2
Introduction
Descriptive Statistics
Coefficient of Correlation
The coefficient of correlation measures the strength of the
linear relationship between two variables X and Y
1
Introduction
Descriptive Statistics
Xi X
Yi Y
r = v i=1
,
uX
n
u n
2 X
2
t
Xi X
Yi Y
i=1
i=1
Introduction
Descriptive measures of the Population
where 1 r 1 .
Descriptive Statistics
Introduction
Descriptive Statistics
Yi
850
760
900
870
1100
800
650
750
750
570
8000
(Xi X)
-19
-13
-9
-1
10
-1
3
14
5
11
0
(Yi Y)
50
-40
100
70
300
0
-150
-50
-50
-330
0
(Xi X)(Yi Y)
-950
520
-900
-70
3000
0
-450
-700
-250
-3630
-3430
Introduction
Descriptive Statistics
X = 67 and Y = 800
10
X
(Xi X) = 1064
i=1
10
X
i=1
r = 0.1641
The result of r indicates a very weak negative relationship
between price and energy cost.