Module 3 - Statistics
Module 3 - Statistics
LEARNING OBJECTIVES
A Variable
A quantity that may assume a succession of values
A symbol standing for any one of a class of things [Webster’s]
Fundamental building block used in the construction of algebraic
equations
Multiplication
xn * xm= xn+m
i.e., x*x = x1+1 = x2
x4 * x2 = x4+2 = x6
2 * 22 = 21+2 = 23 = 8
Division
xn / xm = xn-m
i.e., x2 / x = x2-1 = x
x6 / x2 = x6-2 = x4
2 / 22 = 21-2 = 2-1 = ½
Exponential
(xn)m = xn*m
i.e., (x)2 = x1*2 = x2
(x4 )2 = x4*2 = x8
(2x)2 = 22 x2 = 4x2
(x2 / y3)2 = x2*2 / y3*2 = x4 / y6
EQUATIONS
Solving Equations
4y = 3x + 12
4y/4 = (3x +12)/4
y = 3/4x + 12/4
then
y = 3/4x + 3
Can consolidate variables into a more manageable form:
y - x = 2x + 3x - 7
y - x + x = 2x + 3x + x - 7
y = x(2 + 3 + 1) - 7
then
y = 6x - 7
Can expand the equation:
y = x(1 + x) - 6
y = x + x2 - 6
then
y = x2 + x - 6
Further, y = x2 + x - 6 can be factored:
y = (x + 3)(x - 2)
example: What if ‘someone’ wanted to know, for what values does y
= 0?
solution: y = 0 when (x + 3) = 0 or when (x - 2) = 0
x+3=0
x+3-3=0-3
then
x = -3
or
x-2=0
x -2+2=0+2
then
x=2
Can substitute equations into each other:
example #1:
y = 4x + 2
x=4
then
y = 4 (4) + 2
y = 16 + 2
y = 18
Substitution -- continued
example #2:
y = 5x + 12
y = 7x - 2
Since both equations are equal to ‘y’, we can set the two equations
equal to each other:
7x -2 = 5x + 12
7x - 5x - 2 + 2 = 5x - 5x + 12 + 2
7x - 5x = 12 + 2
2x = 14
then
x=7
Further, what is the value of ‘y’ when x=7?
y = 5(7) + 12
y = 35 + 12
then
y = 47
or
y = 7(7) - 2
y = 49 -2
then
y = 47
Interpreting Equations
y = mx + b
m ≡ slope = Δy / Δx
and represent s the change in the value of ‘y’ for a ‘unit’ change in ‘x’.
b ≡ intercept term
and represents the value of ‘y’ when x = 0. Thus, the value of ‘y’ as
a
function of ‘x’ is shifted by the value of ‘b’.
example:
y=x
y=x+2
y = 3x + 4
Variable Values
x 0 1 2 3 4
y=x 0 1 2 3 4
y=x+2 2 3 4 5 6
y = 3x + 4 4 7 10 13 16
for y = x
the slope = 1
the intercept = 0
for y = x + 2
the slope = 1
the intercept = 2
for y = 3x + 4
the slope = 3
the intercept = 4
Functions
Given all this, when will I ever use this in my life (other than Math class)?
Answer: YES! Here are a few examples.
MARTINI:
Add to Ice
Stir or Shake
or
M = V + 5G
Non-linear Functions
numerical example:
y = 2x2 + 3x + 1
y = x3 + 2x2 + 4x
y = 1/x
Variable Values
x 0 1 2 3 4
y = 2x2 + 3x + 1 1 6 15 28 45
y = x3 + 2x2 + 4x 0 7 32 57 112
y = 1/x * 1 0.5 0.333 0.25
Comments
1. The quadratic function and the cubic function each have an intercept
(y takes a value when x=0) but the inverse function does not [y is
undefined or is defined to be infinity (y=∞)].
2. x=0 is not in the domain of the inverse function: it is not a permissible
input.
3. Note that the change in y for a given change in x is no longer
constant as in the linear examples. Non-linear functions have
an instantaneous slope, which is a slope only for a given value of x.
The instantaneous slope is obtained by evaluating a function’s
derivative for a particular value of x.
Concavity and Convexity
y = x3
y = 0.5x0.5
Variable Values
x 0 1 2 3 4
y = x3 0 1 8 27 64
Δy NA 1 7 19 37
Variable Values
y = x3 1 8 3.375 4.5
y = 3x + 1 4 7 5.5 5.5
Comments
1. y = x3 is strictly convex since f(1+2)/2] < [(f(1) +f(2)]/2.
2. y = 0.5x0.5 is strictly concave since f(1+2)/2] > [(f(1) +f(2)]/2.
3. y = 3x + 1 is concave since f(1+2)/2] ≥ [(f(1) +f(2)]/2, but it is not
strictly concave. y = 3x + 1 is convex since f(1+2)/2] ≤ [(f(1)
+f(2)]/2, but it is not strictly convex. Functions that are both
concave and convex , f(1+2)/2] = [(f(1) +f(2)]/2, are said to be
affine.
REVIEW QUESTIONS
y - x = 3x + 4x -9
(1) y = 6x + 3
(2) y = -3x +9
(3) y = 4x
y - x = 3x + 4x -9
y - x + x = 3x + 4x + x - 9
y = x(3 + 4 + 1) - 9
then
y = 8x - 9
LEARNING OBJECTIVES
Arithmetic Mean
Weighted Average
Median
Mode
Geometric Mean
Range
Mean Absolute Deviation
Standard Deviation
Variance
Coefficient of variation
MEASURES OF CENTRAL TENDENCY
Notation
Index Notation
We will let X j represent any of the N values X 1 , X 2 , X 3 , …, X N for some
variable X. ‘j’ is the subscript or the index over which X is represented and
is usually noted j = 1, 2, 3, … , N.
Summation Notation
The symbol Σ is the Greek letter sigma and is used in mathematics to
indicate summing over certain values.
example: Σ X j represents the summation of X j over j = 1, 2, 3, …, N
Σ X j = X 1 + X 2 + X 3 + …+ X N
Probability Distribution
A set of possible outcomes of an events with a probability (likelihood) attached to
the occurrence (also called realization) of each outcome
example: Economy Return on Stocks Probability
Good 26% 0.25
Average 12% 0.50
Poor -2% 0.25
Probability distributions may be discrete, such as the above example, or
continuous, such as the normal distribution (the ‘bell’ curve)
Arithmetic Mean
The mean of a sample, n (not all the observations in the world), is called x-
620+550+590+520+480+660+510+515
X = 8 = 555.625
Median
The median is the value in a population (or sample) that has just as many
values above it as below it. If there are an even number of values, the
median will be the average of the two ‘middle’ values.
example: Consider our sample of student’s GMAT scores
620, 550, 590, 520, 480, 660, 510, 515
Arranging in descending order, we have
660,620,590,550,520,515,510,480
The median is (550 + 520)/2 = 535.
Mode
The mode of a sample is the value that occurs most frequently or the greater number
of times. The mode does not have to be unique. For example, the sample 1, 2,
3, 3, 4, 4, 6 has two modes, 3 and 4 . A sample with two modes is called bimodal.
Geometric Mean
The geometric mean, G, is calculated by taking the Nth root of the products a set of
N values or observations. For example, the geometric mean for a set of N values,
X 1 , X 2 , X 3 , …,X N is
G = N_ X 1 * X 2 * X 3 * …* X N = { X 1 * X 2 * X 3 * …* X N }1/N .
We often use geometric means when we are calculating average returns
since the geometric return takes ‘compounding’ into account.
example: An investor earns the following returns: 12%, 15%, and 8% on an
investment over a 3 year period. The investor reinvests his gains each year.
Thus, the investor’s wealth increases by a factor of 1.12 in year 1, another 1.15 in
year 2, and 1.08 in year 3.
Geometric Mean = (1.12*1.15*1.08)_ = 1.1163. We subtract off 1 to take the
original investment into account, leaving us with a geometric return of 0.1163 or
11.63%. Note that the average return is Average = (12% + 15% + 8%)/3 =
11.67%.
MEASURES OF DISPERSION
Range
N N
∑ (Xi -μ) ∑ (Xi -X)
1=1 1=1
MAD = for a population MAD = N for a sample
N
example: 1,3,5,7,9; μ = 5.
(1-5)+(3-5)+(5-5)+(7-5)+(9-5)
MAD = = 2.4
5
A MAD of 2.4 tells us that on average the absolute deviation from the mean
(ignoring algebraic signs) is 2.4.
N
(Xi -μ)2 n
(Xi-X)2
σ2 = ∑ N (Population) ∑ n-1 (Sample)
i=1 i=1
Comment: Using n-1 in the divisor for the sample variance is a technique that
provides a better estimate of the population variance in a sample. For larger
samples (e.g., n>30) the adjustment makes little difference.
10+15+12+18
X = 4 = 13.75
2
(10-13.75)2+(15-13.75)2+(12-13.75)2+(18-13.75)2
s = 4-1 = 12.25
Comment: Note that squaring the deviations makes all deviations positive.
Mathematical definition of standard deviation
Standard Deviation is the square root of Variance
σ= σ2 s = s2
Comment: Note that the standard deviation is in the same units as the data (e.g.,
‘percent return’) while the variance is in the less intuitive ‘units squared’ (e.g., “percent
return squared”
Coefficient of Variation
One problem with all of the preceding measures of dispersion is that they do
not control for scale effects. For instance, the standard deviation of A
={1000,1100,987} is 61.83 and the standard deviation of B ={1,20,50} is
24.70. Comparing 61.83 to to 24.70 might lead one to conclude that A is
more disperse than B. In absolute terms, this conclusion is true. In
percentage terms, however, it is not. Note that A has a mean of 1,029 and
B has a mean of 23.67. Relative to the mean, B deviates more than A!
We use coefficient of variation to measure dispersion relative to the mean.
σ s
CV = (Population) CV = X (Sample)
μ
example:
61.83 24.70
CVA = 1,029 = 0.0601 6.01% CVB = 23.67 = 1.0439 104.39%
Arithmetic Mean
Median
Range
Variance
Standard Deviation
Coefficient of Variation
Arithmetic Mean
Mean = (24 + 67 + 32 + 45)/4 = 42.00
Median
Ranking in descending order we get 67,45,32,24.
The median is (45 + 32)/2 = 38.5
Range
The range = 67 - 24 = 43
Variance
Variance = [(24-42)2 + (67-42)2 + (32-43)2 + (45-42)2]/3 = 352.6667
Note: divide by n-1 since this is a sample.
Standard Deviation
Standard Deviation = _352.667 = 18.78
Coefficient of Variation
CV = 18.78/42 = 0.4471 or 44.71%