QM Formula Class
QM Formula Class
Measurement scales:
Types of Variables:
Measuring variation:
1. Range = Maximum-Minimum
2. Variance, population = σ2 = 1/N * ∑ (xi-Mean)2
3. Standard deviation, population =σ
4. Coefficient of Variation, population = σ/Mean
5. Variance, sample = s2 = 1/(N-1) * ∑ (xi-Mean)2
6. Standard deviation, sample =s
7. Coefficient of Variation, sample = s/Mean
8. Mean absolute deviation = 1/N * ∑ |xi-Mean|
9. Z score (how many std devns is xi
away from the mean) = [xi-Mean]/ σ
10. Quartiles (Q1, Q2, Q3) Smallest 25%, 50%, 75%
observations.
11. Inter-quartile range = Q3 – Q1
12. 5-number summary (Min., Q1, Q2, Q3, Max.) Minimum, 3 Quartiles,
Maximum
13. Boxplot (Called Box and Whisker chart in MS Excel) Plot of 5-
number summary
Z score:-
How far is the observation from mean, in terms of standard deviations.
Z = (ObservedValue - Mean)/Standard deviation
= Error/Standard deviation
Note- Unlike, Mean and Variance, Z score is not a summary measure of the dataset, and it can
be computed for each data point.
Probability:-
Probability measures uncertainty.
Probability is the chance of happening of an event.
n event.
1. A priori: -
Classical/Equi-likely.
Textbook examples of Coin tossing, Playing cards, etc, or when you know nothing.
2. Empirical
From historical data, observations, or experiments.
Life tables in insurance, earthquakes, rainfall, twins, quality, stock market …
3. Subjective:-
Personal judgement.
Outcome of India vs Brazil cricket match outcome. Covid-19 will be over in 2022…
Probability- a priori:
2. Joint probability- both events occur. P(Red and King) means the probability that the
card is of Red color and it is also a King.
Example –
a) P(Red and King) = 2/52.
b) P(Diamond and Red) = 13/52.
c) P(Picture and Red) = 6/52.
d) P(Black and Red) = 0/52.
e) P( <3 and Red) = 4/52.
3. Conditional probability- P(Red/King) means the probability that the card is Red if the
card is known to be a King.
Example –
a) P(Red/King) = 2/4 = 1/2.
b) P(Red/7) = 2/4.
c) P(Picture/Black) = 6/26.
d) P(Picture/Diamond) = 3/13.
Probability distribution:
Probability of all outcomes.
Discrete means the outcome is a-
Categorical variable (H/T, M/F, Won/Lost/Draw) or
A integer (1, 2, 3, 4…).
But not a fraction (0.666, 1.414, 1.618, 2.718, 3.1415, 9.11, etc.). Fractions are
considered in Continuous Probability distributions, like Normal distribution.
Binomial distribution
1. Empirical
From historical data
2. Theoretical
o Binominal
o Poisson
o Normal
3. Numerous other theoretical distributions
o F, t, Chi-square (later in the course)
o Uniform, Geometric, Hypergeometric, Beta, Gamma, Maxwell-Boltzman,
Cauchy, Rayleigh, Erlang, ….. (not in the course)
o
1. Binomial Probability
If I am doing an experiment n times, probability of success (head) is p, and
probability of failure is q or (1-p), what is the chance of getting r successes in the n
trials = (Ways in which we can choose those r trials out of n trials) * p*p*p… upto r
times * q*q*q…. upto (n-r) failures
Pronbability = nCr * p^r * q^(n-r) = n!/ [r! (n-r)!] * * p^r * q^(n-r)
Probability of getting all the successes out of n trials = nCn * p^n * q^0 = n! / [n!.
(0!] * p^n * q^0
Prob(exactly 2 heads in 3 trial) = 3C2 * 0.6^2 * 0.4^1
What is the chance that out of 5 consecutive days, the rain happens on atleast 2 days
P(2) + p(3) + p(4) + p(5) = 1- [p(0) + p(1)]
2.Poisson distribution:
What is the Probability distribution?
Example-
Avg no. of no of accidents= 4/day.(arrival (occurrence) rate of accidents per day
Probability of 0, 1, 2, 3, 4, 5… accidents?
Avg no. of no of potholes= 6/km. (occurrence rate of potholes per km)
Probability of 0, 1, 2, 3, 4, 5… potholes?
Avg no. of goals= 3.2/match. Probability of 0, 1, 2, 3, 4, 5… goals?
Avg no. of typos= 2.7/page. Probability of 0, 1, 2, 3, 4, 5… typos?
Avg no. of shooting stars= 0.3/hour. Probability of 0, 1, 2, 3, 4, 5… shooting
stars?
Avg no. of teeth cavities= 3.28/patient. Probability of 0, 1, 2, 3, 4, 5… teeth
cavities?
Formula:
ⅇ−𝝀 ⋅ 𝝀𝒙
P(𝑥)=
𝒙!
e = 2.718… a constant.
λ = average number of events, is known,
𝑥! = 𝑥 ⋅ (𝑥 − 1) ⋅ (𝑥 − 2) … 2 ⋅ 1.
𝑥 = 0, 1, 2, 3…. number of events (goals, pot holes, accidents…)
P(𝑥) = probability of 𝑥 events happening.
Probability of x arrivals per unit time= P(x)
Where lambda (𝜆) is the average arrival rate per unit time or per unit
distance
9am to 10am 3 2
10am to 11am 3 4
11am to 12pm 3 5
12pm to 1pm 3 1
Mean and Variance in Binomial and Poisson distributions
Normal Distribution
Distribution of data….
Distribution of data- vertical bars in the graphs- from several phenomenon like distribution
of weight, height, IQ, measurement errors, etc. have been found to be similar-
Symmetrical about the mean value,
Concentration towards the mean value, and
Very few extremely small or large values.
Formula:
Z=(x-mu)/sigma
X= mu + z.sigma
Mu = Mean
Sigma = Standard Deviation
Z-value is measure of how many standard deviations is any observation away from the mean
Census
Entire population (population, tiger, agriculture, health facilities).
Sampling
A portion of the population.
Quality, voting, blood, soil, customer surveys, voice, interviews, ….
Sampling methods
1.Non-probability sampling
Judgment sampling
Convenience sampling
2.Probability sampling
A. Simple random sampling
Each item has equal probability of getting chosen (=1/N, N is population size).
B. Systematic sampling
Every nth customer/item/bottle on filling line.
C. Stratified sampling
Samples from each strata: Men/Women (20%/80%); Rural/Urban (30%, 70%);
Steel/Chemicals/Telecom stocks (10%, 20%, 70%); Customers/Non Customers (10%,
90%); Tourist/NonTourist (60%, 40%).
D. Clustered sampling
Samples from geographical district.
Probability sampling
Estimation
Estimation: Proportion and Mean
Note –
1 What was the sample size (because the more is the sample size, the lesser will be the
range
2 What is the variance in the overall population (the higher the population variance, the
higher is the range)
3 What is the confidence level at which you are supposed to answer
Types of estimates
1. Point estimate
Population mean= Sample mean
Population proportion= Sample proportion
2. Interval estimates (Range estimate)
Population mean = Sample mean ± Sampling error
Population proportion = Sample proportion ± Sampling error
t distribution table
1. Area under t distribution=1 and it is Bell-shaped, like Normal distribution.
2. t distribution is defined only by one parameter, degree of freedom- df. Normal
distribution is defined by two parameters, Mean and Stdev.
3. Degree of freedom, df= Sample size-1.
4. t value for Area= 0.25 in right tail and df=10, is 0.700.
5. 90% middle area for 4 degrees of freedom (look for 5% in right tail)-
t= -2.132 to t= +2.132.
6. Excel function (It requires area in left tail, not in right tail)-
=t.inv(area in the left tail,df).
Examples-
=t.inv(0.95,4) = +2.132. 95% area in left tail, df=4
=t.inv(0.975,99)= +1.98. 97.5% area in left tail, df=99.
=t.inv(0.975,26)= +2.056. 97.5% area in left tail, df=26.
7. Not for everyone: The ratio of two Normal distributed variables has t distribution with
n-1 degrees of freedom.
Z distribution table
Since it is symmetric, 50% area is on each side of Z=0.
This is Normal distribution table with Mean=0 and Standard deviation=1. Hence also
called Standard Normal distribution table.
1. 90% area in the middle will be between-
Z= -1.65 to Z= +1.65.
95% area in middle will be between-
Z= -1.96 to Z= +1.65.
Note:--
When the population variance is known, we can use z-statistic
When the population variance is not known, check for the sample size
If sample size >=30, you can use z statistic
If sample size <30, you must use t statistic
Sampling Distributions
Relationship between characteristics of population and samples
A sample is drawn from a population with known characteristics. What are the
characteristics of the sample?
Characteristics of the sample are known. What are the characteristics of its population?
Characteristics means-
Mean, Standard deviation, Proportion, Median, Mode, Skewness, Kurtosis, etc
Sampling distribution
A sample is drawn from a population with known properties. What are the properties of the
sample?
Chapter-7: Sampling Distributions.
Characteristics of the sample will depend on the items that get drawn. That is, sample
characteristics will have a distribution, called- sampling distribution.
It has been shown that the distribution of sample characteristics have following relationship
with the characteristics of the population-
1. Mean of the population= Mean of the sampling distribution.
2. Standard deviation of sampling distribution (distribution of sample means) = σ/√𝒏;
narrower than that of the population distribution.
3. The sampling distribution will tend to be Normal distribution as sample size
increases.
Similarly for proportions, it can be shown that
1. Proportion in the population= Mean proportion of the sampling distribution.
2. Standard deviation of sampling distribution (distribution of sample proportions)=
√𝑝 x (1 − 𝑝)/𝑛
Population, Sampling distribution, of sample size
known n=2 n=3 n-10 n=20
Mean µ (58.7) X' (59.7) X' (59.0) X' (59.0) X' (58.5)
Standard deviation σ (16.3) S (11.9) S (8.5) S (4.9) S (3.4)
Standard Error, SE= σ/√n
Standard Error 11.5 9.4 5.2 3.6
Note—
Binomial distribution is analogous to normal distribution
Normal distribution is continuous
Binomial distribution is discrete. When a binomial distribution is considered for
inifinite number of trials, it approaches normal distribution
If population mean does not follow normal distribution, sample means will tend to
follow normal distribution for large sample sizes
Similarly, if the population proportion does not follow binomial distribution, the
sample proportions will tend to follow binomial distribution for large sample sizes
Hypothesis- intuitive
If the sample mean is close to the stated population mean, the null hypothesis is not
rejected.
If the sample mean is far from the stated population mean, the null hypothesis is
rejected.
How far is “far enough” to reject the claim, H0?
The critical value of a test statistic creates a “line in the sand” for decision making- it
answers the question of how far is far enough.
A.100% inspection
Inspect every part in each batch.
Used when the consequence of failure is severe- medical, aeronautics or the item is
expensive.
Expensive, and time consuming.
B.Sampling
Take a sample; accept or reject the entire batch.
Takes less time and low cost, but risky.
Two types of errors- rejecting good items (α) and accepting bad items (β).
𝛼 is also called producer’s risk, because the loss is borne by the producer, and β is called
consumer’s risk, because the loss is borne by the consumer.
Hypothesis of means
A.tStat value for hypothesis testing
i)tStat value = ( X̅ − μ) / (S/√n )
tStat value = (Sample mean – Claimed Population mean) / (Sample Std.Dev /Sqrt(Sample
size))
Standard error, SE = S/√n
ii)tCritical
tCritical: The value is taken from t table
Degree of freedom, df= n-1= sample size-1.
ii) ZCritical
ZCritical with significance with value taken from table
̅̅̅̅ and 𝑋2
𝑋1 ̅̅̅̅ - Means of the two samples, S1 and S2- StdDev. of the two samples; n1
and n2- sizes of the two samples; µ1 and µ2- Hypothesized means of two populations.
o Sp2= [(n1-1)*S12 + (n2-1)*S22] / [(n1-1) + (n2-1)] Sp2 is pooled variance
Use t test and Degrees of freedom= n1-1 + n2-1 = n1+n2-2.
H0: µ1 = µ2
H1: µ1 ≠ µ2
Chi-square tests
Chi-Square Test for differences among more than two proportions
1. Compute from data: χ2Stat = ∑ [ (foi - fei)2/fei ]
2. Get χ2Critical from χ2 table
for df = (No of rows-1)*(No of colums-1), and alpha (usually 0.05).
3. If χ2Stat is larger than χ2Critical, Reject null hypothesis, else do not reject null hypothesis.
5. Ho: πA = πB = πC.
H1: πA, πB, and πC are not equal.
7. In a Chi-square test, do not divide the area into two tails. That is, entire alpha is in
the right tail.
Covariance between x and y = corr coeff between x and y * std dev of x * std
dev of y
Cov = summation of (xi-xbar)(yi-ybar)/ (n-1)
corr coeff between x and y = summation of (xi-xbar)(yi-ybar)/ (n-1) / (std dev
of x * std dev of y)
Typical relationships
1. Direction of relationship, + or - ?
2. Strength of relationship… in number
Computing Covariance
Correlation coefficient, r
𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋,𝑌)
Correlation coefficient, r = 𝑆𝑡𝑑𝑒𝑣(𝑋) ∗ 𝑆𝑡𝑑𝑒𝑣(𝑌)
1. Correlation coefficient, r, is dimensionless.
2. Its value ranges between -1 to +1.
Formulas
1
If Y=X, then Cov(X,Y) = (𝑛−1) ∑ (𝑋 − 𝑋′)2 = Variance.
1
If X’=0 and Y’=0, then Cov(X,Y) = (𝑛−1) ∑ (𝑋 ∗ 𝑌).
B. Correlation Coefficient, r:
Cov(X, Y)
r=
Stdev(X) ∗ Stdev(Y)
If r is closer to -1:
o Risk can be reduced
Buy several stocks instead of single stock in the portfolio- Mutual Fund.
if r is closer to 0:
o Two variables are not related; both add to our knowledge.
𝑆𝑆𝑋𝑌
b= 𝑆𝑆𝑋
SSXY= ∑ (𝑋 − 𝑋 ′ ) ∗ (𝑌 − 𝑌 ′ )
SSX= ∑ (𝑋 − 𝑋 ′ ) ∗ (𝑋 − 𝑋′)
a = Y’ – b X’
Y’- average of Y
X’- average of X
Measures of variations
R2- Coefficient of determination
R2
It measures how close are data points to the fitted straight line.
Its value ranges between 0 and +1.
If the fit is poor, then R2 is close to zero.
If all points are on the straight line, then R2=1.
Generally computed from software.
MS Excel function, =RSQ(Y_Range,X_Range)
The equations
1.Maximize Revenue, R
R = 5 Butter + 10 Cheese ….1
2.Constraint
1 Butter + 4 Cheese ≤ 100 ….2
Butter ≥ 0, Cheese ≥ 0. ….3, 4.