0% found this document useful (0 votes)
92 views40 pages

QM Consolidated Formulae

The document provides formulas for measures of central tendency (mean, median, mode), variance, standard deviation, z-scores, probability, probability distributions (binomial, Poisson, normal), confidence intervals, and hypothesis testing (one sample test, two sample test for independent and related populations). It defines key terms and notations used in the formulas.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views40 pages

QM Consolidated Formulae

The document provides formulas for measures of central tendency (mean, median, mode), variance, standard deviation, z-scores, probability, probability distributions (binomial, Poisson, normal), confidence intervals, and hypothesis testing (one sample test, two sample test for independent and related populations). It defines key terms and notations used in the formulas.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Formula

Measures of central tendenc


Population Mean = ∑x / N where sum of all data points and N is number of data points
Sample Mean = ∑x / N-1 where sum of all data points and N is number of data points
Median = (n + 1) / 2 where n is number of data points
Mode = most frequent value
Sample Variance
σ2 = Σ ( Xi – μ )2 / N-1
Where Xi is data point, μ is sample mean and N is number of data points

Population Variance
σ2 = Σ ( Xi – μ )2 / N
Where Xi is data point, μ is mean and N is number of data points

Population Standard Deviation


s = sqrt [ Σ ( xi – mean )2 / n ]
The term ‘Σ ( xi – mean )2’ represents the sum of the squared deviations of the scores from the sample mean.

Sample Standard Deviation


s = sqrt [ Σ ( xi – sample_mean )2 / (n-1) ]
The term ‘Σ ( xi – sample_mean )2’ represents the sum of the squared deviations of the scores from the sample mean.

or
Square root of variance

Sample Standard Deviation


Same formula but replace N with N-1

Coefficiant of Variation
Standard Deviation / Average

Range
Maximum - minimum

Quartile 1
Quartile(Q1)=((n+1)/4)th Term
known as the lower quartile

Quartile 2
Quartile(Q2)=((n+1)/2)th Term

Quartile 3
Quartile(Q3)=(3(n+1)/4)th Term
known as the upper quartile

InterQuartile range
Upper Quartile – Lower Quartile

Z score
z = (x-μ)/σ, where x is the raw score, μ is the population mean, and σ is the population standard deviation

Probability
Marginal Probability
P(A) = (number of desired outcomes) / (total number of possible outcomes)
P(not A) = 1-P(A)

Joint Probability
If A and B are dependent each other
P(A and B) = P(A given B) * P(B) = P(B given A) * P(A)
If A and B are Independent each other
P(A and B) = P(A) * P(B)
If A and B are Mutually exclusive
p(A or B) = p(A) + p(B)
If A and B are Not Mutually exclusive
P(A or B) = p(A) + p(B) – p(A and B)

Conditional Probability
P(A given B) = P(A and B) / P(B)
If A and B are Independent each other
P(A given B) = P(A)

Probability Distributions
Binomial Distribution
b(x; n, P) = nCx * Px * (1 – P)n – x

Where:
b = binomial probability
x = total number of “successes” (pass or fail, heads or tails etc.)
P = probability of a success on an individual trial
n = number of trials
The binomial distribution formula can also be written in a slightly different way, because nCx = n! / x!(n – x)!
Or simply

Finding Mean and SD for Binomial Distribution


Mean = (Probability of the Event) * (Number of times the event occurred
Standard Deviation = sqrt( number of times the event occurred * Probability of Event1 * Probability of Event2)

Poission Distribution
P(x; μ) = (e-μ) (μx) / x!
Where:
μ is Average number of success
x is the actual number of successes that result from the experiment
e is approximately equal to 2.71828.

Normal Distribution
Find Area of the normal distribution

where:
μ is Average
σ is standard deviation
x is data point

I am not able to find formula for Inverse normal distribution

Refer the link for more problems


https://drive.google.com/file/d/1ls5YExLs40szoiwc0vLvZWA9zisGCxJt/view?usp=sharing

Confidence Interval Estimation


Confidence Interval using mean = Sample mean +- t90% * Standard Error
Standard Error = Standard Deviation/sqrt(sample size)
We can get t90% values from t-table

Confidence Interval using proportion = sample proportion(p) +- z90%*Standard Error


Standard Error = p(1-p)/standard size
We can get z90% values from z-table

Refer the link for more problems


https://drive.google.com/file/d/1NRlMgqiDLfKmHBEWoyR_00uSXJrb5tX2/view?usp=sharing

Hypothesis Testing
One Sampe test
Two tail test
Problem 1' The mean amount per invoice is 120
Has mean changed?
A sample mean taken it is 112
Sample standard deviation is 20.80
n =12

We need prove that the mean amount per invoice is 120


We are going to solve this problem with confidence interval approach
So find the confidence interval for 112

CI = Sample mean +- t95%*Standard error

Standard error = standard deviation/sqrt(n)

Standard error 6.0044428


t95% -2.20098516

CI 98.7843105
125.2156895
Since 120 is in between 98 and 125 accept null hypothesis

Now we solve the same problem with critical value approach


Formula for t-stat
(X-u)/(S/sqrt(n)) u = X +- t-stat*SE
-1.19078493 (X-u)/SE

t-critical -2.20098516
Since -1.1 is in the range of 2.2 and -2.2 accept null hypothesis

Problem 2 Given Proportion 80%


Sample taken 792
Yes 681
No 111

Yes Proportion 0.8598485


No Proportion 0.1401515

We are going to solve this problem with confidence interval approach


CI Sample proportion +- Z-stat*SE
SE sqrt(p(p-1)/n)
0.01268798047 0.15
z-stat -1.959963986 0.1275
0.00016098
CI 0.8349805001
0.8847164696
Since 0.80 is not in the range of 0.83 to 0.88 reject null hypothesis
Now we solve the same problem with critical value approach
z-stat (p-CI)/SE
SE sqrt((CI)(1-CI)/n
p-CI 6%
4.210714148

Z-crit -1.959963986
Since 4.2 is not in the range of +-1.95 reject null hypothesis

One tail test


The mean service time is 188.83
Is it decreased?
H0 < 188.83
H1 >= 188.83
Given that n 25
Mean 170
S 21.3
Formula (X-u)/SE
SE 4.26
t-stat -4.420187793

t-crit -1.71088208
Since -4.42 < -1.71 accept null hypothesis

Covid vaccine test


Two Sample test
Compare the means of two independent population
Already we know the formula for one sample test which is (X-u)/SE where SE = S/sqrt(n)
Formula for two sample test where populations are independent on each other
t-stat ((x1-x2)-(u1-u2))/SE
SE = sqrt(sp2 * (1/n1+1/n2))
sp2 = [(n1-1)*S12+(n2-1)*S22]/[(n1-1)+(n2-1)]
Here we need to compare 2 sample means
u0 = u1
u0 != u1
Prob 1 Given X1 50.3
X2 72
Since we are testing u1 = u2
n 10
S1 18.73
S2 12.54
S12 350.7
S22 157.3

Sp2 254
SE 7.127411872
t-stat -3.044583418
t-crit -2.10092204
Since -3 is not in range of -2 to +2 null hypothesis is rejected

Prob 2 Given X1 16.7


X2 18.88
Since we are testing u1 = u2
n 10
S1 3.0955
S2 2.8662
S12 9.582
S22 8.215

Sp2 8.8985
SE 1.334053972
t-stat -1.634116794
t-crit -2.10092204
Since -1 is in range of -2 to +2 null hypothesis is not rejected

Compare the means of two related populations


Two related sanple test
In two related sample test we need to do hypothesis checking on whether the hypothesis is equal to 0 or not
which means u0 = 0
u1 != 0

Problem 1

Formula is same as One sample test


but here we take mean and standard devaition from the differences
t-stat (D-Ud)/SE Here Ud is population mean difference and it is always zero
SE Sd/sqrt(n)

Given that D 42.6013


Sd 43.797
n 16

SE 10.94925 3.890796
t-stat 3.890796173
t-crit -1.753050356
Since `3.89 is between 1.7 and -1.7 null hypothesis can be rejected

Compare the Proportions of two independent populations


For one sample test we already know the formula is (p-CI)/SE where SE is sqrt((CI)(1-CI)/n)
For two sample test
((p1-p2)-(c1-c2))/SE
SE sqrt(pp(pp-1)/(1/n1+1/n2)
Here pp is the overall proportion
c1,c2 is population proportions

Here the hypothesis is


H0 = H1
H0 != H1

Problem 1
whether the proportion of guests likely to return in two hotels of a chain is equal?
In the survey for the hotel 1
Total 227
Yes 163
Yes proportion 0.718061674

In the survey for the hotel 2


Total 262
Yes 154
Yes proportion 0.5877862595

Overall yes proportion


0.6482617587

Two sample proportion formula


((p1-p2)-(c1-c2))/SE
SE sqrt(pp(1-pp))*(1/n1)+(1/n2))
0.04329879905

z-stat 3.00875353
z-crit -1.959963986
Since 3 is not in between -1.6 and 1.6. Hence null hypothesis is rejected
Problem 2
One tail hypothesis proportion testing
H0 >= H1
H0 < H1

In the survey
Total 459
Yes 193
Yes proportion 0.4204793028

Total 501
Yes 250
Yes proportion 0.499001996

Overall yes proportion


0.4614583333
Two sample proportion formula
((p1-p2)-(c1-c2))/SE
SE sqrt(pp(1-pp))*(1/n1)+(1/n2))
0.03220967275

z-stat -2.437860632
z-crit -1.644853625
Since -2.4 is less than -1.6 null hypothesis is rejected

F test for difference between two variance


First find variance for both
S1 164182 15
S2 280951 16
F-stat 1.71121682
F-crit 2.86209253
Since 1.7 is in between -2.8 and +2.8, null hypothesis is not rejected
Hence there is difference before and after the US elections

Problem 2
S1 350
S2 157
F-stat 2.229299363
F-crit 4.025994158
Since 2.2 is in the range of 4 and -4 null hypothesis is accepted

Analysis of Variance (ANOVA)


In ANOVA we need to prove that the means of the all populations are equal
h0: u1 = u2 = u3 = u4
h1: All means are not equal

Problem 1

In-aisle Front Kiosk Expert


30.06 32.22 30.78 30.33
29.96 31.47 30.91 30.29
30.19 32.13 30.79 30.25
29.96 31.86 30.95 30.25
29.74 32.29 31.13 30.55
29.982 31.994 30.912 30.334

All these 4 store sales same?

Square of deviation from own Mean


In-aisle Front Kiosk Expert
0.006084 0.051076 0.017424 1.6E-05
0.000484 0.274576000000001 4E-06 0.001936
0.043264 0.018496000000001 0.014884 0.007056
0.000484 0.017956 0.001444 0.007056
0.058564 0.087616 0.047524 0.046656

We need to find the variance among the group and within the groups
va2 = vw2

First we need to find the sum of squares of among the groups (SSA)
Formula is sum of differences of each group from the grand mean, with number of observations as weightage
sum(n*(xi-X)^2)

In our example
x1 29.98 here x is mean of each group
x2 31.99
x3 30.91
x4 30.33

gx 30.8055 Grand mean of all the groups (look in to the excel formula)
n 5
number of groups 4

SSA 11.607555 See the excel formula


dof 3

va2 SSA/dof
va2 3.869185

Now compute sum of squares of within the group


formula sum((xi-Xi)^2)

SSW 0.7026
dof is equal to number of observation - number of groups
vw2 SSW/dof

number of observ 20
dof 16

vw2 0.0439125

ANOVA using F-stat


F-stat va2/vw2
88.11124395

F-critical 4.076823062

F stat is greater than F-critical hence null hypothesis is rejected

Chi-square test
A customer satisfaction suvey was conducted in three hotels and the results are in below table
Results Golden Palm Princess Total
Yes 128 199 186 513
No 88 33 66 187
Total 216 232 252 700
Are the three hotel results are equal?

Solution

Observed frequncies
Results Golden Palm Princess Total
Yes 128 199 186 513
No 88 33 66 187
Total 216 232 252 700

Already we know the overall Yes proportion is 0.732857


No proportions is 0.267143

Expected frequencies
Results Golden Palm Princess Total
Yes 158.2971428664 170.02286 184.68 513
No 57.7028571336 61.977143 67.32 187
Total 216 232 252 700

Already the expected and Observed frequencies are same


However will do the chi-square test Observe the excel formula

chi-square formulsum((Observed frequencies-expected frequencies)^2/expected frequencies)

chi-square table
Results Golden Palm Princess Total
Yes 5.79869509484298 4.9385996 0.009435 10.74673
No 15.9076501834526 13.548137 0.025882 29.48167
Total 21.7063452782956 18.486736 0.035317 40.2284

We got chi-square 40.2284


dof 2 dof = dof of number of observations in each group / dof of number of group
x-crit 0.1025865888

Since chi-square value is greater than x-crit null hypothesis is rejected

Problem 2

Observed frequencies
Reason Golden Palm Princess Total
Price 23 7 37 67
Location 39 13 8 60
Room 13 5 13 31
Other 13 8 8 29
Total 88 33 66 187

Overall Yes proportion for each reason


Price 0.3582887701
Location 0.320855614973262
Room 0.165775401069519
Other 0.155080213903743

Expected frequencies
Reason Golden Palm Princess Total
Price 31.5294117688 11.823529 23.64706 67
Location 28.2352941176471 10.588235 21.17647 60
Room 14.5882352941176 5.4705882 10.94118 31
Other 13.6470588235294 5.1176471 10.23529 29
Total 88.0000000040941 33 66 187

chi-square formulsum((Observed frequencies-expected frequencies)^2/expected frequencies)


Chi-Square table
Reason Golden Palm Princess Total
Price 2.30739684124823 1.967808 7.540094 11.8153
Location 4.10404411764706 0.5493464 8.198693 12.85208
Room 0.172912713472486 0.0404807 0.387413 0.600806
Other 0.030679513184584 1.6233942 0.488168 2.142241
Total 6.61503318555236 4.1810293 16.61437 27.41043

Chi-square 27.41

number of observ 4
number of groups 3
dof 6

chi-crit 12.59158724

27 is far greater than 12 hence null hypothesis is rejected


That is there is a relation between the reason for not returning and the hotel

Covariance and Corelation


Problem 1
Person Height Weight
A 58 70
B 66 72
90
C 63 48
D 69 85 80
E 69 80
70

60

50
90

80

70

We need to find the covariance and correlation 60


Covariance formula
50

40

30

20
Person Height Weight x-x! y-y! (x-x!)*(y-y!)
A 58 70 -7 -1 7 10
B 66 72 1 1 1 0
C 63 48 -2 -23 46 56 58 60 62
D 69 85 4 14 56
E 69 80 4 9 36
Average 65 71 146
x! y!
n 5
covriance 36.5

same thing can be done using excel


36.5

Problem 2
Person Experience Salary x-x! y-y! (x-x!)*(y-y!)
A 14 5 -1 1 -1
B 16 6 1 2 2
C 10 3 -5 -1 5
D 20 4 5 0 0
E 15 2 0 -2 0
Average 15 4 6
x! y!
n 5
covriance 1.5

same thing can be done using excel


1.5

Calculate co-relation coefficient


Problem 1 covariance 36.5
Std of x 4.6368092
Std of y 14.21267

corelation formula
covariance(x,y)/(std.x*std.y)

corelation 0.5538573836
Problem 2 covariance 1.5
Std of x 3.6055513
Std of y 1.5811388

corelation formula
covariance(x,y)/(std.x*std.y)

corelation 0.2631174058

Simple Linear Regression


Linear regression formula y=bx+a

Now
b= SSXY/SSX
a = y!-bx!

SSXY sum((x-x!)*(y-y!))/(n-1)
SSX sum((x-x!)^2)/(n-1)

Person Height Weight x-x! y-y! (x-x!)*(y-y!) (x-x!)^2


A 58 70 -7 -1 7 49
B 66 72 1 1 1 1
C 63 48 -2 -23 46 4
D 69 85 4 14 56 16
E 69 80 4 9 36 16
Average 65 71 sum 146 86
x! y!
n 5
SSXY 36.5 21.5 SSX

same thing can be done using excel b= 1.69767442


36.5 a= -39.348837
Bill gates
es of central tendency
Population Mean = 20.96
Sample Mean = 19.96
Median = 20
Mode = 20
Population Variance = 10.9584
Sample Variance = 11.18204
Population Standard Deviation = 3.310347
Sample Standard Deviation = 3.343956
Min = 18
Max = 36
Coefficiant of Variation = 0.157936
Range = 18
Quartile 1 = 19
Quartile 2 = 20
he sample mean. Quartile 3 = 18
Inter Quartile ranges = -1
Z score = -0.59208

s from the sample mean.

Refer the link for more problems


https://drive.google.com/file/d/13uQOOv4qyOJaiHx94lJ9yvdx6xmFT3e_/view?usp=sharing
ard deviation

Probability
Refer below excel file for Probabiity Problems

https://drive.google.com/file/d/1ehSUB7O5EYq2TdttMVeH7NPeWL8iOlEU/view?usp=sharing

ability Distributions
Bill gates sample

1. Toss a fair coin 1. Toss a fair coin

A fair coin is tossed 4 times. What is the A fair coin is tossed 40 times. What is
probability of 0, 1, 2, 3, and 4 heads? Expected number (or Mean) and
Standard deviation (SD) of heads?

= n! / x!(n – x)!

P(Head) 0.5 P(Head) 0.5


P(Tail) 0.5 P(Tail) 0.5
Total 1.0 Total 1.0

Trials 4 Tosses 40
bability of Event2)
X P(X) Mean Sd
0 6.3 20 3.2
1 25.0
2 37.5
3 25.0
4 6.3
Total 100.0

Refer the link for more problems https://drive.google.com/file/d/1aKnzQcKx6voI0fv6o3D6

rmal Distribution
Find area at the left side of X. Mean and Sd of Find area at the right side of X. Mean
Normal distribution are given below. and Sd of Normal distribution are given
below.
Mean = 7 Mean = 7
Sd = 2 Sd = 2

X Area % X Area %
0 0.02 0 99.98
1 0.13 1 99.87
2 0.62 2 99.38
3 2.28 3 97.72
4 6.68 4 93.32
5 15.87 5 84.13
6 30.85 6 69.15
7 50.00 7 50.00
8 69.15 8 30.85
ng 9 84.13 9 15.87
10 93.32 10 6.68
11 97.72 11 2.28
12 99.38 12 0.62

val Estimation

haring

Testing
equal to 0 or not
opulations
ons as weightage

e excel formula)
If we need to find the proportions of the variables are same or not use chi-square

the excel formula

ch group / dof of number of groups


90

80

70

60

50
90

80

70

60

50
Column C
40

30

20

10

0
56 58 60 62 64 66 68 70

4
Column C
3

0
8 10 12 14 16 18 20 22
90

80
f(x) = 1.69767441860465 x − 39.3488372093023
70 R² = 0.306758001381533

60

50
Column C
40 Linear (Column C)

30

20

10

0
56 58 60 62 64 66 68 70
ID Num Gender
ID01 m
ID02 m
ID03 m
ID04 m
ID05 m
ID06 m
ID07 m
ID08 m
ID09 f
ID10 m
ID11 m
ID12 f
ID13 f
ID14 f
ID15 f
ID16 m
ID17 f
ID18 m
ID19 m
ID20 m
ID21 f
ID22 f
w?usp=sharing ID23 f
ID24 f
ID25 f
ID26 f
ID27 m
ID28 m
ID29 f
ID30 m
ID31 f
ID32 f
ID33 f
ID34 m
ID35 f
ID36 m
ID37 f
ID38 f
ID39 m
ID40 f
ID41 f
ID42 f
ID43 m
ID44 m
ID45 m
ID46 m
ID47 f
ID48 m
ID49 m
ID50 f

view?usp=sharing

Poisson Distribution
In a hospital the average number of nurses who are on leave
on a day is 4. What is the probability 0, 1, 2, 3, … nurses are
on leave?

Mean 4

X P(X)
0 1.8
1 7.3
2 14.7
3 19.5
4 19.5
5 15.6
6 10.4
7 6.0
8 3.0
9 1.3
10 0.5
11 0.2
12 0.1
12+
Total 100.0

KnzQcKx6voI0fv6o3D6tSJC1ILfWgxQ/view?usp=sharing

Find area between X1 and X2. Mean and Sd of Normal


distribution are given below.

Mean = 7
Sd = 2

X1 X2 Area %
- infinity 7 50.00
- infinity 5 15.87
- infinity 9 84.13
- infinity 10 93.32
2 3 2.28 MS Excel formula for area from -i
2 5 15.87 =NORM.DIST(X,Mean,Sd,TRUE)*
2 8 69.15
7 9 84.13
7 11 97.72
8 11 97.72
2 8 69.15
6 10 93.32
8 +infinity
Number
Annual Employ
Grad Expecte of
Age Height Class Major GPA Salary in ment
School d Salary Affiliatio
5 Years Status
19 69 so mr y 3.19 40 70 un 0
ns

21 67 sr m un 3.11 50 60 pt 0
20 68 jr ef n 3.02 50 60 pt 0
18 79 fr ef y 4.00 50 57 pt 0
19 67 so m y 2.75 40 100 pt 1
21 70 jr a y 3.24 60 100 pt 2
20 68 jr ef y 2.93 50 75 un 0
21 71 jr m y 3.26 40 60 pt 0
20 62 so mr n 3.21 45 65 pt 0
19 70 so a y 3.23 50 70 pt 0
36 67 so a un 3.77 60 120 pt 1
19 65 so a un 3.71 40 60 un 0
20 65 jr a un 3.20 45 65 pt 3
21 65 jr mr y 2.94 40 60 pt 0
19 66 so mr y 3.22 40 80 pt 0
20 69 jr un un 3.34 60 90 pt 0
19 64 fr ib un 3.09 40 65 un 1
20 67 jr mr n 3.72 50 80 pt 2
23 70 jr ef un 2.50 50 75 un 0
20 70 so ef y 2.74 60 75 un 0
20 63 so mr y 3.55 60 100 pt 2
19 67 so m un 3.00 45 65 pt 0
19 65 so mr y 3.62 40 90 pt 0
20 63 jr m un 2.60 40 60 pt 1
22 63 sr ef y 3.63 50 150 pt 3
21 65 sr o n 2.38 40 60 pt 2
21 73 jr m y 2.45 40 65 pt 0
30 71 jr m n 3.28 50 75 pt 0
20 66 so ib un 3.18 50 75 un 1
24 62 so a n 3.33 55 85 pt 0
19 69 so mr un 2.87 30 50 pt 0
33 67 sr a n 3.14 45 75 ft 0
19 64 fr ib n 3.44 45 90 pt 1
20 72 so m y 3.85 60 100 pt 1
22 61 jr o y 3.50 45 60 un 0
21 69 so ef n 2.92 55 85 pt 0
19 60 fr a un 2.80 55 80 pt 0
21 66 jr mr un 2.67 40 65 pt 0
20 69 so a un 2.65 45 80 un 0
20 63 so is n 2.88 50 80 un 1
19 65 so ef un 3.43 50 100 pt 0
21 63 jr ef y 3.48 60 110 pt 0
20 68 so a un 2.91 45 90 pt 1
19 72 so a n 2.75 50 80 pt 0
22 69 jr is y 3.62 55 85 un 2
21 68 jr m n 2.42 35 60 pt 1
22 66 jr mr n 2.76 40 65 pt 0
19 69 fr un un 3.10 45 70 pt 0
20 68 so is n 2.61 40 65 pt 1
20 66 so is n 3.13 45 80 pt 0
formula for area from -infinity to X,
DIST(X,Mean,Sd,TRUE)*100
Satisfact
ion Spendin
Advisem g
2
ent 550
2 400
5 450
5 360
1 500
5 650
4 500
1 500
4 350
6 300
4 200
5 550
5 425
4 600
3 600
5 400
4 250
4 350
2 400
4 400
5 500
3 600
3 400
3 500
6 1000
4 300
2 450
5 550
5 600
4 400
3 700
5 500
6 350
1 450
7 600
5 400
3 450
3 800
3 400
4 375
3 400
5 500
4 350
5 525
4 400
3 450
3 500
4 400
3 450
2 500

You might also like