0% found this document useful (0 votes)
28 views31 pages

QM Formula Class

Yes

Uploaded by

Venkat R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views31 pages

QM Formula Class

Yes

Uploaded by

Venkat R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Statistics:

Measurement scales:

1. Nominal :- Data can be categorized and counted; cannot be measured or ranked.


Examples:- Engineer?- Yes/No.
Gender- Male/Female.
Material- Wood/Metal/Plastic/Steel/Bronze.
Industry- Oil/Mining/Automobile/Media/IT/Food processing.
TV Channels- Entertainment/Music/News/Kids/Others.
Marital status- Married/Single/Unmarried.
Political parties- CPI/BJP/Cong/TDP/TMC.
IPL teams- KKR/CSK/MI/SRH…

2. Ranked (ordinal) :- Data can be ranked, but cannot be measure


Example - Tall, taller, tallest.
Big, bigger, biggest.
Olympics: First, second, third, fourth….
Thickness: very thick, thick, thin.
Taste: Good, average, below average, bad.
Temperature: freezing, cool, warm, hot.
Customer satisfaction: not satisfied, somewhat satisfied, satisfied, highly satisfied.

3. Interval :- Numerical data, where zero is arbitrary chosen.


Example - Temperature measured in Centigrade.
Temperature measured in Fahrenheit.
Employee satisfaction measured on 1 to 7 scale
1 2 3 4 5 6 7
Not satisfied Highly satisfied
Customer satisfaction measured on 1 to 5 scale
1 2 3 4 5
Note-Zero is arbitrary chosen-
Zero degree Centigrade/Fahrenheit is not zero temperature. Zero Kelvin is 0
temperature.
Zero customer satisfaction is arbitrary.

4. Ratio :- Numerical data, Zero means no value


Example - Height in mm/cm/m
Weight in g/kg/tons
Time in sec/min/hr
Temperature in Kelvin
Humidity in %
Sales (nos, tons, or Rs)

Types of Variables:

Formulas- Average (Arithmetic mean):


1. Discrete sequence:
1
Average = N ∑𝑁
𝑖=1 𝑥𝑖
1
Variance = N ∑𝑁
𝑖=1(𝑥𝑖 − 𝑚𝑒𝑎𝑛)2
Standard deviation = √𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
2.Continuous function:
𝑏
Average = 1/(b-a) * ∫𝑎 𝑓(𝑥) 𝑑𝑥

Measuring variation:
1. Range = Maximum-Minimum
2. Variance, population = σ2 = 1/N * ∑ (xi-Mean)2
3. Standard deviation, population =σ
4. Coefficient of Variation, population = σ/Mean
5. Variance, sample = s2 = 1/(N-1) * ∑ (xi-Mean)2
6. Standard deviation, sample =s
7. Coefficient of Variation, sample = s/Mean
8. Mean absolute deviation = 1/N * ∑ |xi-Mean|
9. Z score (how many std devns is xi
away from the mean) = [xi-Mean]/ σ
10. Quartiles (Q1, Q2, Q3) Smallest 25%, 50%, 75%
observations.
11. Inter-quartile range = Q3 – Q1
12. 5-number summary (Min., Q1, Q2, Q3, Max.) Minimum, 3 Quartiles,
Maximum
13. Boxplot (Called Box and Whisker chart in MS Excel) Plot of 5-
number summary
Z score:-
How far is the observation from mean, in terms of standard deviations.
Z = (ObservedValue - Mean)/Standard deviation
= Error/Standard deviation
Note- Unlike, Mean and Variance, Z score is not a summary measure of the dataset, and it can
be computed for each data point.

Shape of the data- Skewness

Typical frequency distributions- MMM relationships:

Frequency distribution MMM relationship


Negatively skewed Mean < Median < Mode
Symmetric Mean = Median = Mode
Positively skewed Mode < Median < Mean

Exploring numerical data - (Quartiles, 5-number summary, The Box plot)


Quartiles
 Divide the sorted data into 4 quarters- 25% observations in each quarter.
Dataset
40
50
60
80 Q1=90, = (100+80)/2
100
110
120
180 Q2=200, (220+180)/2
220
300
600
700 Q3=800, =(700+900)/2
900
910
930
950

 Q1, First Quartile- Lowest 25% observations.


 Q2, Second Quartile- Lowest 50% observations
 Q3, Third Quartile- Lowest 75% observations.
 Inter-Quartile Range (IQR) = Q3-Q1.
 Notice that Q2 = Median.
 Quartiles are used to study variation in the data, and to spot whether distribution of
data is symmetric.
 Used in 5-number summary and Boxplots.

Few properties of Mean, Median, Mode:


Probability

Probability:-
 Probability measures uncertainty.
 Probability is the chance of happening of an event.

n event.

Getting values of probability

1. A priori: -
 Classical/Equi-likely.
 Textbook examples of Coin tossing, Playing cards, etc, or when you know nothing.

2. Empirical
 From historical data, observations, or experiments.
 Life tables in insurance, earthquakes, rainfall, twins, quality, stock market …

3. Subjective:-
 Personal judgement.
 Outcome of India vs Brazil cricket match outcome. Covid-19 will be over in 2022…

Probability- a priori:

𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦=(𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑖𝑛 𝑤ℎ𝑖𝑐ℎ 𝑡ℎ𝑒 𝑒𝑣𝑒𝑛𝑡 𝑜𝑐𝑐𝑢𝑟𝑠)/(𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓


𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒
𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠)

Types of probability- for two or more events


1. Marginal probability- only one event occurs. P(Red) means the probability that the
card is of Red color.
Example –
a) P(Red)= 26/52 = 1/2.
b) P(King)= 4/52.
c) P(7) = 4/52.
d) P(Picture) = 12/52.
e) P(Diamond) = 13/52.

2. Joint probability- both events occur. P(Red and King) means the probability that the
card is of Red color and it is also a King.
Example –
a) P(Red and King) = 2/52.
b) P(Diamond and Red) = 13/52.
c) P(Picture and Red) = 6/52.
d) P(Black and Red) = 0/52.
e) P( <3 and Red) = 4/52.

3. Conditional probability- P(Red/King) means the probability that the card is Red if the
card is known to be a King.
Example –
a) P(Red/King) = 2/4 = 1/2.
b) P(Red/7) = 2/4.
c) P(Picture/Black) = 6/26.
d) P(Picture/Diamond) = 3/13.

Discrete probability distribution

Probability distribution:
 Probability of all outcomes.
Discrete means the outcome is a-
 Categorical variable (H/T, M/F, Won/Lost/Draw) or
 A integer (1, 2, 3, 4…).
 But not a fraction (0.666, 1.414, 1.618, 2.718, 3.1415, 9.11, etc.). Fractions are
considered in Continuous Probability distributions, like Normal distribution.

Distribution of interruptions per day in a computer network.


What is the average number of interruptions/day?
What is the variation in the number of interruptions/day?

Average interruptions per day


 Mean here is called Expected Value (EV).
 EV= µ= E(X)= ∑ xi * P(xi),
 where xi is a random variable and P(xi) is its probability.
 In the table, As are xi and Bs are P(xi).

Standard deviation of interruptions


 Variance, σ2 = ∑ (xi-µ)2*P(xi)
 where xi is a random variable and P(xi) is its probability.
 In the table, As are xi and Bs are P(xi). EV= µ= ∑ xi * P(xi). EV is
computed in the previous slide

Binomial distribution
1. Empirical
 From historical data
2. Theoretical
o Binominal
o Poisson
o Normal
3. Numerous other theoretical distributions
o F, t, Chi-square (later in the course)
o Uniform, Geometric, Hypergeometric, Beta, Gamma, Maxwell-Boltzman,
Cauchy, Rayleigh, Erlang, ….. (not in the course)
o
1. Binomial Probability
 If I am doing an experiment n times, probability of success (head) is p, and
probability of failure is q or (1-p), what is the chance of getting r successes in the n
trials = (Ways in which we can choose those r trials out of n trials) * p*p*p… upto r
times * q*q*q…. upto (n-r) failures
 Pronbability = nCr * p^r * q^(n-r) = n!/ [r! (n-r)!] * * p^r * q^(n-r)
 Probability of getting all the successes out of n trials = nCn * p^n * q^0 = n! / [n!.
(0!] * p^n * q^0
 Prob(exactly 2 heads in 3 trial) = 3C2 * 0.6^2 * 0.4^1
 What is the chance that out of 5 consecutive days, the rain happens on atleast 2 days
P(2) + p(3) + p(4) + p(5) = 1- [p(0) + p(1)]

2.Poisson distribution:
What is the Probability distribution?
Example-
 Avg no. of no of accidents= 4/day.(arrival (occurrence) rate of accidents per day
Probability of 0, 1, 2, 3, 4, 5… accidents?
 Avg no. of no of potholes= 6/km. (occurrence rate of potholes per km)
Probability of 0, 1, 2, 3, 4, 5… potholes?
 Avg no. of goals= 3.2/match. Probability of 0, 1, 2, 3, 4, 5… goals?
 Avg no. of typos= 2.7/page. Probability of 0, 1, 2, 3, 4, 5… typos?
 Avg no. of shooting stars= 0.3/hour. Probability of 0, 1, 2, 3, 4, 5… shooting
stars?
 Avg no. of teeth cavities= 3.28/patient. Probability of 0, 1, 2, 3, 4, 5… teeth
cavities?
Formula:
ⅇ−𝝀 ⋅ 𝝀𝒙
P(𝑥)=
𝒙!
e = 2.718… a constant.
λ = average number of events, is known,
𝑥! = 𝑥 ⋅ (𝑥 − 1) ⋅ (𝑥 − 2) … 2 ⋅ 1.
𝑥 = 0, 1, 2, 3…. number of events (goals, pot holes, accidents…)
P(𝑥) = probability of 𝑥 events happening.
Probability of x arrivals per unit time= P(x)
Where lambda (𝜆) is the average arrival rate per unit time or per unit
distance

Poisson distribution requirements:


 Event of interest is the number of events in a given interval
(time/length/area).
 The probability that an event occurs is same in all intervals of equal size.
 The number of events that occur in one interval is independent of
number of events in another interval.
 The probability that two or more events will occur in an interval
approaches zero as the interval becomes smaller.
 A tip: Use Poisson distribution if mean= variance.

Time interval Lambda (average Actual arrivals


arrivals)

9am to 10am 3 2
10am to 11am 3 4
11am to 12pm 3 5
12pm to 1pm 3 1
Mean and Variance in Binomial and Poisson distributions

Normal Distribution

 Often only mean and standard deviation of the data is available.


 Distribution of the data is not available.
 But, distribution of the data is essential to make decisions under uncertain conditions.

Distribution of data….
Distribution of data- vertical bars in the graphs- from several phenomenon like distribution
of weight, height, IQ, measurement errors, etc. have been found to be similar-
 Symmetrical about the mean value,
 Concentration towards the mean value, and
 Very few extremely small or large values.

Formula:
Z=(x-mu)/sigma
X= mu + z.sigma
Mu = Mean
Sigma = Standard Deviation
Z-value is measure of how many standard deviations is any observation away from the mean

Normal distribution- introduction


2. her names- Gaussian, Bell-shaped curve, and Law of error.
3. It is a continuous distribution- fractional values like 1.23, 6.66, etc. (distance,
temperature) are allowed on x-axis.
4. The curve ranges from –infinity to +infinity on x-axis.
5. Area under the curve represents probability.
6. Total area under the curve is 1, that is, probability of all the events=1.
7. Normal distribution curve is symmetric around Mean, and for a Normal distribution,
Mean= Median= Mode.
8. Normal distribution is described by only two parameters- mean (μ) and standard
deviation (σ). For each mean and standard deviation, there is a separate curve.
9. Normal curve with mean, μ= 0 and standard deviation, σ = 1 is called Z curve, or
Cumulative Standardized Normal distribution (p-540-541, TB-1).
10. Equation of Normal distribution curve, 𝑓(𝑥)=1/√(2𝜋𝜎^2 ) ⅇ^(−(𝑥−𝜇)^2/(2𝜎^2 ))
11. Since above equation is not easy, tables have been made with area (probability)
under the curve.
12. Distribution of z-value is standard normal distribution, a normal distribution whose
mean is zero, and standard deviation is 1
13. Textbooks have only one Normal Distribution table, for Mean=0, Stdev=1.
14. The Cumulative Standardized Normal Distribution table can be used even if Mean≠0
and Stdev≠1!!!

Confidence Interval Estimation

Census
 Entire population (population, tiger, agriculture, health facilities).
Sampling
 A portion of the population.
 Quality, voting, blood, soil, customer surveys, voice, interviews, ….

Sampling methods
1.Non-probability sampling
 Judgment sampling
 Convenience sampling

2.Probability sampling
A. Simple random sampling
Each item has equal probability of getting chosen (=1/N, N is population size).
B. Systematic sampling
Every nth customer/item/bottle on filling line.
C. Stratified sampling
Samples from each strata: Men/Women (20%/80%); Rural/Urban (30%, 70%);
Steel/Chemicals/Telecom stocks (10%, 20%, 70%); Customers/Non Customers (10%,
90%); Tourist/NonTourist (60%, 40%).
D. Clustered sampling
Samples from geographical district.

Probability sampling

Estimation
Estimation: Proportion and Mean
Note –
1 What was the sample size (because the more is the sample size, the lesser will be the
range
2 What is the variance in the overall population (the higher the population variance, the
higher is the range)
3 What is the confidence level at which you are supposed to answer

Estimation: Proportion and Mean- Examples

Types of estimates
1. Point estimate
 Population mean= Sample mean
 Population proportion= Sample proportion
2. Interval estimates (Range estimate)
 Population mean = Sample mean ± Sampling error
 Population proportion = Sample proportion ± Sampling error

Confidence Interval estimation


Confidence level
Since samples are used to estimate the population mean or proportion, generally point
estimate alone would be wrong.
 Therefore, a confidence interval estimate is constructed around the point estimate.
 The confidence interval is constructed such that the probability that the interval includes
the population parameter is known.
 Mostly 90%, 95% or 99% confidence interval is used.
 t value is used for estimating means.
 Z value is used for estimating proportions.

Estimating population parameters


(Mean and Proportion)
A. Population mean, µ = Sample mean ± Sampling error
1. = Sample mean ± t-Critical value * Std error of mean
2. = Sample mean ± t-Critical value * S/√𝑛
3. = 𝑋̅ ± tvalue * S/√𝑛
B. Population proportion, π = Sample proportion ± Sampling error
1. = Sample proportion ± Z-Critical value * Std error of proportion
a. = Sample proportion ± Z-Critical value *√𝑝 ∗ (1 − 𝑝)/𝑛
2. = 𝑝 ± Zvalue * √𝑝 ∗ (1 − 𝑝)/𝑛
n- sample size , S- Standard deviation of the sample (divide by n-1 and not n).
p- proportion found in the sample. Standard error is standard deviation of the sampling
distribution.
Critical value- from t table for mean, from Z table for proportion
(Quick, dirty trick- Critical value is generally between 1.5 to 3).

t distribution table
1. Area under t distribution=1 and it is Bell-shaped, like Normal distribution.
2. t distribution is defined only by one parameter, degree of freedom- df. Normal
distribution is defined by two parameters, Mean and Stdev.
3. Degree of freedom, df= Sample size-1.
4. t value for Area= 0.25 in right tail and df=10, is 0.700.
5. 90% middle area for 4 degrees of freedom (look for 5% in right tail)-
t= -2.132 to t= +2.132.
6. Excel function (It requires area in left tail, not in right tail)-
=t.inv(area in the left tail,df).
Examples-
=t.inv(0.95,4) = +2.132. 95% area in left tail, df=4
=t.inv(0.975,99)= +1.98. 97.5% area in left tail, df=99.
=t.inv(0.975,26)= +2.056. 97.5% area in left tail, df=26.
7. Not for everyone: The ratio of two Normal distributed variables has t distribution with
n-1 degrees of freedom.

Z distribution table
 Since it is symmetric, 50% area is on each side of Z=0.
 This is Normal distribution table with Mean=0 and Standard deviation=1. Hence also
called Standard Normal distribution table.
 1. 90% area in the middle will be between-
Z= -1.65 to Z= +1.65.
 95% area in middle will be between-
Z= -1.96 to Z= +1.65.

Note:--
 When the population variance is known, we can use z-statistic
 When the population variance is not known, check for the sample size
 If sample size >=30, you can use z statistic
 If sample size <30, you must use t statistic

Determining Sample Size

1. Sample size for estimating population mean


Formula for Sample size, n
n = Z2value σ2 /e2
Zvalue= from Z table
For specified confidence level (90%, 95%, 99%, etc). Middle area.
σ= variance of the population.
e= sampling error specified.

2. Sample size for estimating population proportion


Formula for Sample size, n
n = Z2 value * π * (1-π)/e2
Zvalue= from Z table (P-450-541, TB-1). For specified confidence level (90%, 95%, 99%, etc).
Middle area.
π= estimate of proportion in population.
e= sampling error specified.
Generally π is unknown. Use π= 0.5. It will give the highest possible sample size.

Sampling Distributions
Relationship between characteristics of population and samples
 A sample is drawn from a population with known characteristics. What are the
characteristics of the sample?
 Characteristics of the sample are known. What are the characteristics of its population?
 Characteristics means-
Mean, Standard deviation, Proportion, Median, Mode, Skewness, Kurtosis, etc

Mean, X1’= 45.8


Sd, S1= 24.9

Mean, X2’= 37.8


Sd, S2= 32.6

Mean, X3’= 35.5


Mean, µ= 36.3
Sd, σ= 29.8 Sd, S3= 18.9

Sampling distribution
A sample is drawn from a population with known properties. What are the properties of the
sample?
 Chapter-7: Sampling Distributions.
Characteristics of the sample will depend on the items that get drawn. That is, sample
characteristics will have a distribution, called- sampling distribution.
It has been shown that the distribution of sample characteristics have following relationship
with the characteristics of the population-
1. Mean of the population= Mean of the sampling distribution.
2. Standard deviation of sampling distribution (distribution of sample means) = σ/√𝒏;
narrower than that of the population distribution.
3. The sampling distribution will tend to be Normal distribution as sample size
increases.
Similarly for proportions, it can be shown that
1. Proportion in the population= Mean proportion of the sampling distribution.
2. Standard deviation of sampling distribution (distribution of sample proportions)=
√𝑝 x (1 − 𝑝)/𝑛
Population, Sampling distribution, of sample size
known n=2 n=3 n-10 n=20
Mean µ (58.7) X' (59.7) X' (59.0) X' (59.0) X' (58.5)
Standard deviation σ (16.3) S (11.9) S (8.5) S (4.9) S (3.4)
Standard Error, SE= σ/√n
Standard Error 11.5 9.4 5.2 3.6

Note—
 Binomial distribution is analogous to normal distribution
Normal distribution is continuous
 Binomial distribution is discrete. When a binomial distribution is considered for
inifinite number of trials, it approaches normal distribution
 If population mean does not follow normal distribution, sample means will tend to
follow normal distribution for large sample sizes
 Similarly, if the population proportion does not follow binomial distribution, the
sample proportions will tend to follow binomial distribution for large sample sizes

Fundamentals of Hypothesis Testing: One-Sample Tests

Hypothesis- intuitive
 If the sample mean is close to the stated population mean, the null hypothesis is not
rejected.
 If the sample mean is far from the stated population mean, the null hypothesis is
rejected.
 How far is “far enough” to reject the claim, H0?
 The critical value of a test statistic creates a “line in the sand” for decision making- it
answers the question of how far is far enough.

Null hypothesis, Ho: Avg. tyre life, µ = 100,000 kms.


Alternative hypothesis, H1: Avg. tyre life, µ ≠ 100,000 kms.

Null and Alternate Hypothesis


Two-tail and One-tail test
= and ≠ Two-tail test
>= and < One-tail test
<= and > One-tail-test
Ho: Mean size of components = 200 mm Two-tail test
H1: Mean size of components ≠ 200 mm.

Ho: Mean size of components ≥ 200 mm One tail test


H1: Mean size of components < 200 mm.

Ho: Mean size of components ≤ 200 mm One-tail test


H1: Mean size of components > 200 mm.

Ho: Proportion of defective components ≥ 4% One-tail test


H1: Proportion of defective components < 4%.

Type 1 and Type 2 error


Type 1 error: Rejecting the null hypothesis when it is actually true
Type 2 error: Accepting the null hypothesis when it is actually false

A.100% inspection
 Inspect every part in each batch.
 Used when the consequence of failure is severe- medical, aeronautics or the item is
expensive.
 Expensive, and time consuming.
B.Sampling
 Take a sample; accept or reject the entire batch.
 Takes less time and low cost, but risky.
 Two types of errors- rejecting good items (α) and accepting bad items (β).
 𝛼 is also called producer’s risk, because the loss is borne by the producer, and β is called
consumer’s risk, because the loss is borne by the consumer.

3 equivalent methods for hypothesis testing


1. Confidence Interval method
 Take a sample and make the confidence interval. Then check whether the
claim is within or outside the interval.
 This simple approach is mentioned briefly on p-307 in Textbook-1, and not
pursued further.

2. Critical value (used extensively in the textbook)


 For the claim made, get critical value from t or Z table and make the interval.
Then take a sample, compute t or Z value, and check whether this computed
test value is within or outside the interval.

3. p value (in the textbook, but requires computer)


 Computer gives p value. Compare the p value with the level of significance (α,
usually 5%). If p value is smaller, reject null hypothesis.

Hypothesis of means
A.tStat value for hypothesis testing
i)tStat value = ( X̅ − μ) / (S/√n )
tStat value = (Sample mean – Claimed Population mean) / (Sample Std.Dev /Sqrt(Sample
size))
Standard error, SE = S/√n
ii)tCritical
tCritical: The value is taken from t table
Degree of freedom, df= n-1= sample size-1.

B.ZStat value for hypothesis testing


i)ZStat value
𝑍𝑆𝑡𝑎𝑡 𝑣𝑎𝑙𝑢𝑒 = (𝑝 − 𝜋)/√𝜋 ∗ (1 − 𝜋)/𝑛
ZStat value = (Sample proportion – Claimed proportion) / Standard error
Standard Error, SE = √π ∗ (1 − π)/n

ii) ZCritical
ZCritical with significance with value taken from table

Two-Sample Tests and ANOVA


Two Sample Tests

Two/Three sample tests- examples


Two-Sample Tests and ANOVA Examples
Talk time: Prepaid subscribers = Postpaid subscribers.
Comparing the means of two Yield: Fertilizer A = Fertilizer B.
independent populations Days to recover: Paracetamol = Crocin.
Life: Crompton bulb = Philips bulb.
Productivity: Before training = After training.
Comparing the means of two
Runs: First innings = Second innings.
related populations
Weight loss: Before VLCC = After VLCC - 4kg.
Customer satisfaction: % Ola = %Uber.
Comparing the proportions of two
ExitPollPreferOpposition: %Urban = %Rural.
independent populations
DefaultRate: %HomeLoan = %PersonalLoan.
BSE Index Volatility: Before Budget = After Budget.
F test for the difference between
Production: Has variation increased after machine adjustment.
two variances
Marks variation: Math exam > Language exam.
One way ANOVA Life: Battery A = Battery B = Battery C.
(ANOVA compares the means of StockReturns: IT = Automobile = Banking.
more than two populations). Yield: Fertilizer A = Fertilizer B = Fertilizer C.

1. Comparing the means of two independent populations


A. tStat value: One-Sample and Two-Sample tests
a) For One-sample test
tStat = (𝑋̅ - µ) / SE = (𝑋̅ - µ) / (S/√𝑛) = (𝑋̅ - µ) / √𝑆2 /𝑛

 𝑋̅ - Sample mean; S- standard deviation of sample; n- sample size, µ- Hypothesized


population mean.
 Use t test and Degrees of freedom= sample size-1.
 H0: µ = 100K kms
H1: µ ≠ 100K

b)For Two-sample tests


1 1
̅̅̅̅ - 𝑋2
tStat = [ (𝑋1 ̅̅̅̅) - (µ1 - µ2) ] / SE = [(𝑋1
̅̅̅̅- 𝑋2
̅̅̅̅) - (µ1 - µ2) / √𝑆𝑝2 ∗ ( + )
𝑛1 𝑛2

 ̅̅̅̅ and 𝑋2
𝑋1 ̅̅̅̅ - Means of the two samples, S1 and S2- StdDev. of the two samples; n1
and n2- sizes of the two samples; µ1 and µ2- Hypothesized means of two populations.
o Sp2= [(n1-1)*S12 + (n2-1)*S22] / [(n1-1) + (n2-1)] Sp2 is pooled variance
 Use t test and Degrees of freedom= n1-1 + n2-1 = n1+n2-2.
 H0: µ1 = µ2
H1: µ1 ≠ µ2

2. Comparing the means of two related populations


A. tStat value: One-Sample and Two related Sample tests
a) For One-sample test
tStat = (𝑋̅ - µ) / SE = (𝑋̅ - µ) / (S/√𝑛)
 𝑋̅- Sample mean; S- standard deviation of sample; n- sample size.
 µ- Hypothesized population mean.
 Use t test. Degrees of freedom= sample size-1.
 H0: µ = 100K kms
H1: µ ≠ 100K kms
b) For Two related samples test
̅ - µD) / SE = (𝐷
tStat = (𝐷 ̅ - µD) / (SD/ √𝑛 )
 𝐷 ̅ - Mean difference of sample; SD- Standard deviation of difference, n- sample
size
 µD- Hypothesized population mean difference.
 For t test, Degrees of freedom= sample size-1

3. Comparing the proportions of two independent populations


A. Computing ZStat value: One-Sample and Two-Sample tests
a) For One-sample test
ZStat = (p - π) / SE = (p - π) / (√π ∗ (1 − π)/𝑛 )
 p- Sample proportion; n- sample size.
 π- Hypothesized proportion.
 Use Z test.
 H0: π = 15%
H1: π ≠ 15%
Say 15%.

b) For Two-sample tests


1 1
ZStat= [ (p1-p2)-(π1-π2) ] / SE = [ (p1-p2)-(π1-π2) ] /√𝑝̅ ∗ (1 − 𝑝̅ ) ∗ (𝑛1 + 𝑛2)
 p1 and p2- Sample proportions; n1 and n2- sizes of two samples;
 π1 and π2- Hypothesized population proportions;
 𝑝̅ -pooled proportion, 𝑝̅ = (X1+X2)/(n1+n2). Pooled proportion from samples.
 X1, X2- number who said Yes in sample 1 and sample 2 respectively.
 Use Z test.
 H0: π1 = π2
H1: π1 ≠ π

4. One way ANOVA (Analysis of Variance)


ANOVA procedure
Compute variance from the samples in two ways-
 Among groups, σA2. There are 4 groups here- A, F, K, and E.
 Within groups, σW2. There are 20 within groups here.
If the means of all 4 populations are equal, then both the variances should be (nearly)
equal, σA2 = σW2, or their ratio, σA2/ σW2 should be (nearly) equal to 1.
 If the ratio is (nearly) equal, then null hypothesis is not rejected.
 If the ratio is large or small, then null hypothesis is rejected.
 The acceptance/rejection is given by F table .
 Ho: Variances between the groups <= Var within the groups
H1: Variances between the groups > Var within the groups
 Ho: µI = µF = µK = µE
H1: All 4 population means are not equal.
Mobile sales in 60-days, $
In-aisle, I Front, F Kiosk, K Expert, E
30.06 32.22 30.78 30.33
29.96 31.47 30.91 30.29
30.19 32.13 30.79 30.25
29.96 31.86 30.95 30.25
29.74 32.29 31.13 30.55
Sample average, 𝑋̅ 29.98 31.99 30.91 30.33

a) Computing Variance among groups, σA2


Variance among groups-
 First compute Sum of Squares Among (SSA) groups- Square of Difference of each
Sample Mean from the Grand mean, with number of observations as weightage.
 SSA = 5*(29.98-30.81)2 + 5*(31.99-30.81)2 + 5*(30.91-30.81)2 + 5*(30.33-30.81)2.
= 11.622.
 Now divide SSA by df (=no of groups-1) to get variance among the groups-
 σA2 = MSA= SSA/df = 11.6212/(4-1) = 3.87.
Mobile sales in 60-days, $
In-aisle, I Front, F Kiosk, K Expert, E
30.06 32.22 30.78 30.33
29.96 31.47 30.91 30.29
30.19 32.13 30.79 30.25
29.96 31.86 30.95 30.25
29.74 32.29 31.13 30.55
Sample Mean, 𝑋̅ 29.98 31.99 30.91 30.33
Grand mean, 𝑋 30.81 Mean of all 20 observations

b) Computing Variance between groups, σW2


Variance within groups-
 First compute Sum of Squares Within (SSW) groups- Square of Difference of each
observation from its Sample mean.
SSW = (30.06-29.98)2 + (29.96-29.98)2 + (30.19-29.98)2+ (29.96-29.98)2 +
(29.74-29.98)2 +
(32.22-31.99)2 ……………… ……………………………………….…+
(32.29-31.99)2 +
(30.78-30.91)2 ………………. ……………………………………..…+
(31.13-30.91)2 +
(30.33-30.33)2 ………………… ……………………………….….…+
(30.55-30.33)2
= 0.7026
 Now divide SSW by df (=no of observations-no of groups) to get Variance within
the groups-
 σW2 = MSW= SSW/df = 0.7026/(20-4) = 0.044.
c) ANOVA- Use F test
FStat = σA2 / σW2 = MSA/MSW = 3.8739/0.0439 = 88.2.
FCritical = 3.24, for Alpha =0.05 (area in right tail 5%),
df numerator=3, df denominator=16. p-545, TB-1,
Since Fstat > FCritical, null hypothesis is rejected, with 10% risk.
Better: Variation in sales of mobiles in the four locations is not equal.
Mobile sales in 60-days, $
In-aisle, I Front, F Kiosk, K Expert, E
30.06 32.22 30.78 30.33
29.96 31.47 30.91 30.29
30.19 32.13 30.79 30.25
29.96 31.86 30.95 30.25
29.74 32.29 31.13 30.55
Sample Mean, 𝑋̅ 29.98 31.99 30.91 30.33
Grand mean, 𝑋 30.81 Mean of all 20 observations

5. F test for the difference between two variances


Computing FStat and FCritical value
1. FStat = SA2/ SB2 [Keep larger Variance on Top, in the Numerator]
 SA and SB , variances of Sample A and Sample B.
2 2

2. Critical value, from F distribution.


 Degrees of freedom, df, in the numerator (top) and
degrees of freedom in the denominator (bottom).
 df of numerator, sample size for numerator-1.
 df of denominator, sample size of denominator-1.

Chi-square tests
Chi-Square Test for differences among more than two proportions
1. Compute from data: χ2Stat = ∑ [ (foi - fei)2/fei ]
2. Get χ2Critical from χ2 table
for df = (No of rows-1)*(No of colums-1), and alpha (usually 0.05).

3. If χ2Stat is larger than χ2Critical, Reject null hypothesis, else do not reject null hypothesis.

4. foi – Observed frequency


fei – Expected frequency

5. Ho: πA = πB = πC.
H1: πA, πB, and πC are not equal.

6. Use Chi-square only if every expected frequency >5.

7. In a Chi-square test, do not divide the area into two tails. That is, entire alpha is in
the right tail.

For Expected Frequency


 Expected frequency if P(A)=P(B)=P(C)=P(D)

Correct Observed Expected


option frequency, fo frequency, fe *
A 15 20
B 19 20
C 25 20
D 21 20
Tota l 80 80
 For More Than One sample
Test of independence
Dependent and Independent events
Two events A and B are-
Independent when,
 P(A) = P(A/B)
Ex- P(King)= 4/52= 1/13. P(King/Red)= 2/26= 1/13 P(King)=
P(King/Red), hence independent
 Or, P(A and B) = P(A) * P(B)
EX-P(King and Red)= 2/52 P(King) * P(Red)= 4/52 * 26/52=2/52. P(King and Red)=
P(King) * P(Red), hence independent.
Dependent when,
 P(A) ≠ P(A/B).
Ex-P(King)= 4/52= 1/13. P(King/Picture)= 4/12= 1/3. P(King)≠
P(King/Picture), hence dependent.
 Or, P(A and B) ≠ P(A) * P(B)
Ex-P(King and Picture)= 4/52 P(King) * P(Picture)= 4/52 * 12/52 P(King and
Picture) ≠ P(King)*P(Picture), hence dependent.

Degree of freedom, df- in statistics


1. Variance = Sum of square of errors from mean/df
 For population variance, df= number of observations
 For sample population, df=number ofobservations-1. (1 df is lost since mean is
computed from the sample)
2. t distribution
 df= sample size-1 (1 df is lost since standard
deviation is computed from the sample)
3. F distribution
 dfN= sample sizeN-1, dfD=sample sizeD-1 (1 df is lost each in Numerator
and in Denominator since means are computed from the samples).
4. Chi square distribution: df depends on the problem
 df= (No of colums-1)*(No of rows-1). in independent/dependent problems
(Because Sum of each rows and Sum of each column are known).
 df= Number of rows-1. in the KBC example. (Because 1 df,
Sum of the rows (=80), was known).

Simple Linear Regression


The covariance and the coefficient of correlation
Relationship between two variables…

Covariance between x and y = corr coeff between x and y * std dev of x * std
dev of y
Cov = summation of (xi-xbar)(yi-ybar)/ (n-1)
corr coeff between x and y = summation of (xi-xbar)(yi-ybar)/ (n-1) / (std dev
of x * std dev of y)

Typical relationships
1. Direction of relationship, + or - ?
2. Strength of relationship… in number

Measuring the strength of relationship


1. Covariance
2. Correlation coefficie

Computing Covariance

 Covariance can be negative, positive or zero.


 If X=Y, then Covariance = Variance.

Correlation coefficient, r
𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋,𝑌)
Correlation coefficient, r = 𝑆𝑡𝑑𝑒𝑣(𝑋) ∗ 𝑆𝑡𝑑𝑒𝑣(𝑌)
1. Correlation coefficient, r, is dimensionless.
2. Its value ranges between -1 to +1.
Formulas

A. Sample Covariance, Cov(X,Y):


1
𝐶𝑜𝑣(𝑋, 𝑌) = ∑ (𝑋 − 𝑋 ′ ) ∗ (𝑌 − 𝑌 ′ )
(𝑛−1)

1
 If Y=X, then Cov(X,Y) = (𝑛−1) ∑ (𝑋 − 𝑋′)2 = Variance.
1
 If X’=0 and Y’=0, then Cov(X,Y) = (𝑛−1) ∑ (𝑋 ∗ 𝑌).

B. Correlation Coefficient, r:

Cov(X, Y)
r=
Stdev(X) ∗ Stdev(Y)

Uses of correlation coefficient, r


 If r is closer to +1 or -1:
o Predicting/forecasting
 One variable can be used to predict another variable- Rainfall-production,
Rainfall-stock market, Image recognition and matching, Aircraft detection
by Radar.
o Redundancy
 Interview relevant after written test? Dimension reduction-Principal
Component Analysis (PCA), Data compression (JPEG).
o There may be a cause-and-effect relationship
 Smoking causes cancer? Social distancing/masks reduce Covid?

 If r is closer to -1:
o Risk can be reduced
 Buy several stocks instead of single stock in the portfolio- Mutual Fund.

 if r is closer to 0:
o Two variables are not related; both add to our knowledge.

Determining Simple Linear Regression equation


Y=a+bX
Y- Weight
X- Height
a- intercept
b- slope

𝑆𝑆𝑋𝑌
b= 𝑆𝑆𝑋
 SSXY= ∑ (𝑋 − 𝑋 ′ ) ∗ (𝑌 − 𝑌 ′ )
 SSX= ∑ (𝑋 − 𝑋 ′ ) ∗ (𝑋 − 𝑋′)

a = Y’ – b X’
Y’- average of Y
X’- average of X

Measures of variations
R2- Coefficient of determination
R2
 It measures how close are data points to the fitted straight line.
 Its value ranges between 0 and +1.
 If the fit is poor, then R2 is close to zero.
 If all points are on the straight line, then R2=1.
 Generally computed from software.
 MS Excel function, =RSQ(Y_Range,X_Range)

Introduction to Linear Programming


Which products to produce and how much?

Which media to choose for promotion?


Which models to produce and how many?

The nature of preceding problems


Multiple and simultaneous decisions are to be taken:
 Which models and how many to produce?
 Which stocks and how how many to buy?
 Which media and how much to spend?
 Possible feasible decisions are numerous.
 Each decision yields different benefit or cost.
 The managers is interested in the best combination of decisions.
Obtaining the best combination of decisions-
 The benefit (or cost) is expressed as a simple linear equation.
 Limited resources, usage, demand, etc. are expressed in simple linear equations.
 Usually computer is used to solve the equations- except for very small, classroom
problems.
Types of equations-
 One objective function with multiple decision variables, and
 Multiple constraints.

The equations
1.Maximize Revenue, R
R = 5 Butter + 10 Cheese ….1

2.Constraint
1 Butter + 4 Cheese ≤ 100 ….2
Butter ≥ 0, Cheese ≥ 0. ….3, 4.

3.Highest revenue is achieved when,


 Butter, x= 100 kg and Cheese, y= 0 kg.
 Highest Revenue is Rs 500.

Production planning problem

You might also like