0% found this document useful (0 votes)
92 views27 pages

Quantitative Techniques - Ii: Dr. Pritha Guha

Here are the steps to test for difference in salary with respect to gender: 1. Conduct two-sample t-test: t.test(Salary~Gender, var.equal=TRUE, data=DSal) 2. The cut-off value for t(df=50) at 5% level of significance is ±1.676 3. The p-value is greater than 0.05 4. We fail to reject the null hypothesis. There is no significant difference in salary with respect to gender.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views27 pages

Quantitative Techniques - Ii: Dr. Pritha Guha

Here are the steps to test for difference in salary with respect to gender: 1. Conduct two-sample t-test: t.test(Salary~Gender, var.equal=TRUE, data=DSal) 2. The cut-off value for t(df=50) at 5% level of significance is ±1.676 3. The p-value is greater than 0.05 4. We fail to reject the null hypothesis. There is no significant difference in salary with respect to gender.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

QUANTITATIVE

TECHNIQUES - II
Dr. Pritha Guha
MORE THAN TWO POPULATIONS
Comparing the means of more than two populations
• Annual savings using public transportation in 4 large American cities (in $):

• We believe that the annual saving for the 4 cities are the same. We would have
𝐻0 : mean savings for all cities are same
𝐻1 : mean savings vary across the cities
boxplot(savings~city, col = c("red", "blue", "yellow", "plum"), main = "Boxplot of
Annual savings using public transportation in 4 large American cities")
Comparing the means of more than two populations: Set up
• Suppose we have k (≥ 3) samples.
• We would like to know whether they are from the same distribution.
• Assumption: all samples are from normal distributions with same variance (unknown)
• Samples:
• Sample 1 : 𝑋11 , 𝑋12 , ⋯ , 𝑋1𝑛1 𝐼𝐼𝐷 𝑁(𝜇1 , 𝜎 2 )
• Sample 2 : 𝑋11 , 𝑋12 , ⋯ , 𝑋1𝑛1 𝐼𝐼𝐷 𝑁(𝜇2 , 𝜎 2 )

• Sample k : 𝑋11 , 𝑋12 , ⋯ , 𝑋1𝑛1 𝐼𝐼𝐷 𝑁(𝜇𝑘 , 𝜎 2 )

• 𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘 , 𝐻0 : at least one 𝜇𝑖 is different


Comparing the means of more than two populations:
Analysis Of Variance(ANOVA)
• Analysis of variance, is about comparing the means of multiple populations.
• A one-way layout is an experimental set up where independent measures are made under
several treatments (factor/groups).
• The sample sizes for every group may or may not be the same.
Comparing the means of more than two populations: Set up
Set Up
• 𝑋𝑖𝑗 : jth observation from the ith group, 𝑗 = 1,2, ⋯ , 𝑛𝑖 and 𝑖 = 1,2, ⋯ , 𝑘
• Thus 𝑋𝑖𝑗 ~𝑁 𝜇𝑖 , 𝜎 2 , 1 ≤ 𝑗 ≤ 𝑛𝑖 , 1 ≤ 𝑖 ≤ 𝑘.
• Model: 𝑋𝑖𝑗 = 𝜇𝑖 + 𝜖𝑖𝑗 , 𝑗 = 1,2, ⋯ , 𝑛𝑖 , 𝑖 = 1, 2, ⋯ , 𝑘. So,
𝜖𝑖𝑗 ~𝑁 0, 𝜎 2 , 𝐼𝐼𝐷, , 𝑗 = 1,2, ⋯ , 𝑛𝑖 , 𝑖 = 1, 2, ⋯ , 𝑘.

Assumptions
• Normality: all samples have to be from normal distributions.
• Independence: samples need to be independent.
• Equal variance: all populations must have equal variance
• Difference, if any, is therefore only through the means.
Alternative Representation
• Let μ: overall mean,
𝛼𝑖 : Differential effect of the i-th factor/ treatment/ group, then,
𝜇𝑖 = 𝜇 + 𝛼𝑖
• The model becomes: 𝑋𝑖𝑗 = 𝜇 + 𝛼𝑖 + 𝜖𝑖𝑗 , 𝑗 = 1, 2, ⋯ , 𝑛𝑖 , 𝑖 = 1, 2, ⋯ , 𝑘
• We now test for
𝐻0 : 𝛼1 = 𝛼2 = ⋯ = 𝛼𝑘 = 0, 𝐻1 : at least one 𝛼𝑖 is not 0

• A Model Restriction: The 𝛼𝑖 ’s are the differential effects from mean level, thus,
𝑘
• σ𝑖=1 𝑛𝑖 𝛼𝑖 = 0 , if model is unbalanced.
𝑘
• σ𝑖=1 𝛼𝑖 = 0, if model is balanced.
Some Estimates
1 𝑛𝑖
• The grand sample mean, 𝑋ത00 = σ𝑘𝑖=1 σ𝑗=1 𝑋𝑖𝑗 is unbiased for overall mean μ.
𝑛
Thus 𝜇ො = 𝑋ത00
1 𝑛𝑖
• Sample mean for each group 𝑋ത𝑖0 = σ 𝑋 . 𝑋ത𝑖0 is unbiased for 𝜇𝑖 = 𝜇 + 𝛼𝑖 .
𝑛𝑖 𝑗=1 𝑖𝑗

Thus, 𝜇ො + 𝛼ො𝑖 = 𝑋ത𝑖0


• Hence 𝛼ො𝑖 = 𝑋ത𝑖0 − 𝑋ത00
The Sum of Squares
• Sum of squared variation between the groups/ treatments (SSB/SSTR):
σ𝑘𝑖=1 𝑛𝑖 𝑋ത𝑖0 − 𝑋ത00 2 = σ𝑘𝑖=1 𝑛𝑖 𝛼ො𝑖2
• Sum of squared variation within the groups/treatments (SSE/SSW):
𝑘 𝑛𝑖 𝑘 𝑛𝑖
2 2
෍ ෍ 𝑋𝑖𝑗 − 𝑋ത𝑖0 = ෍ ෍ 𝑋𝑖𝑗 − 𝜇ො − 𝛼ො𝑖
𝑖=1 𝑗=1 𝑖=1 𝑗=1

𝑘 σ𝑛𝑖 2
σ ത00
• Total Sum of Squares (SST): 𝑖=1 𝑗=1 𝑋𝑖𝑗 − 𝑋

• A Result: SST can be partitioned as follows: SST=SSTR+SSE


• PublicT = read.csv(file.choose(), header=T)
• attach(PublicT)
• names(PublicT)

In R
• #Calculation by hand
• unique(city)

• xGT = mean(savings)
• xB = mean(savings[city == "Boston"])
• xNY = mean(savings[city == "NY"])
• xSF = mean(savings[city == "SF"])
• xC = mean(savings[city == "Chicago"])

In R
𝑛𝑖 2
SST= σ𝑘𝑖=1 σ𝑗=1 𝑋𝑖𝑗 − 𝑋ത00 =

• SSB/SSTR= σ𝑘𝑖=1 𝑛𝑖 𝑋ത𝑖0 − 𝑋ത00 2 =


A Test Statistic
Under 𝐻0
𝑆𝑆𝑇𝑅 2
• ~ 𝜒 𝑘−1
𝜎2
𝑆𝑆𝐸 2
• ~ 𝜒𝑛−𝑘 , where 𝑛 = 𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘
𝜎2
• SSTR and SSE are independent.

Test Statistic
𝑆𝑆𝑇𝑅Τ(𝑘−1)
• Under 𝐻0 , 𝐹 = ~𝐹 𝑘−1 ,(𝑛−𝑘)
𝑆𝑆𝐸 Τ(𝑛−𝑘)

Note: We define
• 𝑀𝑆𝑇𝑅 = 𝑆𝑆𝑇𝑅Τ(𝑘 − 1)
• 𝑀𝑆𝐸 = 𝑆𝑆𝐸 Τ(𝑛 − 𝑘)
Analysis Of Variance (ANOVA) Table

• Under 𝐻0 , 𝐹~𝐹𝑘−1,𝑛−𝑘
• Reject 𝐻0 if observed 𝐹 > 𝐹𝑘−1,𝑛−𝑘;𝛼 at level α.
• Rejection of 𝐻0 means that not all group means are equal.
In R
• PublicT.Anova=aov(savings~city)
• summary(PublicT.Anova)
Analysis Of Variance (ANOVA) Table

• Cut-off at 5% level: 𝐹3,20;0.05 = 3.098


• Observed F-stat > cut-off
• Reject 𝐻0 : meaning savings is not same for all the cities
Why do we reject 𝐻0 for large F?
• Ideally, no variation between the groups under 𝐻0
• Under 𝐻0 , MSTR should be small (and MSE should remain unaffected)
• F = MSTR/MSE should be relatively small under 𝐻0 .
• Reject 𝐻0 for large values of observed F.
Bartlett's Test for Homogeneity of Variance

• To test 𝐻0 : 𝜎12 = 𝜎22 = ⋯ = 𝜎𝑘2 and 𝐻1 : not all 𝜎𝑖2 ’s are equal.
• Assumption: Data is from a normal distribution.
2
• Under 𝐻0 , the test statistic follows 𝜒𝑘−1 .
• Reject 𝐻0 at significance level α, if
2
• observed test statistic value > 𝜒𝑘−1;𝛼
• or if p-value < α
In R

bartlett.test(savings ~ city)
A Problem
• Does salary depend on gender?
• A data consisting of observations on three variables for 52 tenure-track professors in a
small college was collected to test this opinion (see DisSalary.csv).
• The variables are:
• Gender: Male/Female
• JobRank: Full Professor (full), Associate Professor (associate), Assistant Professor
(assistant)
• Salary: Salary of the faculty('000 Rs.)

• Assume that the data are normally and independently distributed.


• Suppose that we want to test at a 5% level of significance whether there is a difference
in salary with respect to gender.
boxplot(Salary~Gender, col = c("red", "blue"), main = "Boxplot of salary based on gender")
library(plotly)
p2 <- ggplot(DSal, aes(x=Gender, y=Salary, fill=Gender)) +
geom_boxplot()
ggplotly(p2)
A Problem: Continued
• 𝐻0 : 𝐻1 :

• Cut-off value:
• p-value:
• Conclusion:

You might also like