0% found this document useful (0 votes)

96 views22 pages

Tutorial 5

(1) A 90% confidence interval for the population variance of steel shaft lengths was found to be (0.263, 1.332) based on a sample of 10 shafts. (2) A chi-square test showed that the population variance could not be concluded to be greater than 0.4 at the 5% significance level. (3) The assumption of a normally distributed population was necessary for the chi-square test and confidence interval calculation.

Uploaded by

Bake A Doo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views22 pages

Tutorial 5

Uploaded by

Bake A Doo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

ECON20003 – QUANTITATIVE METHODS 2

TUTORIAL 5

Download the t5e1, t5e2, t5e3 and t5e4 Excel data files from the subject website and save
them to your computer or USB flash drive. Read this handout and try to complete the tutorial
exercises before your tutorial class, so that you can ask help from your tutor during the Zoom
session if necessary.

After you have completed the tutorial exercises attempt the “Exercises for assessment”. You
must submit your answers to these exercises in the Tutorial 5 Homework Canvas
Assignment Quiz by the next tutorial in order to get the tutorial mark. For each assessment
exercise type your answer in the relevant box available in the Quiz or type your answer
separately in Microsoft Word and upload it in PDF format as an attachment. In either case,
if the exercise requires you to use R, save the relevant R/RStudio script and printout in a
Word document and upload it together with your written answer in PDF format.

Inferences about Population Variances

Let’s start with a single population variance. Suppose we take all possible random samples
of the same size (n) from a normal population, X: N(, 2), calculate the sample variance
(s2) from each, and develop the relative frequency distribution of the sample variances, i.e.
the sampling distribution of the sample variance estimator.

It can be shown that the sample variance is an unbiased estimator of the population
variance, i.e. E(s2) = 2, and that the sum of squared deviations from the sample mean
divided by the population variance follows a chi-square distribution with n-1 degrees of
freedom, i.e.


n
(xi  x)2 (n 1)s2
i1
  n21
 2
 2

From this result, (i) the (1-)100% confidence interval estimate of 2 is

 (n 1)s2 (n 1)s2 
 2 , 2 
 
 /2,n1 1/2,n1 

where  /2,n1 and 1 /2,n1 are the (1-/2)100% and /2 100% percentiles of the chi-
2 2

square distribution with n – 1 degrees of freedom, and (ii) the test statistic for H0 : 0
2 2

against a one-sided or two-sided alternative hypothesis is

(n 1)s2
 n21
 2
0

1
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
The chi-square test and the corresponding confidence interval for a population variance are
based on the following assumptions:

i. The data is a random sample of independent observations.

ii. The variable of interest is quantitative and continuous.
iii. The measurement scale is interval or ratio.
iv. The sampled population is normally distributed.

Exercise 1 (Selvanathan, p. 581, ex. 14.10)

A company manufactures steel shafts for use in engines. One method for judging
inconsistencies in the production process is to determine the variance of the lengths of the
shafts. A random sample of 10 shafts was taken and their lengths measured in centimetres.
These measurements are saved in the t5e1 file. Do the calculations first manually and then
with R.

a) Find a 90% confidence interval for the population variance σ2, assuming that the lengths
of the steel shafts are normally distributed, at least approximately.

Use your calculator to find the sample standard deviation of length and square it to
obtain s2 = 0.493.

The sample size is 10 and the confidence level is (1-)100% = 90%, so the required
chi-square percentiles from Table 5 of Selvanathan (Appendix B, p. 1077) are1

2/2,n1  0.05,9
2
 16.9 and 12 /2,n1  0.95,9
2
 3.33

Therefore, the confidence interval estimate is

 (n 1)s2 (n 1)s2   90.493 90.493

 2 , 2    ,   (0.263,1.332)
 
 /2,n1 1/2,n1   16.9 3.33 

Hence, with 90% confidence, the population variance of the lengths of the shafts is
somewhere between 0.263 and 1.332 squared centimetres.2

b) In order to meet quality requirements, the production process of steel shafts has to be
suspended and the machines adjusted as soon as possible when the variance of the
lengths of the shafts is larger than 0.4 squared centimetres. At the 5% level of
significance, can we conclude that the population variance σ2 is greater than 0.4
squared centimetres and thus some urgent adjustment is required? Assume again that
the lengths of the steel shafts are normally distributed.

1 These chi-square percentiles can be also obtained by applying the R quantile function qchisq(alpha, df = ),
i.e. by executing the qchisq(0.05, df = 9) and qchisq(0.95, df = 9) commands.
2
The question is about the population variance. If it were about the population standard deviation, we should
take the square roots of these confidence interval limits.
2
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
The question implies the following hypotheses:

H0 :  2  0.4 , HA : 2  0.4

This is a right-tail tests and the 5% critical value with 9 degrees of freedom is the same
than the upper 2 table value in part (a), i.e. 16.9, and H0 is rejected if the observed test
statistic exceeds this critical value.

The test static value calculated from the sample at hand is

(n 1)s2 90.493
obs
2
  11.0925
 2
0 0.4

Since it is smaller than the critical value, at the 5% significance level we cannot reject
H0 and cannot conclude that urgent adjustment is needed.

To complete parts (a) and (b) with R, launch RStudio, create a new project and script,
name them t5e1, and import the data saved in the t5e1 Excel data file to RStudio. In
Tutorial 3 you already installed the DescTools package and used its SignTest function.
Another function in this package is

VarTest(x, sigma.squared = sigmasq0, alternative = " ")

where x is the variable of interest, sigmasq0 is the hypothesized population variance

value (1 by default) and alternative is one of "two.sided" (default), "greater" or "less".3

Execute the

library(DescTools)
VarTest(length, sigma.squared = 0.4, alternative = “greater”)

commands to obtain4

One Sample Chi-Square test on variance

data: length
X-squared = 11.103, df = 9, p-value = 0.2687
alternative hypothesis: true variance is greater than 0.4
95 percent confidence interval:
0.2624863 Inf
sample estimates:
variance of x
0.4934444

The variable of interest is length and the alternative hypothesis is true variance is greater
than 0.4. The test statistic value is 11.103, about the same than the one calculated
manually. The p-value is 0.2687, so H0 is maintained at the 5% significance level.

3
Like other tests, by default this test is also performed at the 5% significance level. Use the optional conf.level
argument if your significance level is different.
4
If you use R 4.0.2 and get an error message, try VarTest(length, sigma.squared = 0.4, alternative = “great”).
3
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
The printout also shows the 95% confidence interval, but since we performed a right-tail
test, it is open from above. To reproduce the confidence interval obtained in part (a),
you need to request a two-tail test.

VarTest(length, sigma.squared = 0.4)

returns the following printout.

One Sample Chi-Square test on variance

data: length
X-squared = 11.103, df = 9, p-value = 0.5375
alternative hypothesis: true variance is not equal to 0.4
95 percent confidence interval:
0.2334571 1.6445776
sample estimates:
variance of x
0.4934444

It shows the 95% confidence interval, (0.233, 1.645).

c) What assumption did we have to make to answer parts (a) and (b)?

The confidence interval estimator and the chi-square test of 2 require that the sampled
population be normally distributed. Although these techniques remain valid for some
moderate deviations from normality, they are less robust than the confidence interval
estimator and the t-test of . Unfortunately, given the small sample size, this time we
cannot rely on the standard checks of normality.

Suppose now that we are interested in the comparison of the variances of two normally
distributed populations, X1: N(1, 12) and X2: N(2, 22). Assuming that we draw
independent random samples of sizes n1 and n2 from these populations, and calculate the
sample variances (s12, s22) from each

(n1 1)s12 (n2 1)s22

 n211 and  n22 1
 2
1  2
2

The ratio of two independent chi-square random variables divided by their respective
degrees of freedom has an F distribution, so

n2 1 /(n1 1) s12 / 12

1
 F
n2 1 /(n2 1) s22 / 22 n 1,n 1
2
1 2

From this result, (i) the (1-)100% confidence interval estimate of 12 / 22 is

 s12 / s22 s12 / s22 

 , 
F F
  /2,n11,n2 1 1 /2,n11,n2 1 
4
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
where F /2,n11,n2 1 and F1 /2,n11,n2 1 are the (1-/2)100% and /2 100% percentiles of the F-
distribution with df1 = n1 – 1 numerator degrees of freedom and df2 = n2 – 1 denominator
degrees of freedom, and (ii) the test statistic for H0: 12 / 22 = 1 against a one-sided or two-
sided alternative hypothesis is

s12
 Fn11,n2 1
s22

The F-test and the corresponding confidence interval for a population variance are based
on the following assumptions:

i. The data consists of two independent random samples of independent observations.

ii. The variable of interest is quantitative and continuous.
iii. The measurement scale is interval or ratio.
iv. The sampled populations are normally distributed.

Exercise 2 (Selvanathan, p. 596, ex. 14.37)

An important statistical measurement in service facilities (such as restaurants, banks and

car washes) is the variability of service times. As an experiment, two tellers at a bank were
observed, and the service times of 100 customers served by each of the tellers were
recorded (Teller1 and Teller2 in seconds) and saved in the t5e2 Excel file.

a) The sample variances are s12 = 3.346 and s22 = 10.950. Estimate the ratio of the two
population variances with 95% confidence.

The variables of interest is service times at bank tellers. This is a quantitative and
continuous variable and is measured on a ratio scale. Assuming that the two service
times, X1 and X2, are independent and normally distributed and that the samples are
random, we can use the confidence interval estimator mentioned on the previous page.

Both sample sizes are 100 and the confidence level is (1-)100% = 95%, so the
required F percentiles from Table 6(b) of Selvanathan (Appendix B, pp. 1080-81)5 are

F /2,n1 1,n2 1  F0.025,99,99  F0.025,100,100  1.48

and

5
Tables 6(a)-6(d) provide F values only under the right tail of the various F distributions. F values under the
left tail can be determined from right-tail F values using the following formula: F1-,df1,df2 = 1 / F,df2,df1. Note also,
that we need to round df1 and df2 up to 100 because the F-tables provide the percentiles for selected degrees
of freedom only. The exact F percentiles could be obtained by applying the R quantile function qf(alpha, df1 =
, df2 = ), i.e. by executing the qf(0.025, df1 = 99, df2 = 99) and qf(0.975, df1 = 99, df2 = 99) commands.
5
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
1 1
F1 /2,n1 1,n2 1  F0.975,99,99  F0.975,100,100    0.676
F0.025,100,100 1.48

Therefore, the confidence interval estimate is

 s12 / s22 s12 / s22   3.346/10.950 3.346/10.950 

 ,    ,   (0.206,0.452)
  /2,n11,n2 1 1 /2,n11,n2 1  
F F 1.48 0.676 

With 95% confidence, the ratio of the variances of the populations of the service times
at the two tellers is somewhere between 0.206 and 0.452 (seconds2).

b) Do the data allow us to infer at the 10% significance level that the variances in service
times differ between the two tellers?

The question implies the following hypotheses:

12 12
H0 : 2 1 , HA : 2  1
2 2

Assume again that both service times are at least approximately normally distributed,
so that we can perform the F-test described on page 4.

This is a two-tail test and the 10% critical values are

F /2,n11,n2 1  F0.05,99,99  F0.05,100,100 1.39

and
1 1
F1/2,n11,n2 1  F0.95,99,99  F0.95,100,100    0.719
F0.05,100,100 1.39

H0 is rejected if the observed test statistic is smaller than the lower critical value (0.719)
or larger than the upper critical value (1.39).

The observed test static value is

s12 3.346
Fobs    0.306
s22 10.950

Since it is smaller than the lower critical value, we reject H0 at the 10% significance level
and conclude that the variances in service times differ between the two tellers.

Performing this test manually, it is possible to simplify the procedure a bit by labelling
the population with the larger sample variance as population 1. By doing so we assure

6
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
that the sample statistic is not smaller than one and hence we need only the upper
critical value straight from the F tables. The observed test statistic is

10.950
Fobs   3.273
3.346

and since it is larger than the upper critical value, we would again reject H0.6

To complete parts (a) and (b) with R, you can use the same VarTest command then in
Exercise 1, but with a slightly different set of arguments. This time, the general form of
the command is

VarTest(x, y, ratio = ratio0, alternative = " ")

where x and y are the variables to be compared, ratio0 is the hypothesized ratio of the two
population variances (by default, it is 1), and alternative is like before.

Launch RStudio, create a new project and script, name them t5e2, import the data
saved in the t5e2 Excel data file to RStudio, and execute the following commands:

library(DescTools)
VarTest(Teller1, Teller2)

You should get

F test to compare two variances

data: x and y
F = 0.30561, num df = 99, denom df = 99, p-value = 1.045e-08
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.2056242 0.4542014
sample estimates:
ratio of variances
0.3056056

The variable of interest is the variance of x (Teller1) divided by the variance of y (Teller2),
the alternative hypothesis is true ratio of variance is not equal to 1. The ratio of the
sample variances, and hence the test statistic value, is about 0.306. The numerator and
denominator degrees of freedoms are both 99, and the p-value is practically zero, so H0
is can be rejected at any reasonable significance level.

c) What assumption did you have to make in order to answer parts (a) and (b)? Try to
verify whether that assumption is reasonable this time.

6
A word of caution is in order: if the test were a one-tail test, swapping the sample variances would imply that
the alternative hypothesis must be altered as well, for example, from HA: 12 / 22 < 1 to HA: 22 / 12 > 1.
7
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
In parts (a) and (b) we assumed that the samples are random and independent and that
both sampled populations are normally distributed, at least approximately. Take the first
two requirements granted and perform the usual diagnostics for normality.

hist(Teller1, freq = FALSE, col = "blue")

lines(seq(0, 16, by = 0.1),
dnorm(seq(0, 16, by = 0.1), mean(Teller1), sd(Teller1)),
col="red")

hist(Teller2, freq = FALSE, col = "green")

lines(seq(0, 16, by = 0.1),
dnorm(seq(0, 16, by = 0.1), mean(Teller2), sd(Teller2)),
col="red")

return the histograms shown on the next page. The superimposed normal curves seem
to fit to the relative frequency distributions.

qqnorm(Teller1, main = "Normal Q-Q Plot for Teller1",

xlab = "Theoretical Quantiles", ylab = "Sample Quantiles", col = "blue")
qqline(Teller1, col = "red")
qqnorm(Teller2, main = "Normal Q-Q Plot for Teller2",
xlab = "Theoretical Quantiles", ylab = "Sample Quantiles", col = "green")
qqline(Teller2, col = "red")

produce the normal QQ plots displayed on page 10. On both plots most of the points
fall close to the reference line.

The

library(pastecs)
round(stat.desc(Teller1, basic = FALSE, desc = TRUE, norm = TRUE), 3)
round(stat.desc(Teller2, basic = FALSE, desc = TRUE, norm = TRUE), 3)

commands provide the following statistics for Teller1 and for Teller2, respectively:

For both samples, the mean and the median are fairly similar, skewness and excess
kurtosis are only insignificantly different from zero, and p-values of the Shapiro-Wilk test
are above 0.5.
8
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
Histogram of Teller1

0.25
0.20
0.15
Density

0.10
0.05
0.00

4 6 8 10 12

Teller1

Histogram of Teller2
0.00 0.02 0.04 0.06 0.08 0.10 0.12
Density

0 5 10 15

Teller2

9
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
Normal Q-Q Plot for Teller1

12
10
Sample Quantiles

8
6
4

-2 -1 0 1 2

Theoretical Quantiles

Normal Q-Q Plot for Teller2

15
Sample Quantiles

10
5
0

-2 -1 0 1 2

Theoretical Quantiles

All things considered, the normality assumption is supported by all diagnostic checks.7

7
When the normality assumption is unreasonable, one can use the nonparametric Siegel-Tukey test for
10
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
Inferences about Population Proportions

Consider a binary population X, which has only two possible elements, “success” coded as
1 and “failure” coded as 0. Suppose we are interested in the proportion (relative frequency)
of successes in this population, denoted as p. It is given by the usual population mean
formula,

1 N
p Xi
N i1

It can be estimated from a random sample of n observations with the sample proportion,

1n
pˆ  Xi
n i1

which is also the sample mean of the (0;1) binary sample.

Depending on whether sampling is with or without replacement, the sample proportion (p-
hat) is a binomial or hypergeometric random variable. When the sample size is large (np 
5 and nq  5, and in the case of sampling without replacement, n < 0.05N as well), the
sampling distribution of p-hat can be approximated with a normal distribution,

pˆ  N  pˆ , pˆ 
pq
, pˆ  p ,  pˆ 
n

Consequently, (i) an approximate (1-)100% confidence interval estimate of p is

ˆˆ
pq
pˆ  z /2spˆ , spˆ 
n

and (ii) the test statistic for H0: p = p0 against a one-sided or two-sided alternative hypothesis
is

pˆ  p0
Z  N(0;1)
p0q0
n

These large-sample procedures are only approximations to the exact procedures based on
the binomial distribution. Although we discuss neither the details of the exact procedures
nor how to do them manually, we shall perform them with R.

equality in variability. This test is also available in the DescTools package (SiegelTukeyTest), but we do not
learn about it in this course.
11
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
The exact binomial test and the corresponding confidence interval confidence interval for a
population proportion assume the following:

i. The data consists of a random sample of independent observations.

ii. The variable of interest is qualitative with two mutually exclusive but exhaustive
categories.
iii. The measurement scale is nominal or ordinal.
iv. The sampled population is binomially distributed.

Note that when normal approximation is used, on top of these requirements, np  5, nq  5

(and n < 0.05N, if sampling without replacement) must be also satisfied.

Exercise 3

Suppose that in a survey of 600 employers are asked whether they have used a recruitment
service within the past two months to find new staff. The responses are saved in the t5e3
Excel file.

a) Construct a 99% confidence interval for the population proportion of employers who
have used a recruitment service within the past two months to find new staff.

The exact population proportion is unknown, so strictly speaking we cannot check

whether the ‘large-sample’ requirements, np  5 and nq  5, are satisfied. At best we
can replace p with its estimate p-hat, and if n(p-hat) and n(q-hat) are both relatively
large, then we can be willing to assume that np  5 and nq  5 are also satisfied justifying
the large-sample confidence interval estimation of the population proportion.

Launch RStudio, create a new project and script, name them t5e3, import the data
saved in the t5e3 Excel data file to RStudio, and execute the following commands:

attach(t5e3)
table(used)

It returns the frequency distribution of used:

used
no yes
474 126

In general, the

table()

R function returns a basic contingency table. To obtain relative frequencies, we also

need the

length()

12
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
function, which returns the length of vectors and other R objects. In this case, the length
of the use vector is the sample size, so execute

table(used) / length(used)

to get the following relative frequency distribution:

used
no yes
0.79 0.21

Alternatively, we can combine table with the

prop.table()

function, which returns proportions (relative frequencies). Hence, by executing the

prop.table(table(used))

command, you get the same output than before.

These frequency and relative frequency distributions show that in the given sample 126
employers out of 600 (i.e. 21%) used a recruitment service within the past two months
to find new staff, so the sample proportion is 0.21. Using this sample proportion,

npˆ  600  0.21  126 , nqˆ  600  (1  0.21)  474

are both well above 5, so we can expect np  5 and nq  5 to be also satisfied.8

The estimated standard error of this sample proportion is9

ˆ ˆ 0.210.79
pq
spˆ    0.0166
n 600

The confidence level is 99% and from the Standard Normal table

z/2  z0.005  2.576

Therefore, the 99% confidence interval is

pˆ  z /2spˆ  0.21 2.576  0.0166  (0.167,0.253)

8
n(p-hat) and n(q-hat) are the expected numbers of successes and failures in the sample, granted that the
probability of success is equal to p-hat and the probability of failure is equal to q-hat, and they are the same
than the relative frequencies of yes and no.
9
The sample proportion is always a small number between zero and one, and the estimate of its standard
error is even smaller. In order to avoid unreasonable loss of precision, it is recommended to do the manual
calculations with a precision to 4 or even more decimal places. Once you determined the required confidence
interval, you can round its limits if you wish.
13
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
It implies that, with 99% confidence, the proportion of employers who have used a
recruitment service within the past two months to find new staff is between 16.7% and
25.3%.

With R, this confidence interval can be generated by the binom.test command that you
already used in Tutorial 3. Execute

binom.test(126, 600, conf.level = 0.99)

to obtain

Exact binomial test

data: 126 and 600
number of successes = 126, number of trials = 600, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
99 percent confidence interval:
0.1687938 0.2559079
sample estimates:
probability of success
0.21

The reported 99% confidence interval is (0.169, 0.256). It is slightly wider than the one
we got manually because it is based on the correct binomial distribution while ours was
based on the normal approximation of the binomial distribution.

b) Based on the survey data, is there sufficient evidence at the 0.05 level of confidence
that more than 20% of all employers have used a recruitment service within the past two
months to find new staff?

The question implies the following hypotheses:

H0 : p  0.2 , HA : p  0.2

The hypothesized value of the population proportion is 0.20. Using this hypothesized
value,

np0  600  0.2  120 , nq0  600  (1  0.2)  480

and it is reasonable to assume that np and nq are also large enough to rely on the
normal approximation.

The null hypothesis also implies the following standard error:

p0q0 0.20.8
  0.0163
n 600

This is a right-tail Z-test and the 5% critical value is

z  z0.05 1.645

14
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
The observed test static value is

pˆ  p0 0.21 0.20
zobs    0.613
p0q0 0.0163
n

Since it is smaller than the critical value, we cannot reject H0 at the 5% significance
level. Hence, there is not enough evidence to conclude that more than 20% of all
employers have used a recruitment service within the past two months to find new staff.

By default, the binom.test function assumes that the hypothesized population proportion
is 0.5. In part (a) this was fine because we were interested only in the confidence interval
estimate, which does not depend on the hypothesized population proportion. This time,
however, we are interested in a right-tail test at the 5% significance level with p0 = 0.2,
so we need to execute the following command:10

binom.test(126, 600, p = 0.2, alternative = “greater”, conf.level = 0.95)

It returns

Exact binomial test

data: 126 and 600
number of successes = 126, number of trials = 600, p-value = 0.2849
alternative hypothesis: true probability of success is greater than 0.2
95 percent confidence interval:
0.1829191 1.0000000
sample estimates:
probability of success
0.21

The p-value is 0.2849, implying that H0 cannot be rejected at the 5% significance level.

We can arrive at the same conclusion by applying the

prop.test(x, n, p = p0, alternative = " ")

command, where x is the number of successes, n is the sample size, p0 is the

hypothesized population proportion (0.5 by default).11

In this case,

prop.test(126, 600, p = 0.2, alternative = "greater", conf.level = 0.95)

returns the following printout:12

10
Since the default confidence level is 0.95, this time the last argument could be dropped from the command.
11
This command performs a chi-square test. The chi-square distribution was introduced in the Week 4 lecture
and you will learn about the chi-square test in the lectures next week.
12
If you use R 4.0.2 and get an error message, try prop.test(126, 600, p = 0.2, alternative = "great", conf.level
= 0.95).
15
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
1-sample proportions test with continuity correction
data: 126 out of 600, null probability 0.2
X-squared = 0.3151, df = 1, p-value = 0.2873
alternative hypothesis: true p is greater than 0.2
95 percent confidence interval:
0.1831912 1.0000000
sample estimates:
p
0.21

As you can see, the reported p-value is almost the same than on the previous printout.

Let’s now turn our attention to the two binary populations case. Suppose that the population
proportions are p1 and p2, that we draw independently a random sample from each
population, and that the sample sizes, n1 and n2, are large enough so that n1p1, n1q1, n2p2
and n2q2 are all at least 5 (and ni < 0.05Ni if sampling is without replacement).

Under these conditions, (i) an approximate (1-)100% confidence interval of the difference
between the two population proportions, p1  p2, is

pˆ1qˆ1 pˆ 2qˆ2
 pˆ1  pˆ 2   z /2 spˆ  pˆ
1 2
, spˆ1  pˆ2  
n1 n2

and (ii) the test statistic for H0: p1  p2 = D0 against a one-sided or two-sided alternative
hypothesis follows the standard normal distribution reasonably well, but its actual formula
depends on D0.

Namely, on the one hand, if D0 = 0 and hence under the null hypothesis p1 = p2, the common
population proportion is best estimated from the pooled sample by

f1  f2
pˆ 
n1  n2

and the test statistic is

pˆ1  pˆ2
Z  N(0,1)
spˆ1 pˆ2

where

1 1
spˆ1 pˆ2  pq
ˆ ˆ  
 n1 n2 

16
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
On the other hand, if D0  0 and hence under the null hypothesis p1  p2, the two population
proportions must be estimated separately by

f1 f
pˆ1  , pˆ2  2
n1 n2

In this case the estimated standard error is

pˆ1qˆ1 pˆ 2qˆ2
spˆ1  pˆ2  
n1 n2

and the test statistic is

Z
 pˆ1  pˆ 2   D0  N (0,1)
spˆ1  pˆ2

These are again large-sample procedures, approximations to the exact procedures based
on the binomial distribution.

The exact binomial test and the corresponding confidence interval for the difference between
two population proportions assume the following:

i. The data consists of two independent random samples of independent observations.

ii. The variable of interest is qualitative.
iii. The measurement scale is nominal or ordinal.
iv. The sampled populations are binomially distributed and have the same two mutually
exclusive but exhaustive categories.

Note that when normal approximation is used, on top of these requirements, n1p1  5, n1q1
 5, n2p2  5, n2q2  5, (and n1 < 0.05N1, n2 < 0.05N2, if sampling without replacement) must
be also satisfied.

Exercise 4 (Selvanathan, p. 455, ex. 11.47 and p. 559, ex. 13.68)

The impact of the accumulation of carbon dioxide in the atmosphere caused by burning of
fossil fuels such as oil, coal and natural gas has been hotly debated for more than a decade.
Some environmentalists and scientists have predicted that the excess carbon dioxide will
increase the Earth’s temperature over the next 50 to 100 years with disastrous
consequences.

To gauge the public’s opinion on the subject, a random sample of 400 people was asked
two years ago whether they believed in the greenhouse effect. This year, 500 people were
asked the same question. The results are recorded as 1 = believe in greenhouse effect and
0 = do not believe in greenhouse effect, the variables concerning belief are denoted as X1

17
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
(first sample, i.e. two years ago) and X2 (second sample, i.e. this year) and the data are
saved in the t5e4 Excel file.

a) Estimate the real change in the public’s opinion about the subject using a 90%
confidence level.

X1 and X2 are two independent qualitative variables and defining “success” as 1 =

believe in greenhouse effect, the sample proportions13 are

pˆ1  0.62 , pˆ2  0.52

Using these sample proportions as estimates of the corresponding population

proportions,

npˆ1  248 , n(1 pˆ1) 152 , npˆ2  260 , n(1 pˆ2 )  244

They are all much bigger than 5, so the normal approximation is a reasonable option.

The reliability factor at the 90% confidence level is

z /2  z0.05  1.645

The estimate of the standard error is

pˆ1qˆ1 pˆ 2qˆ2 0.62 0.38 0.52 0.48

spˆ1  pˆ2      0.0330
n1 n2 400 500

Therefore, the 90% confidence interval is

 pˆ1  pˆ 2   z /2 spˆ  pˆ
1 2
 (0.62  0.52)  1.645  0.033  (0.0457,0.1543)

It implies that, with 90% confidence, the proportion of believers in the greenhouse effect
has decreased by 4.6% to 15.4%.14

In R, this confidence interval can be generated like the one in Exercise 3. The

table(X1)
table(X2)

commands return the frequencies:

13
To save time, these sample proportions were obtained by R, like in Exercise 3.
14
Recall that the confidence interval has been developed for the change in the proportion of believers between
the first and second surveys, so a positive p1 – p2 value indicates that the true proportion of believers was
larger two years ago than it is today.
18
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
X1
0 1
152 248

and

X2
0 1
240 260

There are 248 successes in the first sample and 260 in the second.

The confidence interval for the difference between the two population proportions is
provided by the prop.test function, but this time its x and n arguments (the number of
successes and the sample size) have two elements and they need to be specified by
the c() function as c(x1, x2) and c(n1, n2), respectively.

prop.test(x = c(248,260), n = c(400,500), conf.level = 0.90)

generates the following printout:

2-sample test for equality of proportions with continuity correction

data: c(248, 260) out of c(400, 500)
X-squared = 8.6369, df = 1, p-value = 0.003294
alternative hypothesis: two.sided
90 percent confidence interval:
0.04348977 0.15651023
sample estimates:
prop 1 prop 2
0.62 0.52

The reported 90% confidence interval, (0.0435, 0.1565), is almost identical to the one
we obtained manually. The difference between them is due to the continuity correction
that is referred to in the heading of the printout. By default, the prop.test function applies
this correction, called Yates’ correction for continuity, though it only makes practical
difference when the sample sizes are small (i.e. some of n1p1, n1q1, n2p2 and n2q2 is/are
smaller than 5). To see its impact, add the correct = FALSE argument to the pervious
command,

prop.test(x = c(248,260), n = c(400,500), conf.level = 0.90, correct = FALSE)

The new printout is

2-sample test for equality of proportions without continuity correction
data: c(248, 260) out of c(400, 500)
X-squared = 9.039, df = 1, p-value = 0.002643
alternative hypothesis: two.sided
90 percent confidence interval:
0.04573977 0.15426023
sample estimates:
prop 1 prop 2
0.62 0.52

The 90% confidence interval is (0.0457, 0.1543), the same than the one we obtained
manually.

19
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
b) Can we infer at the 10% significance level that there has been a decrease in belief in
the greenhouse effect?

Since X1 denotes the belief in the greenhouse effect in the first sample (drawn two years
ago) and X2 denotes the belief in the greenhouse effect in the second sample (drawn
this year), the hypotheses are

H0 : p1  p2  0 , HA : p1  p2  0

In this case D0 = 0, so the common population proportion is estimated from the pooled
sample,

f1  f2 248 260
pˆ    0.564
n1  n2 400500

The estimate of the standard error is

1 1  1 1 
spˆ1 pˆ2  pq
ˆ ˆ     0.5640.436    0.0333
 1 2
n n  400 500 

and the test statistic is

pˆ1  pˆ2 0.62  0.52

zobs    3.003
spˆ1 pˆ2 0.0333

The critical value is

z  z0.1 1.282

and since it is smaller than the test statistic, we reject the null hypothesis and conclude
at the 10% significance level that there has been a decrease in belief in the greenhouse
effect.

To perform this test in R, execute the following command:

prop.test(x = c(248,260), n = c(400,500),

alternative = "greater", conf.level = 0.90, correct = FALSE)

It returns

20
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
2-sample test for equality of proportions without continuity correction
data: c(248, 260) out of c(400, 500)
X-squared = 9.039, df = 1, p-value = 0.001321
alternative hypothesis: greater
90 percent confidence interval:
0.05772434 1.00000000
sample estimates:
prop 1 prop 2
0.62 0.52

The reported test statistic is a chi-square random variable, 2 = 9.039.

How does it compare to the standard normal test statistic we calculated manually, z =
3.003? As you can see on the printout, the degrees of freedom of this chi-square random
variable is one, and you learnt on the week 4 lectures that the square of a standard
normal random variable is a chi-square random variable with df = 1. And indeed, as you
can check easily, apart from some rounding error,

2
zobs  3.0032  9.018  1,2obs

For the sake of comparison, we applied the prop.test function without continuity
correction. Note, however, that in general, there is no reason to overwrite the default
option. If you run

prop.test(x = c(248,260), n = c(400,500),

alternative = "greater", conf.level = 0.90)

you get

2-sample test for equality of proportions with continuity correction

data: c(248, 260) out of c(400, 500)
X-squared = 8.6369, df = 1, p-value = 0.001647
alternative hypothesis: greater
90 percent confidence interval:
0.05547434 1.00000000
sample estimates:
prop 1 prop 2
0.62 0.52

The new test statistic is smaller and hence its p-value is larger, but since the sample
sizes are fairly large, the differences are negligible.

21
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5
Exercises for Assessment

Exercise 5

In Exercise 2 of Tutorial 4, first we developed a confidence interval for the difference

between the mean ages of purchasers and non-purchasers of a particular brand of
toothpaste (part a), and then performed a t-test to see whether there was sufficient evidence
to conclude that there was a difference in the mean age of purchasers and non-purchasers
(part b). Based on the sample variances, in both cases we assumed that the two unknown
population variances are different. Let’s check now whether this assumption is supported by
the data.

Namely, using the same data,

a) Estimate the ratio of the two population variances with 95% confidence.

b) Can we conclude at the 5% significance level that the population variances differ? What
do you conclude if the significance level is increased to 10%?

In parts (a) and (b) alike, do the calculations both manually and with R.

Exercise 6 (Selvanathan, p. 558, ex. 13.58)

In a public opinion survey, 60 out of a sample of 100 high-income voters and 40 out of a
sample of 75 low-income voters supported the introduction of a new national security tax.
Can we conclude at the 5% level of significance that there is a difference in the proportion
of high- and low-income voters favouring a new national security tax? Do the calculations
both manually and with R.

22
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 5

Tutorial 5
No ratings yet
Tutorial 5
21 pages
Session13 - Inferences From Population Variances
No ratings yet
Session13 - Inferences From Population Variances
21 pages
14 SBE11e PPT Ch11
No ratings yet
14 SBE11e PPT Ch11
42 pages
Hypothesis Test - Population Variance (1,2)
No ratings yet
Hypothesis Test - Population Variance (1,2)
10 pages
Inference About One Population Variance: Outline
No ratings yet
Inference About One Population Variance: Outline
10 pages
1. C.I and H.T. (Variance) Slides
No ratings yet
1. C.I and H.T. (Variance) Slides
28 pages
Inferencefor Variances
No ratings yet
Inferencefor Variances
30 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
CHAPTER 3 - 1 CHI SQUARE TEST Hypothesis Tests About The Variances
No ratings yet
CHAPTER 3 - 1 CHI SQUARE TEST Hypothesis Tests About The Variances
29 pages
L2 - Inference About One Population Variance
No ratings yet
L2 - Inference About One Population Variance
8 pages
5th Chap 1st
No ratings yet
5th Chap 1st
6 pages
Final Exam of Business Statistics I at ADA University
No ratings yet
Final Exam of Business Statistics I at ADA University
14 pages
2025-L1-Seminars_unlocked
No ratings yet
2025-L1-Seminars_unlocked
40 pages
Tutorial 7
No ratings yet
Tutorial 7
26 pages
Intro To Essential Stats With Python
No ratings yet
Intro To Essential Stats With Python
51 pages
Chapter 9-Inference About A Population
No ratings yet
Chapter 9-Inference About A Population
23 pages
Point Estimation and Interval Estimation
No ratings yet
Point Estimation and Interval Estimation
4 pages
Selvanathan 6e - 15 - PPT
No ratings yet
Selvanathan 6e - 15 - PPT
42 pages
Chapter 9 (Independent Means Only) UPDATED!!!
No ratings yet
Chapter 9 (Independent Means Only) UPDATED!!!
27 pages
Chuong 2 - Suy Dien Ve Phuong Sai Tong The (SBE - 11e Ch11)
No ratings yet
Chuong 2 - Suy Dien Ve Phuong Sai Tong The (SBE - 11e Ch11)
41 pages
20171025141013chapter-3 Chi-Square-Test PDF
No ratings yet
20171025141013chapter-3 Chi-Square-Test PDF
28 pages
Chapters4 5 PDF
No ratings yet
Chapters4 5 PDF
96 pages
Chapters4 5 PDF
No ratings yet
Chapters4 5 PDF
96 pages
FormulaSheet FinalExam
No ratings yet
FormulaSheet FinalExam
8 pages
Statistika 2
No ratings yet
Statistika 2
21 pages
ch11 Inferences About Population Variance
No ratings yet
ch11 Inferences About Population Variance
21 pages
WILP ASM Mid-Sem Makeup Solutions
No ratings yet
WILP ASM Mid-Sem Makeup Solutions
4 pages
Confidence interval and credintial interval
No ratings yet
Confidence interval and credintial interval
15 pages
Lecture 4_Chap15
No ratings yet
Lecture 4_Chap15
42 pages
Inferential Analysis
No ratings yet
Inferential Analysis
9 pages
CH 11
No ratings yet
CH 11
39 pages
Section 5.7
No ratings yet
Section 5.7
47 pages
Formuleblad-statistiek
No ratings yet
Formuleblad-statistiek
10 pages
STAT501 Online - Spring2024 - FinalExam
No ratings yet
STAT501 Online - Spring2024 - FinalExam
14 pages
Estimating Population Variances
No ratings yet
Estimating Population Variances
17 pages
Midterm 2023 Sol
No ratings yet
Midterm 2023 Sol
10 pages
Power
No ratings yet
Power
29 pages
Two Sample and Paired T-Test - Population Variances
No ratings yet
Two Sample and Paired T-Test - Population Variances
44 pages
Estimation
No ratings yet
Estimation
18 pages
00 Lab Notes
No ratings yet
00 Lab Notes
8 pages
Elementary-statistics-Group-4_20250402_132652_0000
No ratings yet
Elementary-statistics-Group-4_20250402_132652_0000
31 pages
05 - Estimating a Proportion
No ratings yet
05 - Estimating a Proportion
5 pages
Formula Help Sheet
No ratings yet
Formula Help Sheet
6 pages
Tutorial Confidence Interval
No ratings yet
Tutorial Confidence Interval
21 pages
STAT501 Online FinalExam Fall2024
No ratings yet
STAT501 Online FinalExam Fall2024
14 pages
Practice Solutions
No ratings yet
Practice Solutions
4 pages
Reliance JIO
No ratings yet
Reliance JIO
69 pages
课本附录 (二) - 公式表 Formula Sheet - final
No ratings yet
课本附录 (二) - 公式表 Formula Sheet - final
2 pages
FormaShee ff t2024
No ratings yet
FormaShee ff t2024
9 pages
WILP ASM Mid-Sem (Regular) Solutions
No ratings yet
WILP ASM Mid-Sem (Regular) Solutions
4 pages
Tests of Significance
No ratings yet
Tests of Significance
36 pages
2.6*_F-test
No ratings yet
2.6*_F-test
22 pages
11.estimation IV
No ratings yet
11.estimation IV
62 pages
Bab 4
No ratings yet
Bab 4
7 pages
HW12 Sol
No ratings yet
HW12 Sol
9 pages
STA1007S Lab 10: Confidence Intervals: October 2020
No ratings yet
STA1007S Lab 10: Confidence Intervals: October 2020
5 pages
Pertemuan 3 Statlan
No ratings yet
Pertemuan 3 Statlan
32 pages
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Data Science - UNIT-2 - Notes
No ratings yet
Data Science - UNIT-2 - Notes
13 pages
38326
No ratings yet
38326
43 pages
Get Generalized Linear Models and Extensions 4th Edition James W. Hardin PDF ebook with Full Chapters Now
100% (4)
Get Generalized Linear Models and Extensions 4th Edition James W. Hardin PDF ebook with Full Chapters Now
50 pages
Random Vibration 845826
No ratings yet
Random Vibration 845826
22 pages
MGQ301 Syllabus Sp20 PDF
No ratings yet
MGQ301 Syllabus Sp20 PDF
9 pages
Stats Book
100% (1)
Stats Book
238 pages
Associations Between Borgs' Rating of Perceived Exertion and Physiological Measures of Exercise Intensity # Control
No ratings yet
Associations Between Borgs' Rating of Perceived Exertion and Physiological Measures of Exercise Intensity # Control
9 pages
APLICACION DE INCERTIDUMBRE EN LA EVALUACION DE LA CONFORMIDAD Fulltext - Pubblicato
No ratings yet
APLICACION DE INCERTIDUMBRE EN LA EVALUACION DE LA CONFORMIDAD Fulltext - Pubblicato
14 pages
Development of Network Model For Overhaul Maintenance of Meyer 78/18 Bottle Filling Machine
No ratings yet
Development of Network Model For Overhaul Maintenance of Meyer 78/18 Bottle Filling Machine
14 pages
Normal Distribution Problems
No ratings yet
Normal Distribution Problems
2 pages
Assignment 2
0% (1)
Assignment 2
3 pages
CCCJ Statistics Formula Sheet & Tables
No ratings yet
CCCJ Statistics Formula Sheet & Tables
11 pages
POA - Tracker MACHINE LEARNING
100% (1)
POA - Tracker MACHINE LEARNING
48 pages
Complete Download of Essentials of Modern Business Statistics with Microsoft Office Excel 7th Edition Anderson Test Bank Full Chapters in PDF DOCX
100% (3)
Complete Download of Essentials of Modern Business Statistics with Microsoft Office Excel 7th Edition Anderson Test Bank Full Chapters in PDF DOCX
53 pages
Basic Statistics for the Behavioral Sciences 7th Edition Gary Heiman download
100% (3)
Basic Statistics for the Behavioral Sciences 7th Edition Gary Heiman download
55 pages
ETC 8 - Probability and Statistics
0% (1)
ETC 8 - Probability and Statistics
3 pages
Regression Models For Forecasting Goals and Match Results in Association Football
No ratings yet
Regression Models For Forecasting Goals and Match Results in Association Football
10 pages
BTM MiniCase Excel
No ratings yet
BTM MiniCase Excel
8 pages
Time Frequency Reassignment: A Review and Analysis Stephen Hainsworth - Malcolm Macleod CUED/F-INFENG/TR.459
No ratings yet
Time Frequency Reassignment: A Review and Analysis Stephen Hainsworth - Malcolm Macleod CUED/F-INFENG/TR.459
28 pages
Applied Statistics From Bivariate Through Multivariate Techniques Second
100% (67)
Applied Statistics From Bivariate Through Multivariate Techniques Second
62 pages
Slides Chap5 KernelMethods
No ratings yet
Slides Chap5 KernelMethods
24 pages
Statistics for Data Science by Mihir Patnaik
No ratings yet
Statistics for Data Science by Mihir Patnaik
103 pages
Get Introduction to Computation and Programming Using Python, Third Edition John V. Guttag free all chapters
100% (2)
Get Introduction to Computation and Programming Using Python, Third Edition John V. Guttag free all chapters
41 pages
Unit 3 - Lesson 2 - Conversion of Standard Nomal Distribution
No ratings yet
Unit 3 - Lesson 2 - Conversion of Standard Nomal Distribution
27 pages
Assignment No - 3
No ratings yet
Assignment No - 3
2 pages
Reg Mods
No ratings yet
Reg Mods
137 pages
517 (Sims, Princeton) PDF
No ratings yet
517 (Sims, Princeton) PDF
6 pages
General Linear Model (GLM)
No ratings yet
General Linear Model (GLM)
58 pages
A Statistical Approach To Evaluating The Manufacture of Furosemide Tablets
No ratings yet
A Statistical Approach To Evaluating The Manufacture of Furosemide Tablets
8 pages
CH 06 Solutions
No ratings yet
CH 06 Solutions
8 pages