qm2 Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

Quantitative Methods 2

Lecture 2
What You Need to Know
What are random variables, the normal and the t distributions.
Random Variable (RV): is a function or rule that assigns a number to each outcome of an
experiment.
Discrete RV: contain finite or countably infinite number of possible values (e.g. number of heads
when flipping coins twice)
-> includes the Binomial and Poisson distribution
Continuous RV: uncountable infinite number of possible values, i.e. can take on any value in one
or more intervals (e.g. students heights, weight) it could be, 1.421m, 1.41222229m,
1.4129999999m
includes the uniform, normal, exponential, Student-t distrubtion, Chi-squared and Fdistribtuion
some specific continuous RVs defined by their probability density functions
1. f(x) is non-negative and
2. total are under the f(x) curve equals one

Sampling distributions and the Central Limit Theorem.


The normal distrubtion density function ranges from <x<.

2
The Standard Normal distribution denoted Z, has = 0 and = 1.
We can transform X into a Z: by Z=(X-ux)/x

The Central Limit Theorem: The sampling distribution of the sample mean x-bar is
approximately normal for a sufficiently large sample size n. As n gets larger, the sampling
distribution more closely resembles a Normal.
We also know that:
1. Mean of sampling distribution x bar equals mean of population from which sample is
drawn. I.e E(x)= x = (unbiased estimator)
2. Variance of a sampling distribution of x equals variance of population divided by the

sample size n, i.e. Var(x)= x2


(consistent)
n

3. Power of the test (probabilityto reject Null when it is false) equals 1 - . (depends on ,
n, population parameter).

How to conduct an hypothesis test generally - the six steps.


1. Set up null and alternative hypothesis
2. Determine the appropriate test statistic and its sampling distribution
3. Specify the value of a (significance level)- the probability of Type I error
4. Define the decision rule, i.e for what value of the test statistic will we reject null.
5. Calculate the test statistic.
6. Reject or do not reject null and interpret the result in light of the question in words.

-p-value: the smallest value of a that would lead to rejection of the null hypothesis.
It is also a probability statement about the test statistic. The p-value of a calculated test statistic is
the probability that we would observe a test statistic at least as extreme if the Null hypothesis
were true.
p-value <0.01 overwhelming evidence to reject null, 0.01<p<0.05 strong evidence to
reject null, 0.05<p<0.10 weak evidence, p>0.10 no evidence
How to conduct a test of the mean of a population.
Null: u=100
Alternative: u>100

Test statistic is t-statistic
s/ n
T-statistic has a Student t distribution
The t-statistic is only t distributed if the underlying population for X is normally distributed, or
close to it, can
be checked using a histogram. T distribution has (n-1) degrees of freedom.
Lecture 3
What You Need to Know
How to construct the Classical Means test of the difference in population means.
Best or most appropriate test depnds on:
1.Types of data:
Quantitative/interval, ordinal/ranked, nominal/categorical
2.Distribution of data is it normally distributed or not.
3.The type of samples, are they independent or matched pairs
The classical means test requires that the population the samples are drawn from are normally
distributed or nearly so.
Check normality by plotting the histogram of the population distribution
Conditions under which this test is valid: quantitative (interval) data, normally distributed,
independent samples.
A research is taken about the population of people who talk on the phone while driving, and a
population who dont talk while driving.
Two populations with two means.
One independent sample was drawn from each population.
1 2 = D , where D is chosen value. D for difference.
2
The three cases for this test: depend on knowledge of the s.
Difference x1 x2 is normally distributed if two populations are normally
distributed, or if samples are large enough, approximately normal.
In terms of testing the difference between population means, there are three cases to observe to
determine the appropriate test statistic.

Known variances (this is unlikely), unequal variances, equal variances tests.


Case 1: known population variances, both populations are normally distributed or samples (n1
and n2) are very large then we use the z-score: it is distributed as a Standard Normal.
Case 2: Unequal unknown population variances, both populations are normally distributed (or
nearly so) then we use the t-statistic which has a Student t-distribution
Case 3: equal unknown population variances, both populations are normally distributed (or nearly
so), we use t-statistic which is also t-distributed with v=n1+n2-2, it uses a pooled estimate of
common population variance. Pooling under Case 3 often leads to more precise estimates of the
variance of the difference of the means, so we prefer case 3 if it is appropriate.
We can assume 12 22 if we check s1 and s2 are close (e.g. from EVIEWS)
The sample variance s2 is an unbiased and consistent estimator of the population variance.
We divide by (n-1) rather than n to ensure estimator is unbiased. If we repeatedly drew random
of size n from a Normal population with variance 2, we would get a distribution for
samples
s2 that was positive and skewed to the right.

How to construct a test of equality of population variances.


Testing 12 22
To construct this test we use the ratio of our sample estimates of the two population variances (
s12 and s22 ).
The ratio of two independent Chi-square distributions, are divided by their respective degrees of

freedom, which is distributed as an F distribution with v1 and v2 degrees of freedom.

The Chi-square statistic is distributed with (n-1) degrees of freedom. Because the F-distribution
only takes positive values, we define the critical value so that the area to the left under the curve
1
is A and F1-A,v1,v2 F1A,v ,v
. If we are doing a right tail-test then we just use the critical
1 2
FA,v 2 ,v1
value of FA,v1,v2
For a two-tail test, where:
H0: 12 /
22 1
H1 : 12 /22 1

To determine the rejection region:


The upper is Fa/2,v1,v2
The lower is F1-A/2,v1,v2
NB: is the population mean distribution, the result of drawing samples from the population
Lecture 4
What You Need to Know
How to test for a difference in population locations using sample data from a matched pairs
experiment.

A matched pairs experiment is a feature of the samples taken to conduct tests. We sometimes get
to choose a match pairs sample instead of independent samples.
By using matched pairs we can remove a source of variation if we are not interested in it.
We may also be able to construct a more powerful test. The source of variation is reduced
because instead of using one sample of people to measure their right hand size, and another to
measure their left hand size, we can measure both hands of one person and construct a test of the
mean of differences.
With matched pairs we test the mean of the population of differences D rather than the
difference in the means of two populations 1 2 .

The conditions under which this test is valid.


1. The sample of data of differences must be normally distributed for the matched pairs t-test
to be valid, so check with a histogram.
If they are non-normal, we must use non-parametric techniques.
The benefit of matched pairs samples over independent samples.
The good thing about the matched pairs experiment is that there is less variability, because there
are more participants the more variability (e.g. in how long it takes to eat). Less variable outcome
often let us extract the underlying differences more precisely.
How to construct a test for the difference between population proportions with independent
samples.
Proportions involve categorical (qualitative, nominal) data. E.g. are students or retired people
(two different categories) more likely to watch Oprah.
With population proportions we test the
Null hypothesis: H0: p1-p2=D
Alternative hypothesis:
-> HA: p1-p2 D
-> HA: p1-p2>D
-> HA: p1-p2<D

We need to determine the appropriate test statistic. There are two cases.
Case 1: D=0
Case 2: D 0
Z is distributed as Standard Normals, so long as the sample sizes are large enough.


n1 p1 5,n1(1 p1) 5,n2 p 2 5,n2 (1 p 2 ) 5 (four np, nq greater than or equal to 5)

The null hypothesis of a difference in the proportions (Case 2: D 0, the difference does not
equal to zero) tell us that the variances must also be different.

Thus we cannot construct a test statistic using a pooled variance assumption like in Case 1
where our null hypothesis is that the proportions are equal (D=0).
Lecture 5
What You Need to Know
How to construct the Wilcoxon Rank Sum test.
1. We rank observations in both samples from smallest rank to largest.
2. The null hypothesis is that the population locations are the same.
3. We take independent samples of size n from each population.
E.g n=3
Sample from POP A: 3, 11, 13
Sample from POP B: 5, 10, 8
4 . We rank all six observations from smallest to largest (we average ranks if equal
observations values), THEN construct sum of ranks of each sample.
5. We will always choose T=TA (Rank sum of the sample from first population)
If the null hypothesis is true and the population locations are the same then each every
possible ranking of the data is equally likely (random samples).
E.g. if total of 20 possibilities (combos, e.g. sample 1,2,4 taken from population) then the
possibilities of each ranking occurring is one out of twenty or 5%.
This test is for both ranked (ordinal) data and highly non-normal
quantitative data - from independent samples only.
The test is valid when using independent samples.
Exact critical values provided for sample sizes 10 or less.
If either sample is larger than 10, use standard normal approximation.
The critical value tables telling us TL and TU only cover sample sizes n up to 10.
For larger sample sizes (n>10) we use an approximation for the sampling distribution of T
statistic. The z statistic is a distributed normally as a Standard Normal.
Wilcoxon Rank Sum Test if we reject the null it may not just mean just a different population
location, it may mean a different shape or spread for the population distributions.
We must check that the shapes and spreads of the populations examined are identical. (using the
histograms of the data).
Non-parametric Techniques are distrubtion-free. They do not rely on any assumed distribution
for the data such as the normality requirement, and do not involve testing a particular parameter
of the distribution such as the mean.

Non-parametric techniques is used when the data is non-normal as they are robust (strong
against) to non-normality.
Ranked or ordinal data is not normally distributed as it is not quantitative and the intervals for the
ranks are arbitrary.
In EViews the test is under medians
EViews will always give the positive of the statistic even if it should be negative, be careful
when reporting.
EViews probability value is for a two-tailed test.
Rank sum= mean x count (sample size or n)
Wilcoxon Rank Sum?-> test for equality of Medians between series
Lecture Six: Matched Pairs of Ranked (Ordinal) and Non-normal Data
How to construct and interpret the Sign Test and the Wilcoxon Signed Rank Sum Test.
Sign Test- for ranked (ordinal) data
Wilcoxon Signed Rank Sum Test- for non-normal quantitative data

Sign Test
Requirements:
- matched pairs sample
- It is valid for ranked (ordinal) and quantitative data, but we have more powerful tests for
quantitative data
- It is a test of population location like the Wilcoxon Rank Sum Test, so we need the shape
and spread of the data to be in equal in the two populations (using histograms)
1. Calculate the difference for each matched pair between rating of their new cola (sample 1)
and competitors cola (sample 2)
2. Count the number of positive differences, and the number of negative differences.
3. Eliminate (disregard) the zero differences, so the sample size n is the number of nonzero differences. E.g if 14 positives, 5 negatives and 6 zero differences then n or
sample size equals (14+5)=19
4. We use the number of positives as our test statistic x.
Under the null hypothesis, x is distributed as a binomial, with probability of value equal
to a half (p=0.5- an equal chance of positive or negative)
So, the expected number of positive differences x in n draws I 0.5 times n, if H0 is true.
For n 10 the binomial approximates a normal distribution.
The z statistic is distributed approximately as a Standard Normal.
If our level of significance is like 0.05, and our p-value is about 0.05 (say 0.056), then we
have insufficient information to reject the null.

Eviews constructed test statistic for sign test is very different to the way we do by hand.
However we use the positive and negative counts at the bottom to construct our test.

The conditions under which these tests are appropriate. E.g. (obs>0.00000) and
(obs<0.00000)
Wilcoxon Signed Rank Sum Test
Is for matched pairs samples of quantitative data where the matched pair differences are
highly non-Normal.
A good thing is it uses rankings of the differences in matched pairs, plus the sign of the
difference. We lose less information this way when constructing (?)
1. calculate differences for each matched pair, and eliminate zero differences.
2. Rank the absolute value of these differences, where n equals number of non-zero
differences. (same with earlier ranking, if there are equal absolute differences then use
average of ranks)
3. Sum the ranks of positive differences and ranks of negative differences.
4. We use the rank sum of positive differences as our test statistic T=T+.
The null is the population locations are the same, we use the Wilcoxon Signed Rank
Sum Test Statistic T.
In terms of data that is roughly shaped like a normal in the histogram, it is better to
use parametric techniques unless histogram shows clear non-normality because we
lose information by ranking data and not considering original values.

How to choose the correct testing procedure when analysing the location of two
populations.

Type of test
Wilcoxon Rank Sum

Sign Test
Wilcoxon Signed
Rank Sum

Test statistic
T is the rank sum of
the sample drawn
from the first
population
x is number of
positive differences
Rank sum of positive
differences, T=T+.

n
Independent samples
of size n
Number of non-zero
differences
Number of non-zero
differences

ranking
Rank all observations
from both samples,
and add them up to
construct rank sum.
-Rank the absolute
value of differences

Lecture 7
How to construct the ANOVA test for any difference in means between two or more
populations.
General Test intuition: compare differences between samples with differences within samples.
If bigger variation between sample relative to variation of responses within samples, then we
suspect means of populations are not the same. That is if within variation is small relative to
between variation, we are more confident to say the means are different.
Null and alternative hypothesis for ANOVA is always:
H0 : u1=u2= =uk (all population means are equal)
HA: At least two population means differ
Test statistic: (4 steps)
1. calculate Sum of squares for treatments (SST)- this is the variation between samples. It
will equal zero if all the sample means are the same.
2. Calculate Sum of squares for error (SSE)- this is the overall variation within samples. It is
the combined or pooled variation of the k samples. K is the number of populations or
treatments.
3. Calculate the mean square for treatments (MST) and mean square for errors (MSE).
4. Calculate the test statistic F =MST/MSE
It is the ratio of the between sample variance to the within sample variance.
The test statistic F is F-distributed with (k-1) and (n-k) degrees of freedom.

We reject he null if test statistic F> Fa,k-1,n-k


For testing to or more population locations:
1.Single factor (one way) ANOVA for independent samples
2. Kruskal-Wallis Test- non-parametric
3. other tests
Requirements of single factor one way version:
1. normally distributed random variables
2. equal population variances
3. samples drawn independently from each population.
Why it can be useful instead of doing multiple t tests of all pairs.
ANOVA test tells use if any of the population means differ, not which ones.
We could instead do all individual pairwise test of population mean equality using the
classical means t-test.
i.e. test u1=u2 then test u1=u3 then u2=u3
If three treatments, the probability that we wrongly reject the null at least once when doing
the 3 separate t-tests is 1-(0.95)3=0.14=14%. If four treatments this goes up to 26%.

You might also like