Week 1 To 3 Lectures Q A
Week 1 To 3 Lectures Q A
SHAFIQ UR REHMAN
LECTURER (STATISTICS)
DEPARTMENT OF MATHEMATICS & STATISTICS
UNIVERSITY OF CENTRAL PUNJAB, LAHORE.
INTRO. TO TESTING OF HYPOTHESIS
Hypothesis testing is a statistical method used to evaluate the validity of a hypothesis, which is a statement or a claim about a population
parameter or a relationship between variables. The process of hypothesis testing involves comparing the results of a sample to the expected
results, assuming the null hypothesis is true, in order to determine whether or not there is enough evidence to reject the null hypothesis in favor of
an alternative hypothesis.
Some Pros and Cons (Prospects and Contradictions):
1. Objectivity: Hypothesis testing provides a systematic and objective way of analyzing data, which helps to minimize personal bias and
subjectivity.
2. Scientific validity: Hypothesis testing allows researchers to determine the statistical significance of their findings and to draw conclusions
based on empirical evidence, rather than intuition or speculation.
3. Replicability: Hypothesis testing provides a framework for designing studies that can be replicated by other researchers, which helps to
establish the validity and reliability of the findings.
1. Simplification: Hypothesis testing often requires making assumptions and simplifications about complex phenomena, which can
oversimplify reality and lead to misleading conclusions.
2. Limitations: Hypothesis testing has limitations in terms of the types of questions it can answer and the extent to which it can capture the
complexity of real-world phenomena.
3. Misinterpretation: Hypothesis testing can be misinterpreted or misused, particularly when results are overgeneralized or when statistical
significance is equated with practical significance.
WHAT IS HYPOTHESIS?
A Hypothesis is a proposed explanation or prediction for a phenomenon that can be tested through scientific investigation. It is an educated guess
or a tentative statement that is based on existing knowledge or observations and is used to guide further research.
A Statistical Hypothesis is a type of hypothesis that makes a statement or prediction about the distribution of a population parameter, based on
sample data. It is a formal statement about a population parameter, such as a mean(s) or a proportion(s) or about the variance(s), that is tested
using statistical methods.
TYPES
There are two types of statistical hypotheses: the null hypothesis and the alternative hypothesis.
1. Null Hypothesis: The null hypothesis states that there is no significant difference between the observed data and what would be expected
under a certain assumption or theory.
2. Alternative Hypothesis: The alternative hypothesis, on the other hand, states that there is a significant difference or relationship between the
observed data and what would be expected under the null hypothesis.
The process of hypothesis testing involves formulating a null and alternative hypothesis, collecting data, and then using statistical methods to
determine whether the observed data supports the null hypothesis or provides evidence in favor of the alternative hypothesis. The results of
hypothesis testing can provide insight into the validity of a theory or explanation for a phenomenon, and can guide further research and
investigation.
TESTING OF HYPOTHESIS
Hypothesis testing is a statistical technique that is used to make decisions about a population based on a sample. It involves testing a hypothesis or
claim about a population parameter using sample data.
1. Formulating a null hypothesis (H0) and an alternative hypothesis (Ha): The null hypothesis is the assumption that there is no significant
difference between the population parameter and the value specified in the hypothesis. The alternative hypothesis is the opposite of the null
hypothesis and represents the alternative claim that we are trying to support with the data.
2. Setting a significance level (α): The significance level is the maximum probability of making a Type I error, which is the probability of
rejecting the null hypothesis when it is actually true. The most commonly used significance level is 0.05, which corresponds to a 5% chance
of making a Type I error.
3. Determining the appropriate test statistic: The test statistic is a quantity that measures the deviation of the sample data from the null
hypothesis. The choice of the test statistic depends on the type of hypothesis being tested, and the sample size.
4. Collecting the data and calculating the test statistic: The test statistic is calculated using the sample data and the appropriate formula.
5. Calculating the Critical Value (Table value) or p-value: The critical value is used to compare the test statistic value and to decide whether
to accept or reject the null hypothesis. The p-value is the probability of obtaining a test statistic as extreme or more extreme than the one
observed, assuming that the null hypothesis is true.
TESTING OF HYPOTHESIS
Hypothesis testing is a statistical technique that is used to make decisions about a population based on a sample. It involves testing a hypothesis or
claim about a population parameter using sample data.
5. Calculating the Critical Value (Table value) or p-value: The critical value is used to compare the test statistic value and to decide whether
to accept or reject the null hypothesis. The p-value is the probability of obtaining a test statistic as extreme or more extreme than the one
observed, assuming that the null hypothesis is true.
6. Drawing conclusions: Based on the results of the hypothesis test, we can draw conclusions about the population parameter and the validity
of the original claim.
Hypothesis testing is an important tool in statistics and is used in many fields to make informed decisions based on the data.
TWO TYPES OF ERRORS IN HYPOTHESIS TESTING
It is worthwhile to note/know that when a hypothesis is tested, there are four possibilities:
Of these four possibilities, the first two lead to an erroneous decision. The first possibility leads to a Type I error and the second possibility leads
to a Type II error. This can be shown as follows:
In any hypothesis testing, there is a risk of committing Type I and Type II errors. In case we are interested in reducing the risk of committing a
Type I error, we should reduce the size of the rejection region or level of significance, indicated in Table by 𝛼. When 𝛼 = 0.10, it means that a true
hypothesis will be accepted in 90 out of every 100 occasions. Thus, there is a risk of rejecting a true hypothesis in 10 out of every 100 occasions.
To reduce this risk, we may choose 𝛼 = 0.01, which implies that we are prepared to take 1 per cent risk. That is, the probability of rejecting a true
hypothesis is merely 1 per cent instead of 10 per cent as in the previous case.
ASSUMPTIONS (Z-TEST or t-TEST)
The assumptions that helps to decide whether to use a z-test or t-test for hypothesis testing.
For Z-Test:
1. When the population standard deviation is known, and the sample size is large (typically greater than 30).
2. Population standard deviation is unknown, and the sample size is large (typically greater than 30).
i. Random sampling: The sample should be selected randomly from the population.
ii. Independence: The observations in the sample should be independent of each other.
iii. Normality: The population from which the sample is taken should be normally distributed.
iv. Population standard deviation: The population standard deviation is unknown and sample standard deviation is used as a replicate.
3. Non-normal population, population standard deviation is unknown, and the sample size is large (typically greater than 30).
ASSUMPTIONS (Z-TEST or t-TEST)
The assumptions that helps to decide whether to use a z-test or t-test for hypothesis testing.
For t-Test:
On the other hand, when the population standard deviation is unknown or the sample size is small (typically less than 30), a t-test is used.
i. Random sampling: The sample should be selected randomly from the population.
ii. Independence: The observations in the sample should be independent of each other.
iii. Normality: The population from which the sample is taken should be normally distributed.
iv. Population standard deviation: The population standard deviation is not known.
v. Sample size: The sample size is small (typically less than 30).
TESTING OF HYPOTHESIS: DIFFERENCE BETWEEN TWO
POPULATION MEANS AND PROPORTIONS
Z-test (For Unequal Variances):
The formulas for hypothesis testing of the difference between two population means and proportions are
1. Hypothesis testing of the difference between two population means:
Assuming that the population standard deviations are known, the test statistic is given by:
where 𝑥1ҧ and 𝑥ҧ2 are the sample means, 𝜎1 and 𝜎2 are the population standard deviations, 𝜎 is the pooled standard deviation of the populations,
and 𝑛1 and 𝑛2 are the sample sizes.
2. Hypothesis testing of the difference between two population proportions:
Assuming that the sample sizes are large enough (typically greater than 30) and the populations are independent, the test statistic is given by:
where 𝑝1Ƹ and 𝑝Ƹ2 are the sample means, 𝑃1 𝑄1 and 𝑃2 𝑄2 are the population variances, 𝑃𝑄 & 𝑝Ƹ 𝑞ො is the pooled variance of the populations, and 𝑛1
and 𝑛2 are the sample sizes.
Example 1:
potential buyer wants to decide which of the two brands of electric bulbs he should buy as he has to buy them in bulk. As a specimen, he buys 100
bulbs of each of the two brands A and B. On using these bulbs, he finds that brand A has a mean life of 1,200 hours with a standard deviation of
50 hours and brand B has a mean life of 1,150 hours with a standard deviation of 40 hours. Do the two brands differ significantly in quality? Use a
= 0.05.
Solution
Let us set up the null hypothesis that the two brands do not differ significantly in quality:
H0 : 𝜇1 = 𝜇2 and an alternative hypothesis:
H1 :𝜇1 ≠ 𝜇2
where 𝜇1 = mean life of brand A bulbs, and 𝜇2 = mean life of brand B bulbs.
We now construct the Z statistic.
Solution
Let us set up the null hypothesis that the two brands do not differ significantly in quality:
H0 : 𝑃1 = 𝑃2 and an alternative hypothesis:
H1 :𝑃1 ≠ 𝑃2
where 𝑃1 and 𝑃2 are the proportions of defective components from Pune and Bangalore, respectively. This is a two-tail test. Level of significance
a = 0.05, both sample sizes are large.
where 𝑥1ҧ and 𝑥ҧ2 are the sample means, 𝑆12 and 𝑆22 are the sample variances, 𝑆𝑝 is the pooled standard deviation of the samples, and 𝑛1 and 𝑛2 are
the sample sizes.
2. Hypothesis testing of the difference between two population proportions:
Assuming that the sample sizes are large enough (typically greater than 30) and the populations are independent, the test statistic is given by:
𝑝Ƹ1 − 𝑝Ƹ 2 − (𝑃1 − 𝑃2 )
𝑡 = 𝑤𝑖𝑡ℎ 𝑡ℎ𝑒 𝑑𝑓 𝑣 = 𝑛1 − 𝑛2 − 2
1 1
𝑝Ƹ 𝑞ො 𝑛1 + 𝑛2
where 𝑝Ƹ1 and 𝑝Ƹ2 are the sample means, 𝑝Ƹ1 𝑞ො1 and 𝑝Ƹ2 𝑞ො2 are the sample variances, 𝑝Ƹ 𝑞ො is the pooled variance of the sample, and 𝑛1 and 𝑛2 are the
sample sizes.
TESTING OF HYPOTHESIS: DIFFERENCE BETWEEN TWO
POPULATION MEANS AND PROPORTIONS
For t-test (Unequal Variance):
The formulas for hypothesis testing of the difference between two population means and proportions are
1. Hypothesis testing of the difference between two population means:
Assuming that the population standard deviations are known, the test statistic is given by:
𝑝Ƹ1 − 𝑝Ƹ 2 − (𝑃1 − 𝑃2 )
𝑡 = 𝑤𝑖𝑡ℎ 𝑡ℎ𝑒 𝑑𝑓 𝑣.
𝑝Ƹ1 𝑞ො1 𝑝Ƹ 2 𝑞ො2
𝑛1 + 𝑛2
where 𝑝1Ƹ and 𝑝Ƹ2 are the sample means, 𝑝1Ƹ 𝑞ො1 and 𝑝Ƹ2 𝑞ො2 are the sample variances, 𝑝Ƹ 𝑞ො is the pooled variance of the sample, and 𝑛1 and 𝑛2 are the
sample sizes.
Example: Two Salesmen, A and B, are employed by a company. Recently, it conducted a sample survey yielding the
following data: Salesman A Salesman B
Number of sales 20 22
Average weekly sales (Rs Lakh) 30 25
Standard deviation (Rs lakh) 10 7
Is there any significant difference between the average sales of the two salesmen?
Solution
Let us set up the null hypothesis that the two brands do not differ significantly in quality:
H0 : 𝜇1 = 𝜇2 and an alternative hypothesis:
H1 : 𝜇1 ≠ 𝜇2
20 ∗ 102 + 22 ∗ 72
𝑆𝑝 = = 76.95 = 8.77
20 + 22 − 2
At 𝛼 = 0.05 level of significance, the critical value of t for (20 + 22 - 2) that is, 40 degrees of freedom for a two-tail test is 2.021. As our
calculated t-value, being 1.84, is less than the critical value, it falls in the acceptance region. Thus, the null hypothesis is accepted. In other
words, the average sales by the two salesmen are not significantly different.
Practice Questions