The document discusses steps in hypothesis testing including formulating hypotheses, selecting tests, calculating test statistics, determining critical values, and drawing conclusions. It also discusses parametric vs non-parametric tests and provides examples of t-tests including one-sample, two-sample, and worked examples showing calculations and decisions.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
51 views62 pages
BBA 4 RM Unit 5b
The document discusses steps in hypothesis testing including formulating hypotheses, selecting tests, calculating test statistics, determining critical values, and drawing conclusions. It also discusses parametric vs non-parametric tests and provides examples of t-tests including one-sample, two-sample, and worked examples showing calculations and decisions.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 62
UNIT-5
PART-II STEPS IN HYPOTHESIS TESTING Steps for Hypothesis Testing Formulate H0 and H1 Select Appropriate Test Choose Level of Significance
Calculate Test Statistic TSCAL
Determine Prob Assoc Determine Critical Value
with Test Stat of Test Stat TSCR Determine if TSCR Compare with Level of falls into (Non) Rejection Significance, Region Reject/Do not Reject H0 Draw Research Conclusion PARAMETRIC AND NON-PARAMETRIC TESTS Parametric Tests • Parametric tests are those statistical tests that assume the data approximately follows a normal distribution, amongst other assumptions • Examples include z-test, t-test, ANOVA • Important note — the assumption is that the data of the whole population follows a normal distribution, not the sample data. Assumptions in Parametric Tests Parametric tests have a few assumptions that need to be met by the data: • Normality — the sample data come from a population that approximately follows a normal distribution • Homogeneity of variance — the sample data come from a population with the same variance • Independence — the sample data consists of independent observations and are sampled randomly • Outliers — the sample data don’t contain any extreme outliers Non-Parametric Tests • Nonparametric tests are those statistical tests that don’t assume anything about the distribution followed by the data, • Hence also known as distribution free tests • Examples include Chi-square, Mann-Whitney U etc • Nonparametric tests are based on the ranks held by different data p • Every statistical test has a test statistic which helps us determine whether to reject or not reject the null hypothesis. • In the case of the t-test, the test statistic is known as the t- statistic. • In the case of the z-test, the test statistic is known as the z-statistic ….and so on T-TESTS T-tests(Student’s T Test) • A t-test (also known as Student's t-test) is a tool for evaluating the means of one or two populations using hypothesis testing. • A t-test may be used to evaluate whether a single group differs from a known value (a one-sample t-test), whether two groups differ from each other (an independent two-sample t-test), or whether there is a significant difference in paired measurements (a paired, or dependent samples t-test) How do we choose between a z-test and a t- test? • By looking at the sample size and population variance. • If the population variance is known and the sample size is large (greater than or equal to 30) — we choose a z-test • If the population variance is known and the sample size is small (less than 30) — we can perform either a z-test or a t-test • If the population variance is not known and the sample size is small — we choose a t-test • If the population variance is not known and the sample size is large — we choose a t-test One-sample T-test • The one-sample t-test is a statistical hypothesis test used to determine whether an unknown population mean is different from a specific value. • For the one-sample t-test, we need one variable. • We also have an idea, or hypothesis, that the mean of the population has some value. Examples:
•A hospital has a random sample of cholesterol
measurements for men. They were not taking any medications for high cholesterol. The hospital wants to know if the unknown mean cholesterol for patients is different from a goal level of 200 mg. • We measure the grams of protein for a sample of energy bars. The label claims that the bars have 20 grams of protein. We want to know if the labels are correct or not. • The average height of women in India was recorded to be 158.5cm. Is the average height of women in India today greater than 158.5cm? Energy Bar - Grams of Protein
20.70 27.46 22.15 19.85 21.29 24.75
20.75 22.91 25.34 20.33 21.54 21.08
22.14 19.56 21.10 18.04 24.12 19.95
19.72 18.28 16.26 17.46 20.53 22.12
25.06 22.44 19.08 19.88 21.39 22.33 25.79
• Some bars have less than 20 grams of protein. Other bars have more. You might think that the data support the idea that the labels are correct. Others might disagree. The statistical test provides a sound method to make a decision, so that everyone makes the same decision on the same set of data values. Checking the data Is the t-test an appropriate method to test that the energy bars have 20 grams of protein ? The list below checks the requirements for the test. • The data values are independent. The grams of protein in one energy bar do not depend on the grams in any other energy bar. An example of dependent values would be if you collected energy bars from a single production lot. A sample from a single lot is representative of that lot, not energy bars in general. • The data values are grams of protein. The measurements are continuous. • We assume the population from which we are collecting our sample is normally distributed. • We decide that the t-test is an appropriate method. Value of the Test Statistic Worked Example • The average height of women in India was recorded to be 158.5cm. To verify this claim 25 Indian women were studied. The mean height so found was 162cm and sample standard deviation was 2.4cm. Is the average height of women in India today greater than 158.5cm? • To test this hypothesis you asked 25 women their height. • Formulate Hypothesis Supporting Data/Test Data • The significance level is 0.05. • The sample mean is 162cm and sample standard deviation is 2.4cm. • Since the sample size is 25, the degrees of freedom will be 24 (25–1). Decide the type of test • Since you are comparing a singular sample mean with a singular population mean (standard value) and the sample size is 25(<30), this will be a one-sample t-test. • Since the hypothesis has a direction — the average sample height is greater than the average population height — this will be a one-tailed test.(right or left?) Calculate the test statistic • So the t-statistic in our case will be Compare with Critical Value & Make Decision • Next we need to look up the critical value of the t- distribution where alpha is 0.05 and the degrees of freedom are 24 in the table for t-statistic values. • The critical value for our scenario is 1.711. • Our t-statistic is greater than the critical value, so we can reject the null hypothesis and conclude that the mean height of women in India is greater than 158.5 cm Looking up a T-table • https://www.ttable.org/ Two-Sample T Test • The two-sample t-test (also known as the independent samples t-test) is a method used to test whether the unknown population means of two groups are equal or not • For the two-sample t-test, we need two variables. One variable defines the two groups. The second variable is the measurement of interest. • We also have an idea, or hypothesis, that the means of the underlying populations for the two groups are different. • The Two-sample T-test is used when the two small samples (n< 30) are taken from two different populations and compared. Example-1 • We have students who speak English as their first language and students who do not. All students take a reading test. • Our two groups are the native English speakers and the non-native speakers. • Our measurements are the test scores. • Our idea is that the mean test scores for the underlying populations of native and non-native English speakers are not the same. • We want to know if the mean score for the population of native English speakers is different from the people who learned English as a second language. Example-2 • We measure the grams of protein in two different brands of energy bars. • Our two groups are the two brands. Our measurement is the grams of protein for each energy bar. • Our idea is that the mean grams of protein for the underlying populations for the two brands may be different. • We want to know if the mean grams of protein for the two brands of energy bars is different or not. Two-sample t-test assumptions • Data values must be independent. Measurements for one observation do not affect measurements for any other observation. • Data in each group must be obtained via a random sample from the population. • Data in each group are normally distributed. • Data values are continuous. • The variances for the two independent groups are equal. Hypothesis • Null hypothesis- H0: µ1 = µ2 • Alternative hypothesis: • µ1 ≠ µ2 (Two-tailed test) • µ1 < µ2 (left-tailed) • µ1 > µ2 (Right-tailed) The Value of the T Statistic
• The denominator used in calculating the t-statistic is
known as the pooled variance • If the sample sizes of both groups is different then the formula is
• If the sample sizes of both groups is equal then the t-
statistic formula is: Example-1 • 50 women were enquired about their age and height — 25 women are between 27 and 30 years of age (group A), 25 women are between 37 and 40 years of age (group B). The sample mean and standard deviation for group A are 162cm and 2.4cm respectively. The sample mean and standard deviation for group B are 158.6cm and 3.4cm respectively. Is there a relationship between age and height of women in India? Solution • Hypotheses are —
• Since we are comparing the means of two samples, this
will be a two-sample test. • Since the hypothesis is non-directional, this will be a two- tailed test. • So the t-statistic in our case will be:
• Critical Value of t at 24 degrees of freedom and 0.05 level
of significance for a two- tailed test is (+/-)2.064 • So, we observe that : TScal>Tscrit • So we reject the null hypothesis Example-2 • Apple orchard farm owner wants to compare his two farms to see if there are any weight difference in the apples. From farm A, he randomly collected 15 apples with an average weight of 86 gms, and the standard deviation is 7. From farm B, he collected 10 apples with an average weight of 80 gms and standard deviation of 8. With a 95% confidence level, is there any difference in the farms? Solution • Null Hypothesis (H0) : Mean apple weight of farm A is equal to farm B • Alternative Hypothesis (H1) : Mean apple weight of farm A is not equal to farm B • Since we are comparing the means of two samples, this will be a two-sample test. • Since the hypothesis is non-directional, this will be a two- tailed test. • n1=15 • n2=10 • S12=49 • S22 =64 • X̅ 1 =86 • X̅ 2 = 80 • Significance level: α=0.05 • Degrees of freedom df: 15+10-2= 23 • T-statistic • Calculate critical value • Refer two tailed t table for 23 degrees of freedom, α=0.05 • Calculated t statistic value less than the critical value, hence we fail to reject null hypothesis ( H0). • So, there is no significant difference between mean weights of apples in farm A and farm B. Paired Samples T Test • The dependent samples T-test (or paired samples t-test) is a statistical test that determines whether there is a difference between two dependent groups or samples • The dependent samples T-test, or also known as the dependent t-test, tests whether the mean values of two dependent groups differ significantly from each other. When do we need the dependent t-Test? • We need the paired t-test whenever we survey the same group or sample at two points in time. • For example, we might be interested in whether a rehabilitation program has a positive effect on physical fitness. • Since we can't ask all the people who go to rehab, we use a random sample. • We can then use the paired t-test to infer the population from the sample What are dependent or paired samples? • In dependent samples, these measured values are available in pairs. • The pairs result from repeated measurements. • An example of dependent sampling is when the weight of a group of people is measured at two points in time. • If more than two measurement times are available, ANOVA with repeated measures is used. Examples • Medical example: • We want to check whether a new drug increases memory performance. • We test the memory performance of 40 people before and after they take the medicine. • Technical example: • A screw factory complains about very high downtimes at its 5 production plants. • We must now find out whether a newly introduced power plant has an influence on the downtimes. • For this you compare the downtimes of the 5 plants before and after the introduction of the new power plant. • Social science example: • We want to find out if there is a change between 2019 and 2021 in terms of health consciousness of the Indian population. • The survey will include always asks the same people at regular intervals about the same topics. • We compare the health consciousness of the persons in the year 2019 and 2021. Research Question In a t-test for dependent samples, the general question is: Is there a statistically significant difference between the mean value of two dependent groups? The questions for the above examples arise as follows: • Does the new drug help to increase memory performance? • Does the newly introduced power plant have an influence on downtimes? • Has the health consciousness of the Indian population changed between 2019 and 2021? Hypotheses
In the case of a t test for dependent samples, the
hypotheses are: • Null hypothesis H0: The mean value of the two dependent groups is equal. • Alternative hypothesis H1: • H1 (two-tailed): μ1 ≠ μ2 (the two population means are not equal) • H1 (left-tailed): μ1 < μ2 (population 1 mean is less than population 2 mean) • H1 (right-tailed): μ1> μ2 (population 1 mean is greater than population 2 mean) Test Statistic
• t = xdiff / (sdiff/√n)
• xdiff: sample mean of the differences
• s: sample standard deviation of the differences • n: sample size (i.e. number of pairs) Paired Samples t-test: Assumptions For the results of a paired samples t-test to be valid, the following assumptions should be met: • The participants should be selected randomly from the population.(Independent drawing of samples) • The differences between the pairs should be approximately normally distributed. • There should be no extreme outliers in the differences. Example • Suppose we want to know whether or not a certain training program is able to increase the maximum vertical jump (in inches) of college basketball players. • To test this, we may recruit a simple random sample of 20 college basketball players and measure each of their max vertical jumps. • Then, we may have each player use the training program for one month and then measure their max vertical jump again at the end of the month. • Step 1: Define the hypotheses. • We will perform the paired samples t-test with the following hypotheses: • H0: μ1 = μ2 (the two population means are equal) • H1: μ1 ≠ μ2 (the two population means are not equal) • Step 2: Calculate the summary data for the differences.
• xdiff: sample mean of the
differences = -0.95 • s: sample standard deviation of the differences = 1.317 • n: sample size (i.e. number of pairs) = 20 • Step 3: Calculate the test statistic t.
• Step 4: Determine the critical value of the test statistic t using standard t table • (α=0.05, 2-tailed, df=?) • Df= n-1= 20-1= 19 • Critical value of t= (+/-)2.093 • Step 5: Draw a conclusion. • Since the calculated t statistic is more than the critical value, we reject the null hypothesis. • We have sufficient evidence to say that the mean max vertical jump of players is different before and after participating in the training program.