Hypothesis Testing: DR Neeraj Kaushik
Hypothesis Testing: DR Neeraj Kaushik
Hypothesis Testing: DR Neeraj Kaushik
Dr Neeraj Kaushik
The Technological Institute of Textile & Sciences Bhiwani
Basic Concepts
Null Hypothesis & Alternate Hypothesis Type I & II error Level of Significance Degree of Freedom One tail & Two tail testing
Split File
Split File splits the data file into separate groups for analysis based on the values of one or more grouping variables. If we select multiple grouping variables, cases are grouped by each variable within categories of the prior variable on the Groups Based On list. We can specify up to eight grouping variables. For example, if we select gender as the first grouping variable and minority as the second grouping variable, cases will be grouped by minority classification within each gender category.
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
Hypothesis Testing
Hypothesis Tests
Independent samples test Related samples tests
Two Related Sample test Several Related Sample test
1 sample test
Hypothesis Testing
Parametric Tests Conditions
Population knowledge Data Normally distributed Not applicable for Nominal & Ordinal data
Hypothesis Testing
Parametric Tests
T-test One Sample t-test Independent samples t-test Paired Samples t-test Z-test F-test
Analysis of Variance
Hypothesis Testing
Multivariate Techniques Dependence Techniques
1 Dependent Variable
ANOVA on Regression Discriminant
MANOVA
Hypothesis Testing
Multivariate Techniques Inter Dependence Techniques
Focus on Variable ANOVA on Factor Analysis Focus on Objects ANOVA on Cluster Analysis
Hypothesis Testing
Decision Rule:
If Calculated value < Table Value (Maximum Permissible Difference for given DOF and LOS) then the differences are because of chance. (H0 not rejected) If p-value < 0.05 then a significant relation exists between the dependent & independent variable i.e. it is because of some assignable cause (H0 rejected) If p-value > 0.05 then a no significant relation exists between the dependent & independent variable i.e. it is because of chance only (H0 not rejected).
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
Nominal Measurement
2 Category 1 Sample 2 Independent Sample 2 Related Samples Binomial Chi-square McNemar test Chi-square Chi-square >2 Categories
Chi-square
Cochrans test
Chi-square
Chi-square
Ordinal Measurement
Central Tendency 2 Independent Sample 2 Related Samples Several Independent Sample Related Sample U test, Median test McNemar test Kruskal Wallis H test Friedman Signed test
Metric Measurement
Parametric 1 Sample 2 Independent Sample 2 Related Samples Several Independent Sample t / z test t / z test t / z test ANOVA Non parametric Chi square test U test, KS z test Wilcoxon matchedpairs test Kruskal-Wallis oneway ANOVA
Related Sample
ANOVA
The One-Sample Kolmogorov-Smirnov procedure is used to test the null hypothesis that a sample comes from a particular distribution. It does this by finding the largest difference (in absolute value) between two cumulative distribution functions (CDFs)--one computed directly from the data; the other, from mathematical theory.
Four theoretical distribution functions are available-normal, uniform, Poisson, and exponential
The probability of the Z statistic below 0.05, implies that the required distribution is not a good fit.
P value <0.05 Not Good fit of that distribution
The Runs Test procedure tests whether the order of occurrence of two values of a variable is random. A run is a sequence of like observations. A sample with too many or too few runs suggests that the sample is not random. Runs Test procedure can be used to test whether the order of values of a variable is random.
The procedure first classifies each value of the variable as falling above or below a cut point and then tests to ensure that there is no order to the resulting sequence. The cut point is based either on a measure of central tendency (mean, median, or mode) or a custom value.
An e-commerce firm enlisted beta testers to browse and then rate their new Web site. Ratings were recorded as soon as each tester finished browsing. The team is concerned that ratings may be related to the amount of time spent browsing. Use the Runs Test to test the hypothesis that time spent browsing is correlated with site rating. Data File Large significance values (>.05) indicate that the data are randomly ordered.
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
Hypothesis Tests
Concepts & Examples
One sample: Binomial test Two Independent Sample: Chi square test Two Related Sample: McNemar test
Several Independent Sample: Chi square test Several Related Sample: Cochrans test
1. Binomial test
The Binomial Test procedure compares the observed frequencies of the two categories of a dichotomous variable to the frequencies expected under a binomial distribution with a specified probability parameter. By default, the probability parameter for both groups is 0.5.
A telecommunications firm loses about 27% of its customers to churn each month. In order to properly focus churn reduction efforts, management wants to know if this percentage varies across predefined customer groups. Use the binomial test to determine whether a single rate of churn adequately describes the four major customer types. Data File (Split file on the basis of customers)
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
The Chi-Square Test procedure tabulates a variable into categories and computes a chi-square statistic. This goodness-of-fit test compares the observed and expected frequencies in each category to test either that all categories contain the same proportion of values or that each category contains a user-specified proportion of values.
A large hospital schedules discharge support staff assuming that patients leave the hospital at a fairly constant rate throughout the week. However, because of increasing complaints of staff shortages, the hospital administration wants to determine whether the number of discharges varies by the day of the week. Use Chi-Square Test to test the assumption that patients leave the hospital at a constant rate. Data file1 (Weigh case by Avg Daily Discharge) Data file2 (Two Independent samples)
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
3. McNemar test
The McNemar method tests the null hypothesis that binary responses are unchanged.
The McNemar test focuses on change from one condition or one sample response to another.
The McNemar test is designed to test the null hypothesis of no change over time or between two related samples. Unlike the Wilcoxon test, the McNemar test is designed for use with nominal or ordinal test variables.
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
A grocery store manager wants to increase sales of the store-brand detergent. She puts together an instore promotion and talks with customers at checkout. Use the McNemar test to determine if the in-store advertisement changed her customers' buying behavior. Data file A small significance level (<.05) indicates that the two groups are not equal
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
4. Cochrans Q test
The Cochran Q procedure tests the null hypothesis that multiple related proportions are the same. The Cochran test is a multivariate extension of the McNemar test used for two related samples. Cochran Q test as an extension of the McNemar test used to assess change over two times or two matched samples. Unlike the Friedman test, the Cochran test is designed for use with binary variables
An online retailer has created a new Web store. Five users are invited. Each is asked to perform six tasks on the site, all of which are designed to be equally easy. Use Cochran Q procedure to test the null hypothesis that multiple related proportions are the same. Data File1 Small significance values (<.05) indicate that at least one of the variables differs from the others.
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
Mann-Whitney test is used when we want to test for differences between two groups but we are testing an ordinal variable or weve a scale variable that in some other ways does not conform to the assumptions of the independent-samples t test. Their advantage over the independent-samples t test is that Mann-Whitney and Wilcoxon do not assume normality and can be used to test ordinal variables.
Since the test variables are assumed to be ordinal, the Mann-Whitney test is based on ranks of the original values and not on the values themselves. The Mann-Whitney test assume that the variable you are testing is at least ordinal and that its distribution is similar in both groups.
Physicians randomly assigned female stroke patients to receive only physical therapy or physical therapy combined with emotional therapy. Three months after the treatments, the MannWhitney test is used to compare each group's ability to perform common activities of daily life. Use the Mann-Whitney test to determine whether the two groups' abilities differ. Data File
2. Median test
The median method tests the null hypothesis that two or more independent samples have the same median. It assumes nothing about the distribution of the test variable, making it a good choice when we suspect that the distribution varies by group. The null hypothesis for the median test is that median value is a good approximation of center for each of the various groups.
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
2. Median test
To test this hypothesis, each group is divided into two subgroups: those whose scores fall at or below the median, and those whose scores are above it. The median test is a chi-square test of independence between group membership and the proportion of cases above and below the median.
A sales manager evaluates two new training courses. Sixty employees, divided into three groups, all receive standard training. In addition, group 2 receives technical training, and group 3 receives a hands-on tutorial. Each employee was tested at the end of the training course and their score recorded. Use the median test to assess the difference in performance between the three groups, if any. Data File Significance levels less than .05 indicate that the groups differ.
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
The Kruskal-Wallis test is a popular nonparametric alternative to the standard one-way analysis of variance. It is appropriate when the test variable is ordinal or its distribution does not meet the assumptions of standard ANOVA. The only assumptions made by the test are that the test variable is at least ordinal and that its distribution is similar in all groups.
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
When the test variable is ordinal, the mean is not a valid estimate because the distances between the values are arbitrary. Even if the mean is valid, the distribution of the test variable may be so non-normal that it makes you suspicious of any test that assumes normality. The Kruskal-Wallis test is a One-way analysis of variance by ranks. It tests the null hypothesis that multiple independent samples come from the same population.
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
Agricultural researchers are studying the effect of color on the taste of crops. Strawberries grown in red, blue, and black color were rated by tastetesters on an ordinal scale of one to five (far below to far above average). Use the Kruskal-Wallis test to determine if taste varies by color. Data File Significance levels below .05 indicate that the group locations differ.
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
4. Friedman test
The Friedman test is a nonparametric alternative to the repeated measures analysis of variance. The only assumptions made by the Friedman test are that the test variables are at least ordinal and that their distributions are reasonably similar.
4. Friedman test
The Friedman procedure tests the null hypothesis that multiple ordinal responses come from the same population. The data may come from repeated measures of a single sample or from the same measure from multiple matched samples.
The Friedman test ranks the scores in each row of the data file independently of every other row.
An insurance group is evaluating four health care plans for small employers. Twelve employers are recruited to rank the plans about how much they would prefer to offer them to their employees. Use the Friedman test to determine if the plans are of equal preference. Datafile2 (Finance)
Two Independent Sample: Mann Whitney U test Two Independent Sample: Kolmogorov-Smirnovtest
Two Related Sample: Wilcoxon matched-pairs test Several Independent Sample: Kruskal-Wallis oneway ANOVA Several Related Sample: Friedman Two way ANOVA
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
Kolmogorov-Smirnov test checks the null hypothesis that two samples have the same distribution. The test variable is assumed to be continuous; however, its cumulative distribution function (CDF) can assume any shape at all.
It's a very flexible test because no specific shape is assumed for the underlying distribution.
A grain processor has two corn crops with aflatoxin levels below 20 parts per billion and are safe for human consumption. However, because aflatoxin varies widely across yields, he wants to compare the levels of aflatoxin in each of the yields. Use the two-sample Kolmogorov-Smirnov test to determine whether the distribution of aflatoxin differs significantly between the two safe yields. Data File Small significance values (<.05) indicate the two groups differ.
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
The Wilcoxon signed-ranks method tests the null hypothesis that two related medians are the same.
This test allows us to compare a single median against a known value or paired medians from the same (or matched) sample. The Wilcoxon signed-rank test is a nonparametric alternative to the paired-samples t test.
The only assumptions made by the Wilcoxon test are that the test variable is continuous.
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
An investment firm estimates that the median gain for S&P 500 stocks last year was about 0.078% per day, and it is 0.123% so far this year. Use Wilcoxon test (2 related sample) to see if the median gain for technology stocks is different from the known median for all stocks. Data File Small significance values (<.05) indicate that the two variables differ in distribution.
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
3. Friedman test
Various ratios of five consecutive years are given. Test whether theres significant improvements in ratios over the years. Datafile
Small significance levels (<.05) indicate that at least one of the variables differs from the others.
Parametric Tests
An analyst at a department store wants to evaluate a recent credit card promotion. To this end, 500 cardholders were randomly selected. Half received an ad promoting a reduced interest rate on purchases made over the next three months, and Half received a standard seasonal ad. Use Independent-Samples T Test to compare the spending of the two groups.
Datafile
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
Parametric Tests
Use ANOVA for checking whether dependent variables (Factors obtained from Factor Analysis) have any relationship with demographic variables. Dep var: Factors (Tangibility, Empthy etc) Indep var: Age, Income, Gender etc.
Datafile2
Essential Concepts
Sort Cases Transpose Restructure (Cases to Variables or Variable to Cases) Merge Files Weigh cases Split File
Weight Cases
Weight Cases gives cases different weights (by simulated replication) for statistical analysis.
The values of the weighting variable should indicate the number of observations represented by single cases in your data file.
Cases with zero, negative, or missing values for the weighting variable are excluded from analysis. Fractional values are valid; they are used exactly where this is meaningful, and most likely where cases are tabulated.
Hypothesis Testing: Dr Neeraj Kaushik, TIT&S Bhiwani
Presentation by: