8 Biostat

Uploaded by

data2020t1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

36 views22 pages

8 Biostat

Uploaded by

data2020t1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 22

Chapter 8: Comparing More Than Two Means (ANOVA) Introduction Performing a One-Way Analysis of Variance Performing a Nonparametric One-Way Tests Conclusions Problems Introduction When you want to compare means in a study where there are three or more groups, you cannot use multiple t tests. In the old days (even before my time!), if you had three groups (let's call them A, B, and C), you might perform t tests between each pair of means (A versus B, A versus C, and B versus C). With four groups, the situation gets more complicated; you would need six f tests (A versus B, A versus C, A versus D, B versus C, B versus D, and C versus D). Even though no one does multiple t tests anymore, it is important to understand the underlying reason why this is not statistically sound. Suppose you are comparing four groups and performing six t tests. Also, suppose that the null hypothesis is true, and all the means come from populations with equal means. If you perform each t test with a set at .05, there is a probability of .95that you will make the correct decision—that is, to fail to reject the null hypothesis in each of the six tests. However, what is the probability that you will reject at least one of the six null hypotheses? To spare you the math, the answer is about .26 (or 26% if that is easier to think about). This is called an "experiment-wise" type I error. Remember, a type I error is when you reject the null hypothesis (claim the samples come from populations with different means—‘“the drug works")—when you shouldn't. So, instead of your chance of reporting a false positive result being .05, it is really .26. To prevent this problem, statisticians came up with a single test, called analysis of variance (abbreviated ANOVA). The null hypothesis is that all the means come from populations with the same mean; the alternative is that there is at least one pair of means that are different. You either reject or fail to reject the null hypothesis, and there is one p-value associated with the test. If you reject the null hypothesis, you can then investi- gate pairwise differences using methods that control the experiment-wise type I error. Performing a One-Way Analysis of Variance Once again, let's start by using data from the SASHELP data set called Heart. This time you want to see if there are differences in the weight for each of the three levels of cholesterol (High, Borderline, and Desirable). You start by choosing the task One-Way ANOVA from the sta- tistics task list. This brings up the following screen:Figure 1: Data Tab for One-Way ANOVA co pata | options | OUTPUT | INFORMATION Fiver: (none) 4ROLES “Dependent variable: (cer) i + weight * categorical variable tem a + Chol Status The data set SASHELP.Heart was selected by clicking the icon to the right of the Data rectangle. The dependent and categorical variables (Weight and Chol_Status, respectively) have also been selected. You may be more familiar with the term independent variable instead of categorical variable. In this con- text, they mean the same thing. Once you have completed the Data screen, click the Options tab to see the following: Figure 2: Options for One-Way ANOVA (top portion)DATA | OPTIONS | OUTPUT | INFORMATION ‘4 HOMOGENEITY OF VARIANCE Test: | Levene . [Welch's variance-weighted ANOVA 4COMPARISONS Comparisons method: | Tukey -] signcancetevet [aos | One of the assumptions for performing an analysis of variance is that the variances in each of the groups are equal. The Levene test is one test that is used to determine if this assumption is reasonable. If this test is significant, you may choose to ignore it if the differences are not too large. (ANOVA is said to be robust to the assumption of equal variance, especially if the sample sizes are similar.) If you want to account for unequal variances, click the box for Welch's variance-weighted ANOVA. Multiple comparisons are methods that we use in order to determine which pairs of means differ. There are several choices for these tests. The default is Tukey, a popular choice. Later in this chapter, you will see another multiple comparison test called SNK (Student-Newman-Keuls). You probably want to leave the significance level at .05. Further down on the Options tab are plot options (Figure 3): You can accept the default plots or request all the plots as shown here. You also have a choice to display the diagnostic plots as a panel (several smaller graphs displayed in a grid) oras individual plots (the selection here). Finally, because the SASHELP.Heart data set has over 5,000 rows, you need to re- move the 5,000-point default limit on plots to have then display correctly. Figure 3: Options for One-Way ANOVA (Bottom Portion) 4FLOTS Display plets Selected plots []B0x plot [Means plot [HLs-mean difference plot diagnostics plot Display ac: Individual plots Ti No imi m number of plot points: It's time to run the procedure. Click the Run icon to produce the tables and graphs. The first section of output displays class-level information. Don't ignore this! Make sure that the number of levels is what you expected (data errors can cause the program to believe there are more levels). Also, pay attention to the number of observations read and used. This is important because any missing values on either the dependent (Weight) or categorical (Chol_Status) variable will result in that observation being omitted from the analysis. A large proportion of missing values in the analysis may lead to bias—subjects with missing values may be different in some way from subjects without missing values (i.e., missing values may not be random).Figure 4: Class-Level Information Eee eneoeee eats [Valen | Chol Status | 3. Boerne Destabe High Number of Observations Read | 5209 Number of Observations Used | 5051 You see three levels for Chol_Status (as expected) and a rela- tively small number of subjects with missing values. It's time to look at your ANOVA table (Figure 5 below): Figure 5: ANOVA Table Deponcent Variable: Weight Source DF | Sumof Squares | Mean Square F Value | Pr>F ‘Mode [2 s2e54.375 | 71482.188 2590 F | Chol_Statue | 2 a2B5L97515 | 2143248752 26.00 «001 DF Type it SS | Mean Square | F Value | Pr>F | | 2 azesa.a7e1s 2143218758 25.90 <.0001 You can look at the F test and p-values in the ANOVA table, but you must remember that you also need to look at the several other parts of the output to determine if the assumptions for the test are satisfied. You will see in the diagnostic tests that follow that the ANOVA assumptions were satisfied, so let's goahead and see what conclusions you can draw from the ANOVA table and the tables that follow. Notice that the model has 2 degrees of freedom (because there were 3 levels of the independent variable). The mean squares for the model and error terms tell you the between-group variance and the within-group variance. The ratio of these two variances, the F value, is 25.90 with a corresponding p- value of less than .0001. A result such as this is often referred to as "highly significant." Remember, the term "significant" means that there is a low probability that one or more of the pairwise differences occurred by chance. It doesn't necessarily mean that the differences are significant in the common usage of the word, that is, important. The next several plots are intended to help you decide if the ANOVA assumptions were satisfied and to graphically show you information about the 3 means and the distribution of scores in each of the 3 groups. Note: The figures shown below were selected from a larger set of plots produced by the one-way ANOVA task. The plot shown in Figure 6 shows the residuals (the differences between the mean of each group and each individual score) in that group. There are actually two residual plots produced by the one-way task. One (not shown) displays the residuals as actual scores (weights in this example). The one selected here displays the residuals as t scores (the number of standard deviations above or below the mean of the group).Both plots look very similar. You also see the predicted values (means of each group) shown on the x-axis. Figure 6: Residual Plot Studentized Residuals by Predicted for Weight 8 44 8 8 a | a fe ° 148 150 152 154 186 Predicted Value One of the assumptions for running a one-way ANOVA is that the errors (the residuals are estimates of these errors) are normally distributed. You have seen Q-Q plots earlier in this book, so you remember that data values that are normally distributed appear as a straight line on a Q-Q plot. The plot shown in Figure 7 shows small deviations from a straight line, but not enough to invalidate the analysis. Figure 7: Q-Q Plot for ResidualsQ-Q Plot of Residuals for Weight The residuals are also displayed as a histogram (see Figure 8): Figure 8: Histogram for Residuals Perzent was ro 75 Ed 25 oo Distribution of Residuels for Weight Notral kere! 40 2 0 2 we BO Resitval ResidualFit Spread Plot for Weight Fit-Mean roo 130 Residual va 160To graphically display the distribution of weights in the 3 groups, the one-way ANOVA task produces a box plot (Figure 9). The line in the center of the box represents the median, and the small diamond represents the mean. Notice that the means, as well as the medians, of the three groups are not very different. Why then were the results so highly significant? The reason is the large (over 5,000) sample size. Large sample sizes give you high power to see even small differences. Figure 9: Box Plot for Weight by Cholesterol Level a 3 20 i i 00 ei = 150- oe 100 a Tr 3590 8 — Prov F <0001 ordertine Desirable High Figure 10 shows the results for Levin's test of homogeneity of variance. Here, the null hypothesis is that the variances are equal. Because the p-value is .2194, you do not reject the null hypothesis of equal variance. Figure 10: Levin's Test for Homogeneity of VarianceLevene's Test for Homogeneity of Weight Variance ANOVA of Squared Deviations trom Group Means Source DF | Sum of Squares | Mean Square | FValue | Pr>F Chol_status | 2 300505 | 2000782 | 1.82 | 02108 Error 5048 e.a13eo| 170180 Figure 11 show the means and standard deviations for the three groups. Figure 11: Group Means and Standard Deviations Level or et Chol Status | N Mean| Std Dev Borderline | 1860 154.315280 | 285082126 Desirable | 1603 148.631218 | 20308330 High 178 | 155.408277 | 28.2387277 Because this is a one-way model, the least square means shown in Figure 12 are equal to the means in the previous figure. In unbalanced models with more than one factor, this may not be the case. Below the table showing the three means, you see p-values for all of the pairwise differences. Each of the three cholesterol groups in the top table in the figure has what is labeled as the LSMEAN Number. In the table of p-values, the LSMEAN number is used to identify the groups. The intersection of any two groups displays the p-value for the difference. For example, group 1 (Borderline) and group 2 (Desirable) show a p-value of less than .0001. The p-value for the difference of Borderline (1) and High (3) is .4869 (not significant). Figure 12: Least Square MeansLeast Squares Means Acjustment for Multiple Comparisons: Tukey Kramer Chol_Status | Weight LSMEAN | LSMEAN Number poraerine | __124.316280 1 Desirable | _148.431210 2 igh )188.408277 3 Least Squares Means for effect Chol_Ststus Pr> |t{ for HO: LSMean(i}=LSMeang) Dependent Variable: Weight iy 1 2| 3 i | Filter Data. This brings up the following: Figure 15: Creating a Filter with a Data Task FILTER "Variable 2: i item) species Comparison: Equal value type: | enter avatue . Enter avalue | Select distinct value o. Logical: (none) You selected Species as the first variable, Equal as the comparison, and Select a distinct value as the Value type. This brings up a list of all the species in the Fish data set. It looks like this: Figure 16: Selecting a Distinct Value for Species“Variable 1:1 20m) } Species Comparison: Equal Value type: Select distinct value Value: Bream Logical: | (none), AND OR Because you want to add Roach and Pike to this list, select OR as your logical operator. This enables you to repeat the filter- ing process adding the other two species to the data set. Finally, on the tab labeled Output, select a name for your output data set (Three_Fish was used in this example), and select which variables you want in the output data set (Species and Weight were selected here). Now, run the task. This is certainly more tedious than simply writing a WHERE clause as you did in Chapter 7, but, by presenting you with lists of species, it helps avoid spelling or syntax errors. It's time to run the Nonparametric One-Way Statistic task. The opening screen looks like this: Figure 17: Opening Screen of the Nonparametric One-Way TaskDATA | OPTIONS | OUTPUT | INFORMATION 4 DATA | WORK.THREE_FISH Priter: (none) + ROLES “Dependent vatlables ce Weight “Classification variable: /sisen) | species Dimissing vatues are a valid level ) ADDITIONAL ROLES The data set Three_ ‘ish is selected, along with Weight as the Dependent variable and Species as the Classification variable. For this example, you are using all the default values except for a request for multiple comparisons that you decided to check (see Figure 18 below): Figure 18: Requesting a Multiple Comparison Test 4 Additional Tests (CJ empincal distribution function tests, including Kolmogorov-Smirnov and Cramer-von Mises tests, of the Kuiper test (for two-sample data) Pairwise multiple comparison analysis (asymptotic only) You are ready to run the analysis. Below are selected portions of the output: Figure 19: Wilcoxon Rank Sums and Kruskal-Wallis ANOVA. TableWilcoxon Scores (Rank Sums) for Variable Weight Classified by Variable Species ‘Sum of | Expected | Std Dev Mean Scores Seore 1880.00 40.470588 224.50 78.206188 | 11.226000 751-50 74.192870 | 44208682 Average scores wore used for ties Kruskal-Wallis Test Chi-square 40.2701 Looking at the results of the Kruskal-Wallis test, you decide that the fish weights are not all equal (p <.0001). Box plots are shown next: Figure 20: Box Plots for Fish Weights Distribution of Wilcoxon Scores for Weight F Pr chisa DSCF Breamvs.Roach 5.9671 s438e «9001 Bream vs. Pike 0.4500 oes04 | 0.8000 Roach vs. Pike 4.0548 70088 <0001 Conclusions You have seen how to conduct a one-way analysis of variance as well as a Kruskal-Wallis nonparametric test. You have also seen ways to determine if the two assumptions for a one-way ANOVA (normally distributed data and homogeneity of variance) are met. Finally, you saw an alternative way to filter data using the Filter Data task. Problems 8-1: Starting with the workbook Blood_Pressure.xls, create a temporary SAS data set called BP. Use this data set to perform a one-way ANOVA, testing the three drugs’ effects on SBP (sys- tolic blood pressure). What is the overall p-value for the test?Using the Tukey (default) method of multiple comparisons, what do you conclude about the three drug levels (Placebo, Drug A, and Drug B)? 8-2: Repeat problem 8-1, except start with the SAS data set Blood_Pressure.sas7bdat, which is located in the folder c:\SASUniversityEdition\myfolders\Problems. You may need to review the instructions describing the problem sets to see how to create a library. 8-3: Starting with the Diabetes.xls workbook, create a SAS data set called Diabetes. Test if there is a relationship between how often a person drinks diet drinks (variable Diet_Drinks) and the glucose level. What is the overall p-value for the ANOVA; test if there are any pairwise differences. If so, what are they, and what are the p-values? 8-4: Repeat problem 8-3, except request the SNK (Student- Newman-Keuls) multiple comparison test. Because this test has a slightly high power to detect group differences, is the difference between the levels Rarely and Sometimes significant (at the .05 level)? 8-5: Using the SASHELP data set BMT, test if the T values are different for each of the three groups. What is the overall p- value, and which groups, if any, are significantly different at the .05 level? 8-6: You have measured the left ventricular ejection fraction (LVEF) on three groups of subjects with congestive heart fail- ure (CHF). LVEF is the percentage of blood volume that ispumped from the left ventricle with each contraction. The three groups represent 1) Placebo, 2) Calcium channel blocker, and 3) Lasix. The experiment resulted in the following: Placebo: 55 58 62 48 57 57 80 40 55 52 Calcium: 57 65 55 78 57 84 72 80 78 81 Lasix: 60 60 65 67 48 62 64 70 57 40 Run the program below to create the CHF data set. The variables in this data set are Subj, Group (Placebo, Calcium, or Lasix), and LVEF. There will be a short explanation following the program: 1. data CHF; 2. do Group = 'Placebo','Calcium’,'Lasix’; 3. do Subj = 1 to 10; A. input LVEF @@; 5. output; 6. end; 7. end; 8. datalines; 55 58 62 48 57 57 80 40 55 52 57 65 55 78 57 84 72 80 78 8160 60 65 67 48 62 64 70 57 40 The program starts with a DATA statement (1). Line 2 demon- strates a DO loop with character values. Group is first set to ‘Placebo’. Then another DO loop creates a Subj variable with values from 1 to 10 (line 3). For each combination of Group and Subj, you read in a value for LVEF. The @@ on line 4 enables you to place several observations on a single line of data. Without the @@ on the INPUT statement, the program would go to a new line of data for each input. You finish each DO loop with an END statement. Finally, in line 8, you see a DATALINES statement. This enables you to enter the data value directly in the SAS program, avoiding the effort of first creating a text file and then using an INFILE statement to tell the program where to read the data values. Run a one-way ANOVA comparing LVEF for each of the three groups. Include a test for Tukey multiple comparisons. Support Sign Out ©2022 O'REILLY MEDIA, INC. TERMS OF SERVICE PRIVACY POLICY

Lab 10 Worksheet
100% (1)
Lab 10 Worksheet
3 pages
Oneway ANOVA
No ratings yet
Oneway ANOVA
38 pages
18MEO113T - DOE - Unit 5 - AY2023 - 24 ODD
No ratings yet
18MEO113T - DOE - Unit 5 - AY2023 - 24 ODD
76 pages
One Way Anova
100% (1)
One Way Anova
9 pages
Analysis of Variance (ANOVA)
No ratings yet
Analysis of Variance (ANOVA)
23 pages
One Way Anova
100% (1)
One Way Anova
5 pages
One-Way ANOVA Is Used To Test If The Means of Two or More Groups Are Significantly Different
No ratings yet
One-Way ANOVA Is Used To Test If The Means of Two or More Groups Are Significantly Different
17 pages
ANOVA Executive Summary
No ratings yet
ANOVA Executive Summary
6 pages
Analysis Var - Ance: OF (Anova)
No ratings yet
Analysis Var - Ance: OF (Anova)
13 pages
Anova 1
No ratings yet
Anova 1
14 pages
BNIS - Mayana Leaves DPPH - ANOVA
No ratings yet
BNIS - Mayana Leaves DPPH - ANOVA
6 pages
Analysis of Variance
No ratings yet
Analysis of Variance
4 pages
ANOVA Reader
No ratings yet
ANOVA Reader
7 pages
100 Anova
No ratings yet
100 Anova
4 pages
Analysis of Variance
No ratings yet
Analysis of Variance
6 pages
11-Anova For BRM
No ratings yet
11-Anova For BRM
39 pages
Anova
No ratings yet
Anova
22 pages
Module 1 - ANALYSIS OF VARIANCE
No ratings yet
Module 1 - ANALYSIS OF VARIANCE
11 pages
Https
No ratings yet
Https
2 pages
Introduction To ANOVA: Lamb Weight Gain Example From Text
No ratings yet
Introduction To ANOVA: Lamb Weight Gain Example From Text
6 pages
One Way Anova
No ratings yet
One Way Anova
35 pages
Assignment Exercise Anova
No ratings yet
Assignment Exercise Anova
9 pages
Unit 4-1
No ratings yet
Unit 4-1
38 pages
Lesson 4 Analysis of Variance
No ratings yet
Lesson 4 Analysis of Variance
50 pages
1 Way Analysis of Variance (ANOVA) : Peter Shaw RU
No ratings yet
1 Way Analysis of Variance (ANOVA) : Peter Shaw RU
25 pages
2 One-Way ANOVA
No ratings yet
2 One-Way ANOVA
59 pages
Da Anova Tests
No ratings yet
Da Anova Tests
6 pages
Mm13 Content Module 9
No ratings yet
Mm13 Content Module 9
12 pages
Lesson 3 (Analysis of Variance)
No ratings yet
Lesson 3 (Analysis of Variance)
14 pages
One Way ANOVA
No ratings yet
One Way ANOVA
10 pages
Analysis of Variance
No ratings yet
Analysis of Variance
27 pages
ANOVA
No ratings yet
ANOVA
36 pages
ANOVA
No ratings yet
ANOVA
23 pages
Analysisof Variance
No ratings yet
Analysisof Variance
44 pages
Chapter 4 Hypotheses Testing of More Than Two Populations
No ratings yet
Chapter 4 Hypotheses Testing of More Than Two Populations
90 pages
Slidesgo Understanding One Way Anova A Comprehensive Guide To Analyzing Variance 202409271700121bJT
No ratings yet
Slidesgo Understanding One Way Anova A Comprehensive Guide To Analyzing Variance 202409271700121bJT
8 pages
Analysis of Variance
No ratings yet
Analysis of Variance
57 pages
Session 10
No ratings yet
Session 10
10 pages
Spss Tutorials: One-Way Anova
No ratings yet
Spss Tutorials: One-Way Anova
12 pages
One Way Analysis of Variance
No ratings yet
One Way Analysis of Variance
29 pages
Unit 8 8614 Research
No ratings yet
Unit 8 8614 Research
38 pages
Bio 5
No ratings yet
Bio 5
18 pages
ANOVA and Its Application Handout For MPhil PHD
No ratings yet
ANOVA and Its Application Handout For MPhil PHD
8 pages
ANOVA
No ratings yet
ANOVA
29 pages
SMuR Complete
No ratings yet
SMuR Complete
114 pages
Anova - Full
No ratings yet
Anova - Full
25 pages
5 ASAP Advanced Statistics - ANOVA - Total
No ratings yet
5 ASAP Advanced Statistics - ANOVA - Total
127 pages
Anova
No ratings yet
Anova
38 pages
14 Anova1
No ratings yet
14 Anova1
31 pages
Anova
No ratings yet
Anova
5 pages
Anova
No ratings yet
Anova
5 pages
18MEO113T - DOE - Unit 5 - AY2023 - 24 ODD
No ratings yet
18MEO113T - DOE - Unit 5 - AY2023 - 24 ODD
76 pages
Anova One Way PDF
No ratings yet
Anova One Way PDF
32 pages
Statistics FOR Management Assignment - 2: One Way ANOVA Test
No ratings yet
Statistics FOR Management Assignment - 2: One Way ANOVA Test
15 pages
One-Way ANOVA: What Is This Test For?
No ratings yet
One-Way ANOVA: What Is This Test For?
22 pages
Anova R
No ratings yet
Anova R
17 pages

8 Biostat

Uploaded by

8 Biostat

Uploaded by

You might also like