Threats To Internal and External Validity
Threats To Internal and External Validity
Threats To Internal and External Validity
THREATS TO INTERNAL AND EXTERNAL VALIDITY The results of a research study are only useful to the extent that they can be accurately and confidently interpreted. The issue of accuracy and confident interpretation of results is at the center of any discussion of validity. Validity, which is derived from the Latin word validus, meaning strong, refers to the degree with which correct inferences can be made from the results of a research study. The idea of validity in a research study involves two concepts at the same time. A researcher wants to have confidence that the outcomes observed in a research study are a function of the conditions observed, measured, and/or manipulated in the study and not due to come other factors that were not addressed in the study. Such confidence reflects the internal validity of a study. Usually researchers want to use the results of a research study to make a claim not just about the participants in the study but also about a larger population of which the participants are a sample. The ability to make such claims, or generalization, depends on the external validity of the study. These two aspects of research validity as well as factors that threaten research validity will be discussed separately in the following sections. Internal Validity As just described, internal validity refers to the extent to which the results obtained in a research study are a function of the variables that were systematically manipulated, measured, and/or observed in the study. Suppose, for example, that a researcher is interested in determining which of two instructional methods is superior for teaching a history concept. Suppose further that the researcher asked two teachers to each use one of the methods of instruction and then the researcher compared the mean test scores of each class following the instruction. It is apparent that the researcher is interested in test score differences that are attributable to the different instructional methods. However, because there are so many other ways in which the classes differed, the researcher cannot confidently conclude that any test score differences that were observed between the groups are due to the methods of instruction. The teachers may have been different in terms of their teaching effectiveness or enthusiasm, the classes may not be equivalent with respect to interest or preparation, or there may have been interruptions (such as fire drills or assemblies) in one class and not the other. The list of possible conditions that could have produced test score differences between the two classes is almost endless. Each of those possible conditions constitutes a potential threat to the internal validity are explained in the following sections. Potential Threats To Internal Validity History refers to the occurrence of events that could alter the outcome or the results of the study. These events could occur before the study, in which case we refer to previous history, or during the study, in which case we refer to concurrent history. For example, in a study of the effectiveness of a new method for teaching a unit on the biology of a cell, suppose we realize that many of the students had recently watched a television documentary entitled The Cell. This would be an example of previous history influencing the results of a study. In another situation, suppose we are interested in examining the effectiveness of using musical activities to teach mathematics concepts. If we have one teacher use the standard curriculum, it is impossible for us to determine whether the outcome we observe is attributable to differences in the curricula or differences between the teachers. The different effects of the teacher are an example of concurrent history as a possible threat to the internal validity of the study. Maturation pertains to any changes that occur in the subjects during the course
www.coe.iup.edu/grbieger/Classes/GSR615/Module 6/Internal and External Validity.htm 1/6
08/05/2012
of the study that are not part of the study and that might affect the results of the study. Such changes could be biological, that is, growth processes during the study that may affect the results, or they may be psychological, that is, learning or development that occurs during the study may affect the results. If we were to examine the weight gain and increase in height of second grades from September to May as a function of the school breakfast and lunch program, we would have to consider that normal growth would account for some of the change in those variables during that period. Biological maturation is possible source of invalidity in this case. On the other hand, if we were examining the effects of certain instructional techniques on concept learning of sixth graders from September to May, we would have to consider the attainment of formal operational thought during that period by some of the students as a possible reason for what we observe. In this example, psychological maturation is a potential threat to internal validity. Testing relates to the possible effects of a pretest on the performance of participants in a study on the posttest. For example, the effect may be sensitizing in that a pretest may alert subjects to the fact that they are being studied, leading them to react in a manner that may affect the results. Another possibility is that of multiple testing effects, where performance on a pretest may affect performance on later administrations of the test or other tests. In either case, the posttest may not be measuring just the influence of the treatment but also the effects of the earlier pretesting. Instrumentation is concerned with the effects on the outcome of a study of the inconsistent use of a measurement instrument. In other word, what the instrument is measuring changes during the duration of the study. For example, suppose a researcher is trying to ascertain the effect of a new instructional technique on achievement. The achievement test may be initially valid, but if the students become fatigue during the period of data collection because of the length of the achievement test, and that fatigue as well as achievement. As a result, the results of the study may be due to the deterioration of the testing instrument rather than to the variables being studied. Statistical regression is the term applied to the tendency of extreme scores to move (or regress) toward the mean score on subsequent retesting. For example, a group of students is given an IQ test, and those scoring in the lowest 25 percent are selected to participate in the study. After the treatment, the students are given another IQ test. Since the students were in the lowest extreme to begin with, we would expect the scores on the posttest to be higher, simply due to statistical regression. We could not confidently attribute the results of the study to the treatment, since statistical regression is a possible threat to the internal validity of our study. Mortality refers to the loss of subjects from the study due to their initial nonavailability or subsequent withdrawal from the study. Mortality can occur when potential participants agree to take part in a study in a nonrandom way. In other words, the participants are different from those who chose not to participate. Mortality can also affect the outcome of a study when participants drop out in a nonrandom fashion from different groups being compared in a study. For example, if more high-scoring people drop out from the experimental group than from the control group, the outcome of the study may be invalid due to mortality. Selection pertains to the possibility that groups in a study may possess different characteristics and that those differences may affect the results. For example, one group might differ from another in age, ability, gender, or racial/ethnic composition, or any of an almost unlimited number of ways. To the extent that such differences in group characteristics could affect the outcome of the study, they constitute a potential threat to internal validity due to selection. In the following section, techniques are suggested for minimizing these treats and thereby maximizing the internal validity of a research study.
www.coe.iup.edu/grbieger/Classes/GSR615/Module 6/Internal and External Validity.htm 2/6
08/05/2012
Procedures for Maximizing Internal Validity A researcher can maximize internal validity by taking steps to minimize the potential threats to internal validity. Fraenkel and Wallen (1993) suggest four general ways in which these threats can be minimized: 1. Standardization of the conditions under which the research study is carried out will help minimize threats to internal validity from history and instrumentation. 2. Obtaining as much information as possible about the participants in the research study aids in minimizing threats to internal validity from mortality and selection. 3. Obtaining as much information as possible about the procedural details of the research study, for example, where and when the study occurs, minimizes threats to internal validity from history and instrumentation. 4. Choosing an appropriate research design can help control most other threats to internal validity. The following are some specific suggestions for minimizing the potential threat to internal validity from each of the sources mentioned earlier. History: The use of a control group, selected from the same population as the experimental group (s) and which experiences the same concurrent history as the experimental group(s), can help eliminate most of the effects of history. Also, the shorter the duration of an experiment, the less likely history will be a threat. Maturation: The effects of maturation, like the effects of history, can be minimized by the use of a control group, selected from the same population as the experimental group(s). Also, like the effects of history, the effects of maturation tend to be minimized in studies of short duration. Testing: The use of a research design that does not include a pretest can eliminate testing as a potential threat to internal validity. If baseline or pretreatment data are needed, the use of unobtrusive measures (data collection techniques about which the experimental participant is unaware) may minimize the effects of testing. It also may help for a researcher to use different equivalent forms of a test for pretesting and post-testing. Instrumentation: Careful specification and control of the measurement procedures can eliminate most instrumentation threat. Standardized instruments, administration or data collection procedures, and the training of observers are among the procedures that help control the instrumentation threat. Statistical regression: Avoiding the use of extreme scorers, when average scorers are excluded, will minimize the threat due to statistical regression. Mortality: Choosing large groups and ensuring that they are representative of the population from which they were selected can minimize mortality threats. The use of follow-up procedures with a portion of those who leave the study or who were initially unavailable can further minimize mortality as a threat.
www.coe.iup.edu/grbieger/Classes/GSR615/Module 6/Internal and External Validity.htm 3/6
08/05/2012
Selection: Random selection and random assignment of subjects minimize selection as a threat to internal validity. If random selection and assignment are not possible, the use of certain statistical techniques, used as part of a careful quasiexperimental design, can adjust for group differences and thereby minimize selection as a threat. External Validity Rarely is a researcher interested in drawing conclusions only about the participants in a study. Usually, the researcher would like to claim that the results that were obtained for the participants are also applicable, or generalizable, to a larger population (or a larger set of settings and contexts). External validity, as described earlier, refers to the extent to which the results of a research study are able to be generalized confidently to a group larger than the group that participated in the study (Bracht & Glass, 1968.) Using the example from the discussion of internal validity, suppose that the researcher is interested in generalizing the results of the comparison of the two instructional methods to a larger population of students, teachers, and settings. In order for the results to be generalized with confidence, the researcher must have reason to believe that the students, teachers, and settings (and other aspects of the study) in the study are similar to those aspects as they exist in the larger population. Threats to the external validity of research findings may be related to the population, that is, the extent to which a sample is representative (or not representative) of the population from which it was selected, or to the ecology, that is, the extent to which characteristics of the setting or context of the research study are representative (or not representative) of the setting and context to which the results are to be generalized. The following section describes some of the common threats to external validity. Potential Threats to External Validity Effect of testing refers to the fact that the administration of a test (for example, a pretest) may affect the responses or the performance of the participants in a research study. If this happens, it means that the performance of the people being studied may be different from what it might have been if they had not been pre-tested. Therefore, the results may not be generalizable to situations where pre-testing will not occur. Multiple-treatment interference pertains to the situation in which participants in a study receive more than one treatment. In such a case, the effects of the multiple treatments may interact. For example, suppose a study is using students to test the effectiveness of a new method of instruction in mathematics. These students are also receiving many other treatments during the normal course of the school program, and those other treatments may have some impact on the effect of the new mathematics technique. The results of this study can be validly generalized only to similar situations. Selection-treatment interference is concerned with the possibility that some characteristic of the participants selected for the study interacts with some aspect of the treatment. Examples of such characteristics could include prior experiences, learning, personality factors, or any traits that might interact with the effect of the treatment. For the results to be validly generalized to a larger population, that population must possess the same traits, characteristics, experiences, and so on as the sample. Effects of experimental arrangements pertain to situations where participants become aware that they are involved in a study, and, as a result of that awareness, their response or performance is different from what it would have been otherwise. The effect on performance may be due to the newness of the experimental treatment (sometimes called the novelty effect), to the belief on the part of participants
www.coe.iup.edu/grbieger/Classes/GSR615/Module 6/Internal and External Validity.htm 4/6
08/05/2012
that they are receiving some special treatment (sometimes referred to as the Hawthorne effect; Roethlisberger & Dickson, 1939), or to the participants belief in the effectiveness of the treatment (sometimes called the placebo effect). Experimenter effects refer to the possibility that an experimenter may sometimes unintentionally influence the performance of participants in a study. Rosenthal (1996, p. 40) classified these effects as passive (e.g., the gender, race, or personal attributes of the researcher or observer affect participants performance) or active (e.g., the expectations of the researcher or observer are communicated to the participant in a manner that affects performance.) Specificity of variables is concerned with the extent to which the variables in study are adequately described and operationally defined. Variables can be defined too specifically. For example, if a researcher defines intelligence as the as the IQ score obtained from the Stanford-Binet test, any results may not be validly generalizable for other definitions of intelligence. Minimally, all variables must be described in sufficient detail to allow another researcher to replicate the study. In addition, the description and definition of variables must employ measurement instruments or observational devices that are themselves reliable and valid. To the extent that the variables included in a study are not adequately described and carefully defined, the ability validly to generalize the results of the study is threatened. Procedures for Maximizing External Validity In general, threats to the external validity of a study can be minimized when the researcher has taken steps to ensure that the sample, the setting, and the context are representative of the population, setting, and context are representative of the population, setting, and context to which the results are intended to be generalized. Effect of testing: The use of research designs that do not include pretests (see Chapter 3) can help eliminate this potential threat. Research designs such as the Solomon four-group design are especially useful in determining the extent to which pre-testing may have influenced the results of a study. Multiple-treatment interference: When there is reason to believe that there will be interference of multiple treatments, the researcher should try to choose a design in which only one treatment is assigned to each subject. If such a design is practical, the researcher should try to control and/or measure the effects of all relevant treatments and incorporate them into a multiple-treatment design. Selection-treatment interaction: This threat is similar to the internal validity threat of selection, and the remedy is also similar. Random selection and assignment of participants can minimize much of the threat to external validity due to selection-treatment interaction. When random selection or random assignment is not practical, statistical techniques such as analysis of covariance, used in conjunction with a careful quasi-experimental design, can take into account differences due to measurable attributes of the individual, thus minimizing selection-treatment interaction as a threat. Effects of experimental arrangements: The most effective way to minimize the reactive effects of various experimental arrangements is to have a control group (i.e., a group that receives no treatment whatsoever) and a placebo group (i.e., one that receives a placebo or non-experimental treatment.) In educational settings, it is often impossible to have true control groups, but we can usually arrange for a placebo group. An example would be a case in which an experimental group receives the new method of instruction and the
www.coe.iup.edu/grbieger/Classes/GSR615/Module 6/Internal and External Validity.htm 5/6
08/05/2012
placebo Experimenter effects: The use of blind data collection procedures can be an effective means of minimizing threats to external validity due to experimenter effects. This means that the researcher does not collect data or make observations but instead trains a nave observer to do so. The person collecting the data or making the observations should be unaware of the purpose of the study and should be unaware of which participants are receiving experimental treatment. Specificity of variables: Careful definition of variables is the key to minimizing this threat to external validity. In order to ensure generalizability, the researcher must operationally define variables in a way that is meaningful in settings beyond that in which the study is being conducted. The use of widely agreed upon definitions or multiple competing definitions should be considered. Learning Outcomes The exercises in this module will provide practice in identifying potential threats to the internal validity and external validity of research studies, and in describing specific procedures for minimizing those potential threats to validity. After completing the exercises in this module, you will be able to
Define external validity List the major potential threats to the internal validity of research studies List the major potential threats to the external validity of research studies Identify the potential threats to internal validity in a specific research study Identify the potential threats to external validity in a specific research study Describe specific procedures that will minimize the potential threat to internal validity in a particular research study
Describe specific procedures that will minimize the potential threat to external validity in a particular research study
Summary This module has described the most common threats to the internal validity and external validity of research studies and has described specific procedures that could be followed to minimize these threats to validity. In the next 2 modules you will learn about methods and techniques for analyzing data from quantitative research studies. ______________________________________________________________________________ NOTE: At this point, you are ready to test your understanding of what you have learned so far. You should now turn to the Exercises for Module 6.
6/6