0% found this document useful (0 votes)
428 views8 pages

The Nonequivalent Dependent Variables

The document discusses the Nonequivalent Dependent Variables (NEDV) design, specifically the pattern matching variation. This variation involves specifying expected effects of an intervention on multiple outcome variables and comparing the expected pattern to the actual observed pattern. If the patterns match strongly, it provides evidence the intervention caused the effects, even without a control group. The design is strengthened by including more outcome variables and a complex expected pattern of effects.

Uploaded by

June Dumdumaya
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
428 views8 pages

The Nonequivalent Dependent Variables

The document discusses the Nonequivalent Dependent Variables (NEDV) design, specifically the pattern matching variation. This variation involves specifying expected effects of an intervention on multiple outcome variables and comparing the expected pattern to the actual observed pattern. If the patterns match strongly, it provides evidence the intervention caused the effects, even without a control group. The design is strengthened by including more outcome variables and a complex expected pattern of effects.

Uploaded by

June Dumdumaya
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

The Nonequivalent Dependent Variables (NEDV) Design

The Nonequivalent Dependent Variables (NEDV) Design is a deceptive one. In its simple form, it is an extremely weak design with respect to internal validity. But in its pattern matching variations, it opens the door to an entirely different approach to causal assessment that is extremely powerful. The design notation shown here is for the simple two-variable case. Notice that this design has only a single group of participants! The two lines in the notation indicate separate variables, not separate groups.

The idea in this design is that you have a program designed to change a specific outcome. For instance, let's assume you are doing training in algebra for first-year high-school students. Your training program is designed to affect algebra scores. But it is not designed to affect geometry scores. And, pre-post geometry performance might be reasonably expected to be affected by other internally validity factors like history or maturation. In this case, the pre-post geometry performance acts like a control group -- it models what would likely have happened to the algebra pre-post scores if the program hadn't been given. The key is that the "control" variable has to be similar enough to the target variable to be affected in the same way by history, maturation, and the other single group internal validity threats, but not so similar that it is affected by the program. The figure shows the results we might get for our two-variable algebra-geometry example. Note that this design only works if the geometry variable is a reasonable proxy for what would have happened on the algebra scores in the absence of the program. The real allure of this design is the possibility that we don't need a control group -- we can give the program to all of our sample! The problem is that in its two-variable simple version, the assumption of the control variable is a difficult one to meet. (Note that a double-pretest version of this design would be considerably stronger).

The Pattern Matching NEDV Design. Although the two-variable NEDV design is quite weak, we can make it considerably stronger by adding multiple outcome variables. In this variation, we need many outcome variables and a theory that tells how affected (from most to least) each variable will be by the program. Let's reconsider the example of our algebra program above. Now, instead of having only an algebra and geometry score, we have ten measures that we collect pre and post. We expect that the algebra measure would be most affected by the program (because that's what the program was most designed to affect). But here, we recognize that geometry might also be affected because training in algebra might be relevant, at least tangentially, to geometry skills. On the other hand, we might theorize that creativity would be much less affected, even indirectly, by training in algebra and so our creativity measure is predicted to be least affected of the ten measures.

Now, let's line up our theoretical expectations against our pre-post gains for each variable. The graph we'll use is called a "ladder graph" because if there is a correspondence between expectations and observed results we'll get horizontal lines and a figure that looks a bit like a ladder. You can see in the figure that the expected order of outcomes (on the left) are mirrored well in the actual outcomes (on the right).

Depending on the circumstances, the Pattern Matching NEDV design can be quite strong with respect to internal validity. In general, the design is stronger if you have a larger set of variables and you find that your expectation pattern matches well with the observed results. What are the threats to internal validity in this design? Only a factor (e.g., an historical event or maturational pattern) that would yield the same outcome pattern can act as an alternative explanation. And, the more complex the predicted pattern, the less likely it is that some other factor would yield it. The problem is, the more complex the predicted pattern, the less likely it is that you will find it matches to your observed data as well.

The Pattern Matching NEDV design is especially attractive for several reasons. It requires that the researcher specify expectations prior to institution of the program. Doing so can be a sobering experience. Often we make naive assumptions about how our programs or interventions will work. When we're forced to look at them in detail, we begin to see that our assumptions may be unrealistic. The design also requires a detailed measurement net -- a large set of outcome variables and a detailed sense of how they are related to each other. Developing this level of detail about your measurement constructs is liable to improve the construct validity of your study. Increasingly, we have methodologies that can help researchers empirically develop construct networks that describe the expected interrelationships among outcome variables (see Concept Mapping for more information about how to do this). Finally, the Pattern Matching NEDV is especially intriguing because it suggests that it is possible to assess the effects of programs even if you only have a treated group. Assuming the other conditions for the design are met, control groups are not necessarily needed for causal assessment. Of course, you can also couple the Pattern Matching NEDV design with standard experimental or quasi-experimental control group designs for even more enhanced validity. And, if your experimental or quasi-experimental design already has many outcome measures as part of the measurement protocol, the design might be considerably enriched by generating variable-level expectations about program outcomes and testing the match statistically.

One of my favorite questions to my statistician friends goes to the heart of the potential of the Pattern Matching NEDV design. "Suppose," I ask them, "that you have ten outcome variables in a study and that you find that all ten show no statistically significant treatment effects when tested individually (or even when tested as a multivariate set). And suppose, like the desperate graduate student who finds in their initial analysis that nothing is significant that you decide to look at the direction of the effects across the ten variables. You line up the variables in terms of which should be most to least affected by your program. And, miracle of miracles, you find that there is a strong and statistically significant correlation between the expected and observed order of effects even though no individual effect was statistically significant. Is this finding interpretable as a treatment effect?" My answer is "yes." I think the graduate student's desperation-driven intuition to look at order of effects is a sensible one. I would conclude that the reason you did not find statistical effects on the individual variables is that you didn't have sufficient statistical power. Of course, the results will only be interpretable as a treatment effect if you can rule out any other plausible factor that could have caused the ordering of outcomes. But the more detailed the predicted pattern and the stronger the correlation to observed results, the more likely the treatment effect becomes the most plausible explanation. In such cases, the expected pattern of results is like a unique fingerprint -- and the observed pattern that matches it can only be due to that unique source pattern.

I believe that the pattern matching notion implicit in the NEDV design opens the way to an entirely different approach to causal assessment, one that is closely linked to detailed prior explication of the program and to detailed mapping of constructs. It suggests a much richer model for causal assessment than one that relies only on a simplistic dichotomous treatment-control model. In fact, I'm so convinced of the importance of this idea that I've staked a major part of my career on developing pattern matching models for conducting research!

Single Group Threats


The Single Group Case

What is meant by a "single group" threat? Let's consider two single group designs and then consider the threats that are most relevant with respect to internal validity. The top design in the figure shows a "posttest-only" single group design. Here, a group of people receives your program and afterwards is given a posttest. In the bottom part of

the figure we see a "pretest-posttest" single group design. In this case, we give the participants a pretest or baseline measure, give them the program or treatment, and then give them a posttest.

To help make this a bit more concrete, let's imagine that we are studying the effects of a compensatory education program in mathematics for first grade students on a measure of math performance such as a standardized math achievement test. In the post-only design, we would give the first graders the program and then give a math achievement posttest. We might choose not to give them a baseline measure because we have reason to believe they have no prior knowledge of the math skills we are teaching. It wouldn't make sense to pretest them if we expect they would all get a score of zero. In the pre-post design we are not willing to assume that they have no prior knowledge. We measure the baseline in order to determine where the students start out in math achievement. We might hypothesize that the change or gain from pretest to posttest is due to our special math tutoring program. This is a compensatory program because it is only given to students who are identified as potentially low in math ability on the basis of some screening mechanism.

The Single Group Threats

With either of these scenarios in mind, consider what would happen if you observe a certain level of posttest math achievement or a change or gain from pretest to posttest. You want to conclude that the outcome is due to your math program. How could you be wrong? Here are some of the ways, some of the threats to interval validity that your critics might raise, some of the plausible alternative explanations for your observed effect:

History Threat

It's not your math program that caused the outcome, it's something else, some historical event that occurred. For instance, we know that lot's of first graders watch the public TV programSesame Street. And, we know that in every Sesame Street show they present some very elementary math concepts. Perhaps these shows cause the outcome and not your math program.

Maturation Threat

The children would have had the exact same outcome even if they had never had your special math training program. All you are doing is measuring normal maturation or growth in understanding that occurs as part of growing up -- your math program has no effect. How is this maturation explanation different from a history threat? In general, if we're talking about a specific event or chain of events that could cause the outcome, we call it a history threat. If we're talking about all of the events that typically transpire in your life over a period of time (without being specific as to which ones are the active causal agents) we call it a maturation threat.

Testing Threat

This threat only occurs in the pre-post design. What if taking the pretest made some of the children more aware of that kind of math problem -- it "primed" them for the program so that when you began the math training they were ready for it in a way that they wouldn't have been without the pretest. This is what is meant by a testing threat -- taking the pretest (not getting your program) affects how participants do on the posttest.

Instrumentation Threat

Like the testing threat, this one only operates in the pretest-posttest situation. What if the change from pretest to posttest is due not to your math program but rather to a change in the test that was used? This is what's meant by an instrumentation threat. In many schools when they have to administer repeated testing they don't use the exact same test (in part because they're worried about a testing threat!) but rather give out "alternate forms" of the same tests. These alternate forms were designed to be "equivalent" in the types of questions and level of difficulty, but what if they aren't? Perhaps part or all of any pre-post gain is attributable to the change in instrument, not to your program. Instrumentation threats are especially likely when the "instrument" is a human observer. The observers may get tired over time or bored with the observations. Conversely, they might get better at making the observations as they practice more. In either event, it's the change in instrumentation, not the program, that leads to the outcome.

Mortality Threat

Mortality doesn't mean that people in your study are dying (although if they are, it would be considered a mortality threat!). Mortality is used metaphorically here. It means that people are "dying" with respect to your study. Usually, it means that they are dropping out of the study. What's wrong with that? Let's assume that in our compensatory math tutoring program we have a nontrivial dropout rate between pretest and posttest. And, assume that the kids who are dropping out are the low pretest math achievement test scorers. If you look at the average gain from pretest to posttest using all of the scores available to you at each occasion, you would include these low pretest subsequent dropouts in the pretest and not in the posttest. You'd be dropping out the potential low scorers from the posttest, or, you'd be artificially inflating the posttest average over what it would have been if no students had dropped out. And, you won't necessarily solve this problem by comparing pre-post averages for only those kids who stayed in the study. This subsample would certainly not be representative even of the original entire sample. Furthermore, we know that because of regression threats (see below) these students may appear to actually do worse on the posttest, simply as an artifact of the non-random dropout or mortality in your study. When mortality is a threat, the researcher can often gauge the degree of the threat by comparing the dropout group against the nondropout group on pretestmeasures. If there are no major differences, it may be more reasonable to assume that mortality was happening across the entire sample and is not biasing results greatly. But if the pretest differences are large, one must be concerned about the potential biasing effects of mortality.

Regression Threat

A regression threat, also known as a "regression artifact" or "regression to the mean" is a statistical phenomenon that occurs whenever you have a nonrandom sample from a population and two measures that are imperfectly correlated. OK, I know that's gibberish. Let me try again. Assume that your two measures are a pretest and posttest (and you can certainly bet these aren't perfectly correlated with each other). Furthermore, assume that your sample consists of low pretest scorers. The regression threat means that the pretest average for the group in your study will appear to increase or improve (relatively to the overall population) even if you don't do anything to them -- even if you never give them a treatment. Regression is a confusing threat to understand at first. I like to think about it as the " you can only go up from here" phenomenon. If you include in your program only the kids who constituted the lowest ten percent of the class on the pretest, what are the chances that they would constitute exactly the lowest ten percent on the posttest? Not likely. Most of them would score low on the posttest, but they aren't likely to be the lowest ten percent twice. For instance, maybe there were a few kids on the pretest who got lucky on a few guesses and scored at the eleventh percentile who won't get so lucky next time. No, if you choose the lowest ten percent on the pretest, they can't get any lower than being the lowest -- they can only go up from there, relative to the larger population from which they were selected. This purely statistical phenomenon is what we mean by a regression threat. To see a more detailed discussion of why regression threats occur and how to estimate them, click here.

How do we deal with these single group threats to internal validity? While there are several ways to rule out threats, one of the most common approaches to ruling out the ones listed above is through your research design. For instance, instead of doing a single group study, you could incorporate a control group. In this scenario, you would have two groups: one receives your program and the other one doesn't. In fact, the only difference between these groups should be the program. If that's true, then the control group would experience all the same history and maturation threats, would have the same testing and instrumentation issues, and would have similar rates of mortality and regression to the mean. In other words, a good control group is one of the most effective ways to rule out the single-group threats to internal validity. Of course, when you add a control group, you no-longer have a single group design. And, you will still have to deal with threats two major types of threats to internal validity: the multiple-group threats to internal validity and the social threats to internal validity.

Regression Point Displacement Analysis


Statistical Requirements

The notation for the Regression Point Displacement (RPD) design shows that the statistical analysis requires:

a posttest score a pretest score a variable to represent the treatment group (where 0=comparison and 1=program)

These requirements are identical to the requirements for the Analysis of Covariance model. The only difference is that the RPD design only has a single treated group score.

The figure shows a bivariate (pre-post) distribution for a hypothetical RPD design of a community-based AIDS education program. The new AIDS education program is piloted in one particular county in a state, with the remaining counties acting as controls. The state routinely publishes annual HIV positive rates by county for the entire state. The xvalues show the HIV-positive rates per 1000 people for the year preceding the program while the y-values show the rates for the year following it. Our goal is to estimate the size of the vertical displacement of the treated unit from the regression line of all of the control units, indicated on the graph by the dashed arrow. The model we'll use is the Analysis of Covariance (ANCOVA) model stated in regression model form:

When we fit the model to our simulated data, we obtain the regression table shown below:

The coefficient associated with the dichotomous treatment variable is the estimate of the vertical displacement from the line. In this example, the results show that the program lowers HIV positive rates by .019 and that this amount is statistically significant. This displacement is shown in the results graph:

For more details on the statistical analysis of the RPD design, you can view an entire paper on the subject entitled " The Regression Point Displacement Design for Evaluating Community-Based Pilot Programs and Demonstration Projects."

You might also like