Models and Methods For Evaluation: Ron Owston
Models and Methods For Evaluation: Ron Owston
CONTENTS
Introduction .....................................................................................................................................................................606
General Program Evaluation Models..............................................................................................................................606
Evolution of Program Evaluation..........................................................................................................................606
Decision-Making Evaluation Approaches .............................................................................................................607
Naturalistic Evaluation Approaches ......................................................................................................................607
Kirkpatrick’s Four Levels ......................................................................................................................................608
Technology Evaluation Approaches................................................................................................................................608
Implications for the Evaluation of Technology ..............................................................................................................610
Design of Study .....................................................................................................................................................612
Data Sources and Analysis ....................................................................................................................................613
Dissemination ........................................................................................................................................................615
Conclusions .....................................................................................................................................................................616
References .......................................................................................................................................................................616
605
Ron Owston
606
Models and Methods for Evaluation
Suchman (1967) argued that evaluating the attain- outcomes, and the extent to which the program is sus-
ment of a program’s goals is still essential, but more tainable and transferable. In essence, the CIPP model
critical is to understand the intervening processes that asks of a program: What needs to be done? How should
led to those outcomes. He suggested that an evaluation it be done? Is it being done? Did it succeed? Stuffle-
should test a hypothesis such as: “Activity A will attain beam also reconciled his model with Scriven’s forma-
objective B because it is able to influence process C, tive and summative evaluation by stating that formative
which affects the occurrence of this objective” (p. 177). evaluation focuses on decision making and summative
Following this reasoning, Weiss (1972) showed how a evaluation on accountability.
model could be developed and tested to explain how a Another popular approach that emerged was Pat-
chain of events in a teacher home visit program could ton’s (1978) utilization-focused evaluation. Patton
lead to the ultimate objective of improving children’s addressed the concern that evaluation findings are often
reading achievement. This early work led to the devel- ignored by decision makers. He probed evaluation pro-
opment of an approach known today as theory-based gram sponsors to attempt to understand why this is so
evaluation, theory-driven evaluation, or program theory and how the situation could be improved. From this
evaluation (PTE). PTE consists of two basic elements: study, he developed not so much an evaluation model
an explicit theory or model of how the program causes as a general approach to evaluation that has only two
the intended or observed outcomes and an actual eval- fundamental requirements. First, he stated that relevant
uation that is at least guided by the model (Rogers et decision makers and evaluation report audiences must
al., 2000). The theory component is not a grand theory be clearly identified. Second, he maintained that eval-
in the traditional social science sense, but rather it is a uators must work actively with the decision makers to
theory of change or plausible model of how a program decide upon all other aspects of the evaluation, includ-
is supposed to work (Bickman, 1987). The program ing such matters as the evaluation questions, research
model, often called a logic model, is typically developed design, data analysis, interpretation, and dissemina-
by the evaluator in collaboration with the program tion. Patton admitted that the challenge of producing
developers, either before the evaluation takes place or evaluation studies that are actually used is enormous
afterwards. Evaluators then collect evidence to test the but remained optimistic that it is possible and worth
validity of the model. PTE does not suggest a method- attempting.
ology for testing the model, although it is often associ- Cronbach (1980), a student of Tyler, also focused
ated with qualitative methodology. Cook (2000) argues on the decision-making process. His contribution was
that program theory evaluators who use qualitative to emphasize the political context of decision making,
methods cannot establish that the observed program saying that it is seldom a lone person who makes
outcomes were caused by the program itself, as causal- decisions about a program; rather, decisions are more
ity can only be established through experimental design. likely to be made in a lively political setting by a
Generally speaking, the contribution of PTE is that it policy-shaping community. Cronbach advocated that
forces evaluators to move beyond treating the program the evaluator should be a teacher, educating the client
as a black box and leads them to examining why group throughout the evaluation process by helping
observed changes arising from a program occurred. them refine their evaluation questions and determine
what technical and political actions are best for them.
Decision-Making Evaluation Approaches During this educative process, the evaluator is con-
stantly giving feedback to the clients, and the final
During the same period, other evaluators focused on evaluation report is only one more vehicle for commu-
how they could help educational decision makers. Best nicating with them. Unlike the other evaluation theo-
known is Stufflebeam (1973), who viewed evaluation rists mentioned above, Cronbach did not believe that
as a process of providing meaningful and useful infor- the evaluator should determine the worthiness of a
mation for decision alternatives. Stufflebeam proposed program nor provide recommended courses of action.
his context, input, process, and product (CIPP) model,
which describes four kinds of evaluative activities. Naturalistic Evaluation Approaches
Context evaluation assesses the problems, needs, and
opportunities present in the educational program’s set- At the same time these researchers were developing
ting. Input evaluation assesses competing strategies approaches that focused on how evaluation results are
and the work plans and budgets. Process evaluation used, others concentrated their efforts on developing
monitors, documents, and assesses program activities. methods that place few, if any, constraints on the eval-
Product evaluation examines the impact of the program uator. Known as naturalistic or qualitative, these
on the target audience, the quality and significance of approaches give the evaluator freedom to choose the
607
Ron Owston
methods used to collect, analyze, and interpret their models discussed, as Kirkpatrick does not emphasize
data. Stake’s (1975) responsive evaluation is one such negotiation with the decision makers nor does he favor
model. Stake was concerned that conventional a naturalistic approach. Kirkpatrick’s first writing on
approaches were not sufficiently receptive to the needs the model dates back to over 40 years ago, but it was
of the evaluation client. He advocated that evaluators not until more recently that he provided a detailed
must attend to actual program activities rather than elaboration of its features. Even though it focuses on
intents, respond to the audience’s needs for informa- training program evaluation, the model is still relevant
tion, and present different value perspectives when to general educational settings; for example, Guskey
reporting on the success and failure of a program. (2000) adapted it for the evaluation of teacher profes-
Stake believed that evaluators should use whatever sional development programs.
data-gathering schemes seem appropriate; however, he Kirkpatrick proposed four levels that the evaluator
did emphasize that they will likely rely heavily on must attend to: reaction, learning, behavior, and
human observers and judges. Rather than relying on results. Reaction refers to the program participants’
methodologies of experimental psychology, as is often satisfaction with the program; the typical course eval-
done in conventional evaluations, Stake saw evaluators uation survey measures reaction. Learning is the extent
drawing more from the traditions of anthropology and to which participants change attitudes, improve their
journalism in carrying out their studies. knowledge, or increase their skills as a result of attend-
Two other approaches are of interest in this dis- ing the program; course exams, tests, or surveys mea-
cussion of naturalistic methods. First, is Eisner’s sure this kind of change. The next two levels are new
(1979) connoisseurship model, which is rooted in the to most educational evaluators and are increasingly
field of art criticism. His model relies on the evalua- more difficult to assess. Behavior refers to the extent
tor’s judgment to assess the quality of an educational to which participants’ behavior changes as a result of
program, just as the art critic appraises the complexity attending the course; to assess this level, the evaluator
of a work of art. Two concepts are key to Eisner’s must determine whether participants’ new knowledge,
model: educational connoisseurship and educational skills, or attitudes transfer to the job or another situa-
criticism. Educational connoisseurship involves the tion such as a subsequent course. The fourth evaluation
appreciation of the finer points of an educational pro- level, results, focuses on the lasting changes to the
gram, a talent that derives from the evaluator’s expe- organization that occurred as a consequence of the
rience and background in the domain. Educational crit- course, such as increased productivity, improved man-
icism relies on the evaluator’s ability to verbalize the agement, or improved quality. In a formal educational
features of the program, so those who do not have the setting, the fourth evaluation level could refer to
level of appreciation that the connoisseur has can fully assessing how students perform on the job after grad-
understand the program’s features. uation. Kirkpatrick has recommended the use of con-
The second approach is ethnographic evaluation, trol group comparisons to assess a program’s effec-
whose proponents believe can yield a more meaningful tiveness at these two higher levels, if at all possible.
picture of an educational program than would be pos-
sible using traditional scientific methods (Guba, 1978).
Ethnographic evaluators immerse themselves in the TECHNOLOGY EVALUATION
program they are studying by taking part in the day- APPROACHES
to-day activities of the individuals being studied. Their
data-gathering tools include field notes, key informant So far I have concentrated on models that are applicable
interviews, case histories, and surveys. Their goal is to a wide range of educational programs, whether or
to produce a rich description of the program and to not they might involve technology. Several frameworks
convey their appraisal of the program to the program have been proposed specifically to assess technology-
stakeholders. based learning, although none has been employed much
by researchers other than their developers. These
Kirkpatrick’s Four Levels frameworks tend to recommend areas in which evalu-
ators should focus their data collection, provide criteria
Although it is well established in the human resource against which technology-based learning could be
development community, Kirkpatrick’s (2001) four- judged, or provide questions for the evaluator to ask.
level model is less known in educational evaluation For example, Riel and Harasim (1994) proposed three
circles because it focuses on the evaluation of corpo- areas on which data collection might focus for the
rate training programs. I have placed it in a category evaluation of online discussion groups: the structure of
by itself because it has little in common with the other network environment, social interaction that occurs
608
Models and Methods for Evaluation
TABLE 45.1
CIAO! Framework
Context Interactions Outcomes
Rationale To evaluate technology, we need to Observing students and obtaining Being able to attribute learning outcomes to
know about its aims and the context process data help us to understand technology when it is one part of a multifaceted
of its use. why and how some element works course is very difficult. It is important to try to assess
in addition to whether or not it both cognitive and affective learning outcomes (e.g.,
works. changes in perceptions and attitudes).
Data Designers’ and course teams’ aims Records of student interactions Measures of learning
Policy documents and meeting Student diaries Changes in students’ attitudes and perceptions
records Online logs
Methods Interviews with technology program Observation Interviews
designers and course team members Diaries Questionnaires
Analysis of policy documents Video/audio and computer recording Tests
Source: Adapted from Scanlon, E. et al., Educ. Technol. Soc., 3(4), 101–107, 2000.
during the course or project, and the effects of the uated: the context dimension concerns how the tech-
experience on individuals. Bates and Poole’s (2003) nology fits within the course and where and how it is
SECTION model calls for the comparison of two or used; interactions refers to how students interact with
more online instructional delivery modes on the basis the technology and with each other; and outcomes deals
of the appropriateness of the technology for the targeted with how students change as a result of using the tech-
students, its ease of use and reliability, costs, teaching nology. The first row of the framework provides a brief
and learning factors, interactivity fostered by the tech- rationale for the need to evaluate each of the three
nology, organizational issues, novelty of the technol- dimensions. The second and third rows, respectively,
ogy, and how quickly courses can be mounted and highlight the kinds of data that should be collected for
updated. Ravitz (1998) suggested a framework that each dimension and the methods that should be
encourages the assessment of a project’s evolution employed for each. The authors point out that, while
through interactive discussion, continual recordkeep- the framework has proven to be very valuable in high-
ing, and documentation. Mandinach (2005) has given lighting areas in which evaluative data should be col-
evaluators a set of key questions to ask about an e-learn- lected, caution should be exercised in not applying the
ing program in three general areas: student learning, framework in an overly prescriptive manner.
pedagogical and intuitional issues, and broader policy Perhaps the most widely used criteria for evaluat-
issues. Finally, Baker and Herman (2003) have pro- ing teaching with technology in higher education are
posed an approach, which they call distributed evalu- the Seven Principles for Good Practice in Undergrad-
ation, to deal with large-scale, longitudinal evaluation uate Education, described in a seminal article by
of technology. They emphasize clarifying evaluation Chickering and Gamson (1987). Almost 10 years after
goals across all stakeholders, using a variety of quan- this article was published, Chickering and Ehrmann
titative and qualitative measures ranging from question- (1996) illustrated how the criteria, which were distilled
naires and informal classroom tests to standardized from decades of research on the undergraduate educa-
tests, designing lengthier studies so changes can be tion experience, could be adapted for information and
assessed over time, collecting data at the local level and communication technologies. Briefly, the criteria sug-
entering them into a systemwide repository, and pro- gest that faculty should:
viding feedback targeted at various audiences.
Of particular note because of its origins and com- • Encourage contact between students and the
prehensiveness is the context, interactions, attitudes, faculty.
and outcomes (CIAO!) framework developed by Scan- • Develop reciprocity and cooperation among
lon et al. (2000). The CIAO! framework represents a students.
culmination of some 25 years of technology evaluation • Encourage active learning.
experience of the authors at the Open University in the • Give prompt feedback.
United Kingdom. As shown in Table 45.1, the columns • Emphasize time on task.
in the framework represent three dimensions of the • Communicate high expectations.
technology-based learning program that must be eval- • Respect diverse talents and ways of learning.
609
Ron Owston
610
Models and Methods for Evaluation
Purpose of Evaluation?
Needs of Audience?
Evaluation Design?
Experimental Qualitative
Random Non-Random
Assignment Assignment
Data Sources?
Dissemination Strategies?
TABLE 45.2
Evaluation Models Best Suited for Particular Evaluation Purposes
Primary Purpose of Evaluation
Meeting
Attainment of Development Information
the Program’s Accreditation of Theory Needs of Overall
Goals and Program of the about Diverse Impact of
Evaluation Model Objectives Improvement Program Intervention Audiences Program
Goal-based (Tyler, 1942) X X
Goal-free evaluation (Scriven, 1972) X X X
Theory-based (Weiss, 1972) X X X X
Context, input, process, and product X X X
(CIPP) (Stufflebeam, 1973)
Utilization-focused (Patton, 1978) X
Responsive (Stake, 1975) X X X
Connoisseurship (Eisner, 1979) X
Ethnographic (Guba, 1978) X X X X
Multilevel (Guskey, 2000; X X X
Kirkpatrick, 2001)
CIAO! framework (Scanlon et al., X X X
2000)
Seven principles of good practice in X X
undergraduate education
(Chickering and Ehrmann, 1996)
kinds of data needed to take appropriate action. Recall setting, the decision makers or stakeholders might be a
Stufflebeam’s statement that the purpose of evaluation faculty member who is teaching an online course, a cur-
is to present options to decision makers. In a university riculum committee, a technology roundtable, a faculty
611
Ron Owston
council, or senior academic administrators. The stake- precise question that will address the incremental
holders in a school setting could be a combination of impact of technology within a more global experience
parents, teachers, a school council, and the district of technology use. The authors illustrate, for example,
superintendent. The challenge to the evaluator, there- how a study could be designed around a narrower
fore, is to identify these audiences and then find out question: “What effect does Internet research have on
what their expectations are for the evaluation and the student learning?” (p. 19). Rather than simply com-
kind of information they seek about the program. Pat- paring students who do research on the Internet with
ton, Cronback, and Stake all emphasized the critical those who do not, they created a factorial design in
importance of this stage. The process may involve which the presence or absence of Internet research is
face-to-face meetings with the different stakeholders, linked to whether teachers do or do not instruct stu-
telephone interviews, or brief surveys. Because con- dents on best practices for Internet research. The result
sensus in expectations is unlikely to be found, the is four experimental conditions: best practice with
evaluator will have to make judgments about the rel- Internet, best practice without Internet, typical Internet
ative importance of each stakeholder and whose infor- practice, and a control group whose teacher neither
mation should be given priority. encourages nor discourages students from doing Inter-
With the expectations and information needs in net research. The authors’ recommendation echoes that
hand, the study now must be planned. We saw from offered by Carol Weiss some time ago when she made
Scriven’s perspective that all program outcomes the point that the control group does not necessarily
should be examined whether or not they are stated as have to receive no treatment at all; it can receive a
objectives. My experience has taught me not only to lesser version of the treatment program (Weiss, 1972).
assess the accomplishment of program objectives, as This advice is particularly relevant when speaking of
this is typically what stakeholders want done, but also technology, as it is commonly used by students today
to seek data on unintended outcomes, whether positive either in classrooms or outside of school, so to expect
or negative, as they can lead to insights one might that the control group contains students who do not
otherwise have missed. use technology would be unrealistic.
A problem that Cook et al. (2003b) mention only
Design of Study in passing is that of sample size and units of analysis—
key considerations in an experimental study. In a report
Next the evaluator must decide upon the actual design commissioned by the U.S. Institute of Education Sci-
of the study. A major decision has to be made about ences, Agodini et al. (2003) analyzed these issues
whether to embark on an experimental design involving when developing specifications for a national study on
a comparison group or a non-experimental design. The the effectiveness of technology applications on student
information needs of the stakeholders should determine achievement in mathematics and reading. The authors
the path to follow (Patton, 1978; Stake, 1975). If the concluded that an effect size of 0.35 would be a rea-
stakeholders seek proof that a technology-based pro- sonable minimum goal for such a study because pre-
gram works, then an experimental design is what is vious studies of technology have detected effects of
likely required. The What Works Clearinghouse estab- this size, and it was judged to be sufficiently large to
lished by the U.S. Department of Education’s Institute close the achievement gaps between various segments
of Education Sciences holds experimental designs as of the student population. An effect size of 0.35 means
the epitome of “scientific evidence” for determining the that the effect of the treatment is 35% larger than the
effectiveness of educational interventions (http://www. standard deviation of the outcome measure being con-
w-w-c.org). On the other hand, if the stakeholders seek sidered. To achieve this effect size would require the
information on how to improve a program, then non- following number of students under the given condi-
experimental or qualitative approaches may be appro- tions of random assignment:
priate. Some even argue that defining a placebo and
treatment does not make sense given the nature of edu- • Students randomly assigned to treatments
cation; hence, accumulation of evidence over time and would require 10 classrooms with 20 stu-
qualitative studies are a more meaningful means of dents in each (total of 200 students).
determining what works (Olson, 2004). • Classrooms randomly assigned to treat-
If a decision is made to conduct a randomized ments would require 30 classrooms with 20
experimental study, Cook et al. (2003b) offer some students in each (total of 600 students) for a
helpful advice. They suggest that, rather than asking a study of the effects of technology on reading
broad question such as, “Do computers enhance learn- achievement; however, 40 classrooms with
ing?” (p. 18), the evaluator should formulate a more 20 students (total of 800 students) would be
612
Models and Methods for Evaluation
required for mathematics because of statis- scores on other dependent measures, individual and
tical considerations on the way mathematics focus group interviews of students and teachers, Web-
scores cluster. based survey data, relevant program documents, and
• Schools randomly assigned to treatments classroom observation. The use of multiple data
would require 29 schools with 20 students sources is standard practice in qualitative evaluation,
in each (total of 1160 students). as the need to triangulate observations is essential (Pat-
ton, 2002). In experimental studies, other qualitative
The first condition of random assignment of students and quantitative data sources may be used to help
to treatment is not likely a very feasible option in most explain and interpret observed differences on depen-
schools, so the evaluator is left with the choice of dent measures.
random assignment to classrooms or to schools, both Log files generated by Web servers are a relatively
of which would require many more students. The result new source of data that can be used to triangulate
is that an evaluation of technology using an experi- findings from surveys and interviews when the tech-
mental design would likely be a fairly costly under- nology being evaluated is Web based. These files con-
taking if these guidelines are followed. tain a record of communication between a Web
Unfortunately, even random assignment to class- browser and a Web server in text-based form. The files
rooms or schools may be problematic; therefore, the vary slightly depending on the type of server, but most
evaluator is left with having to compare intact classes, Web servers record the following information:
a design that is weak (Campbell et al., 1966). Finding
teachers or students from an intact class to act as a • Address of the computer requesting a file
comparison group is difficult. Even if their cooperation • Date and time of the request
is obtained, so many possible competing hypotheses • Web address of the file requested
could explain any differences found between experi- • Method used for the requested file
mental and comparison groups (e.g., the comparison • Return code from the Web server that spec-
group may have an exceptional teacher or the students ifies if the request was successful or failed
in the experimental group may be more motivated) that and why
they undermine the validity of the findings. • Size of the file requested
When the goal of the study is program improve-
ment rather than proving the program works, qualita- Web server log files do not reveal or record the content
tive approaches such as those of Stake and of Guba of a Web browser request—only the fact that a request
described earlier in this chapter are particularly appro- was made. Because each Web page has a distinct
priate. Owston (2000) argued that the mixing of both address, it is possible to determine that a user viewed
qualitative and quantitative methods shows stronger a particular page. Log files grow to be exceedingly
potential for capturing and understanding the richness large and are often discarded by system administrators;
and complexity of e-learning environments than if however, evaluators can analyze the files using com-
either approach is used solely. Although some meth- mercial tools such as WebTrends Log Analyzer
odologists may argue against mixing research para- (http://www.webtrends.com) or freeware tools such as
digms, I take a more pragmatic stance that stresses the AWStats (http://awstats.sourceforge.net). Output from
importance and predominance of the research ques- the tools can be in tabular or graphical format (see
tions over the paradigm. This approach frees the eval- Figure 45.2 for sample output). The tools can be used
uator to choose whatever methods are most appropriate by the evaluator to answer questions such as what time
to answer the questions once they are articulated. Ulti- of day or week users were accessing the system, how
mately, as Feuer et al. (2002) pointed out, “No method long they were logged into the system, what pages they
is good, bad, scientific, or unscientific in itself; rather, viewed, and what paths they followed through the web-
it is the appropriate application of method to a partic- site. Figure 45.2 is typical of the graphical output that
ular problem that enables judgments about scientific may be obtained on the average number of users vis-
quality.” iting a website per day of the week.
The author and his colleagues have used log file
Data Sources and Analysis analysis successfully in several technology evalua-
tion studies. In one study, Wideman et al. (1998)
When the basic design of the study is developed, the found that students in a focus group said they made
next decision will be to determine the evaluation data frequent use of a simulation routine in an online
sources. Generally, the best strategy is to use as many course, but the log files revealed that the routine was
different sources as practical, such as test scores or seldom used. In another study, Cook et al. (2003a)
613
Ron Owston
60 1 Sun
50 2 Mon
User Sessions
40 3 Tue
30 4 Wed
20 5 Thu
10 6 Fri
0 7 Sat
1 2 3 4 5 6 7
Weekdays
were able to correlate student access to a university data, ranging from user activity logs, online demo-
course website to final course grades to obtain an indi- graphic questionnaire responses, and data from auto-
cator of how helpful the site was to students. The matically triggered pop-up questions (see example in
researchers were able to obtain these data because the Figure 45.3) to the results of queries designed to auto-
website required students to log in, and a record of matically appear at key points when users interact with
each log-in appeared in the log file which could be the application. Another feature of VULab is its capa-
matched to the student grades. Log-file analysis has bility to record the screens and voice conversations of
some limitations (Haigh and Megarity, 1998), but we remote users and store the files on the VULab server
found that it provided more and better quality data than without the need to install special software on the
are generated by, for example, the course management users’ computers. The data that are collected are stored
system WebCT (http://www.webct.com). in an integrated database system, allowing for subse-
Another tool developed by the author and his col- quent data mining and ad hoc querying of the data by
leagues to aid in the evaluation of technology-based researchers. VULab also allows for ease of use for
learning is the Virtual Usability Lab (VULab) (Owston researchers in setting up the parameters for studies and
et al., 2005). VULab was originally developed for edu- automatically monitoring users whether they are inter-
cational game research, but it is applicable to any Web- acting with computers locally or are scattered across
based learning research where the learner’s computer the Internet. Owston et al. (2005) reported on how
is connected to the Internet. The tool allows for the VULab was used to record student discussions when
automated integration of a wide range of sources of they were filling out an online questionnaire after play-
614
Models and Methods for Evaluation
C C
C E
Teacher Support
E E E
Figure 45.4 Essential (E) and contributing (C) factors to the sustainability of innovative use of technology in the classroom. (Adapted
from Owston, R.D., J. Educ. Change, 8(1), 61–77, 2007.)
ing an online game. The students were asked on the In Atlas.ti, these files are coded the same way as textual
questionnaire whether or not they enjoyed playing the files; in NVivo, the files cannot be directly imported
game, and a rich discussion of several minutes’ dura- but coding of external video and audio files can be
tion ensued among a small group of students playing done. If a project involves only audio or video, the best
the game at one computer. When it came time to enter strategy may be to use Transana (http://transana.org)
their responses into the questionnaire form, they sim- which is a free, open-source tool designed for the
ply entered “yes”; thus, valuable user feedback would analysis of these kinds of files. A helpful feature of
have been lost if it had not been for the VULab record- Transana is that while audio or video files are being
ing. The tool also proved useful for identifying role played a typist can transcribe the voices directly into
playing among the groups of students playing the a separate window within the application.
game, intra-group competition and collaboration, and An excellent website maintained by the Computer-
pinpointing technical problems within the game itself. Assisted Qualitative Data Analysis (CAQDAS) Net-
Frequently, evaluations involve collecting large working Project (see http://caqdas.soc.surrey.ac.uk/) in
quantities of qualitative data, such as interview tran- the United Kingdom provides independent academic
scripts, open-ended responses to questionnaires, dia- comparisons of popular qualitative data analysis tools
ries, field notes, program documents, and minutes of and as well as other helpful resources and announce-
meetings. Managing and analyzing these files can be ments. Those new to computerized analysis of quali-
simplified using qualitative data analysis (QDA) soft- tative data are well advised to visit this website for
ware tools. Two of the most popular QDA tools are guidance in selecting the most appropriate tool to use
Atlas.ti (http://atlasti.com/) and NVivo (http://www. in an evaluation.
qsrinternational.com/). These tools do not perform the
analysis, but they help in the coding and interpretation Dissemination
of the data. Both of these tools also have a feature that
allows researchers to visually map relationships A final issue that needs addressing is the dissemina-
between codes that may lead to theory development; tion of evaluation findings. The American Evaluation
for example, Owston (2007) studied factors that con- Association’s Guiding Principles for Evaluators (see
tribute to the sustainability of innovative classroom use http://www.eval.org/Publications/GuidingPrinciples.
of technology. Using Atlas.ti, he mapped the relation- asp) provides valuable advice to evaluators who are
ships among codes and developed a model (see Figure disseminating their results. Evaluators should com-
45.4) that helps explain why teachers are likely to municate their methods and approaches accurately
sustain innovative pedagogical practices using technol- and in sufficient detail to allow others to understand,
ogy. Atlas.ti allows the importing of audio and video interpret, and critique their work. They should make
files as well as textual files, whereas NVivo does not. clear the limitations of an evaluation and its results.
615
Ron Owston
Evaluators should discuss in a contextually appropriate Baker, E. L. and Herman, J. L. (2003). Technology and evalu-
way those values, assumptions, theories, methods, ation. In Evaluating Educational Technology: Effective
Research Designs for Improving Learning, edited by G.
results, and analyses significantly affecting the inter- Haertel and B. Means, pp. 133–168. New York: Teachers
pretation of the evaluative findings. These statements College Press.*
apply to all aspects of the evaluation, from its initial Bates, A. and Poole, G. (2003). Effective Teaching with Tech-
conceptualization to the eventual use of findings. nology in Higher Education. San Francisco, CA: Jossey-
Beyond this, the final report should contain no Bass.
Bickman, L. (1987). The functions of program theory. In Using
surprises for the stakeholders if evaluators are doing Program Theory in Evaluation: New Directions for Program
their job properly. That means that there should be an Evaluation, Vol. 33, edited by L. Bickman, pp. 5–18. San
ongoing dialog between the evaluators and stakehold- Francisco, CA: Jossey-Bass.*
ers, including formal and informal progress reports. Bonk, C. J. and Cummings, J. A. (1998). A dozen recommen-
This allows for the stakeholders to make adjustments dations for placing the student at the centre of Web-based
learning. Educ. Media Int., 35(2), 82–89.
to the program while it is in progress. At the same Bonk, C. J., Wisher, R. A., and Lee, J. (2003). Moderating
time, it is a way of gradually breaking news to the learner-centered e-learning: problems and solutions, benefits
stakeholders if it looks as though serious problems are and implications. In Online Collaborative Learning: Theory
occurring with the program. Surprising stakeholders and Practice, edited by T. S. Roberts, pp. 54–85. Hershey,
at the end of a project with bad news is one way to PA: Idea Group Publishing.
Campbell, D. T., Stanley, J. C., and Gage, N. L. (1966). Exper-
ensure that the evaluation report will be buried and imental and Quasi-Experimental Designs for Research. Chi-
never seen again! All the evaluation models reviewed cago, IL: Rand McNally.*
in this chapter encourage, to varying degrees, contin- Chickering, A. and Ehrmann, S. C. (1996). Implementing the
uous dialog between evaluators and stakeholders for Seven Principles: Technology As Lever, http://www.tltgroup.
these reasons. The end result should be that the eval- org/Seven/Home.htm.
Chickering, A. and Gamson, Z. (1987). Seven principles of good
uation report is used and its recommendations or impli- practice in undergraduate education. AAHE Bull., 39, 3–7
cations are given due consideration. (http://www.tltgroup.org/Seven/Home.htm).
Cook, K., Cohen, A. J., and Owston, R. D. (2003a). If You Build
It, Will They Come? Students’ Use of and Attitudes towards
Distributed Learning Enhancements in an Introductory Lec-
CONCLUSIONS ture Course, Institute for Research on Learning Technolo-
gies Technical Report 2003-1. Toronto: York University
The challenge facing evaluators of technology-based (http://www.yorku.ca/irlt/reports.html).
programs is to design studies that can provide the Cook, T. D. (2000). The false choice between theory-based
feedback needed to enhance their design or to provide evaluation and experimentation. New Direct. Eval. Chal-
evidence on their effectiveness. Evaluators need to lenges Oppor. Program Theory Eval., 87, 27–34.
Cook, T. D., Means, B., Haertel, G., and Michalchik, V. (2003b).
look broadly across the field of program evaluation The case for using randomized experiments in research on
theory to help discern the critical elements required newer educational technologies: a critique of the objections
for a successful evaluation undertaking. These include raised and alternatives. In Evaluating Educational Technol-
attention to aspects such as the audience of the report ogy: Effective Research Designs for Improving Learning,
and their information needs, deciding to what extent edited by G. Haertel and B. Means. New York: Teachers
College Press.
the study will be influenced by stated objectives, Cronbach, L. J. (1980). Toward Reform of Program Evaluation.
whether a comparative design will be used, and if San Francisco, CA: Jossey-Bass.*
quantitative, qualitative, or a combination of methods Eisner, E. W. (1979). The Educational Imagination: On the
will be brought into play. The study should also be Design and Evaluation of School Programs. New York: Mac-
guided by the criteria and approaches developed for millan.*
Feuer, M. J., Towne, L., and Shavelson, R. J. (2002). Scientific
or applicable to the evaluation of e-learning. When culture and educational research. Educ. Res., 31, 4–14.
these steps are taken, evaluators will be well on their Graham, C., Cagiltay, K., Craner, J., Lim, B., and Duffy, T. M.
way to devising studies that will be able to answer (2000). Teaching in a Web-Based Distance Learning Envi-
some of the pressing issues facing teaching and learn- ronment: An Evaluation Summary Based on Four Courses,
ing with technology. Center for Research on Learning and Technology Technical
Report No. 13-00. Bloomington: Indiana University (http://
crlt.indiana.edu/publications/crlt00-13.pdf).
REFERENCES Guba, E. G. (1978). Toward a Method of Naturalistic Inquiry
in Educational Evaluation, Center for the Study of Evalua-
Agodini, R., Dynarski, M., Honey, M., and Levin, D. (2003). tion Monograph Series No. 8. Los Angeles: University of
The Effectiveness of Educational Technology: Issues and California at Los Angeles.*
Recommendations for the National Study, Draft. Washing- Guskey, T. R. (2000). Evaluating Professional Development.
ton, D.C.: U.S. Department of Education. Thousand Oaks, CA: Corwin Press.
616
Models and Methods for Evaluation
Haigh, S. and Megarity, J. (1998). Measuring Web Site Usage: Riel, M. and Harasim, L. (1994). Research perspectives on
Log File Analysis. Ottawa, ON: National Library of Canada network learning. Machine-Mediated Learning, 4(2/3),
(http://www.collectionscanada.ca/9/1/p1-256-e.html). 91–113.
Kirkpatrick, D. L. (2001). Evaluating Training Programs: The Rogers, P. J., Hacsi, T. A., Petrosino, A., and Huebner, T. A.,
Four Levels, 2 ed. San Francisco, CA: Berrett-Koehler.* Eds. (2000). Program Theory in Evaluation Challenges and
Mandinach, E. B. (2005). The development of effective evalu- Opportunities: New Directions for Evaluation, No. 87. San
ation methods for e-learning: a concept paper and action Francisco, CA: Jossey-Bass.
plan. Teachers Coll. Rec., 107(8), 1814–1835. Scanlon, E., Jones, A., Barnard, J., Thompson, J., and Calder,
Olson, D. R. (2004). The triumph of hope over experience in J. (2000). Evaluating information and communication tech-
the search for ‘what works’: a response to Slavin. Educ. nologies for learning. Educ. Technol. Soc., 3(4), 101–107.
Res., 33(1), 24–26. Scriven, M. (1972). Pros and cons about goal free evaluation.
Owston, R. D. (2000). Evaluating Web-based learning environ- Eval. Comm., 3(4), 1–7.*
ments: strategies and insights. CyberPsychol. Behav., 3(1), Stake, R. E. (1975). Evaluating the Arts in Education: A
79–87.* Responsive Approach. Columbus, OH: Merrill.*
Owston, R. D. (2007). Contextual factors that sustain innovative Suchman, E. (1967). Evaluative Research: Principles and Prac-
pedagogical practice using technology: an international tice in Public Service and Social Action Programs. New
study. J. Educ. Change, 8(1), 61–77. York: Russell Sage Foundation.
Owston, R. D. and Wideman, H. H. (1999). Internet-Based Stufflebeam, D. L. (1973). An introduction to the PDK book:
Courses at Atkinson College: An Initial Assessment, Centre educational evaluation and decision-making. In Educational
for the Study of Computers in Education Technical Report Evaluation: Theory and Practice, edited by B. L. Worthern
No. 99-1. Toronto: York University (http://www.yorku.ca/ and J. R. Sanders, pp. 128–142. Belmont, CA: Wadsworth.*
irlt/reports.html). Tyler, R. W. (1942). General statement on evaluation. J. Educ.
Owston, R. D., Kushniruk, A., Ho, F., Pitts, K., and Wideman, Res., 35, 492–501.
H. (2005). Improving the design of Web-based games and Weiss, C. H. (1972). Evaluation Research: Methods for Assessing
simulations through usability research. In Proceedings of the Program Effectiveness. Englewood Cliffs, NJ: Prentice Hall.*
ED-MEDIA 2005: World Conference on Educational, Mul- Wideman, H. H., Owston, R. D., and Quann, V. (1998). A
timedia, Hypermedia, and Telecommunications, June Formative Evaluation of the VITAL Tutorial ‘Introduction to
29–July 1, Montreal, Canada, pp. 1162–1167. Computer Science,’ Centre for the Study of Computers in
Patton, M. Q. (1978). Utilization-Focused Evaluation. Beverly Education Technical Report No. 98-1. Toronto: York Uni-
Hills, CA: SAGE.* versity (http://www.yorku.ca/irlt/reports.html).
Patton, M. Q. (2002). Qualitative Evaluation and Research Worthen, B. L. and Sanders, J. R. (1987). Educational Evalu-
Methods, 3rd ed. Thousand Oaks, CA: SAGE. ation: Alternative Approaches and Practical Guidelines.
Ravitz, J. (1998). Evaluating learning networks: a special chal- New York: Longman.*
lenge for Web-based instruction. In Web-Based Instruction,
edited by B. Khan, pp. 361–368. Englewood Cliffs, NJ:
Educational Technology Publications. * Indicates a core reference.
617