0% found this document useful (0 votes)
64 views14 pages

Models and Methods For Evaluation: Ron Owston

This chapter discusses approaches for evaluating technology-based learning programs by situating it in the context of general program evaluation models. It provides an overview of influential evaluation approaches such as Tyler's model of assessing program objectives, decision-making approaches, naturalistic evaluation, and Kirkpatrick's four levels. It also reviews common technology evaluation criteria before suggesting strategies for technology program evaluation that emphasize clarifying goals, determining information needs, and selecting an appropriate methodology.

Uploaded by

AMAAL ALORINI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views14 pages

Models and Methods For Evaluation: Ron Owston

This chapter discusses approaches for evaluating technology-based learning programs by situating it in the context of general program evaluation models. It provides an overview of influential evaluation approaches such as Tyler's model of assessing program objectives, decision-making approaches, naturalistic evaluation, and Kirkpatrick's four levels. It also reviews common technology evaluation criteria before suggesting strategies for technology program evaluation that emphasize clarifying goals, determining information needs, and selecting an appropriate methodology.

Uploaded by

AMAAL ALORINI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

45

Models and Methods for Evaluation


Ron Owston
York University, Toronto, Canada

CONTENTS
Introduction .....................................................................................................................................................................606
General Program Evaluation Models..............................................................................................................................606
Evolution of Program Evaluation..........................................................................................................................606
Decision-Making Evaluation Approaches .............................................................................................................607
Naturalistic Evaluation Approaches ......................................................................................................................607
Kirkpatrick’s Four Levels ......................................................................................................................................608
Technology Evaluation Approaches................................................................................................................................608
Implications for the Evaluation of Technology ..............................................................................................................610
Design of Study .....................................................................................................................................................612
Data Sources and Analysis ....................................................................................................................................613
Dissemination ........................................................................................................................................................615
Conclusions .....................................................................................................................................................................616
References .......................................................................................................................................................................616

ABSTRACT gram evaluation criteria and frameworks. Strategies


distilled from these two fields are then suggested for
This chapter situates the evaluation of technology- evaluating technology-based learning programs.
based programs in the context of the field of general These strategies emphasize clarifying the goal or pur-
educational program evaluation. It begins with an pose of the evaluation and determining the informa-
overview of the main evaluation approaches devel- tion needs of the intended audiences of the evaluation
oped for general educational programs, including at the beginning of the project. This, in turn, suggests
Tyler’s early conception of assessing attainment of the most appropriate evaluation methodology to be
program objectives, decision-making approaches, nat- used. The chapter concludes with a description of tools
uralistic evaluation, and Kirkpatrick’s four levels for that can be used for analysis of evaluative data, fol-
evaluating program effectiveness. Following this is an lowed by a brief discussion of the dissemination of
overview of commonly used technology-specific pro- evaluation results.

605
Ron Owston

KEYWORDS program evaluation is a particular case of general pro-


gram evaluation; therefore, the methods and tools in
Effect size: A statistical measure of the difference the program evaluation literature are equally applicable
between the mean of the control group and the to technology evaluation. At the same time, the criteria
mean of the experimental group in a quantitative that technology program evaluators offer can inform
research study. the more general evaluation approaches.
Evaluation: The process of gathering information This chapter begins with a discussion of the field
about the merit or worth of a program for the pur- of general program evaluation and outlines some of
pose of making decisions about its effectiveness or the more influential evaluation approaches that have
for program improvement. emerged. Following this is an overview of common
Naturalistic evaluation: An evaluation approach that technology program evaluation criteria and frame-
relies on qualitative methodology but gives evalu- works. Drawing from these two areas, I then suggest
ators freedom to choose the precise method used strategies that can be used to evaluate technology-
to collect, analyze, and interpret their data. based learning programs and describe several new data
Web log file: A data file residing on a Web server that collection and analysis software tools that can help
contains a record of all visitors to the site hosted evaluators.
by the server, where they came from, what links
they clicked on, as well as other information.
GENERAL PROGRAM
EVALUATION MODELS
INTRODUCTION
Evolution of Program Evaluation
New technologies that have potential implications for
learning are being developed almost daily: blogs, wikis, Prior to the 1970s, educational program evaluators
podcasting, response clickers, interactive pads and tended to concentrate on determining the extent to
whiteboards, advanced educational games and simula- which a program met its stated objectives, a model first
tions, and social websites, to name a few. Although advocated by Tyler (1942) in a longitudinal study of
individual teachers are always willing to pioneer the schools in the 1930s. That model seemed sensible
use of these technologies in their classrooms, system enough and served a generation or two of educators
administrators often face the challenge of having to well, but during the 1960s and 1970s researchers began
make informed decisions on whether these technolo- developing new evaluation models that went far
gies should be adopted on a wider scale or integrated beyond Tyler’s original conception of evaluation.
into curricula. The main criterion for their adoption The models that emerged were developed in
frequently is how effective they are at improving learn- response to the need to provide accountability for large
ing. Because of the newness of the technologies, sel- U.S. government program expenditures in health, edu-
dom do we have any compelling evidence of their effec- cation, and welfare during this period. Scriven (1972)
tiveness apart from anecdotal accounts of early argued that evaluators must not be blinded by exam-
adopters. This inevitably leads to a call for a formal ining only the stated goals of a project as other pro-
evaluation of programs that employ the technology. gram outcomes may be equally important. By impli-
The goal of this chapter is to provide guidance to cation, Scriven urged evaluators to cast a wide net in
those charged with the evaluation of technology-based evaluating the results of a program by looking at both
programs on how to approach the task. What is very the intended and unintended outcomes. In fact, he went
apparent from an examination of the literature on tech- as far as advising evaluators to avoid the rhetoric
nology program evaluation is the large gap between it around the program by not reading program brochures,
and the literature on the general field of program eval- proposals, or descriptions and to focus only on the
uation. As will be seen from the discussion that fol- actual outcomes. Scriven also popularized the terms
lows, program evaluation has become a mature field formative and summative evaluations as a way of dis-
of study that offers a variety of approaches and per- tinguishing two kinds of roles evaluators play: They
spectives from which the evaluator can draw. Those can assess the merits of a program while it is still under
writing about technology evaluation tend either to development, or they can assess the outcomes of an
ignore the field or to give it only cursory attention on already completed program. In practice, these two
the way to developing their own approaches, so another roles are not always as clearly demarcated as Scriven
goal of this chapter is to bridge the gap between these suggests; nonetheless, this distinction between the two
two fields. I take the position that technology-based purposes of evaluation is still widely drawn on today.

606
Models and Methods for Evaluation

Suchman (1967) argued that evaluating the attain- outcomes, and the extent to which the program is sus-
ment of a program’s goals is still essential, but more tainable and transferable. In essence, the CIPP model
critical is to understand the intervening processes that asks of a program: What needs to be done? How should
led to those outcomes. He suggested that an evaluation it be done? Is it being done? Did it succeed? Stuffle-
should test a hypothesis such as: “Activity A will attain beam also reconciled his model with Scriven’s forma-
objective B because it is able to influence process C, tive and summative evaluation by stating that formative
which affects the occurrence of this objective” (p. 177). evaluation focuses on decision making and summative
Following this reasoning, Weiss (1972) showed how a evaluation on accountability.
model could be developed and tested to explain how a Another popular approach that emerged was Pat-
chain of events in a teacher home visit program could ton’s (1978) utilization-focused evaluation. Patton
lead to the ultimate objective of improving children’s addressed the concern that evaluation findings are often
reading achievement. This early work led to the devel- ignored by decision makers. He probed evaluation pro-
opment of an approach known today as theory-based gram sponsors to attempt to understand why this is so
evaluation, theory-driven evaluation, or program theory and how the situation could be improved. From this
evaluation (PTE). PTE consists of two basic elements: study, he developed not so much an evaluation model
an explicit theory or model of how the program causes as a general approach to evaluation that has only two
the intended or observed outcomes and an actual eval- fundamental requirements. First, he stated that relevant
uation that is at least guided by the model (Rogers et decision makers and evaluation report audiences must
al., 2000). The theory component is not a grand theory be clearly identified. Second, he maintained that eval-
in the traditional social science sense, but rather it is a uators must work actively with the decision makers to
theory of change or plausible model of how a program decide upon all other aspects of the evaluation, includ-
is supposed to work (Bickman, 1987). The program ing such matters as the evaluation questions, research
model, often called a logic model, is typically developed design, data analysis, interpretation, and dissemina-
by the evaluator in collaboration with the program tion. Patton admitted that the challenge of producing
developers, either before the evaluation takes place or evaluation studies that are actually used is enormous
afterwards. Evaluators then collect evidence to test the but remained optimistic that it is possible and worth
validity of the model. PTE does not suggest a method- attempting.
ology for testing the model, although it is often associ- Cronbach (1980), a student of Tyler, also focused
ated with qualitative methodology. Cook (2000) argues on the decision-making process. His contribution was
that program theory evaluators who use qualitative to emphasize the political context of decision making,
methods cannot establish that the observed program saying that it is seldom a lone person who makes
outcomes were caused by the program itself, as causal- decisions about a program; rather, decisions are more
ity can only be established through experimental design. likely to be made in a lively political setting by a
Generally speaking, the contribution of PTE is that it policy-shaping community. Cronbach advocated that
forces evaluators to move beyond treating the program the evaluator should be a teacher, educating the client
as a black box and leads them to examining why group throughout the evaluation process by helping
observed changes arising from a program occurred. them refine their evaluation questions and determine
what technical and political actions are best for them.
Decision-Making Evaluation Approaches During this educative process, the evaluator is con-
stantly giving feedback to the clients, and the final
During the same period, other evaluators focused on evaluation report is only one more vehicle for commu-
how they could help educational decision makers. Best nicating with them. Unlike the other evaluation theo-
known is Stufflebeam (1973), who viewed evaluation rists mentioned above, Cronbach did not believe that
as a process of providing meaningful and useful infor- the evaluator should determine the worthiness of a
mation for decision alternatives. Stufflebeam proposed program nor provide recommended courses of action.
his context, input, process, and product (CIPP) model,
which describes four kinds of evaluative activities. Naturalistic Evaluation Approaches
Context evaluation assesses the problems, needs, and
opportunities present in the educational program’s set- At the same time these researchers were developing
ting. Input evaluation assesses competing strategies approaches that focused on how evaluation results are
and the work plans and budgets. Process evaluation used, others concentrated their efforts on developing
monitors, documents, and assesses program activities. methods that place few, if any, constraints on the eval-
Product evaluation examines the impact of the program uator. Known as naturalistic or qualitative, these
on the target audience, the quality and significance of approaches give the evaluator freedom to choose the

607
Ron Owston

methods used to collect, analyze, and interpret their models discussed, as Kirkpatrick does not emphasize
data. Stake’s (1975) responsive evaluation is one such negotiation with the decision makers nor does he favor
model. Stake was concerned that conventional a naturalistic approach. Kirkpatrick’s first writing on
approaches were not sufficiently receptive to the needs the model dates back to over 40 years ago, but it was
of the evaluation client. He advocated that evaluators not until more recently that he provided a detailed
must attend to actual program activities rather than elaboration of its features. Even though it focuses on
intents, respond to the audience’s needs for informa- training program evaluation, the model is still relevant
tion, and present different value perspectives when to general educational settings; for example, Guskey
reporting on the success and failure of a program. (2000) adapted it for the evaluation of teacher profes-
Stake believed that evaluators should use whatever sional development programs.
data-gathering schemes seem appropriate; however, he Kirkpatrick proposed four levels that the evaluator
did emphasize that they will likely rely heavily on must attend to: reaction, learning, behavior, and
human observers and judges. Rather than relying on results. Reaction refers to the program participants’
methodologies of experimental psychology, as is often satisfaction with the program; the typical course eval-
done in conventional evaluations, Stake saw evaluators uation survey measures reaction. Learning is the extent
drawing more from the traditions of anthropology and to which participants change attitudes, improve their
journalism in carrying out their studies. knowledge, or increase their skills as a result of attend-
Two other approaches are of interest in this dis- ing the program; course exams, tests, or surveys mea-
cussion of naturalistic methods. First, is Eisner’s sure this kind of change. The next two levels are new
(1979) connoisseurship model, which is rooted in the to most educational evaluators and are increasingly
field of art criticism. His model relies on the evalua- more difficult to assess. Behavior refers to the extent
tor’s judgment to assess the quality of an educational to which participants’ behavior changes as a result of
program, just as the art critic appraises the complexity attending the course; to assess this level, the evaluator
of a work of art. Two concepts are key to Eisner’s must determine whether participants’ new knowledge,
model: educational connoisseurship and educational skills, or attitudes transfer to the job or another situa-
criticism. Educational connoisseurship involves the tion such as a subsequent course. The fourth evaluation
appreciation of the finer points of an educational pro- level, results, focuses on the lasting changes to the
gram, a talent that derives from the evaluator’s expe- organization that occurred as a consequence of the
rience and background in the domain. Educational crit- course, such as increased productivity, improved man-
icism relies on the evaluator’s ability to verbalize the agement, or improved quality. In a formal educational
features of the program, so those who do not have the setting, the fourth evaluation level could refer to
level of appreciation that the connoisseur has can fully assessing how students perform on the job after grad-
understand the program’s features. uation. Kirkpatrick has recommended the use of con-
The second approach is ethnographic evaluation, trol group comparisons to assess a program’s effec-
whose proponents believe can yield a more meaningful tiveness at these two higher levels, if at all possible.
picture of an educational program than would be pos-
sible using traditional scientific methods (Guba, 1978).
Ethnographic evaluators immerse themselves in the TECHNOLOGY EVALUATION
program they are studying by taking part in the day- APPROACHES
to-day activities of the individuals being studied. Their
data-gathering tools include field notes, key informant So far I have concentrated on models that are applicable
interviews, case histories, and surveys. Their goal is to a wide range of educational programs, whether or
to produce a rich description of the program and to not they might involve technology. Several frameworks
convey their appraisal of the program to the program have been proposed specifically to assess technology-
stakeholders. based learning, although none has been employed much
by researchers other than their developers. These
Kirkpatrick’s Four Levels frameworks tend to recommend areas in which evalu-
ators should focus their data collection, provide criteria
Although it is well established in the human resource against which technology-based learning could be
development community, Kirkpatrick’s (2001) four- judged, or provide questions for the evaluator to ask.
level model is less known in educational evaluation For example, Riel and Harasim (1994) proposed three
circles because it focuses on the evaluation of corpo- areas on which data collection might focus for the
rate training programs. I have placed it in a category evaluation of online discussion groups: the structure of
by itself because it has little in common with the other network environment, social interaction that occurs

608
Models and Methods for Evaluation

TABLE 45.1
CIAO! Framework
Context Interactions Outcomes
Rationale To evaluate technology, we need to Observing students and obtaining Being able to attribute learning outcomes to
know about its aims and the context process data help us to understand technology when it is one part of a multifaceted
of its use. why and how some element works course is very difficult. It is important to try to assess
in addition to whether or not it both cognitive and affective learning outcomes (e.g.,
works. changes in perceptions and attitudes).
Data Designers’ and course teams’ aims Records of student interactions Measures of learning
Policy documents and meeting Student diaries Changes in students’ attitudes and perceptions
records Online logs
Methods Interviews with technology program Observation Interviews
designers and course team members Diaries Questionnaires
Analysis of policy documents Video/audio and computer recording Tests

Source: Adapted from Scanlon, E. et al., Educ. Technol. Soc., 3(4), 101–107, 2000.

during the course or project, and the effects of the uated: the context dimension concerns how the tech-
experience on individuals. Bates and Poole’s (2003) nology fits within the course and where and how it is
SECTION model calls for the comparison of two or used; interactions refers to how students interact with
more online instructional delivery modes on the basis the technology and with each other; and outcomes deals
of the appropriateness of the technology for the targeted with how students change as a result of using the tech-
students, its ease of use and reliability, costs, teaching nology. The first row of the framework provides a brief
and learning factors, interactivity fostered by the tech- rationale for the need to evaluate each of the three
nology, organizational issues, novelty of the technol- dimensions. The second and third rows, respectively,
ogy, and how quickly courses can be mounted and highlight the kinds of data that should be collected for
updated. Ravitz (1998) suggested a framework that each dimension and the methods that should be
encourages the assessment of a project’s evolution employed for each. The authors point out that, while
through interactive discussion, continual recordkeep- the framework has proven to be very valuable in high-
ing, and documentation. Mandinach (2005) has given lighting areas in which evaluative data should be col-
evaluators a set of key questions to ask about an e-learn- lected, caution should be exercised in not applying the
ing program in three general areas: student learning, framework in an overly prescriptive manner.
pedagogical and intuitional issues, and broader policy Perhaps the most widely used criteria for evaluat-
issues. Finally, Baker and Herman (2003) have pro- ing teaching with technology in higher education are
posed an approach, which they call distributed evalu- the Seven Principles for Good Practice in Undergrad-
ation, to deal with large-scale, longitudinal evaluation uate Education, described in a seminal article by
of technology. They emphasize clarifying evaluation Chickering and Gamson (1987). Almost 10 years after
goals across all stakeholders, using a variety of quan- this article was published, Chickering and Ehrmann
titative and qualitative measures ranging from question- (1996) illustrated how the criteria, which were distilled
naires and informal classroom tests to standardized from decades of research on the undergraduate educa-
tests, designing lengthier studies so changes can be tion experience, could be adapted for information and
assessed over time, collecting data at the local level and communication technologies. Briefly, the criteria sug-
entering them into a systemwide repository, and pro- gest that faculty should:
viding feedback targeted at various audiences.
Of particular note because of its origins and com- • Encourage contact between students and the
prehensiveness is the context, interactions, attitudes, faculty.
and outcomes (CIAO!) framework developed by Scan- • Develop reciprocity and cooperation among
lon et al. (2000). The CIAO! framework represents a students.
culmination of some 25 years of technology evaluation • Encourage active learning.
experience of the authors at the Open University in the • Give prompt feedback.
United Kingdom. As shown in Table 45.1, the columns • Emphasize time on task.
in the framework represent three dimensions of the • Communicate high expectations.
technology-based learning program that must be eval- • Respect diverse talents and ways of learning.

609
Ron Owston

Graham and colleagues applied the criteria to IMPLICATIONS FOR THE


the evaluation of four online courses in a profes- EVALUATION OF TECHNOLOGY
sional school of a large midwestern American uni-
versity (Graham et al., 2000). The evaluation team What should be abundantly clear at this point is the
developed a list of “lessons learned” for online surfeit of evaluation approaches, criteria, and models.
instruction, aimed at improving the courses and Few experienced evaluators, however, pick one model
which correspond to the seven principles. Simi- and adhere to it for all of their work; they are more
larly, Cook et al. (2003a) applied the criteria to the likely to draw upon different aspects of several mod-
evaluation of a technology-enhanced undergradu- els. Worthen and Saunders (1987, p. 151) expressed
ate economics course. They used the principles as this well:
the basis of codes for the qualitative analysis of
open-ended student survey responses and assessed The value of alternative approaches lies in their capacity
the extent to which the criteria were exemplified in to help us think, to present and provoke new ideas and
the course. techniques, and to serve as mental checklists of things
we ought to consider, remember, or worry about. Their
Although the Seven Principles describe effective
heuristic value is very high; their prescriptive value seems
teaching from the faculty member’s perspective, the much less.
American Psychological Association has produced an
often-cited list of 14 principles that pertain to the Several implications can be drawn from this discussion
learner and the learning process (see http://www.apa. of models so far that will help in making decisions
org/ed/lcp2/lcp14.html). The learner-centered princi- about the design of technology-based program evalu-
ples are intended to deal holistically with learners in ations. These are summarized in Figure 45.1. First, we
the context of real-world learning situations; thus, must clarify why we are proposing an evaluation: Is it
they are best understood as an organized set of prin- to assess a blended learning course developed by a
ciples that influence the learner and learning with no faculty member who was given a course development
principle viewed in isolation. The 14 principles, grant? Is it to evaluate an elementary school laptop
which are grouped into four main categories, are as computer initiative? Is it being conducted because stu-
follows: dents are expressing dissatisfaction with an online
course? Is it to see how an online professional learning
• Cognitive and metacognitive (six princi- community facilitates pedagogical change? The pur-
ples): Nature of the learning process; goals pose of the evaluation will lead us to favor one
of the learning process; construction of approach over another; for example, in the case of the
knowledge; strategic thinking; thinking faculty member developing a course, the Seven Prin-
about thinking; context of learning ciples and/or the APA’s learner-centered principles
• Motivational and affective (three princi- may be good criteria to judge the course. The Seven
ples): Motivational and emotional influ- Principles may also be appropriate to guide the eval-
ences on learning; intrinsic motivation to uation of the course where there is student dissatisfac-
learn; effects of motivation on effort tion. On the other hand, in the case of the professional
• Developmental and social (two principles): program, Kirkpatrick’s model (or Guskey’s extension
Developmental influences on learning; of it) would direct us not only to examining teachers’
social influences on learning perceptions of and learnings in the community but also
• Individual difference factors (three princi- to studying the impact of the program on the classroom
ples): Individual differences in learning; practice. Table 45.2 provides additional guidance on
learning and diversity; standards and selecting a model from among the most widely used
assessment ones for six common program evaluation purposes.
Readers should exercise caution when interpreting the
Bonk and Cummings (1998) discussed how these table, as there are no hard and fast rules about what
principles are relevant for the design of online courses model to use for a given purpose. Rarely is one model
from a learner-centered perspective and for providing the only appropriate one to use in an evaluation; how-
a framework for the benefits, implications, problems, ever, more often than not some models are better than
and solutions of online instruction. By implication, the others for a particular study.
APA principles could serve as criteria to guide the We next have to give careful thought about who
evaluation of the effectiveness of technology-based the intended audiences of the evaluation report are and
learning environments. should plan on providing those individuals with the

610
Models and Methods for Evaluation

Purpose of Evaluation?

Needs of Audience?

Evaluation Design?

Experimental Qualitative

Random Non-Random
Assignment Assignment

Data Sources?

Dissemination Strategies?

Figure 45.1 Decisions for designing an evaluation study.

TABLE 45.2
Evaluation Models Best Suited for Particular Evaluation Purposes
Primary Purpose of Evaluation
Meeting
Attainment of Development Information
the Program’s Accreditation of Theory Needs of Overall
Goals and Program of the about Diverse Impact of
Evaluation Model Objectives Improvement Program Intervention Audiences Program
Goal-based (Tyler, 1942) X X
Goal-free evaluation (Scriven, 1972) X X X
Theory-based (Weiss, 1972) X X X X
Context, input, process, and product X X X
(CIPP) (Stufflebeam, 1973)
Utilization-focused (Patton, 1978) X
Responsive (Stake, 1975) X X X
Connoisseurship (Eisner, 1979) X
Ethnographic (Guba, 1978) X X X X
Multilevel (Guskey, 2000; X X X
Kirkpatrick, 2001)
CIAO! framework (Scanlon et al., X X X
2000)
Seven principles of good practice in X X
undergraduate education
(Chickering and Ehrmann, 1996)

kinds of data needed to take appropriate action. Recall setting, the decision makers or stakeholders might be a
Stufflebeam’s statement that the purpose of evaluation faculty member who is teaching an online course, a cur-
is to present options to decision makers. In a university riculum committee, a technology roundtable, a faculty

611
Ron Owston

council, or senior academic administrators. The stake- precise question that will address the incremental
holders in a school setting could be a combination of impact of technology within a more global experience
parents, teachers, a school council, and the district of technology use. The authors illustrate, for example,
superintendent. The challenge to the evaluator, there- how a study could be designed around a narrower
fore, is to identify these audiences and then find out question: “What effect does Internet research have on
what their expectations are for the evaluation and the student learning?” (p. 19). Rather than simply com-
kind of information they seek about the program. Pat- paring students who do research on the Internet with
ton, Cronback, and Stake all emphasized the critical those who do not, they created a factorial design in
importance of this stage. The process may involve which the presence or absence of Internet research is
face-to-face meetings with the different stakeholders, linked to whether teachers do or do not instruct stu-
telephone interviews, or brief surveys. Because con- dents on best practices for Internet research. The result
sensus in expectations is unlikely to be found, the is four experimental conditions: best practice with
evaluator will have to make judgments about the rel- Internet, best practice without Internet, typical Internet
ative importance of each stakeholder and whose infor- practice, and a control group whose teacher neither
mation should be given priority. encourages nor discourages students from doing Inter-
With the expectations and information needs in net research. The authors’ recommendation echoes that
hand, the study now must be planned. We saw from offered by Carol Weiss some time ago when she made
Scriven’s perspective that all program outcomes the point that the control group does not necessarily
should be examined whether or not they are stated as have to receive no treatment at all; it can receive a
objectives. My experience has taught me not only to lesser version of the treatment program (Weiss, 1972).
assess the accomplishment of program objectives, as This advice is particularly relevant when speaking of
this is typically what stakeholders want done, but also technology, as it is commonly used by students today
to seek data on unintended outcomes, whether positive either in classrooms or outside of school, so to expect
or negative, as they can lead to insights one might that the control group contains students who do not
otherwise have missed. use technology would be unrealistic.
A problem that Cook et al. (2003b) mention only
Design of Study in passing is that of sample size and units of analysis—
key considerations in an experimental study. In a report
Next the evaluator must decide upon the actual design commissioned by the U.S. Institute of Education Sci-
of the study. A major decision has to be made about ences, Agodini et al. (2003) analyzed these issues
whether to embark on an experimental design involving when developing specifications for a national study on
a comparison group or a non-experimental design. The the effectiveness of technology applications on student
information needs of the stakeholders should determine achievement in mathematics and reading. The authors
the path to follow (Patton, 1978; Stake, 1975). If the concluded that an effect size of 0.35 would be a rea-
stakeholders seek proof that a technology-based pro- sonable minimum goal for such a study because pre-
gram works, then an experimental design is what is vious studies of technology have detected effects of
likely required. The What Works Clearinghouse estab- this size, and it was judged to be sufficiently large to
lished by the U.S. Department of Education’s Institute close the achievement gaps between various segments
of Education Sciences holds experimental designs as of the student population. An effect size of 0.35 means
the epitome of “scientific evidence” for determining the that the effect of the treatment is 35% larger than the
effectiveness of educational interventions (http://www. standard deviation of the outcome measure being con-
w-w-c.org). On the other hand, if the stakeholders seek sidered. To achieve this effect size would require the
information on how to improve a program, then non- following number of students under the given condi-
experimental or qualitative approaches may be appro- tions of random assignment:
priate. Some even argue that defining a placebo and
treatment does not make sense given the nature of edu- • Students randomly assigned to treatments
cation; hence, accumulation of evidence over time and would require 10 classrooms with 20 stu-
qualitative studies are a more meaningful means of dents in each (total of 200 students).
determining what works (Olson, 2004). • Classrooms randomly assigned to treat-
If a decision is made to conduct a randomized ments would require 30 classrooms with 20
experimental study, Cook et al. (2003b) offer some students in each (total of 600 students) for a
helpful advice. They suggest that, rather than asking a study of the effects of technology on reading
broad question such as, “Do computers enhance learn- achievement; however, 40 classrooms with
ing?” (p. 18), the evaluator should formulate a more 20 students (total of 800 students) would be

612
Models and Methods for Evaluation

required for mathematics because of statis- scores on other dependent measures, individual and
tical considerations on the way mathematics focus group interviews of students and teachers, Web-
scores cluster. based survey data, relevant program documents, and
• Schools randomly assigned to treatments classroom observation. The use of multiple data
would require 29 schools with 20 students sources is standard practice in qualitative evaluation,
in each (total of 1160 students). as the need to triangulate observations is essential (Pat-
ton, 2002). In experimental studies, other qualitative
The first condition of random assignment of students and quantitative data sources may be used to help
to treatment is not likely a very feasible option in most explain and interpret observed differences on depen-
schools, so the evaluator is left with the choice of dent measures.
random assignment to classrooms or to schools, both Log files generated by Web servers are a relatively
of which would require many more students. The result new source of data that can be used to triangulate
is that an evaluation of technology using an experi- findings from surveys and interviews when the tech-
mental design would likely be a fairly costly under- nology being evaluated is Web based. These files con-
taking if these guidelines are followed. tain a record of communication between a Web
Unfortunately, even random assignment to class- browser and a Web server in text-based form. The files
rooms or schools may be problematic; therefore, the vary slightly depending on the type of server, but most
evaluator is left with having to compare intact classes, Web servers record the following information:
a design that is weak (Campbell et al., 1966). Finding
teachers or students from an intact class to act as a • Address of the computer requesting a file
comparison group is difficult. Even if their cooperation • Date and time of the request
is obtained, so many possible competing hypotheses • Web address of the file requested
could explain any differences found between experi- • Method used for the requested file
mental and comparison groups (e.g., the comparison • Return code from the Web server that spec-
group may have an exceptional teacher or the students ifies if the request was successful or failed
in the experimental group may be more motivated) that and why
they undermine the validity of the findings. • Size of the file requested
When the goal of the study is program improve-
ment rather than proving the program works, qualita- Web server log files do not reveal or record the content
tive approaches such as those of Stake and of Guba of a Web browser request—only the fact that a request
described earlier in this chapter are particularly appro- was made. Because each Web page has a distinct
priate. Owston (2000) argued that the mixing of both address, it is possible to determine that a user viewed
qualitative and quantitative methods shows stronger a particular page. Log files grow to be exceedingly
potential for capturing and understanding the richness large and are often discarded by system administrators;
and complexity of e-learning environments than if however, evaluators can analyze the files using com-
either approach is used solely. Although some meth- mercial tools such as WebTrends Log Analyzer
odologists may argue against mixing research para- (http://www.webtrends.com) or freeware tools such as
digms, I take a more pragmatic stance that stresses the AWStats (http://awstats.sourceforge.net). Output from
importance and predominance of the research ques- the tools can be in tabular or graphical format (see
tions over the paradigm. This approach frees the eval- Figure 45.2 for sample output). The tools can be used
uator to choose whatever methods are most appropriate by the evaluator to answer questions such as what time
to answer the questions once they are articulated. Ulti- of day or week users were accessing the system, how
mately, as Feuer et al. (2002) pointed out, “No method long they were logged into the system, what pages they
is good, bad, scientific, or unscientific in itself; rather, viewed, and what paths they followed through the web-
it is the appropriate application of method to a partic- site. Figure 45.2 is typical of the graphical output that
ular problem that enables judgments about scientific may be obtained on the average number of users vis-
quality.” iting a website per day of the week.
The author and his colleagues have used log file
Data Sources and Analysis analysis successfully in several technology evalua-
tion studies. In one study, Wideman et al. (1998)
When the basic design of the study is developed, the found that students in a focus group said they made
next decision will be to determine the evaluation data frequent use of a simulation routine in an online
sources. Generally, the best strategy is to use as many course, but the log files revealed that the routine was
different sources as practical, such as test scores or seldom used. In another study, Cook et al. (2003a)

613
Ron Owston

Activity Level by Day of Week

60 1 Sun

50 2 Mon

User Sessions
40 3 Tue
30 4 Wed
20 5 Thu
10 6 Fri
0 7 Sat
1 2 3 4 5 6 7
Weekdays

Figure 45.2 Sample output from log file analysis.

were able to correlate student access to a university data, ranging from user activity logs, online demo-
course website to final course grades to obtain an indi- graphic questionnaire responses, and data from auto-
cator of how helpful the site was to students. The matically triggered pop-up questions (see example in
researchers were able to obtain these data because the Figure 45.3) to the results of queries designed to auto-
website required students to log in, and a record of matically appear at key points when users interact with
each log-in appeared in the log file which could be the application. Another feature of VULab is its capa-
matched to the student grades. Log-file analysis has bility to record the screens and voice conversations of
some limitations (Haigh and Megarity, 1998), but we remote users and store the files on the VULab server
found that it provided more and better quality data than without the need to install special software on the
are generated by, for example, the course management users’ computers. The data that are collected are stored
system WebCT (http://www.webct.com). in an integrated database system, allowing for subse-
Another tool developed by the author and his col- quent data mining and ad hoc querying of the data by
leagues to aid in the evaluation of technology-based researchers. VULab also allows for ease of use for
learning is the Virtual Usability Lab (VULab) (Owston researchers in setting up the parameters for studies and
et al., 2005). VULab was originally developed for edu- automatically monitoring users whether they are inter-
cational game research, but it is applicable to any Web- acting with computers locally or are scattered across
based learning research where the learner’s computer the Internet. Owston et al. (2005) reported on how
is connected to the Internet. The tool allows for the VULab was used to record student discussions when
automated integration of a wide range of sources of they were filling out an online questionnaire after play-

Figure 45.3 Screen shot of VULab.

614
Models and Methods for Evaluation

Supportive Plans and Policies Support from Outside School

C C

Funding C Sustainability of Innovation C Support within School

C E

Innovation Champions E Administrative Support

Teacher Support

E E E

Teacher Profession Development Student Support Perceived Value of Innovation

Figure 45.4 Essential (E) and contributing (C) factors to the sustainability of innovative use of technology in the classroom. (Adapted
from Owston, R.D., J. Educ. Change, 8(1), 61–77, 2007.)

ing an online game. The students were asked on the In Atlas.ti, these files are coded the same way as textual
questionnaire whether or not they enjoyed playing the files; in NVivo, the files cannot be directly imported
game, and a rich discussion of several minutes’ dura- but coding of external video and audio files can be
tion ensued among a small group of students playing done. If a project involves only audio or video, the best
the game at one computer. When it came time to enter strategy may be to use Transana (http://transana.org)
their responses into the questionnaire form, they sim- which is a free, open-source tool designed for the
ply entered “yes”; thus, valuable user feedback would analysis of these kinds of files. A helpful feature of
have been lost if it had not been for the VULab record- Transana is that while audio or video files are being
ing. The tool also proved useful for identifying role played a typist can transcribe the voices directly into
playing among the groups of students playing the a separate window within the application.
game, intra-group competition and collaboration, and An excellent website maintained by the Computer-
pinpointing technical problems within the game itself. Assisted Qualitative Data Analysis (CAQDAS) Net-
Frequently, evaluations involve collecting large working Project (see http://caqdas.soc.surrey.ac.uk/) in
quantities of qualitative data, such as interview tran- the United Kingdom provides independent academic
scripts, open-ended responses to questionnaires, dia- comparisons of popular qualitative data analysis tools
ries, field notes, program documents, and minutes of and as well as other helpful resources and announce-
meetings. Managing and analyzing these files can be ments. Those new to computerized analysis of quali-
simplified using qualitative data analysis (QDA) soft- tative data are well advised to visit this website for
ware tools. Two of the most popular QDA tools are guidance in selecting the most appropriate tool to use
Atlas.ti (http://atlasti.com/) and NVivo (http://www. in an evaluation.
qsrinternational.com/). These tools do not perform the
analysis, but they help in the coding and interpretation Dissemination
of the data. Both of these tools also have a feature that
allows researchers to visually map relationships A final issue that needs addressing is the dissemina-
between codes that may lead to theory development; tion of evaluation findings. The American Evaluation
for example, Owston (2007) studied factors that con- Association’s Guiding Principles for Evaluators (see
tribute to the sustainability of innovative classroom use http://www.eval.org/Publications/GuidingPrinciples.
of technology. Using Atlas.ti, he mapped the relation- asp) provides valuable advice to evaluators who are
ships among codes and developed a model (see Figure disseminating their results. Evaluators should com-
45.4) that helps explain why teachers are likely to municate their methods and approaches accurately
sustain innovative pedagogical practices using technol- and in sufficient detail to allow others to understand,
ogy. Atlas.ti allows the importing of audio and video interpret, and critique their work. They should make
files as well as textual files, whereas NVivo does not. clear the limitations of an evaluation and its results.

615
Ron Owston

Evaluators should discuss in a contextually appropriate Baker, E. L. and Herman, J. L. (2003). Technology and evalu-
way those values, assumptions, theories, methods, ation. In Evaluating Educational Technology: Effective
Research Designs for Improving Learning, edited by G.
results, and analyses significantly affecting the inter- Haertel and B. Means, pp. 133–168. New York: Teachers
pretation of the evaluative findings. These statements College Press.*
apply to all aspects of the evaluation, from its initial Bates, A. and Poole, G. (2003). Effective Teaching with Tech-
conceptualization to the eventual use of findings. nology in Higher Education. San Francisco, CA: Jossey-
Beyond this, the final report should contain no Bass.
Bickman, L. (1987). The functions of program theory. In Using
surprises for the stakeholders if evaluators are doing Program Theory in Evaluation: New Directions for Program
their job properly. That means that there should be an Evaluation, Vol. 33, edited by L. Bickman, pp. 5–18. San
ongoing dialog between the evaluators and stakehold- Francisco, CA: Jossey-Bass.*
ers, including formal and informal progress reports. Bonk, C. J. and Cummings, J. A. (1998). A dozen recommen-
This allows for the stakeholders to make adjustments dations for placing the student at the centre of Web-based
learning. Educ. Media Int., 35(2), 82–89.
to the program while it is in progress. At the same Bonk, C. J., Wisher, R. A., and Lee, J. (2003). Moderating
time, it is a way of gradually breaking news to the learner-centered e-learning: problems and solutions, benefits
stakeholders if it looks as though serious problems are and implications. In Online Collaborative Learning: Theory
occurring with the program. Surprising stakeholders and Practice, edited by T. S. Roberts, pp. 54–85. Hershey,
at the end of a project with bad news is one way to PA: Idea Group Publishing.
Campbell, D. T., Stanley, J. C., and Gage, N. L. (1966). Exper-
ensure that the evaluation report will be buried and imental and Quasi-Experimental Designs for Research. Chi-
never seen again! All the evaluation models reviewed cago, IL: Rand McNally.*
in this chapter encourage, to varying degrees, contin- Chickering, A. and Ehrmann, S. C. (1996). Implementing the
uous dialog between evaluators and stakeholders for Seven Principles: Technology As Lever, http://www.tltgroup.
these reasons. The end result should be that the eval- org/Seven/Home.htm.
Chickering, A. and Gamson, Z. (1987). Seven principles of good
uation report is used and its recommendations or impli- practice in undergraduate education. AAHE Bull., 39, 3–7
cations are given due consideration. (http://www.tltgroup.org/Seven/Home.htm).
Cook, K., Cohen, A. J., and Owston, R. D. (2003a). If You Build
It, Will They Come? Students’ Use of and Attitudes towards
Distributed Learning Enhancements in an Introductory Lec-
CONCLUSIONS ture Course, Institute for Research on Learning Technolo-
gies Technical Report 2003-1. Toronto: York University
The challenge facing evaluators of technology-based (http://www.yorku.ca/irlt/reports.html).
programs is to design studies that can provide the Cook, T. D. (2000). The false choice between theory-based
feedback needed to enhance their design or to provide evaluation and experimentation. New Direct. Eval. Chal-
evidence on their effectiveness. Evaluators need to lenges Oppor. Program Theory Eval., 87, 27–34.
Cook, T. D., Means, B., Haertel, G., and Michalchik, V. (2003b).
look broadly across the field of program evaluation The case for using randomized experiments in research on
theory to help discern the critical elements required newer educational technologies: a critique of the objections
for a successful evaluation undertaking. These include raised and alternatives. In Evaluating Educational Technol-
attention to aspects such as the audience of the report ogy: Effective Research Designs for Improving Learning,
and their information needs, deciding to what extent edited by G. Haertel and B. Means. New York: Teachers
College Press.
the study will be influenced by stated objectives, Cronbach, L. J. (1980). Toward Reform of Program Evaluation.
whether a comparative design will be used, and if San Francisco, CA: Jossey-Bass.*
quantitative, qualitative, or a combination of methods Eisner, E. W. (1979). The Educational Imagination: On the
will be brought into play. The study should also be Design and Evaluation of School Programs. New York: Mac-
guided by the criteria and approaches developed for millan.*
Feuer, M. J., Towne, L., and Shavelson, R. J. (2002). Scientific
or applicable to the evaluation of e-learning. When culture and educational research. Educ. Res., 31, 4–14.
these steps are taken, evaluators will be well on their Graham, C., Cagiltay, K., Craner, J., Lim, B., and Duffy, T. M.
way to devising studies that will be able to answer (2000). Teaching in a Web-Based Distance Learning Envi-
some of the pressing issues facing teaching and learn- ronment: An Evaluation Summary Based on Four Courses,
ing with technology. Center for Research on Learning and Technology Technical
Report No. 13-00. Bloomington: Indiana University (http://
crlt.indiana.edu/publications/crlt00-13.pdf).
REFERENCES Guba, E. G. (1978). Toward a Method of Naturalistic Inquiry
in Educational Evaluation, Center for the Study of Evalua-
Agodini, R., Dynarski, M., Honey, M., and Levin, D. (2003). tion Monograph Series No. 8. Los Angeles: University of
The Effectiveness of Educational Technology: Issues and California at Los Angeles.*
Recommendations for the National Study, Draft. Washing- Guskey, T. R. (2000). Evaluating Professional Development.
ton, D.C.: U.S. Department of Education. Thousand Oaks, CA: Corwin Press.

616
Models and Methods for Evaluation

Haigh, S. and Megarity, J. (1998). Measuring Web Site Usage: Riel, M. and Harasim, L. (1994). Research perspectives on
Log File Analysis. Ottawa, ON: National Library of Canada network learning. Machine-Mediated Learning, 4(2/3),
(http://www.collectionscanada.ca/9/1/p1-256-e.html). 91–113.
Kirkpatrick, D. L. (2001). Evaluating Training Programs: The Rogers, P. J., Hacsi, T. A., Petrosino, A., and Huebner, T. A.,
Four Levels, 2 ed. San Francisco, CA: Berrett-Koehler.* Eds. (2000). Program Theory in Evaluation Challenges and
Mandinach, E. B. (2005). The development of effective evalu- Opportunities: New Directions for Evaluation, No. 87. San
ation methods for e-learning: a concept paper and action Francisco, CA: Jossey-Bass.
plan. Teachers Coll. Rec., 107(8), 1814–1835. Scanlon, E., Jones, A., Barnard, J., Thompson, J., and Calder,
Olson, D. R. (2004). The triumph of hope over experience in J. (2000). Evaluating information and communication tech-
the search for ‘what works’: a response to Slavin. Educ. nologies for learning. Educ. Technol. Soc., 3(4), 101–107.
Res., 33(1), 24–26. Scriven, M. (1972). Pros and cons about goal free evaluation.
Owston, R. D. (2000). Evaluating Web-based learning environ- Eval. Comm., 3(4), 1–7.*
ments: strategies and insights. CyberPsychol. Behav., 3(1), Stake, R. E. (1975). Evaluating the Arts in Education: A
79–87.* Responsive Approach. Columbus, OH: Merrill.*
Owston, R. D. (2007). Contextual factors that sustain innovative Suchman, E. (1967). Evaluative Research: Principles and Prac-
pedagogical practice using technology: an international tice in Public Service and Social Action Programs. New
study. J. Educ. Change, 8(1), 61–77. York: Russell Sage Foundation.
Owston, R. D. and Wideman, H. H. (1999). Internet-Based Stufflebeam, D. L. (1973). An introduction to the PDK book:
Courses at Atkinson College: An Initial Assessment, Centre educational evaluation and decision-making. In Educational
for the Study of Computers in Education Technical Report Evaluation: Theory and Practice, edited by B. L. Worthern
No. 99-1. Toronto: York University (http://www.yorku.ca/ and J. R. Sanders, pp. 128–142. Belmont, CA: Wadsworth.*
irlt/reports.html). Tyler, R. W. (1942). General statement on evaluation. J. Educ.
Owston, R. D., Kushniruk, A., Ho, F., Pitts, K., and Wideman, Res., 35, 492–501.
H. (2005). Improving the design of Web-based games and Weiss, C. H. (1972). Evaluation Research: Methods for Assessing
simulations through usability research. In Proceedings of the Program Effectiveness. Englewood Cliffs, NJ: Prentice Hall.*
ED-MEDIA 2005: World Conference on Educational, Mul- Wideman, H. H., Owston, R. D., and Quann, V. (1998). A
timedia, Hypermedia, and Telecommunications, June Formative Evaluation of the VITAL Tutorial ‘Introduction to
29–July 1, Montreal, Canada, pp. 1162–1167. Computer Science,’ Centre for the Study of Computers in
Patton, M. Q. (1978). Utilization-Focused Evaluation. Beverly Education Technical Report No. 98-1. Toronto: York Uni-
Hills, CA: SAGE.* versity (http://www.yorku.ca/irlt/reports.html).
Patton, M. Q. (2002). Qualitative Evaluation and Research Worthen, B. L. and Sanders, J. R. (1987). Educational Evalu-
Methods, 3rd ed. Thousand Oaks, CA: SAGE. ation: Alternative Approaches and Practical Guidelines.
Ravitz, J. (1998). Evaluating learning networks: a special chal- New York: Longman.*
lenge for Web-based instruction. In Web-Based Instruction,
edited by B. Khan, pp. 361–368. Englewood Cliffs, NJ:
Educational Technology Publications. * Indicates a core reference.

617

You might also like