Mixed Methods Evaluation NSF
Mixed Methods Evaluation NSF
The Foundation provides awards for research in the sciences and engineering. The awardee is wholly responsible for the conduct of such research and the preparation of the results for publication. The Foundation, therefore, does not assume responsibility for the research findings or their interpretation. The Foundation welcomes proposals from all qualified scientists and engineers, and strongly encourages women, minorities, and persons with disabilities to compete fully in any of the research and related programs described here. In accordance with federal statutes, regulations, and NSF policies, no person on grounds of race, color, age, sex, national origin, or disability shall be excluded from participation in, denied the benefits of, or be subject to discrimination under any program or activity receiving financial assistance from the National Science Foundation. Facilitation Awards for Scientists and Engineers with Disabilities (FASED) provide funding for special assistance or equipment to enable persons with disabilities (investigators and other staff, including student research assistants) to work on an NSF project. See the program announcement or contact the program coordinator at (703) 306-1636. Privacy Act and Public Burden The information requested on proposal forms is solicited under the authority of the National Science Foundation Act of 1950, as amended. It will be used in connection with the selection of qualified proposals and may be disclosed to qualified reviewers and staff assistants as part of the review process; to applicant institutions/grantees; to provide or obtain data regarding the application review process, award decisions, or the administration of awards; to government contractors, experts, volunteers, and researchers as necessary to complete assigned work; and to other government agencies in order to coordinate programs. See Systems of Records, NSF-50, Principal Investigators/Proposal File and Associated Records, and NSF-51, 60 Federal Register 4449 (January 23, 1995). Reviewer/Proposal File and Associated Records, 59 Federal Register 8031 (February 17 , 194). Submission of the information is voluntary. Failure to provide full and complete information, however, may reduce the possibility of your receiving an award. Public reporting burden for this collection of information is estimated to average 120 hours per response, including the time for reviewing instructions. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Herman G. Fleming, Reports Clearance Officer, Contracts, Policy, and Oversight, National Science Foundation, 4201 Wilson Boulevard, Arlington, VA 22230. The National Science Foundation has TTD (Telephonic Device for the Deaf) capability, which enables individuals with hearing impairment to communication with the Foundation about NSF programs, employment, or general information. This number is (703) 306-0090.
ACKNOWLEDGMENTS
Appreciation is expressed to our external advisory panel Dr. Frances Lawrenz, Dr. Jennifer Greene, Dr. Mary Ann Millsap, and Dr. Steve Dietz for their comprehensive reviews of this document and their helpful suggestions. We also appreciate the direction provided by Dr. Conrad Katzenmeyer and Mr. James Dietz of the Division of Research, Evaluation and Communication.
August 1997
This handbook was developed with support from the National Science Foundation RED 94-52965.
Table of Contents
Part I. 1
Introduction to Mixed Method Evaluations Introducing This Handbook ............................................................................. (Laure Sharp and Joy Frechtling) The Need for a Handbook on Designing and Conducting Mixed Method Evaluations.......................................................................................... Key Concepts and Assumptions.......................................................................
Page 1-1
Illustration: A Hypothetical Project ................................................................ (Laure Sharp) Project Title...................................................................................................... Project Description........................................................................................... Project Goals as Stated in the Grant Application to NSF ................................ Overview of the Evaluation Plan .....................................................................
Part II. 3
Overview of Qualitative Methods and Analytic Techniques Common Qualitative Methods ......................................................................... (Colleen Mahoney) Observations..................................................................................................... Interviews ......................................................................................................... Focus Groups.................................................................................................... Other Qualitative Methods............................................................................... Appendix A: Sample Observation Instrument ................................................ Appendix B: Sample Indepth Interview Guide ............................................... Appendix C: Sample Focus Group Topic Guide ............................................ 3-1
Analyzing Qualitative Data.............................................................................. (Suzanne Berkowitz) What Is Qualitative Analysis?.......................................................................... Processes in Qualitative Analysis .................................................................... Summary: Judging the Quality of Qualitative Analysis.................................. Practical Advice in Conducting Qualitative Analyses .....................................
Part III. 5
Designing and Reporting Mixed Method Evaluations Overview of the Design Process for Mixed Method Evaluation ..................... (Laure Sharp and Joy Frechtling) Developing Evaluation Questions.................................................................... Selecting Methods for Gathering the Data: The Case for Mixed Method Designs ................................................................................................ Other Considerations in Designing Mixed Method Evaluations .....................
Page 5-1
Evaluation Design for the Hypothetical Project .............................................. (Laure Sharp) Step 1. Develop Evaluation Questions............................................................ Step 2. Determine Appropriate Data Sources and Data Collection Approaches to Obtain Answers to the Final Set of Evaluation Questions............................................................................................. Step 3. Reality Testing and Design Modifications: Staff Needs, Costs, Time Frame Within Which All Tasks (Data Collection, Data Analysis, and Reporting Writing) Must Be Completed ......................
6-1
6-3
6-8 7-1
Reporting the Results of Mixed Method Evaluations...................................... (Gary Silverstein and Laure Sharp) Ascertaining the Interests and Needs of the Audience..................................... Organizing and Consolidating the Final Report............................................... Formulating Sound Conclusions and Recommendations................................. Maintaining Confidentiality ............................................................................. Tips for Writing Good Evaluation Reports......................................................
Part IV. 8 9
Exhibit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Common techniques ......................................................................................... Example of a mixed method design ................................................................. Advantages and disadvantages of observations ............................................... Types of data for which observations are a good source ................................ Advantages and disadvantages of indepth interviews...................................... Considerations in conducting indepth interviews and focus groups................ Which to use: Focus groups or indepth interviews? ....................................... Advantages and disadvantages of document studies........................................ Advantages and disadvantages of using key informants.................................. Data matrix for Campus A: What was done to share knowledge ................... Participants views of information sharing at eight campuses......................... Matrix of cross-case analysis linking implementation and outcome factors ... Goals, stakeholders, and evaluation questions for a formative evaluation ...... Goals, stakeholders, and evaluation questions for a summative evaluation .... Evaluation questions, data sources, and data collection methods for a formative evaluation......................................................................................... Evaluation questions, data sources, and data collection methods for a summative evaluation....................................................................................... First data collection plan .................................................................................. Final data collection plan ................................................................................. Matrix of stakeholders...................................................................................... Example of an evaluation/methodology matrix ...............................................
Page 1-4 1-9 3-2 3-3 3-7 3-8 3-11 3-14 3-15 4-7 4-9 4-17 6-2 6-3 6-5 6-6 6-7 6-9 7-3 7-6
The Need for a Handbook on Designing and Conducting Mixed Method Evaluations
Evaluation of the progress and effectiveness of projects funded by the National Science Foundations (NSF) Directorate for Education and Human Resources (EHR) has become increasingly important. Project staff, participants, local stakeholders, and decisionmakers need to know how funded projects are contributing to knowledge and understanding of mathematics, science, and technology. To do so, some simple but critical questions must be addressed: What are we finding out about teaching and learning? How can we apply our new knowledge? Where are the dead ends? What are the next steps?
Although there are many excellent textbooks, manuals, and guides dealing with evaluation, few are geared to the needs of the EHR grantee who may be an experienced researcher but a novice evaluator. One of the ways that EHR seeks to fill this gap is by the publication of what have been called user-friendly handbooks for project evaluation. The first publication, User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering and Technology Education, issued in 1993, describes the types of evaluations principal investigators/project directors (PIs/PDs) may be called upon to perform over the lifetime of a project. It also describes in some detail the evaluation process, which includes the development of evaluation questions and the collection and analysis of appropriate data to provide answers to these questions. Although this first handbook discussed both qualitative and quantitative methods, it
1-1
covered techniques that produce numbers (quantitative data) in greater detail. This approach was chosen because decisionmakers usually demand quantitative (statistically documented) evidence of results. Indicators that are often selected to document outcomes include percentage of targeted populations participating in mathematics and science courses, test scores, and percentage of targeted populations selecting careers in the mathematics and science fields. The current handbook, User-Friendly Guide to Mixed Method Evaluations, builds on the first but seeks to introduce a broader perspective. It was initiated because of the recognition that by focusing primarily on quantitative techniques, evaluators may miss important parts of a story. Experienced evaluators have found that most often the best results are achieved through the use of mixed method evaluations, which combine quantitative and qualitative techniques. Because the earlier handbook did not include an indepth discussion of the collection and analysis of qualitative data, this handbook was initiated to provide more information on qualitative techniques and discuss how they can be combined effectively with quantitative measures. Like the earlier publication, this handbook is aimed at users who need practical rather than technically sophisticated advice about evaluation methodology. The main objective is to make PIs and PDs "evaluation smart" and to provide the knowledge needed for planning and managing useful evaluations.
Like the earlier publication, this handbook is aimed at users who need practical rather than technically sophisticated advice about evaluation methodology.
Formative evaluations (which include implementation and process evaluations) address the first set of issues. They examine the development of the project and may lead to changes in the way the project is structured and carried out. Questions typically asked include:
1-2
To what extent do the activities and strategies match those described in the plan? If they do not match, are the changes in the activities justified and described? To what extent were the activities conducted according to the proposed timeline? By the appropriate personnel? To what extent are the actual costs of project implementation in line with initial budget expectations? To what extent are the participants moving toward the anticipated goals of the project? Which of the activities or strategies are aiding the participants to move toward the goals? What barriers were encountered? How and to what extent were they overcome?
Summative evaluations (also called outcome or impact evaluations) address the second set of issues. They look at what a project has actually accomplished in terms of its stated goals. Summative evaluation questions include: To what extent did the project meet its overall goals? Was the project equally effective for all participants? What components were the most effective? What significant unintended impacts did the project have? Is the project replicable and transportable?
For each of these questions, both quantitative data (data expressed in numbers) and qualitative data (data expressed in narratives or words) can be useful in a variety of ways. The remainder of this chapter provides some background on the differing and complementary nature of quantitative and qualitative evaluation methodologies. The aim is to provide an overview of the advantages and disadvantages of each, as well as an idea of some of the more controversial issues concerning their use. Before doing so, however, it is important to stress that there are many ways of performing project evaluations, and that there is no recipe or formula that is best for every case. Quantitative and qualitative methods each have advantages and drawbacks when it comes to an evaluation's design, implementation, findings, conclusions, and
1-3
utilization. The challenge is to find a judicious balance in any particular situation. According to Cronbach (1982), There is no single best plan for an evaluation, not even for an inquiry into a particular program at a particular time, with a particular budget.
What Are the Major Differences Between Quantitative and Qualitative Techniques? As shown in Exhibit 1, quantitative and qualitative measures are characterized by different techniques for data collection.
Exhibit 1. Common techniques Quantitative Qualitative
Aside from the most obvious distinction between numbers and words, the conventional wisdom among evaluators is that qualitative and quantitative methods have different strengths, weaknesses, and requirements that will affect evaluators decisions about which methodologies are best suited for their purposes. The issues to be considered can be classified as being primarily theoretical or practical. Theoretical issues. Most often, these center on one of three topics:
Quantitative and qualitative techniques provide a tradeoff between breadth and depth.
The value of the types of data; The relative scientific rigor of the data; or Basic, underlying philosophies of evaluation.
Value of the data. Quantitative and qualitative techniques provide a tradeoff between breadth and depth and between generalizability and targeting to specific (sometimes very limited) populations. For example, a sample survey of high school students who participated in a special science enrichment program (a quantitative technique) can yield representative and broadly generalizable information about the proportion of participants who plan to major in science when they get to college and how this proportion differs by gender. But at best, the survey can elicit only a few, often superficial reasons for this gender difference. On the other hand, separate focus groups (a qualitative
1-4
technique) conducted with small groups of male and female students will provide many more clues about gender differences in the choice of science majors and the extent to which the special science program changed or reinforced attitudes. But this technique may be limited in the extent to which findings apply beyond the specific individuals included in the focus groups. Scientific rigor. Data collected through quantitative methods are often believed to yield more objective and accurate information because they were collected using standardized methods, can be replicated, and, unlike qualitative data, can be analyzed using sophisticated statistical techniques. In line with these arguments, traditional wisdom has held that qualitative methods are most suitable for formative evaluations, whereas summative evaluations require "hard" (quantitative) measures to judge the ultimate value of the project. This distinction is too simplistic. Both approaches may or may not satisfy the canons of scientific rigor. Quantitative researchers are becoming increasingly aware that some of their data may not be accurate and valid, because some survey respondents may not understand the meaning of questions to which they respond, and because peoples recall of even recent events is often faulty. On the other hand, qualitative researchers have developed better techniques for classifying and analyzing large bodies of descriptive data. It is also increasingly recognized that all data collectionquantitative and qualitativeoperates within a cultural context and is affected to some extent by the perceptions and beliefs of investigators and data collectors. Philosophical distinction. Some researchers and scholars differ about the respective merits of the two approaches largely because of different views about the nature of knowledge and how knowledge is best acquired. Many qualitative researchers argue that there is no objective social reality, and that all knowledge is "constructed" by observers who are the product of traditions, beliefs, and the social and political environment within which they operate. And while quantitative researchers no longer believe that their research methods yield absolute and objective truth, they continue to adhere to the scientific model and seek to develop increasingly sophisticated techniques and statistical tools to improve the measurement of social phenomena. The qualitative approach emphasizes the importance of understanding the context in which events and outcomes occur, whereas quantitative researchers seek to control the context by using random assignment and multivariate analyses. Similarly, qualitative researchers believe that the study of deviant cases provides important insights for the interpretation of findings; quantitative researchers tend to ignore the small number of deviant and extreme cases.
1-5
This distinction affects the nature of research designs. According to its most orthodox practitioners, qualitative research does not start with narrowly specified evaluation questions; instead, specific questions are formulated after open-ended field research has been completed (Lofland and Lofland, 1995). This approach may be difficult for program and project evaluators to adopt, since specific questions about the effectiveness of interventions being evaluated are usually expected to guide the evaluation. Some researchers have suggested that a distinction be made between Qualitative and qualitative work: Qualitative work (large Q) refers to methods that eschew prior evaluation questions and hypothesis testing, whereas qualitative work (small q) refers to open-ended data collection methods such as indepth interviews embedded in structured research (Kidder and Fine, 1987). The latter are more likely to meet EHR evaluators' needs. Practical issues. On the practical level, there are four issues which can affect the choice of method: Credibility of findings; Staff skills; Costs; and Time constraints.
Credibility of findings. Evaluations are designed for various audiences, including funding agencies, policymakers in governmental and private agencies, project staff and clients, researchers in academic and applied settings, as well as various other "stakeholders" (individuals and organizations with a stake in the outcome of a project). Experienced evaluators know that they often deal with skeptical audiences or stakeholders who seek to discredit findings that are too critical or uncritical of a project's outcomes. For this reason, the evaluation methodology may be rejected as unsound or weak for a specific case. The major stakeholders for EHR projects are policymakers within NSF and the federal government, state and local officials, and decisionmakers in the educational community where the project is located. In most cases, decisionmakers at the national level tend to favor quantitative information because these policymakers are accustomed to basing funding decisions on numbers and statistical indicators. On the other hand, many stakeholders in the educational community are often skeptical about statistics and number crunching and consider the richer data obtained through qualitative research to be more trustworthy and informative. A particular case in point is the use of traditional test results, a favorite outcome criterion
1-6
for policymakers, school boards, and parents, but one that teachers and school administrators tend to discount as a poor tool for assessing true student learning. Staff skills. Qualitative methods, including indepth interviewing, observations, and the use of focus groups, require good staff skills and considerable supervision to yield trustworthy data. Some quantitative research methods can be mastered easily with the help of simple training manuals; this is true of small-scale, self-administered questionnaires, where most questions can be answered by yes/no checkmarks or selecting numbers on a simple scale. Large-scale, complex surveys, however, usually require more skilled personnel to design the instruments and to manage data collection and analysis. Costs. It is difficult to generalize about the relative costs of the two methods; much depends on the amount of information needed, quality standards followed for the data collection, and the number of cases required for reliability and validity. A short survey based on a small number of cases (25-50) and consisting of a few easy questions would be inexpensive, but it also would provide only limited data. Even cheaper would be substituting a focus group session for a subset of the 25-50 respondents; while this method might provide more interesting data, those data would be primarily useful for generating new hypotheses to be tested by more appropriate qualitative or quantitative methods. To obtain robust findings, the cost of data collection is bound to be high regardless of method. Time constraints. Similarly, data complexity and quality affect the time needed for data collection and analysis. Although technological innovations have shortened the time needed to process quantitative data, a good survey requires considerable time to create and pretest questions and to obtain high response rates. However, qualitative methods may be even more time consuming because data collection and data analysis overlap, and the process encourages the exploration of new evaluation questions (see Chapter 4). If insufficient time is allowed for the evaluation, it may be necessary to curtail the amount of data to be collected or to cut short the analytic process, thereby limiting the value of the findings. For evaluations that operate under severe time constraintsfor example, where budgetary decisions depend on the findingsthe choice of the best method can present a serious dilemma. In summary, the debate over the merits of qualitative versus quantitative methods is ongoing in the academic community, but when it comes to the choice of methods for conducting project evaluations, a pragmatic strategy has been gaining increased support. Respected practitioners have argued for integrating the two
1-7
approaches building on their complimentary strengths.1 Others have stressed the advantages of linking qualitative and quantitative methods when performing studies and evaluations, showing how the validity and usefulness of findings will benefit (Miles and Huberman, 1994).
Why Use a Mixed Method Approach? The assumption guiding this handbook is that a strong case can be made for using an approach that combines quantitative and qualitative elements in most evaluations of EHR projects. We offer this assumption because most of the interventions sponsored by EHR are not introduced into a sterile laboratory, but rather into a complex social environment with feature that affect the success of the project. To ignore the complexity of the background is to impoverish the evaluation. Similarly, when investigating human behavior and attitudes, it is most fruitful to use a variety of data collection methods (Patton, 1990). By using different sources and methods at various points in the evaluation process, the evaluation team can build on the strength of each type of data collection and minimize the weaknesses of any single approach. A multimethod approach to evaluation can increase both the validity and reliability of evaluation data. The range of possible benefits that carefully crafted mixed method designs can yield has been conceptualized by a number of evaluators.2 The validity of results can be strengthened by using more than one method to study the same phenomenon. This approach called triangulationis most often mentioned as the main advantage of the mixed method approach. Combining the two methods pays off in improved instrumentation for all data collection approaches and in sharpening the evaluator's understanding of findings. A typical design might start out with a qualitative segment such as a focus group discussion, which will alert the evaluator to issues that should be explored in a survey of program participants, followed by the survey, which in turn is followed by indepth interviews to clarify some of the survey findings (Exhibit 2).
See especially the article by William R. Shadish in Program Evaluation: A Pluralistic Enterprise, New Directions for Program Evaluation, No. 60 (San Francisco: Jossey Bass. Winter 1993). For a full discussion of this topic, see Jennifer C. Greene, Valerie J. Caracelli, and Wendy F. Graham, Toward a Conceptual Framework for Mixed Method Evaluation Designs, Educational Evaluation and Policy Analysis, Vol. 11, No. 3, (Fall 1989), pp. 255-274.
1-8
(questionnaire)
But this sequential approach is only one of several that evaluators might find useful (Miles and Huberman, 1994). Thus, if an evaluator has identified subgroups of program participants or specific topics for which indepth information is needed, a limited qualitative data collection can be initiated while a more broad-based survey is in progress. A mixed method approach may also lead evaluators to modify or expand the evaluation design and/or the data collection methods. This action can occur when the use of mixed methods uncovers inconsistencies and discrepancies that alert the evaluator to the need for reexamining the evaluation framework and/or the data collection and analysis procedures used.
There is a growing consensus among evaluation experts that both qualitative and quantitative methods have a place in the performance of effective evaluations. Both formative and summative evaluations are enriched by a mixed method approach.
How To Use This Handbook This handbook covers a lot of ground, and not all readers will want to read it from beginning to end. For those who prefer to sample sections, some organizational features are highlighted below. To provide practical illustrations throughout the handbook, we have invented a hypothetical project, which is summarized in the next chapter (Part 1, Chapter 2); the various stages of the evaluation design for this project will be found in Part 3, Chapter 6. These two chapters may be especially useful for evaluators who have not been involved in designing evaluations for major, multisite EHR projects. Part 2, Chapter 3 focuses on qualitative methodologies, and Chapter 4 deals with analysis approaches for qualitative data.
1-9
These two chapters are intended to supplement the information on quantitative methods in the previous handbook. Part 3, Chapters 5, 6, and 7 cover the basic steps in developing a mixed method evaluation design and describes ways of reporting findings to NSF and other stakeholders. Part 4 presents supplementary material, including an annotated bibliography and a glossary of common terms.
Before turning to these issues, however, we present the hypothetical NSF project that is used as an anchoring point for discussing the issues presented in the subsequent chapters.
References Cronbach, L. (1982). Designing Evaluations of Educational and Social Programs. San Francisco: Jossey-Bass. Kidder, L., and Fine, M. (1987). Qualitative and Quantitative Methods: When Stories Converge. Multiple Methods in Program Evaluation. New Directions for Program Evaluation, No. 35. San Francisco: Jossey-Bass. Lofland, J., and Lofland, L.H. (1995). Analyzing Social Settings: A Guide to Qualitative Observation and Analysis. Belmont, CA: Wadsworth Publishing Company. Miles, M.B., and Huberman, A.M. (1994). Qualitative Data Analysis, 2nd Ed. Newbury Park, CA: Sage, p. 40-43. National Science Foundation. (1993). User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering and Technology Education. NSF 93-152. Arlington, VA: NSF. Patton, M.Q. (1990). Qualitative Evaluation and Research Methods, 2nd Ed. Newbury Park, CA: Sage.
1-10
2
Project Title
Undergraduate Faculty Enhancement: Introducing faculty in state universities and colleges to new concepts and methods in preservice mathematics instruction.
Project Description
In response to the growing national concern about the quality of American elementary and secondary education and especially about students' achievement in mathematics and science, considerable efforts have been directed at enhancing the skills of the teachers in the labor force through inservice training. Less attention has been focused on preservice training, especially for elementary school teachers, most of whom are educated in departments and schools of education. In many institutions, faculty members who provide this instruction need to become more conversant with the new standards and instructional techniques for the teaching of mathematics in elementary schools. The proposed pilot project was designed to examine a strategy for meeting this need. The project attempts to improve preservice education by giving the faculty teaching courses in mathematics to future elementary school teachers new knowledge, skills, and approaches for incorporation into their instruction. In the project, the investigators ascertain the extent of faculty members' knowledge about standards-based instruction, engage them in expanding their understanding of standards-based reform and the instructional approaches that support high-quality teaching; and assess the extent to which the strategies emphasized and demonstrated in the pilot project are transferred to the participants' own classroom practices.
2-1
The project is being carried out on the main campus of a major state university under the leadership of the Director of the Center for Educational Innovation. Ten day-long workshops will be offered to two cohorts of faculty members from the main campus and branch campuses. These workshops will be supplemented by opportunities for networking among participating faculty members and the exchange of experiences and recommendations during a summer session following the academic year. The workshops are based on an integrated plan for reforming undergraduate education for future elementary teachers. The focus of the workshops is to provide carefully articulated information and practice on current approaches to mathematics instruction (content and pedagogy) in elementary grades, consistent with state frameworks and standards of excellence. The program uses and builds on the knowledge of content experts, master practitioners, and teacher educators. The following strategies are being employed in the workshops: presentations, discussions, hands-on experiences with various traditional and innovative tools, coaching, and videotaped demonstrations of model teaching. The summer session is offered for sharing experiences, reflecting on successful and unsuccessful applications, and constructing new approaches. In addition, participants are encouraged to communicate with each other throughout the year via e-mail. Project activities are funded for 2 years and are expected to support two cohorts of participants; funding for an additional 6-month period to allow performance of the summative evaluation has been included. Participation is limited to faculty members on the main campus and in the seven 4-year branch campuses of the state university where courses in elementary mathematics education are offered. Participants are selected on the basis of a written essay and a commitment to attend all sessions and to try suggested approaches in their classroom. A total of 25 faculty members are to be enrolled in the workshops each year. During the life of the project, roughly 1,000 undergraduate students will be enrolled in classes taught by the participating faculty members.
2-2
To enable and encourage faculty members to incorporate these approaches in their own classrooms activities and, hopefully, into the curricula of their institutions; To stimulate their students interest in teaching mathematics and in using the new techniques when they become elementary school teachers; and To test a model for achieving these goals.
2-3
Did the participants have the opportunity to engage in inquirybased activities? Was there an appropriate balance of knowledge building and application?
The summative evaluation was intended to document the extent to which participants introduced changes in their classroom teaching and to determine which components of the workshops were especially effective in this respect. The proposal also promised to investigate the impact of the workshops on participating faculty members, especially on their acquisition of knowledge and skills. Furthermore, the impact on other faculty members, on the institution, and on students was to be part of the evaluation. Recommendations for replicating this project in other institutions, and suggestions for changes in the workshop content or administrative arrangements, were to be included in the summative evaluation. Proposed summative evaluation questions included the following: To what extent did the participants use what they were taught in their own instruction or activities? Which topics and techniques were most often (or least often) incorporated? To what extent did participants share their recently acquired knowledge and skills with other faculty? Which topics were frequently discussed? Which ones were not? To what extent was there an impact on the students of these teachers? Had they become more (or less) positive about making the teaching of elementary mathematics an important component of their future career? Did changes occur in the overall program of instruction offered to potential elementary mathematics teachers? What were the obstacles to the introduction of changes?
The proposal also enumerated possible data sources for conducting the evaluations, including self-administered questionnaires completed after each workshop, indepth interviews with knowledgeable informants, focus groups, observation of workshops, classroom observations, and surveys of students. It was stated that a more complete design for the formative and summative evaluations would be developed after contract award.
2-4
ANALYTIC TECHNIQUES
In this chapter we describe and compare the most common qualitative methods employed in project evaluations.3 These include observations, indepth interviews, and focus groups. We also cover briefly some other less frequently used qualitative techniques. Advantages and disadvantages are summarized. For those readers interested in learning more about qualitative data collection methods, a list of recommended readings is provided.
Observations
Observational techniques are methods by which an individual or individuals gather firsthand data on programs, processes, or behaviors being studied. They provide evaluators with an opportunity to collect data on a wide range of behaviors, to capture a great variety of interactions, and to openly explore the evaluation topic. By directly observing operations and activities, the evaluator can develop a holistic perspective, i.e., an understanding of the context within which the project operates. This may be especially important where it is not the event that is of interest, but rather how that event may fit into, or be impacted by, a sequence of events. Observational approaches also allow the evaluator to learn about things the participants or staff may be unaware of or that they are unwilling or unable to discuss in an interview or focus group. When to use observations. Observations can be useful during both the formative and summative phases of evaluation. For example, during the formative phase, observations can be useful in determining whether or not the project is being delivered and operated as planned. In the hypothetical project, observations could be used to describe the faculty development sessions, examining the extent to which participants understand the concepts, ask the right questions, and are engaged in appropriate interactions. Such
Information on common quantitative methods is provided in the earlier User-Friendly Handbook for Project Evaluation (NSF 93-152).
3-1
formative observations could also provide valuable insights into the teaching styles of the presenters and how they are covering the material. Observations during the summative phase of evaluation can be used to determine whether or not the project is successful. The technique would be especially useful in directly examining teaching methods employed by the faculty in their own classes after program participation. Exhibits 3 and 4 display the advantages and disadvantages of observations as a data collection tool and some common types of data that are readily collected by observation. Readers familiar with survey techniques may justifiably point out that surveys can address these same questions and do so in a less costly fashion. Critics of surveys find them suspect because of their reliance on self-report, which may not provide an accurate picture of what is happening because of the tendency, intentional or not, to try to give the right answer. Surveys also cannot tap into the contextual element. Proponents of surveys counter that properly constructed surveys with built in checks and balances can overcome these problems and provide highly credible data. This frequently debated issue is best decided on a case-by-case basis.
Advantages
Provide direct information about behavior of individuals and groups Permit evaluator to enter into and understand situation/context Provide good opportunities for identifying unanticipated outcomes Exist in natural, unstructured, and flexible setting
Disadvantages
Expensive and time consuming Need well-qualified, highly trained observers; may need to be content experts May affect behavior of participants Selective perception of observer may distort data Investigator has little control over situation Behavior or set of behaviors observed may be atypical
Recording Observational Data Observations are carried out using a carefully developed set of steps and instruments. The observer is more than just an onlooker, but rather comes to the scene with a set of target concepts, definitions, and criteria for describing events. While in some studies observers may simply record and describe, in the majority of evaluations, their descriptions are, or eventually will be, judged against a continuum of expectations. Observations usually are guided by a structured protocol. The protocol can take a variety of forms, ranging from the request for a narrative describing events seen to a checklist or a rating scale of specific behaviors/activities that address the evaluation question of interest. The use of a protocol helps assure that all observers are gathering the pertinent information and, with appropriate training, applying the same criteria in the evaluation. For example, if, as described earlier, an observational approach is selected to gather data on the faculty training sessions, the instrument developed would explicitly guide the observer to examine the kinds of activities in which participants were interacting, the role(s) of the trainers and the participants, the types of materials provided and used, the opportunity for hands-on interaction, etc. (See Appendix A to this chapter for an example of
3-2
observational protocol that could be applied to the hypothetical project.) The protocol goes beyond a recording of events, i.e., use of identified materials, and provides an overall context for the data. The protocol should prompt the observer to Describe the setting of program delivery, i.e., where the observation took place and what the physical setting was like; Identify the people who participated in those activities, i.e., characteristics of those who were present; Describe the content of the intervention, i.e., actual activities and messages that were delivered; Document the interactions between implementation staff and project participants; Describe and assess the quality of the delivery of the intervention; and Be alert to unanticipated events that might require refocusing one or more evaluation questions.
The setting - The physical environment within which the project takes place. The human, social environment - The ways in which all actors (staff, participants, others) interact and behave toward each other.
Project implementation activities -
What goes on in the life of the project? What do various actors (staff, participants, others) actually do? How are resources allocated? The native language of the program Different organizations and agencies have their own language or jargon to describe the problems they deal with in their work; capturing the precise language of all participants is an important way to record how staff and participants understand their experiences.
Nonverbal communication -
Nonverbal cues about what is happening in the project: on the way all participants dress, express opinions, physically space themselves during discussions, and arrange themselves in their physical setting.
Notable nonoccurrences -
Field notes are frequently used to provide more indepth background or to help the observer remember salient events if a form is not completed at the time of observation. Field notes contain the description of what has been observed. The descriptions must be factual, accurate, and thorough without being judgmental and cluttered by trivia. The date and time of the observation should be recorded, and everything that the observer believes to be worth noting should be included. No information should be trusted to future recall. The use of technological tools, such as battery-operated tape recorder or dictaphone, laptop computer, camera, and video camera, can make the collection of field notes more efficient and the notes themselves more comprehensive. Informed consent must be obtained from participants before any observational data are gathered.
Determining what is not occurring although the expectation is that it should be occurring as planned by the project team, or noting the absence of some particular activity/factor that is noteworthy and would serve as added information.
The Role of the Observer There are various methods for gathering observational data, depending on the nature of a given project. The most fundamental distinction between various observational strategies concerns the extent to which the observer will be a participant in the setting being studied. The extent of participation is a continuum that varies from
3-3
The most fundamental distinction between various observational strategies concerns the extent to which the observer will be a participant in the setting being studied.
complete involvement in the setting as a full participant to complete separation from the setting as an outside observer or spectator. The participant observer is fully engaged in experiencing the project setting while at the same time trying to understand that setting through personal experience, observations, and interactions and discussions with other participants. The outside observer stands apart from the setting, attempts to be nonintrusive, and assumes the role of a fly-on-the-wall. The extent to which full participation is possible and desirable will depend on the nature of the project and its participants, the political and social context, the nature of the evaluation questions being asked, and the resources available. The ideal is to negotiate and adopt that degree of participation that will yield the most meaningful data about the program given the characteristics of the participants, the nature of staff-participant interactions, and the sociopolitical context of the program (Patton, 1990). In some cases it may be beneficial to have two people observing at the same time. This can increase the quality of the data by providing a larger volume of data and by decreasing the influence of observer bias. However, in addition to the added cost, the presence of two observers may create an environment threatening to those being observed and cause them to change their behavior. Studies using observation typically employ intensive training experiences to make sure that the observer or observers know what to look for and can, to the extent possible, operate in an unbiased manner. In long or complicated studies, it is useful to check on an observers performance periodically to make sure that accuracy is being maintained. The issue of training is a critical one and may make the difference between a defensible study and what can be challenged as one persons perspective. A special issue with regard to observations relates to the amount of observation needed. While in participant observation this may be a moot point (except with regard to data recording), when an outside observer is used, the question of how much becomes very important. While most people agree that one observation (a single hour of a training session or one class period of instruction) is not enough, there is no hard and fast rule regarding how many samples need to be drawn. General tips to consider are to avoid atypical situations, carry out observations more than one time, and (where possible and relevant) spread the observations out over time. Participant observation is often difficult to incorporate in evaluations; therefore, the use of outside observers is far more common. In the hypothetical project, observations might be scheduled for all training sessions and for a sample of classrooms, including some where faculty members who participated in training were teaching and some staffed by teachers who had not participated in the training.
3-4
Issues of privacy and access. Observational techniques are perhaps the most privacy-threatening data collection technique for staff and, to a lesser extent, participants. Staff fear that the data may be included in their performance evaluations and may have effects on their careers. Participants may also feel uncomfortable assuming that they are being judged. Evaluators need to assure everyone that evaluations of performance are not the purpose of the effort, and that no such reports will result from the observations. Additionally, because most educational settings are subject to a constant flow of observers from various organizations, there is often great reluctance to grant access to additional observers. Much effort may be needed to assure project staff and participants that they will not be adversely affected by the evaluators work and to negotiate observer access to specific sites.
Interviews
Interviews provide very different data from observations: they allow the evaluation team to capture the perspectives of project participants, staff, and others associated with the project. In the hypothetical example, interviews with project staff can provide information on the early stages of the implementation and problems encountered. The use of interviews as a data collection method begins with the assumption that the participants perspectives are meaningful, knowable, and able to be made explicit, and that their perspectives affect the success of the project. An interview, rather than a paper and pencil survey, is selected when interpersonal contact is important and when opportunities for followup of interesting comments are desired. Two types of interviews are used in evaluation research: structured interviews, in which a carefully worded questionnaire is administered; and indepth interviews, in which the interviewer does not follow a rigid form. In the former, the emphasis is on obtaining answers to carefully phrased questions. Interviewers are trained to deviate only minimally from the question wording to ensure uniformity of interview administration. In the latter, however, the interviewers seek to encourage free and open responses, and there may be a tradeoff between comprehensive coverage of topics and indepth exploration of a more limited set of questions. Indepth interviews also encourage capturing of respondents perceptions in their own words, a very desirable strategy in qualitative data collection. This allows the evaluator to present the meaningfulness of the experience from the respondents perspective. Indepth interviews are conducted with individuals or with a small group of individuals.4
Interviews allow the evaluation team to capture the perspectives of project participants, staff, and others associated with the project.
A special case of the group interview is called a focus group. Although we discuss focus groups separately, several of the exhibits in this section will refer to both forms of data collection because of their similarities.
3-5
Indepth interviews. An indepth interview is a dialogue between a skilled interviewer and an interviewee. Its goal is to elicit rich, detailed material that can be used in analysis (Lofland and Lofland, 1995). Such interviews are best conducted face to face, although in some situations telephone interviewing can be successful. Indepth interviews are characterized by extensive probing and openended questions. Typically, the project evaluator prepares an interview guide that includes a list of questions or issues that are to be explored and suggested probes for following up on key topics. The guide helps the interviewer pace the interview and make interviewing more systematic and comprehensive. Lofland and Lofland (1995) provide guidelines for preparing interview guides, doing the interview with the guide, and writing up the interview. Appendix B to this chapter contains an example of the types of interview questions that could be asked during the hypothetical study. The dynamics of interviewing are similar to a guided conversation. The interviewer becomes an attentive listener who shapes the process into a familiar and comfortable form of social engagementa conversationand the quality of the information obtained is largely dependent on the interviewers skills and personality (Patton, 1990). In contrast to a good conversation, however, an indepth interview is not intended to be a two-way form of communication and sharing. The key to being a good interviewer is being a good listener and questioner. Tempting as it may be, it is not the role of the interviewer to put forth his or her opinions, perceptions, or feelings. Interviewers should be trained individuals who are sensitive, empathetic, and able to establish a nonthreatening environment in which participants feel comfortable. They should be selected during a process that weighs personal characteristics that will make them acceptable to the individuals being interviewed; clearly, age, sex, profession, race/ethnicity, and appearance may be key characteristics. Thorough training, including familiarization with the project and its goals, is important. Poor interviewing skills, poor phrasing of questions, or inadequate knowledge of the subjects culture or frame of reference may result in a collection that obtains little useful data. When to use indepth interviews. Indepth interviews can be used at any stage of the evaluation process. They are especially useful in answering questions such as those suggested by Patton (1990): What does the program look and feel like to the participants? To other stakeholders? What are the experiences of program participants?
3-6
What do stakeholders know about the project? What thoughts do stakeholders knowledgeable about the program have concerning program operations, processes, and outcomes? What are participants and stakeholders expectations? What features of the project are most salient to the participants? What changes do participants perceive in themselves as a result of their involvement in the project?
Usually yield richest data, details, new insights Permit face-to-face contact with respondents Provide opportunity to explore topics in depth Afford ability to experience the affective as well as cognitive aspects of responses Allow interviewer to explain or help clarify questions, increasing the likelihood of useful responses Allow interviewer to be flexible in administering interview to particular individuals or circumstances
Disadvantages
Specific circumstances for which indepth interviews are particularly appropriate include complex subject matter; detailed information sought; busy, high-status respondents; and highly sensitive subject matter.
In the hypothetical project, indepth interviews of the project director, staff, department chairs, branch campus deans, and nonparticipant faculty would be useful. These interviews can address both formative and summative questions and be used in conjunction with other data collection methods. The advantages and disadvantages of indepth interviews are outlined in Exhibit 5. When indepth interviews are being considered as a data collection technique, it is important to keep several potential pitfalls or problems in mind. There may be substantial variation in the interview setting. Interviews generally take place in a wide range of settings. This limits the interviewers control over the environment. The interviewer may have to contend with disruptions and other problems that may inhibit the acquisition of information and limit the comparability of interviews. There may be a large gap between the respondents knowledge and that of the interviewer. Interviews are often conducted with knowledgeable respondents yet administered by less knowledgeable interviewers or by interviewers not completely familiar with the pertinent social, political, or cultural context. Therefore, some of the responses may not be correctly understood or reported. The solution may be not only to employ highly trained and knowledgeable staff, but also to use interviewers with special skills for specific types of
Expensive and time-consuming Need well-qualified, highly trained interviewers Interviewee may distort information through recall error, selective perceptions, desire to please interviewer Flexibility can result in inconsistencies across interviews Volume of information too large; may be difficult to transcribe and reduce data
3-7
respondents (for example, same status interviewers for highlevel administrators or community leaders). It may also be most expedient for the project director or senior evaluation staff to conduct such interviews, if this can be done without introducing or appearing to introduce bias.
Exhibit 6. Considerations in conducting indepth interviews and focus groups
Exhibit 6 outlines other considerations in conducting interviews. These considerations are also important in conducting focus groups, the next technique that we will consider. Recording interview data. Interview data can be recorded on tape (with the permission of the participants) and/or summarized in notes. As with observations, detailed recording is a necessary component of interviews since it forms the basis for analyzing the data. All methods, but especially the second and third, require carefully crafted interview guides with ample space available for recording the interviewees responses. Three procedures for recording the data are presented below. In the first approach, the interviewer (or in some cases the transcriber) listens to the tapes and writes a verbatim account of everything that was said. Transcription of the raw data includes word-for-word quotations of the participants responses as well as the interviewers descriptions of participants characteristics, enthusiasm, body language, and overall mood during the interview. Notes from the interview can be used to identify speakers or to recall comments that are garbled or unclear on the tape. This approach is recommended when the necessary financial and human resources are available, when the transcriptions can be produced in a reasonable amount of time, when the focus of the interview is to make detailed comparisons, or when respondents own words and phrasing are needed. The major advantages of this transcription method are its completeness and the opportunity it affords for the interviewer to remain attentive and focused during the interview. The major disadvantages are the amount of time and resources needed to produce complete transcriptions and the inhibitory impact tape recording has on some respondents. If this technique is selected, it is essential that the participants have been informed that their answers are being recorded, that they are assured confidentiality, and that their permission has been obtained. A second possible procedure for recording interviews draws less on the word-by-word record and more on the notes taken by the interviewer or assigned notetaker. This method is called note expansion. As soon as possible after the interview, the interviewer listens to the tape to clarify certain issues and to confirm that all the main points have been included in the notes. This approach is recommended when resources are scarce, when the results must be produced in a short period of time, and when the purpose of the interview is to get rapid feedback from members of the target population. The note expansion approach saves
Factors to consider in determining the setting for interviews (both individual and group) include the following:
Select a setting that provides privacy for participants. Select a location where there are no distractions and it is easy to hear respondents speak. Select a comfortable location. Select a nonthreatening environment. Select a location that is easily accessible for respondents. Select a facility equipped for audio or video recording. Stop telephone or visitor interruptions to respondents interviewed in their office or homes. Provide seating arrangements that encourage involvement and interaction.
3-8
time and retains all the essential points of the discussion. In addition to the drawbacks pointed out above, a disadvantage is that the interviewer may be more selective or biased in what he or she writes. In the third approach, the interviewer uses no tape recording, but instead takes detailed notes during the interview and draws on memory to expand and clarify the notes immediately after the interview. This approach is useful if time is short, the results are needed quickly, and the evaluation questions are simple. Where more complex questions are involved, effective note-taking can be achieved, but only after much practice. Further, the interviewer must frequently talk and write at the same time, a skill that is hard for some to achieve.
Focus Groups
Focus groups combine elements of both interviewing and participant observation. The focus group session is, indeed, an interview (Patton, 1990) not a discussion group, problem-solving session, or decisionmaking group. At the same time, focus groups capitalize on group dynamics. The hallmark of focus groups is the explicit use of the group interaction to generate data and insights that would be unlikely to emerge without the interaction found in a group. The technique inherently allows observation of group dynamics, discussion, and firsthand insights into the respondents behaviors, attitudes, language, etc. Focus groups are a gathering of 8 to 12 people who share some characteristics relevant to the evaluation. Originally used as a market research tool to investigate the appeal of various products, the focus group technique has been adopted by other fields, such as education, as a tool for data gathering on a given topic. Focus groups conducted by experts take place in a focus group facility that includes recording apparatus (audio and/or visual) and an attached room with a one-way mirror for observation. There is an official recorder who may or may not be in the room. Participants are paid for attendance and provided with refreshments. As the focus group technique has been adopted by fields outside of marketing, some of these features, such as payment or refreshment, have been eliminated. When to use focus groups. When conducting evaluations, focus groups are useful in answering the same type of questions as indepth interviews, except in a social context. Specific applications of the focus group method in evaluations include identifying and defining problems in project implementation;
3-9
Focus groups and indepth interviews should not be used interchangeably.
identifying project strengths, weaknesses, and recommendations; assisting with interpretation of quantitative findings;5 obtaining perceptions of project outcomes and impacts; and generating new ideas.
In the hypothetical project, focus groups could be conducted with project participants to collect perceptions of project implementation and operation (e.g., Were the workshops staffed appropriately? Were the presentations suitable for all participants?), as well as progress toward objectives during the formative phase of evaluation (Did participants exchange information by e-mail and other means?). Focus groups could also be used to collect data on project outcomes and impact during the summative phase of evaluation (e.g., Were changes made in the curriculum? Did students taught by participants appear to become more interested in class work? What barriers did the participants face in applying what they had been taught?). Although focus groups and indepth interviews share many characteristics, they should not be used interchangeably. Factors to consider when choosing between focus groups and indepth interviews are included in Exhibit 7.
Developing a Focus Group An important aspect of conducting focus groups is the topic guide. (See Appendix C to this chapter for a sample guide applied to the hypothetical project.) The topic guide, a list of topics or question areas, serves as a summary statement of the issues and objectives to be covered by the focus group. The topic guide also serves as a road map and as a memory aid for the focus group leader, called a moderator. The topic guide also provides the initial outline for the report of finding. Focus group participants are typically asked to reflect on the questions asked by the moderator. Participants are permitted to hear each others responses and to make additional comments beyond their own original responses as they hear what other people have to say. It is not necessary for the group to reach any kind of consensus, nor it is necessary for people to disagree. The moderator must keep the
Survey developers also frequently use focus groups to pretest topics or ideas that later will be used for quantitative data collection. In such cases, the data obtained are considered part of instrument development rather than findings. Qualitative evaluators feel that this is too limited an application and that the technique has broader utility.
3-10
Group interaction
interaction of respondents may stimulate a richer response or new and valuable thought. group/peer pressure will be valuable in challenging the thinking of respondents and illuminating conflicting opinions. subject matter is not so sensitive that respondents will temper responses or withhold information. the topic is such that most respondents can say all that is relevant or all that they know in less than 10 minutes.
Group/peer pressure
group/peer pressure would inhibit responses and cloud the meaning of results. subject matter is so sensitive that respondents would be unwilling to talk openly in a group. the topic is such that a greater depth of response per individual is desirable, as with complex subject matter and very knowledgeable respondents. it is possible to use numerous individuals on the project; one interviewer would become fatigued or bored conducting all interviews. a greater volume of issues must be covered. it is necessary to understand how attitudes and behaviors link together on an individual basis. it may be necessary to develop the interview guide by altering it after each of the initial interviews. stakeholders do not need to hear firsthand the opinions of participants. respondents are dispersed or not easily assembled for other reasons.
it is desirable to have one individual conduct the data collection; a few groups will not create fatigue or boredom for one person. the volume of issues to cover is not extensive. a single subject area is being examined in depth and strings of behaviors are less relevant. enough is known to establish a meaningful topic guide.
Continuity of information
Observation by stakeholders
it is desirable for stakeholders to hear what participants have to say. an acceptable number of target respondents can be assembled in one location. quick turnaround is critical, and funds are limited. focus group facilitators need to be able to control and manage groups
Logistics geographically
quick turnaround is not critical, and budget will permit higher cost. interviewers need to be supportive and skilled listeners.
3-11
discussion flowing and make sure that one or two persons do not dominate the discussion. As a rule, the focus group session should not last longer than 1 1/2 to 2 hours. When very specific information is required, the session may be as short as 40 minutes. The objective is to get high-quality data in a social context where people can consider their own views in the context of the views of others, and where new ideas and perspectives can be introduced. The participants are usually a relatively homogeneous group of people. Answering the question, Which respondent variables represent relevant similarities among the target population? requires some thoughtful consideration when planning the evaluation. Respondents social class, level of expertise, age, cultural background, and sex should always be considered. There is a sharp division among focus group moderators regarding the effectiveness of mixing sexes within a group, although most moderators agree that it is acceptable to mix the sexes when the discussion topic is not related to or affected by sex stereotypes. Determining how many groups are needed requires balancing cost and information needs. A focus group can be fairly expensive, costing $10,000 to $20,000 depending on the type of physical facilities needed, the effort it takes to recruit participants, and the complexity of the reports required. A good rule of thumb is to conduct at least two groups for every variable considered to be relevant to the outcome (sex, age, educational level, etc.). However, even when several groups are sampled, conclusions typically are limited to the specific individuals participating in the focus group. Unless the study population is extremely small, it is not possible to generalize from focus group data. Recording focus group data. The procedures for recording a focus group session are basically the same as those used for indepth interviews. However, the focus group approach lends itself to more creative and efficient procedures. If the evaluation team does use a focus group room with a one-way mirror, a colleague can take notes and record observations. An advantage of this approach is that the extra individual is not in the view of participants and, therefore, not interfering with the group process. If a one-way mirror is not a possibility, the moderator may have a colleague present in the room to take notes and to record observations. A major advantage of these approaches is that the recorder focuses on observing and taking notes, while the moderator concentrates on asking questions, facilitating the group interaction, following up on ideas, and making smooth transitions from issue to issue. Furthermore, like observations, focus groups can be videotaped. These approaches allow for confirmation of what was seen and heard. Whatever the approach to gathering detailed data, informed consent is necessary and confidentiality should be assured.
3-12
Having highlighted the similarities between interviews and focus groups, it is important to also point out one critical difference. In focus groups, group dynamics are especially important. The notes, and resultant report, should include comments on group interaction and dynamics as they inform the questions under study.
Document Studies Existing records often provide insights into a setting and/or group of people that cannot be observed or noted in another way. This information can be found in document form. Lincoln and Guba (1985) defined a document as any written or recorded material not prepared for the purposes of the evaluation or at the request of the inquirer. Documents can be divided into two major categories: public records, and personal documents (Guba and Lincoln, 1981). Public records are materials created and kept for the purpose of attesting to an event or providing an accounting (Lincoln and Guba, 1985). Public records can be collected from outside (external) or within (internal) the setting in which the evaluation is taking place. Examples of external records are census and vital statistics reports, county office records, newspaper archives, and local business records that can assist an evaluator in gathering information about the larger community and relevant trends. Such materials can be helpful in better understanding the project participants and making comparisons between groups/communities. For the evaluation of educational innovations, internal records include documents such as student transcripts and records, historical accounts, institutional mission statements, annual reports, budgets, grade and standardized test reports, minutes of meetings, internal memoranda, policy manuals, institutional histories, college/university catalogs, faculty and student handbooks, official correspondence, demographic material, mass media reports and presentations, and descriptions of program development and evaluation. They are particularly useful in describing institutional characteristics, such as backgrounds and academic performance of students, and in identifying institutional strengths and weaknesses. They can help the evaluator understand the
3-13
institutions resources, values, processes, priorities, and concerns. Furthermore, they provide a record or history not subject to recall bias. Personal documents are first-person accounts of events and experiences. These documents of life include diaries, portfolios, photographs, artwork, schedules, scrapbooks, poetry, letters to the paper, etc. Personal documents can help the evaluator understand how the participant sees the world and what she or he wants to communicate to an audience. And unlike other sources of qualitative data, collecting data from documents is relatively invisible to, and requires minimal cooperation from, persons within the setting being studied (Fetterman, 1989). The usefulness of existing sources varies depending on whether they are accessible and accurate. In the hypothetical project, documents can provide the evaluator with useful information about the culture of the institution and participants involved in the project, which in turn can assist in the development of evaluation questions. Information from documents also can be used to generate interview questions or to identify events to be observed. Furthermore, existing records can be useful for making comparisons (e.g., comparing project participants to project applicants, project proposal to implementation records, or documentation of institutional policies and program descriptions prior to and following implementation of project interventions and activities). The advantages and disadvantages of document studies are outlined in Exhibit 8.
Available locally Inexpensive Grounded in setting and language in which they occur Useful for determining value, interest, positions, political climate, public attitudes, historical trends or sequences Provide opportunity for study of trends over time Unobtrusive
Key Informant A key informant is a person (or group of persons) who has unique skills or professional background related to the issue/intervention being evaluated, is knowledgeable about the project participants, or has access to other information of interest to the evaluator. A key informant can also be someone who has a way of communicating that represents or captures the essence of what the participants say and do. Key informants can help the evaluation team better understand the issue being evaluated, as well as the project participants, their backgrounds, behaviors, and attitudes, and any language or ethnic considerations. They can offer expertise beyond the evaluation team. They are also very useful for assisting with the evaluation of curricula and other educational materials. Key informants can be surveyed or interviewed individually or through focus groups. In the hypothetical project, key informants (i.e., expert faculty on main campus, deans, and department chairs) can assist with (1) developing
Disadvantages
May be incomplete May be inaccurate; questionable authenticity Locating suitable documents may pose challenges Analysis may be time consuming Access may be difficult
3-14
evaluation questions, and (2) answering formative and summative evaluation questions. The use of advisory committees is another way of gathering information from key informants. Advisory groups are called together for a variety of purposes: To represent the ideas and attitudes of a community, group, or organization; To promote legitimacy for project; To advise and recommend; or To carry out a specific task.
Members of such a group may be specifically selected or invited to participate because of their unique skills or professional background; they may volunteer; they may be nominated or elected; or they may come together through a combination of these processes. The advantages and disadvantages of using key informants are outlined in Exhibit 9.
Information concerning causes, reasons, and/or best approaches from an insider point of view Advice/feedback increases credibility of study Pipeline to pivotal groups May have side benefit to solidify relationships between evaluators, clients, participants, and other stakeholders
Disadvantages
Performance Assessment The performance assessment movement is impacting education from preschools to professional schools. At the heart of this upheaval is the belief that for all of their virtuesparticularly efficiency and economytraditional objective, norm-referenced tests may fail to tell us what we most want to know about student achievement. In addition, these same tests exert a powerful and, in the eyes of many educators, detrimental influence on curriculum and instruction. Critics of traditional testing procedures are exploring alternatives to multiplechoice, norm-referenced tests. It is hoped that these alternative means of assessment, ranging from observations to exhibitions, will provide a more authentic picture of achievement. Critics raise three main points against objective, norm-referenced tests. Tests themselves are flawed. Tests are a poor measure of anything except a students testtaking ability. Tests corrupt the very process they are supposed to improve (i.e., their structure puts too much emphasis on learning isolated facts).
Time required to select and get commitment may be substantial Relationship between evaluator and informants may influence type of data obtained Informants may interject own biases and impressions May result in disagreements among individuals leading to frustration/ conflicts
3-15
The search for alternatives to traditional tests has generated a number of new approaches to assessment under such names as alternative assessment, performance assessment, holistic assessment, and authentic assessment. While each label suggests slightly different emphases, they all imply a movement toward assessment that supports exemplary teaching. Performance assessment appears to be the most popular term because it emphasizes the development of assessment tools that involve students in tasks that are worthwhile, significant, and meaningful. Such tasks involve higher order thinking skills and the coordination of a broad range of knowledge. Performance assessment may involve qualitative activities such as oral interviews, group problem-solving tasks, portfolios, or personal documents/creations (poetry, artwork, stories). A performance assessment approach that could be used in the hypothetical project is work sample methodology (Schalock, Schalock, and Girad, in press ). Briefly, work sample methodology challenges teachers to create unit plans and assessment techniques for students at several points during a training experience. The quality of this product is assessed (at least before and after training) in light of the goal of the professional development program. The actual performance of students on the assessment measures provides additional information on impact.
Case Studies Classical case studies depend on ethnographic and participant observer methods. They are largely descriptive examinations, usually of a small number of sites (small towns, hospitals, schools) where the principal investigator is immersed in the life of the community or institution and combs available documents, holds formal and informal conversations with informants, observes ongoing activities, and develops an analysis of both individual and cross-case findings. In the hypothetical study, for example, case studies of the experiences of participants from different campuses could be carried out. These might involve indepth interviews with the facility participants, observations of their classes over time, surveys of students, interviews with peers and department chairs, and analyses of student work samples at several points in the program. Selection of participants might be made based on factors such as their experience and training, type of students taught, or differences in institutional climate/supports. Case studies can provide very engaging, rich explorations of a project or application as it develops in a real-world setting. Project evaluators must be aware, however, that doing even relatively modest, illustrative case studies is a complex task that cannot be accomplished through
3-16
occasional, brief site visits. Demands with regard to design, data collection, and reporting can be substantial. For those wanting to become thoroughly familiar with this topic, a number of relevant texts are referenced here.
References
Fetterman, D.M. (1989). Ethnography: Step by Step. Applied Social Research Methods Series, Vol. 17. Newbury Park, CA: Sage. Guba, E.G., and Lincoln, Y.S. (1981). Effective Evaluation. San Francisco: Jossey-Bass. Lincoln, Y.S., and Guba, E.G. (1985). Naturalistic Inquiry. Beverly Hills, CA: Sage. Lofland, J., and Lofland, L.H. (1995). Analyzing Social Settings: A Guide to Qualitative Observation and Analysis, 3rd Ed. Belmont, CA: Wadsworth. Patton, M.Q. (1990). Qualitative Evaluation and Research Methods, 2nd Ed. Newbury Park, CA: Sage. Schalock, H.D., Schalock, M.D., and Girad, G.R. (In press). Teacher work sample methodology, as used at Western Oregon State College. In J. Millman, Ed., Assuring Accountability? Using Gains in Student Learning to Evaluate Teachers and Schools. Newbury Park, CA: Corwin.
Debus, M. (1995). Methodological Review: A Handbook for Excellence in Focus Group Research. Washington, DC: Academy for Educational Development. Denzin, N.K., and Lincoln, Y.S. (Eds.). (1994). Handbook of Qualitative Research. Thousand Oaks, CA: Sage. Erlandson, D.A., Harris, E.L., Skipper, B.L., and Allen, D. (1993). Doing Naturalist Inquiry: A Guide to Methods. Newbury Park, CA: Sage. Greenbaum, T.L. (1993). The Handbook of Focus Group Research. New York: Lexington Books.
3-17
Hart, D. (1994). Authentic Assessment: A Handbook for Educators. Menlo Park, CA: Addison-Wesley. Herman, J.L., and Winters, L. (1992). Tracking Your Schools Success: A Guide to Sensible Evaluation. Newbury Park, CA: Corwin Press. Hymes, D.L., Chafin, A.E., and Gondor, R. (1991). The Changing Face of Testing and Assessment: Problems and Solutions. Arlington, VA: American Association of School Administrators. Krueger, R.A. (1988). Focus Groups: A Practical Guide for Applied Research. Newbury Park, CA: Sage. LeCompte, M.D., Millroy, W.L., and Preissle, J. (Eds.). (1992). The Handbook of Qualitative Research in Education. San Diego, CA: Academic Press. Merton, R.K., Fiske, M., and Kendall, P.L. (1990). The Focused Interview: A Manual of Problems and Procedures, 2nd Ed. New York: The Free Press. Miles, M.B., and Huberman, A.M. (1994). Qualitative Data Analysis: An Expanded Sourcebook. Thousand Oaks, CA: Sage. Morgan, D.L. (Ed.). (1993). Successful Focus Groups: Advancing the State of the Art. Newbury Park, CA: Sage. Morse, J.M. (Ed.). (1994). Critical Issues in Qualitative Research Methods. Thousand Oaks, CA: Sage. Perrone, V. (Ed.). (1991). Expanding Student Assessment. Alexandria, VA: Association for Supervision and Curriculum Development. Reich, R.B. (1991). The Work of Nations. New York: Alfred A. Knopf. Schatzman, L., and Strauss, A.L. (1973). Englewood Cliffs, NJ: Prentice-Hall. Field Research.
Seidman, I.E. (1991). Interviewing as Qualitative Research: A Guide for Researchers in Education and Social Sciences. New York: Teachers College Press. Stewart, D.W., and Shamdasani, P.N. (1990). Focus Groups: Theory and Practice. Newbury Park, CA: Sage.
3-18
United States General Accounting Office (GAO). (1990). Case Study Evaluations, Paper 10.1.9. Washington, DC: GAO. Weiss, R.S. (1994). Learning from Strangers: The Art and Method of Qualitative Interview Studies. New York: Free Press. Wiggins, G. (1989). A True Test: Toward More Authentic and Equitable Assessment. Phi Delta Kappan, May, 703-704. Wiggins, G. (1989). Teaching to the (Authentic) Test. Educational Leadership, 46, 45. Yin, R.K. (1989). Case Study Research: Newbury Park, CA: Sage. Design and Method.
3-19
Developed from Weiss, Iris, 1997 Local Systemic Change Observation Protocol.
A-1
II.
Session Focus Indicate the major intended purpose(s) of this session based on the information provided by the project staff. _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________
A-3
III.
Faculty Development Activities (Check all activities observed and describe, as relevant) A. Indicate the major instructional resource(s) used in this faculty development session.
Print materials Hands-on materials Outdoor resources Technology/audio-visual resources Other instructional resources (Please specify.) __________________________________________
B.
C.
Indicate the major activities of presenters and participants in this session. (Check circle to indicate applicability.)
A-4
A-5
D.
Comments Please provide any additional information you consider necessary to capture the activities or context of this faculty development session. Include comments on any feature of the session that is so salient that you need to get it on the table right away to help explain your ratings.
A-6
A-7
I.
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6
7 7 7 7
B.
Synthesis Rating 1
Design of the session was not at all reflective of best practice for faculty development
5
Design of the session extremely reflective of best practice for faculty development
C.
A-8
II.
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6
7 7 7 7
B.
Synthesis Rating 1
Implementation of the session not at all reflective of best practice for faculty development
5
Implementation of the session extremely reflective of best practice for faculty development
C.
A-9
III.
Disciplinary Content Not applicable. (Disciplinary content not included in the session.) A. Ratings of Key Indicators
Not at all 1. Disciplinary content was appropriate for the purposes of the faculty development session and the backgrounds of the participants................................... 2. The content was sound and appropriately presented/ explored....................................................................... 3. Facilitator displayed an understanding of concepts (e.g., in his/her dialogue with participants).................. 4. Content area was portrayed by a dynamic body of knowledge continually enriched by conjecture, investigation, analysis, and proof/justification ............ 5. Depth and breadth of attention to disciplinary content was appropriate for the purposes of the session and the needs of adult learners ........................................... 6. Appropriate connections were made to other areas of science/mathematics, to other disciplines, and/or to real world contexts ...................................................... 7. Degree of closure or resolution of conceptual understanding was appropriate for the purposes of the session and the needs of adult learners ........................ 8. __________________________________ To a great extent Dont N/A know
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
1 1
2 2
3 3
4 4
5 5
B.
Synthesis Rating 1
Disciplinary content of the session not at all reflective of best practice for faculty development
5
Disciplinary content of the session extremely reflective of best practice for faculty development
C.
A-10
IV.
Pedagogical Content Not applicable. (Pedagogical content not included in the session.) A. Ratings of Key Indicators
Not at all 1. Pedagogical content was appropriate for the purposes of the faculty development session and the backgrounds of the participants................................... 2. Pedagogical content was sound and appropriately presented/explored ...................................................... 3. Presentor displayed an understanding of pedagogical concepts (e.g., in his/her dialogue with participants)... 4. The session included explicit attention to classroom implementation issues.................................................. 5. Depth and breadth of attention to pedagogical content was appropriate for the purposes of the session and the needs of adult learners ........................................... 6. Degree of closure or resolution of conceptual understanding was appropriate for the purposes of the session and the needs of adult learners ........................ 7. __________________________________ To a great extent Dont N/A know
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
1 1
2 2
3 3
4 4
5 5
B.
Synthesis Rating 1
Pedagogical content of the session not at all reflective of current standards for science/ mathematics education
5
Pedagogical content of session extremely reflective of current standards for science/ mathematics education
C.
A-11
V.
To a great extent
2 2 2 2
3 3 3 3
4 4 4` 4
5 5 5 5
6 6 6 6
7 7 7 7
1 1 1
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
Use 1, Not at all, when you have considerable evidence of insensitivity or inequitable behavior; 3, when there are no examples either way; and 5, To a great extent, when there is considerable evidence of proactive efforts to achieve equity.
B.
Synthesis Rating 1
Culture of the session interferes with engagement of participants as members of a faculty learning community
5
Culture of the session facilitates engagement of participants as members of a faculty learning community
C.
VI.
Overall Ratings of the Session While the impact of a single faculty development session may well be limited in scope, it is important to judge whether it is helping move participants in the desired direction. For ratings in the section below, consider all available information (i.e., your previous ratings of design, implementation, content, and culture/equity; related interviews, and your knowledge of the overall faculty development program) as you assess likely impact of this session. Feel free to elaborate on ratings with comments in the space provided. Likely Impact on Participants Capacity for Exemplary Instruction Consider the likely impact of this session on the faculty participants capacity to teach exemplary science/ mathematics instruction. Circle the response that best describes your overall assessment of the likely effect of this session in each of the following areas. Not applicable. (The session did not focus on building capacity for classroom instruction.)
Not at all 1. Participants ability to identify and understand important ideas of science/mathematics .................................................. 2. Participants understanding of science/mathematics as dynamic body of knowledge generated and enriched by investigation............................................................................ 3. Participants understanding of how students learn .................. 4. Participants ability to plan/implement exemplary classroom instruction ............................................................................... 5. Participants ability to implement exemplary classroom instructional materials ............................................................. 6. Participants self-confidence in instruction............................. 7. Proactiveness of participants in addressing their faculty development needs.................................................................. 8. Professional networking among participants with regard to science/mathematics instruction.............................................. To a great extent Dont N/A know
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
5 5 5 5 5 5 5
6 6 6 6 6 6 6
7 7 7 7 7 7 7
A-13
Name of Interviewer________________ Date_______________________________ Name of Interviewee________________ Staff Position_____________________ Appendix B1 Sample Indepth Interview Guide Interview with Project Staff -------------------------------------------------------------------------------------------------------------------Good morning. I am ________ (introduce self). This interview is being conducted to get your input about the implementation of the Undergraduate Faculty Enhancement workshops which you have been conducting/involved in. I am especially interested in any problems you have faced or are aware of and recommendations you have. If it is okay with you, I will be tape recording our conversation. The purpose of this is so that I can get all the details but at the same time be able to carry on an attentive conversation with you. I assure you that all your comments will remain confidential. I will be compiling a report which will contain all staff comments without any reference to individuals. If you agree to this interview and the tape recording, please sign this consent form. I'd like to start by having you briefly describe your responsibilities and involvement thus far with the Undergraduate Faculty Enhancement Project. (Note to interviewer: You may need to probe to gather the information you need). I'm now going to ask you some questions that I would like you to answer to the best of your ability. If you do not know the answer, please say so. Are you aware of any problems with the scheduling and location(s)? (Note to interviewer: If so, probe - What have the problems been?, Do you know why these problems are occurring?, Do you have any suggestions on how to minimize these problems?) How were decisions made with respect to content and staffing of the first three workshops? (Note to interviewer: You may need to probe to gather the information about input from staff, participant reactions, availability of instructors, etc.)
This guide was designed for interviews to be conducted after the project has been active for 3 months. For later interviews, the guide will need to be modified as appropriate.
B-1
What is taking place in the workshops? (Note to interviewer: After giving individual time to respond, probe specific planned activities/strategies he/she may not have addressed What have the presentations been like?, Have there been demonstrations of model teaching? If so, please describe?, Has active participation been encouraged? Please describe for me how?) What do you think the strongest points of the workshops have been up to this point? Why do you say this? (Note to interviewer: You may need to probe why specific strong elements are mentioned - e.g., if interviewee replies They work, respond How can you tell that they work?) What types of concerns have you had or heard regarding the availability of materials and equipment? (Note to interviewer: You may need to probe to gather the information you need) What other problems are you aware of? gather the information you need) (Note to interviewer: You may need to probe to
What do you think about the project/workshops at this point? (Note to interviewer: You may need to probe to gather the information you need - e.g., I'd like to know more about what your thinking is on that issue?) Is there any other information about the workshops or other aspects of the project that you think would be useful for me to know? (Note to interviewer: If so, you may need to probe to gather the information you need)
B-2
Name of Moderator________________ Date_______________________________ Attendees________________ Appendix C1 Sample Focus Group Topic Guide Workshop Participants Evaluation Questions: Are the students taught by faculty participants exposed to new standards, materials, practices? Did this vary by faculty member? By students characteristics? Were there obstacles to changes?; What did the participants do to share knowledge with other faculty? Did other faculty adopt new concepts and practices?; Were changes made in curriculum? Examinations and other requirements? Expenditures for library and other resource materials? Did students taught by participants become more interested in class work? More active in class? Did they express interest in teaching math after graduation? Did they plan to use new concepts and techniques? ------------------------------------------------------------------------------------------------------------------Introduction Give an explanation Good afternoon. My name is _______ and this is my colleague ______. Thank you for coming. A focus group is a relaxed discussion..... Present the purpose We are here today to talk about your teaching experiences since you participated in the Undergraduate Faculty Enhancement workshops. The purpose is to get your perceptions of how the workshops have affected your teaching, your students, other faculty, and the curriculum. I am not here to share information, or to give you my opinions. Your perceptions are what matter. There are no right or wrong or desirable or undesirable answers. You can disagree with each other, and you can change your mind. I would like you to feel comfortable saying what you really think and how you really feel.
This guide was designed for year one participants one year after they had participated in training (month 22 of project).
C-1
Discuss procedure ______ (colleague) will be taking notes and tape recording the discussion so that I do not miss anything you have to say. I explained these procedures to you when we set up this meeting. As you know everything is confidential. No one will know who said what. I want this to be a group discussion, so feel free to respond to me and to other members in the group without waiting to be called on. However, I would appreciate it if only one person did talk at a time. The discussion will last approximately one hour. There is a lot I want to discuss, so at times I may move us along a bit. Participant introduction Now, let's start by everyone sharing their name, what they teach, and how long they've been teaching. Rapport building I want each of you to think of an adjective that best described your teaching prior to the workshop experience and one that describes it following the experience. If you do not think your teaching has changed, you may select one adjective. We're going to go around the room so you can share your choices. Please briefly explain why you selected the adjective(s) you did. Interview What types of standards-based practice have you exposed students to since your participation in the workshops? Probes: If there were standards not mentioned - Has anyone exposed students to ______? If not Why not? Would you have exposed students to these practices if you had not participated in the workshops? Probes: Where would you have gotten this information? How would the information have been different? How have you exposed students to these practices since completing the workshops? Probes: Tell me more about that. How did that work?
C-2
Of the materials introduced to you through the workshops, which ones have you used? Probes: Had you used these prior to the workshops? Tell me more about why you used these. Tell me more about how you used these. Of the materials not mentioned -- Has anyone used _______? Tell me why not. Of these materials, which have you found the most useful? Probes: Tell me more about why you have found this most useful. Of the materials not mentioned - Why haven't you found _______ useful? How could it be more useful? Of the strategies introduced to you through the workshops, which ones have you applied to your teaching? Probes: Tell me about how you have used this strategy. Of the strategies not mentioned Has anyone tried ______? Tell me why not. Of these strategies which ones have been most effective? Probes: Tell me why you think they have been effective. Which have you found to be least effective? Probes: Tell me why you think they have not been effective. It's interesting, ______ found that strategy to be effective, what do you think may account for the difference? What problems/obstacles have you faced in attempting to incorporate into your teaching the knowledge and skills you received through the workshops? Probes: Tell me more about that. How many of you have shared information from the workshops with other faculty? Probes: Tell me about what you shared. Tell me about why you choose to share that aspect of the workshop? How did this happen (through presentations, faculty meetings, informal conversations, etc.) How have the other faculty responded? What concepts and practices have they adopted? C-3
Has your experience in the workshops resulted in efforts, by you, your Chair, and/or Dean to make changes to the curriculum? Probes: Tell me more about that. Tell me why you think this has/ has not happened. What about examinations and other requirements? Similar probes to above Since completing the workshops, describe for me any changes in your use of the library or resource center and purchase of educational materials. Probes: How much more money would you say you've spent? Have you faced any problems with obtaining the resources you've requested? Describe for me any changes you noticed in your students since your participation in the workshops. Probes: Have their interest levels increased? How do you know that? Why do you think that is? How have your changes affected their active participation? What about their knowledge base? Skills? Anything else? Describe for me the most beneficial aspects of the workshops for you as an instructor? Probes: That's interesting, tell me more about that. If you were designing these workshops in the future, how would you improve them? Probes: Any ideas of how to best do that? What areas do you feel you need more training in? Probes: Why do you say that? What would be the best avenue(s) for receiving that training?
C-4
Closure Though there were many different opinions about _______, it appears unanimous that _______. Does anyone see it differently? It seems most of you agree ______, but some think that _____. Does anyone want to add or clarify an opinion on this? Is there any other information regarding your experience with or following the workshops that you think would be useful for me to know? Thank you very much for coming this afternoon. Your time is very much appreciated and your comments have been very helpful.
C-5
4-1
Although distinctly different from quantitative statistical analyses both in procedures and goals, good qualitative analysis is both systematic and intensely disciplined.
Although distinctly different from quantitative statistical analysis both in procedures and goals, good qualitative analysis is both systematic and intensely disciplined. If not objective in the strict positivist sense, qualitative analysis is arguably replicable insofar as others can be walked through the analyst's thought processes and assumptions. Timing also works quite differently in qualitative evaluation. Quantitative evaluation is more easily divided into discrete stages of instrument development, data collection, data processing, and data analysis. By contrast, in qualitative evaluation, data collection and data analysis are not temporally discrete stages: as soon as the first pieces of data are collected, the evaluator begins the process of making sense of the information. Moreover, the different processes involved in qualitative analysis also overlap in time. Part of what distinguishes qualitative analysis is a loop-like pattern of multiple rounds of revisiting the data as additional questions emerge, new connections are unearthed, and more complex formulations develop along with a deepening understanding of the material. Qualitative analysis is fundamentally an iterative set of processes. At the simplest level, qualitative analysis involves examining the assembled relevant data to determine how they answer the evaluation question(s) at hand. However, the data are apt to be in formats that are unusual for quantitative evaluators, thereby complicating this task. In quantitative analysis of survey results, for example, frequency distributions of responses to specific items on a questionnaire often structure the discussion and analysis of findings. By contrast, qualitative data most often occur in more embedded and less easily reducible or distillable forms than quantitative data. For example, a relevant piece of qualitative data might be interspersed portions of an interview transcript, multiple excerpts from a set of field notes, or a comment or cluster of comments from a focus group. Throughout the course of qualitative analysis, the analyst should be asking and reasking the following questions: What patterns and common themes emerge in responses dealing with specific items? How do these patterns (or lack thereof) help to illuminate the broader study question(s)? Are there any deviations from these patterns? If yes, are there any factors that might explain these atypical responses? What interesting stories emerge from the responses? How can these stories help to illuminate the broader study question(s)? Do any of these patterns or findings suggest that additional data may need to be collected? Do any of the study questions need to be revised?
4-2
Do the patterns that emerge corroborate the findings of any corresponding qualitative analyses that have been conducted? If not, what might explain these discrepancies?
Two basic forms of qualitative analysis, essentially the same in their underlying logic, will be discussed: intra-case analysis and crosscase analysis. A case may be differently defined for different analytic purposes. Depending on the situation, a case could be a single individual, a focus group session, or a program site (Berkowitz, 1996). In terms of the hypothetical project described in Chapter 2, a case will be a single campus. Intra-case analysis will examine a single project site, and cross-case analysis will systematically compare and contrast the eight campuses.
Data Reduction First, the mass of data has to be organized and somehow meaningfully reduced or reconfigured. Miles and Huberman (1994) describe this first of their three elements of qualitative data analysis as data reduction. Data reduction refers to the process of selecting, focusing, simplifying, abstracting, and transforming the data that appear in written up field notes or transcriptions. Not only do the data need to be condensed for the sake of manageability, they also have to be transformed so they can be made intelligible in terms of the issues being addressed. Data reduction often forces choices about which aspects of the assembled data should be emphasized, minimized, or set aside completely for the purposes of the project at hand. Beginners often fail to understand that even at this stage, the data do not speak for themselves. A common mistake many people make in quantitative as well as qualitative analysis, in a vain effort to remain perfectly
Data reduction refers to the process of selecting, focusing, simplifying, abstracting, and transforming the data that appear in written up field notes or transcriptions.
4-3
objective, is to present a large volume of unassimilated and uncategorized data for the reader's consumption. In qualitative analysis, the analyst decides which data are to be singled out for description according to principles of selectivity. This usually involves some combination of deductive and inductive analysis. While initial categorizations are shaped by preestablished study questions, the qualitative analyst should remain open to inducing new meanings from the data available. In evaluation, such as the hypothetical evaluation project in this handbook, data reduction should be guided primarily by the need to address the salient evaluation question(s). This selective winnowing is difficult, both because qualitative data can be very rich, and because the person who analyzes the data also often played a direct, personal role in collecting them. The words that make up qualitative analysis represent real people, places, and events far more concretely than the numbers in quantitative data sets, a reality that can make cutting any of it quite painful. But the acid test has to be the relevance of the particular data for answering particular questions. For example, a formative evaluation question for the hypothetical study might be whether the presentations were suitable for all participants. Focus group participants may have had a number of interesting things to say about the presentations, but remarks that only tangentially relate to the issue of suitability may have to be bracketed or ignored. Similarly, a participants comments on his department chair that are unrelated to issues of program implementation or impact, however fascinating, should not be incorporated into the final report. The approach to data reduction is the same for intra-case and cross-case analysis. With the hypothetical project of Chapter 2 in mind, it is illustrative to consider ways of reducing data collected to address the question what did participating faculty do to share knowledge with nonparticipating faculty? The first step in an intra-case analysis of the issue is to examine all the relevant data sources to extract a description of what they say about the sharing of knowledge between participating and nonparticipating faculty on the one campus. Included might be information from focus groups, observations, and indepth interviews of key informants, such as the department chair. The most salient portions of the data are likely to be concentrated in certain sections of the focus group transcripts (or write-ups) and indepth interviews with the department chair. However, it is best to also quickly peruse all notes for relevant data that may be scattered throughout. In initiating the process of data reduction, the focus is on distilling what the different respondent groups suggested about the activities used to share knowledge between faculty who participated in the
4-4
project and those who did not. How does what the participating faculty say compare to what the nonparticipating faculty and the department chair report about knowledge sharing and adoption of new practices? In setting out these differences and similarities, it is important not to so flatten or reduce the data that they sound like close-ended survey responses. The tendency to treat qualitative data in this manner is not uncommon among analysts trained in quantitative approaches. Not surprisingly, the result is to make qualitative analysis look like watered down survey research with a tiny sample size. Approaching qualitative analysis in this fashion unfairly and unnecessarily dilutes the richness of the data and, thus, inadvertently undermines one of the greatest strengths of the qualitative approach. Answering the question about knowledge sharing in a truly qualitative way should go beyond enumerating a list of knowledgesharing activities to also probe the respondents' assessments of the relative effectiveness of these activities, as well as their reasons for believing some more effective than others. Apart from exploring the specific content of the respondents' views, it is also a good idea to take note of the relative frequency with which different issues are raised, as well as the intensity with which they are expressed.
Data Display Data display is the second element or level in Miles and Huberman's (1994) model of qualitative data analysis. Data display goes a step beyond data reduction to provide an organized, compressed assembly of information that permits conclusion drawing... A display can be an extended piece of text or a diagram, chart, or matrix that provides a new way of arranging and thinking about the more textually embedded data. Data displays, whether in word or diagrammatic form, allow the analyst to extrapolate from the data enough to begin to discern systematic patterns and interrelationships. At the display stage, additional, higher order categories or themes may emerge from the data that go beyond those first discovered during the initial process of data reduction. From the perspective of program evaluation, data display can be extremely helpful in identifying why a system (e.g., a given program or project) is or is not working well and what might be done to change it. The overarching issue of why some projects work better or are more successful than others almost always drives the analytic process in any evaluation. In our hypothetical evaluation example, faculty from all eight campuses come together at the central campus to attend workshops. In that respect, all participants are exposed to the identical program. However, implementation of teaching
At the display stage, additional, higher order categories or themes may emerge from the data that go beyond those first discovered during the initial process of data reduction.
4-5
techniques presented at the workshop will most likely vary from campus to campus based on factors such as the participants personal characteristics, the differing demographics of the student bodies, and differences in the university and departmental characteristics (e.g., size of the student body, organization of preservice courses, department chairs support of the program goals, departmental receptivity to change and innovation). The qualitative analyst will need to discern patterns of interrelationships to suggest why the project promoted more change on some campuses than on others. One technique for displaying narrative data is to develop a series of flow charts that map out any critical paths, decision points, and supporting evidence that emerge from establishing the data for a single site. After the first flow chart has been developed, the process can be repeated for all remaining sites. Analysts may (1) use the data from subsequent sites to modify the original flow chart; (2) prepare an independent flow chart for each site; and/or (3) prepare a single flow chart for some events (if most sites adopted a generic approach) and multiple flow charts for others. Examination of the data display across the eight campuses might produce a finding that implementation proceeded more quickly and effectively on those campuses where the department chair was highly supportive of trying new approaches to teaching but was stymied and delayed when department chairs had misgivings about making changes to a triedand-true system. Data display for intra-case analysis. Exhibit 10 presents a data display matrix for analyzing patterns of response concerning perceptions and assessments of knowledge-sharing activities for one campus. We have assumed that three respondent unitsparticipating faculty, nonparticipating faculty, and department chairshave been asked similar questions. Looking at column (a), it is interesting that the three respondent groups were not in total agreement even on which activities they named. Only the participants considered e-mail a means of sharing what they had learned in the program with their colleagues. The nonparticipant colleagues apparently viewed the situation differently, because they did not include e-mail in their list. The department chairperhaps because she was unaware they were taking placedid not mention e-mail or informal interchanges as knowledge-sharing activities. Column (b) shows which activities each group considered most effective as a way of sharing knowledge, in order of perceived importance; column (c) summarizes the respondents' reasons for regarding those particular activities as most effective. Looking down column (b), we can see that there is some overlap across groupsfor example, both the participants and the department chair believed structured seminars were the most effective knowledge-sharing
4-6
activity. Nonparticipants saw the structured seminars as better than lunchtime meetings, but not as effective as informal interchanges.
Exhibit 10. Data matrix for Campus A: What was done to share knowledge (a) Respondent group Activities named (b) Which most effective (c) Why
Participants
Structured seminars E-mail Informal interchanges Lunch time meetings Structured seminars Informal interchanges Lunch time meetings
Nonparticipants
Easier to assimilate information in less formal settings Smaller bits of information at a time Highest attendance by nonparticipants Most comments (positive) to chair
Department chair
Structured seminars
Simply knowing what each set of respondents considered most effective, without knowing why, would leave out an important piece of the analytic puzzle. It would rob the qualitative analyst of the chance to probe potentially meaningful variations in underlying conceptions of what defines effectiveness in an educational exchange. For example, even though both participating faculty and the department chair agreed on the structured seminars as the most effective knowledge-sharing activity, they gave somewhat different reasons for making this claim. The participants saw the seminars as the most effective way of communicating a lot of information concisely. The department chair used indirect indicatorsattendance rates of nonparticipants at the seminars, as well as favorable comments on the seminars volunteered to herto formulate her judgment of effectiveness. It is important to recognize the different bases on which the respondents reached the same conclusions. Several points concerning qualitative analysis emerge from this relatively straightforward and preliminary exercise. First, a pattern of cross-group differences can be discerned even before we analyze the responses concerning the activities regarded as most effective, and why. The open-ended format of the question allowed each group to give its own definition of knowledge-sharing activities. The point of the analysis is not primarily to determine which activities
4-7
were used and how often; if that were the major purpose of asking this question, there would be far more efficient ways (e.g., a checklist or rating scale) to find the answer. From an analytic perspective, it is more important to begin to uncover relevant group differences in perceptions. Differences in reasons for considering one activity more effective than another might also point to different conceptions of the primary goals of the knowledge-sharing activities. Some of these variations might be attributed to the fact that the respondent groups occupy different structural positions in life and different roles in this specific situation. While both participating and nonparticipating faculty teach in the same department, in this situation the participating faculty are playing a teaching role vis-a-vis their colleagues. The data in column (c) indicate the participants see their main goal as imparting a great deal of information as concisely as possible. By contrast, the nonparticipantsin the role of studentsbelieve they assimilate the material better when presented with smaller quantities of information in informal settings. Their different approaches to the question might reflect different perceptions based on this temporary rearrangement in their roles. The department chair occupies a different structural position in the university than either the participating or nonparticipating faculty. She may be too removed from day-to-day exchanges among the faculty to see much of what is happening on this more informal level. By the same token, her removal from the grassroots might give her a broader perspective on the subject. Data display in cross-case analysis. The principles applied in analyzing across cases essentially parallel those employed in the intra-case analysis. Exhibit 11 shows an example of a hypothetical data display matrix that might be used for analysis of program participants responses to the knowledge-sharing question across all eight campuses. Looking down column (a), one sees differences in the number and variety of knowledge-sharing activities named by participating faculty at the eight schools. Brown bag lunches, department newsletters, workshops, and dissemination of written (hard-copy) materials have been added to the list, which for branch campus A included only structured seminars, e-mail, informal interchanges, and lunchtime meetings. This expanded list probably encompasses most, if not all, such activities at the eight campuses. In addition, where applicable, we have indicated whether the nonparticipating faculty involvement in the activity was compulsory or voluntary. In Exhibit 11, we are comparing the same group on different campuses, rather than different groups on the same campus, as in Exhibit 10. Column (b) reveals some overlap across participants in which activities were considered most effective: structured seminars were named by participants at campuses A and C, brown bag lunches
4-8
Exhibit 11. Participants views of information sharing at eight campuses Branch campus (a) Activities named (b) Which most effective (c) Why
Structured seminar (voluntary) E-mail Informal interchanges Lunchtime meetings Brown bags E-mail Department newsletter Workshops (voluntary) Structured seminar (compulsory) Informal interchanges Dissemination of written materials Structured seminars (compulsory) Workshops (voluntary) E-mail Dissemination of materials Workshops (compulsory) Structured seminar Informal interchanges Lunch meetings Brown bags E-mail Dissemination of materials
Brown bags
Most interactive
Structured seminar
Compulsory Structured format works well Dissemination important but not enough without personal touch Voluntary hands-on approach works best
Workshops
Dissemination of materials
Not everyone regularly uses e-mail Compulsory workshops rested as coercive Best time
Lunch meetings
Brown bags
Relaxed environment
4-9
by those at campuses B and H. However, as in Exhibit 10, the primary reasons for naming these activities were not always the same. Brown bag lunches were deemed most effective because of their interactive nature (campus B) and the relaxed environment in which they took place (campus H), both suggesting a preference for less formal learning situations. However, while campus A participants judged voluntary structured seminars the most effective way to communicate a great deal of information, campus C participants also liked that the structured seminars on their campus were compulsory. Participants at both campuses appear to favor structure, but may part company on whether requiring attendance is a good idea. The voluntary/compulsory distinction was added to illustrate different aspects of effective knowledge sharing that might prove analytically relevant. It would also be worthwhile to examine the reasons participants gave for deeming one activity more effective than another, regardless of the activity. Data in column (c) show a tendency for participants on campuses B, D, E, F, and H to prefer voluntary, informal, hands-on, personal approaches. By contrast, those from campuses A and C seemed to favor more structure (although they may disagree on voluntary versus compulsory approaches). The answer supplied for campus G (best time) is ambiguous and requires returning to the transcripts to see if more material can be found to clarify this response. To have included all the knowledge-sharing information from four different respondent groups on all eight campuses in a single matrix would have been quite complicated. Therefore, for clarity's sake, we present only the participating faculty responses. However, to complete the cross-case analysis of this evaluation question, the same procedure should be followedif not in matrix format, then conceptuallyfor nonparticipating faculty and department chairpersons. For each group, the analysis would be modeled on the above example. It would be aimed at identifying important similarities and differences in what the respondents said or observed and exploring the possible bases for these patterns at different campuses. Much of qualitative analysis, whether intra-case or crosscase, is structured by what Glaser and Strauss (1967) called the method of constant comparison, an intellectually disciplined process of comparing and contrasting across instances to establish significant patterns, then further questioning and refinement of these patterns as part of an ongoing analytic process.
4-10
Conclusion Drawing and Verification This activity is the third element of qualitative analysis. Conclusion drawing involves stepping back to consider what the analyzed data mean and to assess their implications for the questions at hand.6 Verification, integrally linked to conclusion drawing, entails revisiting the data as many times as necessary to cross-check or verify these emergent conclusions. The meanings emerging from the data have to be tested for their plausibility, their sturdiness, their confirmabilitythat is, their validity (Miles and Huberman, 1994, p. 11). Validity means something different in this context than in quantitative evaluation, where it is a technical term that refers quite specifically to whether a given construct measures what it purports to measure. Here validity encompasses a much broader concern for whether the conclusions being drawn from the data are credible, defensible, warranted, and able to withstand alternative explanations. For many qualitative evaluators, it is above all this third phase that gives qualitative analysis its special appeal. At the same time, it is probably also the facet that quantitative evaluators and others steeped in traditional quantitative techniques find most disquieting. Once qualitative analysts begin to move beyond cautious analysis of the factual data, the critics ask, what is to guarantee that they are not engaging in purely speculative flights of fancy? Indeed, their concerns are not entirely unfounded. If the unprocessed data heap is the result of not taking responsibility for shaping the story line of the analysis, the opposite tendency is to take conclusion drawing well beyond what the data reasonably warrant or to prematurely leap to conclusions and draw implications without giving the data proper scrutiny. The question about knowledge sharing provides a good example. The underlying expectation, or hope, is for a diffusion effort, wherein participating faculty stimulate innovation in teaching mathematics among their colleagues. A cross-case finding might be that participating faculty at three of the eight campuses made active, ongoing efforts to share their new knowledge with their colleagues in a variety of formal and informal settings. At two other campuses, initial efforts at sharing started strong but soon fizzled out and were not continued. In the remaining three cases, one or two faculty participants shared bits and pieces of what they had learned with a few selected colleagues on an ad hoc basis, but otherwise took no steps to diffuse their new knowledge and skills more broadly.
Conclusion drawing involves stepping back to consider what the analyzed data mean and to assess their implications for the questions at hand.
If the unprocessed data heap is the result of not taking responsibility for shaping the story line of the analysis, the opposite tendency is to take conclusion drawing well beyond what the data reasonably warrant.
When qualitative data are used as a precursor to the design/development of quantitative instruments, this step may be postponed. Reducing the data and looking for relationships will provide adequate information for developing other instruments.
4-11
Taking these findings at face value might lead one to conclude that the project had largely failed in encouraging diffusion of new pedagogical knowledge and skills to nonparticipating faculty. After all, such sharing occurred in the desired fashion at only three of the eight campuses. However, before jumping ahead to conclude that the project was disappointing in this respect, or to generalize beyond this case to other similar efforts at spreading pedagogic innovations among faculty, it is vital to examine more closely the likely reasons why sharing among participating and nonparticipating faculty occurred, and where and how it did. The analysts would first look for factors distinguishing the three campuses where ongoing organized efforts at sharing did occur from those where such efforts were either not sustained or occurred in largely piecemeal fashion. However, it will also be important to differentiate among the less successful sites to tease out factors related both to the extent of sharing and the degree to which activities were sustained. One possible hypothesis would be that successfully sustaining organized efforts at sharing on an ongoing basis requires structural supports at the departmental level and/or conducive environmental conditions at the home campus. In the absence of these supports, a great burst of energy and enthusiasm at the beginning of the academic year will quickly give way under the pressure of the myriad demands, as happened for the second group of two campuses. Similarly, under most circumstances, the individual good will of one or two participating faculty on a campus will in itself be insufficient to generate the type and level of exchange that would make a difference to the nonparticipating faculty (the third set of campuses). At the three "successful" sites, for example, faculty schedules may allow regularly scheduled common periods for colleagues to share ideas and information. In addition, participation in such events might be encouraged by the department chair, and possibly even considered as a factor in making promotion and tenure decisions. The department might also contribute a few dollars for refreshments in order to promote a more informal, relaxed atmosphere at these activities. In other words, at the campuses where sharing occurred as desired, conditions were conducive in one or more ways: a new time slot did not have to be carved out of already crowded faculty schedules, the department chair did more than simply pay "lip service" to the importance of sharing (faculty are usually quite astute at picking up on what really matters in departmental culture), and efforts were made to create a relaxed ambiance for transfer of knowledge.
4-12
At some of the other campuses, structural conditions might not be conducive, in that classes are taught continuously from 8 a.m. through 8 p.m., with faculty coming and going at different times and on alternating days. At another campus, scheduling might not present so great a hurdle. However, the department chair may be so busy that despite philosophic agreement with the importance of diffusing the newly learned skills, she can do little to actively encourage sharing among participating and nonparticipating faculty. In this case, it is not structural conditions or lukewarm support so much as competing priorities and the department chair's failure to act concretely on her commitment that stood in the way. By contrast, at another campus, the department chairperson may publicly acknowledge the goals of the project but really believe it a waste of time and resources. His failure to support sharing activities among his faculty stems from more deeply rooted misgivings about the value and viability of the project. This distinction might not seem to matter, given that the outcome was the same on both campuses (sharing did not occur as desired). However, from the perspective of an evaluation researcher, whether the department chair believes in the project could make a major difference to what would have to be done to change the outcome. We have begun to develop a reasonably coherent explanation for the cross-site variations in the degree and nature of sharing taking place between participating and nonparticipating faculty. Arriving at this point required stepping back and systematically examining and reexamining the data, using a variety of what Miles and Huberman (1994, pp. 245-262) call "tactics for generating meaning." They describe 13 such tactics, including noting patterns and themes, clustering cases, making contrasts and comparisons, partitioning variables, and subsuming particulars in the general. Qualitative analysts typically employ some or all of these, simultaneously and iteratively, in drawing conclusions. One factor that can impede conclusion drawing in evaluation studies is that the theoretical or logical assumptions underlying the research are often left unstated. In this example, as discussed above, these are assumptions or expectations about knowledge sharing and diffusion of innovative practices from participating to nonparticipating faculty, and, by extension, to their students. For the analyst to be in a position to take advantage of conclusion-drawing opportunities, he or she must be able to recognize and address these assumptions, which are often only implicit in the evaluation questions. Toward that end, it may be helpful to explicitly spell out a "logic model" or set of assumptions as to how the program is expected to achieve its desired outcome(s.) Recognizing these assumptions becomes even more important when there is a need or
It may be helpful to explicitly spell out a "logic model" or set of assumptions as to how the program is expected to achieve its desired outcome(s.)
4-13
desire to place the findings from a single evaluation into wider comparative context vis-a-vis other program evaluations. Once having created an apparently credible explanation for variations in the extent and kind of sharing that occurs between participating and nonparticipating faculty across the eight campuses, how can the analyst verify the validityor truth valueof this interpretation of the data? Miles and Huberman (1994, pp. 262-277) outline 13 tactics for testing or confirming findings, all of which address the need to build systematic "safeguards against self-delusion" (p. 265) into the process of analysis. We will discuss only a few of these, which have particular relevance for the example at hand and emphasize critical contrasts between quantitative and qualitative analytic approaches. However, two points are very important to stress at the outset: several of the most important safeguards on validity-such as using multiple sources and modes of evidencemust be built into the design from the beginning; and the analytic objective is to create a plausible, empirically grounded account that is maximally responsive to the evaluation questions at hand. As the authors note: "You are not looking for one account, forsaking all others, but for the best of several alternative accounts" (p. 274). One issue of analytic validity that often arises concerns the need to weigh evidence drawn from multiple sources and based on different data collection modes, such as self-reported interview responses and observational data. Triangulation of data sources and modes is critical, but the results may not necessarily corroborate one another, and may even conflict. For example, another of the summative evaluation questions proposed in Chapter 2 concerns the extent to which nonparticipating faculty adopt new concepts and practices in their teaching. Answering this question relies on a combination of observations, self-reported data from participant focus groups, and indepth interviews with department chairs and nonparticipating faculty. In this case, there is a possibility that the observational data might be at odds with the self-reported data from one or more of the respondent groups. For example, when interviewed, the vast majority of nonparticipating faculty might say, and really believe, that they are applying project-related innovative principles in their teaching. However, the observers may see very little behavioral evidence that these principles are actually influencing teaching practices in these faculty members' classrooms. It would be easy to brush off this finding by concluding that the nonparticipants are saving face by parroting what they believe they are expected to say about their teaching. But there are other, more analytically interesting, possibilities. Perhaps the nonparticipants have an incomplete understanding of these principles, or they were not adequately trained in how to translate them effectively into classroom practice.
4-14
Analyzing across multiple group perspectives and different types of data is not a simple matter of deciding who is right or which data are most accurate.
The important point is that analyzing across multiple group perspectives and different types of data is not a simple matter of deciding who is right or which data are most accurate. Weighing the evidence is a more subtle and delicate matter of hearing each group's viewpoint, while still recognizing that any single perspective is partial and relative to the respondent's experiences and social position. Moreover, as noted above, respondents' perceptions are no more or less real than observations. In fact, discrepancies between self-reported and observational data may reveal profitable topics or areas for further analysis. It is the analyst's job to weave the various voices and sources together in a narrative that responds to the relevant evaluation question(s). The more artfully this is done, the simpler, more natural it appears to the reader. To go to the trouble to collect various types of data and listen to different voices, only to pound the information into a flattened picture, is to do a real disservice to qualitative analysis. However, if there is a reason to believe that some of the data are stronger than others (some of the respondents are highly knowledgeable on the subject, while others are not), it is appropriate to give these responses greater weight in the analysis. Qualitative analysts should also be alert to patterns of interconnection in their data that differ from what might have been expected. Miles and Huberman define these as following up surprises (1994, p. 270). For instance, at one campus, systematically comparing participating and nonparticipating faculty responses to the question about knowledge-sharing activities (see Exhibit 10) might reveal few apparent cross-group differences. However, closer examination of the two sets of transcripts might show meaningful differences in perceptions dividing along other, less expected lines. For purposes of this evaluation, it was tacitly assumed that the relevant distinctions between faculty would most likely be between those who had and had not participated in the project. However, both groups also share a history as faculty in the same department. Therefore, other factorssuch as prior personal tiesmight have overridden the participant/nonparticipant faculty distinction. One strength of qualitative analysis is its potential to discover and manipulate these kinds of unexpected patterns, which can often be very informative. To do this requires an ability to listen for, and be receptive to, surprises. Unlike quantitative researchers, who need to explain away deviant or exceptional cases, qualitative analysts are also usually delighted when they encounter twists in their data that present fresh analytic insights or challenges. Miles and Huberman (1994, pp. 269, 270) talk about checking the meaning of outliers and using extreme cases. In qualitative analysis deviant instances or cases that do not appear to fit the pattern or trend are not treated as outliers,
Qualitative analysts should also be alert to patterns of interconnection in their data that differ from what might have been expected.
4-15
as they would be in statistical, probability-based analysis. Rather, deviant or exceptional cases should be taken as a challenge to further elaboration and verification of an evolving conclusion. For example, if the department chair strongly supports the project's aims and goals for all successful projects but one, perhaps another set of factors is fulfilling the same function(s) at the deviant site. Identifying those factors will, in turn, help to clarify more precisely what it is about strong leadership and belief in a project that makes a difference. Or, to elaborate on another extended example, suppose at one campus where structural conditions are not conducive to sharing between participating and nonparticipating faculty, such sharing is occurring nonetheless, spearheaded by one very committed participating faculty member. This example might suggest that a highly committed individual who is a natural leader among his faculty peers is able to overcome the structural constraints to sharing. In a sense, this deviant case analysis would strengthen the general conclusion by showing that it takes exceptional circumstances to override the constraints of the situation. Elsewhere in this handbook, we noted that summative and formative evaluations are often linked by the premise that variations in project implementation will, in turn, effect differences in project outcomes. In the hypothetical example presented in this handbook, all participants were exposed to the same activities on the central campus, eliminating the possibility of analyzing the effects of differences in implementation features. However, using a different model and comparing implementation and outcomes at three different universities, with three campuses participating per university, would give some idea of what such an analysis might look like. A display matrix for a cross-site evaluation of this type is given in Exhibit 12. The upper portion of the matrix shows how the three campuses varied in key implementation features. The bottom portion summarizes outcomes at each campus. While we would not necessarily expect a one-to-one relationship, the matrix loosely pairs implementation features with outcomes with which they might be associated. For example, workshop staffing and delivery are paired with knowledge-sharing activities, accuracy of workshop content with curricular change. However, there is nothing to preclude looking for a relationship between use of appropriate techniques in the workshops (formative) and curricular changes on the campuses (summative). Use of the matrix would essentially guide the analysis along the same lines as in the examples provided earlier.
4-16
Exhibit 12. Matrix of cross-case analysis linking implementation and outcome factors Implementation Features Workshops Branch campus delivered and staffed as planned? Content accurate/ up to date?
Materials available?
Suitable presentation?
Campus A
Yes
Yes
Mostly
Campus B Campus C
No Mostly
Yes Yes
No Yes
Outcome Features - Participating Campuses Knowledge -sharing Branch campus with nonparticipants? Curricular changes? Changes to exams and requirements? Expenditures? Students more interested/ active in class?
Campus A Campus B
Many Many
Some Many
No Yes
Campus C
Moderate level
Only a few
Few
Yes
In this cross-site analysis, the overarching question would address the similarities and differences across these three sitesin terms of project implementation, outcomes, and the connection between themand investigate the bases of these differences. Was one of the projects discernibly more successful than others, either overall or in particular areasand if so, what factors or configurations of factors seem to have contributed to these successes? The analysis would then continue through multiple iterations until a satisfactory resolution is achieved.
4-17
First, although stated in different ways, there is broad consensus concerning the qualitative analyst's need to be self-aware, honest, and reflective about the analytic process. Analysis is not just the end product, it is also the repertoire of processes used to arrive at that particular place. In qualitative analysis, it is not necessary or even desirable that anyone else who did a similar study should find exactly the same thing or interpret his or her findings in precisely the same way. However, once the notion of analysis as a set of uniform, impersonal, universally applicable procedures is set aside, qualitative analysts are obliged to describe and discuss how they did their work in ways that are, at the very least, accessible to other researchers. Open and honest presentation of analytic processes provides an important check on an individual analysts tendencies to get carried away, allowing others to judge for themselves whether the analysis and interpretation are credible in light of the data. Second, qualitative analysis, as all of qualitative research, is in some ways craftsmanship (Kvale, 1995). There is such a thing as poorly crafted or bad qualitative analysis, and despite their reluctance to issue universal criteria, seasoned qualitative researchers of different bents can still usually agree when they see an example of it. Analysts should be judged partly in terms of how skillfully, artfully, and persuasively they craft an argument or tell a story. Does the analysis flow well and make sense in relation to the study's objectives and the data that were presented? Is the story line clear and convincing? Is the analysis interesting, informative, provocative? Does the analyst explain how and why she or he drew certain conclusions, or on what bases she or he excluded other possible interpretations? These are the kinds of questions that can and should be asked in judging the quality of qualitative analyses. In evaluation studies, analysts are often called upon to move from conclusions to recommendations for improving programs and policies. The recommendations should fit with the findings and with the analysts understanding of the context or milieu of the study. It is often useful to bring in stakeholders at the point of translating analytic conclusions to implications for action. As should by now be obvious, it is truly a mistake to imagine that qualitative analysis is easy or can be done by untrained novices. As Patton (1990) comments:
Because each qualitative study is unique, the analytical
approach used
will be unique.
Applying guidelines requires judgment and creativity. Because each qualitative study is unique, the analytical approach used will be unique. Because qualitative inquiry depends, at every stage, on the skills, training, insights, and capabilities of the researcher, qualitative analysis ultimately depends on the analytical intellect and style of the analyst. The human factor is the greatest
4-18
Be selective when using computer software packages in qualitative analysis: A great proliferation of software packages that can be used to aid analysis of qualitative data has been developed in recent years. Most of these packages were reviewed by Weitzman and Miles (1995), who grouped them into six types: word processors, word retrievers, textbase managers, code-and-retrieve programs, code-based theory builders, and conceptual network
4-19
builders. All have strengths and weaknesses. Weitzman and Miles suggested that when selecting a given package, researchers should think about the amount, types, and sources of data to be analyzed and the types of analyses that will be performed. Two caveats are in order. First, computer software packages for qualitative data analysis essentially aid in the manipulation of relevant segments of text. While helpful in marking, coding, and moving data segments more quickly and efficiently than can be done manually, the software cannot determine meaningful categories for coding and analysis or define salient themes or factors. In qualitative analysis, as seen above, concepts must take precedence over mechanics: the analytic underpinnings of the procedures must still be supplied by the analyst. Software packages cannot and should not be used as a way of evading the hard intellectual labor of qualitative analysis. Second, since it takes time and resources to become adept in utilizing a given software package and learning its peculiarities, researchers may want to consider whether the scope of their project, or their ongoing needs, truly warrant the investment.
References
Berkowitz, S. (1996). Using Qualitative and Mixed Method Approaches. Chapter 4 in Needs Assessment: A Creative and Practical Guide for Social Scientists, R. Reviere, S. Berkowitz, C.C. Carter, and C. Graves-Ferguson, Eds. Washington, DC: Taylor & Francis. Glaser, B., and Strauss, A. (1967). Theory. Chicago: Aldine. The Discovery of Grounded
Kvale, S. (1995). The Social Construction of Validity. Qualitative Inquiry, (1):19-40. Miles, M.B., and Huberman, A.M. (1984). Analysis, 16. Newbury Park, CA: Sage. Qualitative Data
Miles, M.B, and Huberman, A.M. (1994). Qualitative Data Analysis, 2nd Ed., p. 10-12. Newbury Park, CA: Sage. Patton, M.Q. (1990). Qualitative Evaluation and Research Methods, 2nd Ed. Newbury Park: CA, Sage. Weitzman, E.A., and Miles, M.B. (1995). A Software Sourcebook: Computer Programs for Qualitative Data Analysis. Thousand Oaks, CA: Sage.
4-20
Coffey, A., and Atkinson, P. (1996). Making Sense of Qualitative Data: Complementary Research Strategies. Thousand Oaks, CA: Sage. Howe, K., and Eisenhart, M. (1990). Standards for Qualitative (and Quantitative) Research: A Prolegomenon. Educational Researcher, 19(4):2-9. Wolcott, H.F. (1994). Transforming Qualitative Data: Description, Analysis and Interpretation, Thousand Oaks: CA, Sage.
4-21
One size does not fit all. Consequently, when it comes to designing an evaluation, experience has proven that the evaluator must keep in mind that the specific questions being addressed and the audience for the answers must influence the selection of an evaluation design and tools for data collection. Chapter 2 of the earlier User-Friendly Handbook for Project Evaluation (National Science Foundation, 1993) deals at length with designing and implementing an evaluation, identifying the following steps for carrying out an evaluation: Developing evaluation questions; Matching questions with appropriate information-gathering techniques; Collecting data; Analyzing the data; and Providing information to interested audiences.
Readers of this volume who are unfamiliar with the overall process are urged to read that chapter. In this chapter, we are briefly reviewing the process of designing an evaluation, including the development of evaluation questions, the selection of data collection methodologies, and related technical issues, with special attention to the advantages of mixed method designs. We are stressing mixed method designs because such designs frequently provide a more comprehensive and believable set of understandings about a projects accomplishments than studies based on either quantitative or qualitative data alone.
5-1
The process is not an easy one. To quote an experienced evaluator (Patton, 1990): Once a group of intended evaluation users begins to take seriously the notion that they can learn from the collection and analysis of evaluative information, they soon find that there are lots of things they would like to find out. The evaluator's role is to help them move from a rather extensive list of potential questions to a much shorter list of realistically possible questions and finally to a focused list of essential and necessary questions. We have developed a set of tools intended to help navigate these initial steps of evaluation design. These tools are simple forms or matrices that help to organize the information needed to identify and select among evaluation questions. Since the objectives of the formative and summative evaluations are usually different, separate forms need to be completed for each. Worksheet 1 provides a form for briefly describing the project, the conceptual framework that led to the initiation of the project, and its proposed activities, and for summarizing its salient features. Information on this form will be used in the design effort. A side benefit of filling out this form and sharing it among project staff is that it can be used to make sure that there is a common understanding of the projects basic characteristics. Sometimes newcomers to a project, and even those who have been with it from the start, begin to develop some divergent ideas about emphases and goals.
5-2
WORKSHEET 1:
DESCRIBE THE INTERVENTION
1. State the problem/question to be addressed by the project:
2.
3.
State the conceptual framework which led to the decision to undertake this intervention and its proposed activities.
4.
5.
6.
7.
What is the total budget for this project? How are major components budgeted?
8.
5-3
Worksheet 2 provides a format for further describing the goals and objectives of the project in measurable terms. This step, essential in developing an evaluation design, can prove surprisingly difficult. A frequent problem is that goals or objectives may initially be stated in such global terms that it is not readily apparent how they might be measured. For example, the statement improve the education of future mathematics and science educators needs more refinement before it can be used as the basis for structuring an evaluation. Worksheets 3 and 4 assist the evaluator in identifying the key stakeholders in the project and clarifying what it is each might want to address in an evaluation. Stakeholder involvement has become an important part of evaluation design, as it has been recognized that an evaluation must address the needs of individuals beyond the funding agency and the project director. Worksheet 5 provides a tool for organizing and selecting among possible evaluation questions. It points to several criteria that should be considered. Who wants to know? Will the information be new or confirmatory? How important is the information to various stakeholders? Are there sufficient resources to collect and analyze the information needed to answer the questions? Can the question be addressed in the time available for the evaluation? Once the set of evaluation questions is determined, the next step is selecting how each will be addressed and developing an overall evaluation design. It is at this point that decisions regarding the types and mixture of data collection methodologies, sampling, scheduling of data collection, and data analysis need to be made. These decisions are quite interdependent, and the data collection techniques selected will have important implications for both scheduling and analysis plans.
5-4
2.
3.
4.
Can this objective be broken down further? Break it down to the smallest unit. It must be clear what specifically you hope to see documented or changed.
5.
Is this objective measurable (can indicators and standards be developed for it)? If not, restate it.
6.
Formulate one or more questions that will yield information about the extent to which the objective was addressed.
7.
Once you have completed the above steps, go back to #3 and write the next objective. Continue with steps 4, and 5, and 6.
5-5
Audience
Spokesperson
5-6
5-7
Question
Which stakeholder(s)?
Importance to Stakeholders
Resources Required
Timeframe
Selecting Methods for Gathering the Data: The Case for Mixed Method Designs
As discussed in Chapter 1, mixed method designs can yield richer, more valid, and more reliable findings than evaluations based on either the qualitative or quantitative method alone. A further advantage is that a mixed method approach is likely to increase the acceptance of findings and conclusions by the diverse groups that have a stake in the evaluation. When designing a mixed method evaluation, the investigator must consider two factors: Which is the most suitable data collection method for the type of data to be collected? How can the data collected be most effectively combined or integrated?
While in any good evaluation data analysis is to some extent an iterative process, it is important to think things through as much as possible at the outset.
To recapitulate the earlier summary of the main differences between the two methods, qualitative methods provide a better understanding of the context in which the intervention is embedded; when a major goal of the evaluation is the generalizability of findings, quantitative data are usually needed. When the answer to an evaluation question calls for understanding the perceptions and reactions of the target population, a qualitative method (indepth interview, focus group) is most appropriate. If a major evaluation question calls for the assessment of the behavior of participants or other individuals involved in the intervention, trained observers will provide the most useful data. In Chapter 1, we also showed some of the many ways in which the quantitative and qualitative techniques can be combined to yield more meaningful findings. Specifically, the two methods have been successfully combined by evaluators to test the validity of results (triangulation), to improve data collection instruments, and to explain findings. A good design for mixed method evaluations should include specific plans for collecting and analyzing the data through the combined use of both methods; while it may often be difficult to come up with a detailed analysis plan at the outset, it is very useful to have such a plan when designing data collection instruments and when organizing narrative data obtained through qualitative methods. There needs to be considerable up-front thinking regarding probable data
5-9
analysis plans and strategies for synthesizing the information from various sources. Initial decisions can be made regarding the extent to which qualitative techniques will be used to provide full-blown stand-alone descriptions versus commentaries or illustrations to give greater meaning to quantitative data. Preliminary strategies for combining information from different data sources need to be formulated. Schedules for initiating the data analysis need to be established. The early findings thus generated should be used to reflect on the evaluation design and initiate any changes that might be warranted. While in any good evaluation data analysis is to some extent an iterative process, it is important to think things through as much as possible at the outset to avoid being left awash in data or with data focusing more on peripheral questions, rather than those that are germane to the studys goals and objectives (see Chapter 4; also see Miles and Huberman, 1994, and Greene, Caracelli, and Graham, 1989).
5-10
even typical case sampling, may be appropriate (Patton, 1990). When sampling subjects for indepth interviews, the investigator has considerable flexibility with respect to sample size. In many evaluation studies, the design calls for studying a population at several points in time, e.g., students in the 9th grade and then again in the 12th grade. There are two ways of carrying out such studies that seek to measure trends. In a longitudinal approach, data are collected from the same individuals at designated time intervals; in a cross-sectional approach, new samples are drawn for each successive data collection. While in most cases, longitudinal designs that require collecting information from the same students or teachers at several points in time are best, they are often difficult and expensive to carry out because students move and teachers are reassigned. Furthermore, loss of respondents due to failure to locate or to obtain cooperation from some segment of the original sample is often a major problem. Depending on the nature of the evaluation and the size of the population studied, it may be possible to obtain good results with successive cross-sectional designs. Timing, sequencing, frequency of data collection, and cost. The evaluation questions and the analysis plan will largely determine when data should be collected and how often focus groups, interviews, or observations should be scheduled. In mixed method designs, when the findings of qualitative data collection will affect the structuring of quantitative instruments (or vice versa), proper sequencing is crucial. As a general rule, project evaluations are strongest when data are collected at least at two points in time: before the time an innovation is first introduced, and after it has been in operation for a sizable period of time. Throughout the design process, it is essential to keep an eye on the budgetary implications of each decision. As was pointed out in Chapter 1, costs depend not on the choice between qualitative and quantitative methods, but on the number of cases required for analysis and the quality of the data collection. Evaluators must resist the temptation to plan for a more extensive data collection than the budget can support, which may result in lower data quality or the accumulation of raw data that cannot be processed and analyzed. Tradeoffs in the design of evaluations based on mixed methods. All evaluators find that both during the design phase, when plans are carefully crafted according to experts' recommendations, and later when fieldwork gets under way, modifications and tradeoffs become a necessity. Budget limitations, problems in accessing fieldwork sites and administrative records, and difficulties in recruiting staff with appropriate skills are among the recurring problems that should be anticipated as far as possible during the design phase, but that also may require modifying the design at a later time.
5-11
Close contact among the evaluator, the project director, and other project staff is essential throughout the life of the project.
What tradeoffs are least likely to impair the integrity and usefulness of mixed method evaluations if the evaluation plan as designed cannot be fully implemented? A good general rule for dealing with budget problems is to sacrifice the number of cases or the number of questions to be explored (this may mean ignoring the needs of some low priority stakeholders), but to preserve the depth necessary to fully and rigorously address the issues targeted. When it comes to design modifications, it is of course essential that the evaluator be closely involved in decisionmaking. But close contact among the evaluator, the project director, and other project staff is essential throughout the life of the project. In particular, some project directors tend to see the summative evaluation as an add-on, that is, something to be doneperhaps by a contractorafter the project has been completed. But the quality of the evaluation is dependent on record keeping and data collection during the life of the project, which should be closely monitored by the evaluator. In the next chapter, we illustrate some of the issues related to designing an evaluation, using the hypothetical example provided in Chapter 2.
References Greene, J.C., Caracelli, V.J., and Graham, W.F. (1989). Toward a Conceptual Framework for Mixed Method Evaluation Designs. Educational Evaluation and Policy Analysis, Vol. II, No. 3. Miles, M.B., and Huberman, A.M. (1994). Qualitative Data Analysis, 2nd Ed. Newbury Park, CA: Sage National Science Foundation. (1993). User-Friendly Handbook for Project Evaluation: Science, Mathematics and Technology Education. NSF 93-152. Arlington, VA: NSF. Patton, M.Q. 1990. Qualitative Evaluation and Research Methods, 2nd Ed. Newbury Park, CA: Sage.
5-12
6
Step 1.
EVALUATION DESIGN
FOR THE
HYPOTHETICAL PROJECT
6-1
During these campus visits, the evaluator discovered that interest and participation in the project varied considerably, as did the extent to which deans and department chairs encouraged and facilitated faculty participation. Questions to explore these issues systematically were therefore added to the formative evaluation. The questions initially selected by the evaluation team for the formative and summative evaluation are shown in Exhibits 13 and 14.
Exhibit 13. Goals, stakeholders, and evaluation questions for a formative evaluation
Project goal (implementation-related) 1. To attract faculty and administrators interest and support for project participation by eligible faculty members
Evaluation questions Did all campuses participate? If not, what were the reasons? How was the program publicized? In what way did local administrators encourage (or discourage) participation by eligible faculty members? Were there incentives or rewards for participation? Did applicants and nonapplicants, and program completers and dropouts, differ with respect to personal and work-related characteristics (age, highest degree obtained, ethnicity, years of teaching experience, etc.) Were the workshops organized and staffed as planned? Were needed materials available? Were the workshops of high quality (accuracy of information, depth of coverage etc.)? Was the full range of topics included in the design actually covered? Was there evidence of an increase in knowledge as a result of project participation? Did participants exchange information about their use of new instructional approaches? By e-mail or in other ways? Did problems arise? Are workshops too few, too many? Should workshop format, content, staffing be modified? Is communication adequate? Was summer session useful?
2.
To offer a state of the art faculty development program to improve the preparation of future teachers for elementary mathematics instruction To provide participants with knowledge concerning new concepts, methods, and standards in elementary math education To provide followup and encourage networking through frequent contact among participants during the academic year To identify problems in carrying out the project during year 1 for the purpose of making changes during year 2
granting agency, project sponsor (center administrators), other administrators, project staff
3.
4.
project staff
5.
6-2
Exhibit 14. Goals, stakeholders, and evaluation questions for a summative evaluation
Evaluation questions
Stakeholders
Did faculty who have experienced the professional development change their instructional practices? Did this vary by teachers or by students characteristics? Did faculty members use the information regarding new standards, materials, and practices? What obstacles prevented implementing changes? What factors facilitated change? Did participants share knowledge acquired through the project with other faculty? Was it done formally (e.g., at faculty meetings) or informally?
2.
Acquisition of knowledge and changes in instructional practices by other (nonparticipating) faculty members Institution-wide change in curriculum and administrative practices
3.
Were changes made in curriculum? Examinations and other requirements? Expenditures for library and other resource materials (computers)? Did students become more interested in classwork? More active participants? Did they express interest in teaching math after graduation? Did they plan to use new concepts and techniques?
granting agency, project sponsor (center), campus administrators, project staff, and campus faculty participants
4.
granting agency, project sponsor (center), campus administrators, project staff, and campus faculty participants
Step 2. Determine Appropriate Data Sources and Data Collection Approaches to Obtain Answers to the Final Set of Evaluation Questions
This step consisted of grouping the questions that survived the prioritizing process in step 1, defining measurable objectives, and determining the best source for obtaining the information needed and the best method for collecting that information. For some questions, the choice was simple. If the project reimburses participants for travel and other attendance-related expenses, reimbursement records kept in the project office would yield information about how many participants attended each of the workshops. For most questions,
6-3
however, there might be more choices and more opportunity to take advantage of the mixed method approach. To ascertain the extent of participants' learning and skill enhancement, the source might be participants, or workshop observers, or workshop instructors and other staff. If the choice is made to rely on information provided by the participants themselves, data could be obtained in many different ways: through tests (possibly before and after the completion of the workshop series), work samples, narratives supplied by participants, self-administered questionnaires, indepth interviews, or focus group sessions. The choice should be made on the basis of methodological (which method will give us the "best" data?) and pragmatic (which method will strengthen the evaluation's credibility with stakeholders? which method can the budget accommodate?) considerations. Source and method choices for obtaining the answers to all questions in Exhibits 13 and 14 are shown in Exhibits 15 and 16. Examining these exhibits, it becomes clear that data collection from one source can answer a number of questions. The evaluation design begins to take shape; technical issues, such as sampling decisions, number of times data should be collected, and timing of the data collections, need to be addressed at this point. Exhibit 17 summarizes the data collection plan created by the evaluation specialist and her staff for both evaluations. The formative evaluation must be completed before the end of the first year to provide useful inputs for the year 2 activities. Data to be collected for this evaluation include Relevant information in existing records; Frequent interviews with project director and staff; Short self-administered questionnaires to be completed by participants at the conclusion of each workshop; and Reports from the two to four staff observers who observed the 11 workshop sessions.
In addition, the 25 year 1 participants will be assigned to one of three focus groups to be convened twice (during month 5 and after the year 1 summer session) to assess the program experience, suggest program modifications, and discuss interest in instructional innovation on their home campus.
6-4
Exhibit 15. Evaluation questions, data sources, and data collection methods for a formative evaluation
Question
Source of information
1.
Did all campuses participate? If not, what were the reasons? How was the program publicized? In what way did local administrators encourage (or discourage) participation by eligible faculty members? Were there incentives or rewards for participation? Did applicants and nonapplicants, and program completers and dropouts, differ with respect to personal and work-related characteristics (age, highest degree obtained, ethnicity, years of teaching experience, etc.)? Were the workshops organized and staffed as planned? Were needed materials available? Were the workshops of high quality (accuracy of information, depth of coverage etc.)? Was the full range of topics included in the design actually covered? Was there evidence of an increase in knowledge as a result of project participation? Did participants exchange information about their use of new instructional approaches? By e-mail or in other ways? Did problems arise? Are workshops too few, too many? Should workshop format, content, staffing be modified? Is communication adequate? Was summer session useful?
project records, project director, roster of eligible applicants on each campus, campus participants
record review; interview with project director; rosters of eligible applicants on each campus (including personal characteristics, length of service, etc.), participant focus groups
2.
3.
participant questionnaire, observer notes, observer focus group, participant focus group, work samples
4.
5.
interview with project director and staff, focus group interview with observers, focus group with participants
6-5
Exhibit 16. Evaluation questions, data sources, and data collection methods for summative evaluation
Question
Source of information
1.
Did faculty who have experienced the professional development change their instructional practices? Did this vary by teachers or by students characteristics? Do they use the information regarding new standards, materials, and practices? What obstacles prevented implementing changes? What factors facilitated change? Did participants share knowledge acquired through the project with other faculty? Was it done formally (e.g., at faculty meetings) or informally? Were changes made in curriculum? Examinations and other requirements? Expenditures for library and other resource materials (computers)? Did students become more interested in classwork? More active participants? Did they express interest in teaching math after graduation? Did they plan to use new concepts and techniques?
focus group with participants, reports of classroom observers, interview with department chair
2.
focus groups with participants, interviews with nonparticipants, reports of classroom observers (nonparticipants classrooms), interview with department chair
3.
focus groups with participants, interview with department chair and dean, document review
4.
students, participants
6-6
Exhibit 17. First data collection plan Method Sampling plan Formative evaluation Timing of activity
Interview with project director; record review Interview with other staff Workshop observations Participant questionnaire Focus group for participants
Not applicable No sampling proposed No sampling proposed No sampling proposed No sampling proposed
Once a month during year/during month 1; update if necessary At the end of months 3, 6, 10 Two observers at each workshop and summer session Brief questionnaire to be completed at the end of every workshop The year 1 participants (n=25) will be assigned to one of three focus groups that meet during month 5 of the school year and after summer session. One meeting for all workshop observers during month 11
No sampling proposed
Summative evaluation
Classroom observations
Purposive selection: 1 participant per campus; 2 classrooms for each participant; 1 classroom for 2 nonparticipants in each branch campus
Two observations for participants each year (classroom months 4 and 8); one observation for nonparticipants; for 2-year project; a total of 96 observations (two observers at all times) The year 2 participants (n=25) will be assigned to one of three focus groups that meet during month 5 of school year and after summer session. One focus group with all classroom observers (4-8) One interview during year 2 Personal interview during year 2 During year 2 Towards the end of year 2 Questionnaires to be completed during year 1 and 2 One interview towards end of year 2 During year 1 and year 2
No sampling proposed
Focus group with classroom observers Interview with 2 (nonparticipant) faculty members at all institutions Interview with department chairperson at all campuses Interview with dean at 8 campuses Interview with all year 1 participants Student questionnaires Interview with project director and staff Record review
No sampling proposed Random select if more than 2 faculty members in a department Not applicable Not applicable Not applicable 25% sample of students in all participants' and nonparticipants' classes No sampling proposed No sampling proposed
6-7
The summative evaluation will use relevant data from the formative evaluation; in addition, the following data will be collected: During years 1 and 2, teams of two classroom observers will visit a sample of participants' and nonparticipants' classrooms. There will be focus group meetings with these observers at the end of school years 1 and 2 (four to eight staff members are likely to be involved in conducting the 48 scheduled observations each year). During year 2 and after the year 2 summer session, focus group meetings will be held with the 25 year 2 participants. At the end of year 2, all year 1 participants will be interviewed. Interviews will be conducted with nonparticipant faculty members, department chairs and deans at each campus, the project director, and project staff. Student surveys will be conducted.
Step 3. Reality Testing and Design Modifications: Staff Needs, Costs, Time Frame Within Which All Tasks (Data Collection, Data Analysis, and Report Writing) Must Be Completed
The evaluation specialist converted the data collection plan (Exhibit 17) into a timeline, showing for each month of the 2 1/2-year life of the project data collection, data analysis, and report-writing activities. Staff requirements and costs for these activities were also computed. She also contacted the chairperson of the department of elementary education at each campus to obtain clearance for the planned classroom observations and data collection from students (undergraduates) during years 1 and 2. This exercise showed a need to fine tune data collection during year 2 so that data analysis could begin by month 18; it also suggested that the scheduled data collection activities and associated data reduction and analysis costs would exceed the evaluation budget by $10,000. Conversations with campus administrators had raised questions about the feasibility of on-campus data collection from students. The administrators also questioned the need for the large number of scheduled classroom observations. The evaluation staff felt that these observations were an essential component of the evaluation, but they decided to survey students only once (at the end of year 2). They plan to incorporate question about impact on students in the focus group discussions with
6-8
participating faculty members after the summer session at the end of year 1. Exhibit 18 shows the final data collection plan for this hypothetical project. It also illustrates how quantitative and qualitative data have been mixed.
Exhibit 18. Final data collection plan Activity Type of method* Scheduled collection date Number of cases
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
Interview with project director Interview with project staff Record review Workshop observations Participants evaluation of each workshop Participants focus groups Workshop observer focus groups Classroom observations Classroom observations (nonparticipant classrooms) Classroom observers focus group Interviews with department chairs at 8 branch campuses Interviews with all year 1 participants Interviews with deans at 7 branch campuses Interviews with 2 nonparticipant faculty members at each campus Student survey Document review
Q2 Q2 Q2 Q2 Q1 Q2 Q2 Q2 Q2 Q2 Q2
Once a month during year 1; twice during year 2 (months 18 and 23) At the end of months 3, 6, 10 (year 1); at the end of month 23 (year 2) Month 1 plus updates as needed Each workshop including summer At the end of each workshop and summer Months 5,10,17,22 Month 10 Months 4, 8, 16, 20 Months 8 and 16 Months 10 and 22 Months 9 and 21
1 4 interviews with 4 persons = 16 interviews Not applicable 2 observers, 11 observations = 22 observations 25 participants in 11 workshops = 275 questionnaires 12 focus groups for 7-8 participants 1 meeting for 2-4 observers 2 observers 4 times in 8 classrooms = 64 observations 2 observers twice in 8 classrooms= 32 observations 2 meetings with all classroom observers (4-8) 16 interviews
Q2 Q2 Q2
15. 16.
Q1 Q2
6-9
It should be noted that due chiefly to budgetary constraints, the priorities that the final evaluation plan did not provide for the systematic collection of some information that might have been of importance for the overall assessment of the project and recommendations for replication. For example, there is no provision to examine systematically (by using trained workshop observers, as is done during year 1) the extent to which the year 2 workshops were modified as a result of the formative evaluation. This does not mean, however, that an evaluation question that did not survive the prioritization process cannot be explored in conjunction with the data collection tools specified in Exhibit 17. Thus, the question of workshop modifications and their effectiveness can be explored in the interviews scheduled with project staff and the self-administered questionnaires and focus groups for year 2 participants. Furthermore, informal interaction between the evaluation staff, the project staff, participants, and others involved in the project can yield valuable information to enrich the evaluation. Experienced evaluators know that, in hindsight, the prioritization process is often imperfect. And during the life of any project, it is likely that unanticipated events will affect project outcomes. Given the flexible nature of qualitative data collection tools, to some extent the need for additional information can be accommodated in mixed method designs by including narrative and anecdotal material. Some of the ways in which such material can be incorporated in reaching conclusions and recommendations will be discussed in Chapter 7 of this handbook.
6-10
7
Preparing a report provides evaluators the opportunity to depict in detail the rich information obtained from the various study activities.
The final task the evaluator is required to perform is to summarize what the team has done, what has been learned, and how others might benefit from this projects experience. As a rule, NSF grantees are expected to submit a final report when the evaluation has been completed. For the evaluator, this is seen as the primary reporting task, which provides the opportunity to depict in detail the rich qualitative and quantitative information obtained from the various study activities. In addition to the contracting agency, most evaluations have other audiences as well, such as previously identified stakeholders, other policymakers, and researchers. For these audiences, whose interest may be limited to a few of the topics covered in the full report, shorter summaries, oral briefings, conference presentations, or workshops may be more appropriate. Oral briefings allow the sharing of key findings and recommendations with those decisionmakers who lack the time to carefully review a voluminous report. In addition, conference presentations and workshops can be used to focus on special themes or to tailor messages to the interests and background of a specific audience. In preparing the final report and other products that communicate the results of the evaluation, the evaluator must consider the following questions: How should the communication be best tailored to meet the needs and interests of a given audience? How should the comprehensive final report be organized? How should the findings based on qualitative and quantitative methods be integrated? Does the report distinguish between conclusions based on robust data and those that are more speculative?
7-1
Where findings are reported, especially those likely to be considered sensitive, have appropriate steps been taken to make sure that promises of confidentiality are met?
This chapter deals primarily with these questions. More extensive coverage of the general topic of reporting and communicating evaluation results can be found in the earlier User-Friendly Handbook for Project Evaluation (NSF, 1993).
7-2
Direct
Top-level administrators at the major state university Staff at the Center for Educational Innovation Undergraduate faculty targeted to participate in the workshops Policymakers at other 4-year institutions interested in developing similar preservice programs Other researchers
Direct
Direct
Direct
Indirect
Indirect
In this example, the evaluator would risk having the results ignored by some stakeholders and underutilized by others if only a single dissemination strategy was used. Even if a single report is developed for all stakeholders (which is usually the case), it is advisable to develop a dissemination strategy that recognizes the diverse informational needs of the audience and the limited time some readers might realistically be able to devote to digesting the results of the study. Such a strategy might include (1) preparing a concise executive summary of the evaluations key findings (for the university's top-level administrators); (2) preparing a detailed report (for the Center for Educational Innovation and the National Science Foundation) that describes the history of the program, the range of activities offered to undergraduate faculty, and the impact of these activities on program participants and their students; and (3) conducting a series of briefings that are tailored to the interests of
7-3
specific stakeholders (e.g., university administrators might be briefed on the program's tangible benefits and costs). By referring back to the worksheets that were developed in planning the evaluation (see Chapter 5), the interests of specific stakeholders can be ascertained. However, rigid adherence to the original interests expressed by stakeholders is not always the best approach. This strategy may shortchange the audience if the evaluationas is often the case pointed to unanticipated developments. It should also be pointed out that while the final report usually is based largely on answers to summative evaluation questions, it is useful to summarize salient results of the formative evaluation as well, where these results provide important information for project replication.
Many people are more likely to read the executive summary than the full document.
In addition to the main body of the report, a short abstract and a oneto four-page executive summary should be prepared. The latter is especially important because many people are more likely to read the executive summary than the full document. The executive summary can help focus readers on the most significant aspects of the evaluation. It is desirable to keep the methodology section short and to include a technical appendix containing detailed information about the data collection and other methodological issues. All evaluation instruments and procedures should be contained in the appendix, where they are accessible to interested readers. Regardless of the audience for which it is written, the final report must engage the reader and stimulate attention and interest. Descriptive narrative, anecdotes, personalized observations, and vignettes make for livelier reading than a long recitation of statistical
7-4
The purpose of the final report is not only to tell the story of the project, but also to assess in what ways it succeeded or failed in achieving its goals.
measures and indicators. One of the major virtues of the mixed method approach is the evaluators ability to balance narrative and numerical reporting. This can be done in many ways: for example, by alternating descriptive material (obtained through qualitative techniques) and numerical material (obtained through quantitative techniques) when describing project activities, or by using qualitative information to illustrate, personalize, or explicate a statistical finding. Butas discussed in the earlier chaptersthe main virtue of using a mixed method approach is that it enlarges the scope of the analysis. And it is important to remember that the purpose of the final report is not only to tell the story of the project, its participants, and its activities, but also to assess in what ways it succeeded or failed in achieving its goals. In preparing the findings section, which constitutes the heart of the report, it is important to balance and integrate the descriptive and evaluative reporting section. A well-written report should provide a concise context for understanding the conditions in which results were obtained and identifying specific factors (e.g., implementation strategies) that affected the results. According to Patton (1990), Description is thus balanced by analysis and interpretation. Endless description becomes its own muddle. The purpose of analysis is to organize description so that it is manageable. Description is balanced by analysis and leads into interpretation. An interesting and reasonable report provides sufficient description to allow the reader to understand the basis for an interpretation, and sufficient interpretation to allow the reader to understand the description. For the hypothetical project, most questions identified for the summative evaluation in Exhibit 16 can be explored through the joint use of qualitative and quantitative data, as shown in Exhibit 20. For example, to answer some of the questions pertaining to the impact of faculty training on their students attitudes and behaviors, quantitative data (obtained from a student survey) are being used, together with qualitative information obtained through several techniques (classroom observations, faculty focus groups, interviews with knowledgeable informants.)
7-5
Exhibit 20. Example of an evaluation/methodology/matrix Project goals Summative evaluation study questions a Data collection techniques (see codes below) b c d e f
Did the faculty who experienced the workshop training change their instructional practice? Did the faculty who experienced the workshop training use the information regarding new standards, materials, and practices? What practices prevented the faculty who experienced the workshop training from implementing the changes?
X X
X X
X X
X X
Acquisition of knowledge and changes in Did participants share the knowledge acquired through the workshops with other instructional practices faculty? by other (nonparticipating) faculty members What methods did participants use to share the knowledge acquired through the workshops? Institution-wide changes in curriculum and administrative practices Were changes made in curriculum? Were changes made in examinations and other requirements? Were changes made in expenditures for libraries and other resource materials?
X X X X X X X X X X X X X X X X
Did students become more interested in their classwork? Positive effects on career plans of students taught by participating teachers Did students become more active participants? Did students express interest in teaching math after graduation? Did students plan to use new concepts and techniques? a = indepth interviews with knowledgeable informants b = focus groups c = observation of workshops d = classroom observations e = surveys of students f = documents
7-6
7-7
response rates, refusal rates for personal interviews and focus group participation, access problems, etc., should all be discussed in an appendix. If problems were encountered that may have affected the findings, possible biases and how the evaluator sought to correct them should be discussed. Use the recommendations section to express views based on the total project experience. Of course, references to data should be included whenever possible. For example, a recommendation in the report for the hypothetical project might include the following phrase: "Future programs should provide career-related incentives for faculty participation, as was suggested by several participants." But the evaluator should also feel free to offer creative suggestions that do not necessarily rely on the systematic data collection.
Maintaining Confidentiality
All research involving human subjects entails possible risks for participants and usually requires informed consent on their part. To obtain this consent, researchers usually assure participants that their identity will not be revealed when the research is reported and that all information obtained through surveys, focus groups, personal interviews, and observations will be handled confidentially. Participants are assured that the purpose of the study is not to make judgments about their performance or behavior, but simply to improve knowledge about a project's effectiveness and improve future activities. In quantitative studies, reporting procedures have been developed to minimize the risk that the actions and responses of participants can be associated with a specific individual; usually results are reported for groupings only and, as a rule, only for groupings that include a minimum number of subjects. In studies that use qualitative methods, it may be more difficult to report all findings in ways that make it impossible to identify a participant. The number of respondents is often quite small, especially if one is looking at respondents with characteristics that are of special interest in the analysis (for example, older teachers, or teachers who hold a graduate degree). Thus, even if a finding does not name the respondent, it may be possible for someone (a colleague, an administrator) to identify a respondent who made a critical or disparaging comment in an interview.
7-8
Of course, not all persons who are interviewed in the course of an evaluation can be anonymous: the name of those persons who have a unique or high status role (the project director, a college dean, or a school superintendent) are known, and anonymity should not be promised. The issue is of importance to more vulnerable persons, usually those in subordinate positions (teachers, counselors, or students) who may experience negative consequences if their behavior and opinions become known. It is in the interest of the evaluator to obtain informed consent from participants by assuring them that their participation is risk-free; they will be more willing to participate and will speak more openly. But in the opinion of experienced qualitative researchers, it is often impossible to fulfill promises of anonymity when qualitative methods are used: Confidentiality and anonymity are usually promisedsometimes very superficiallyin initial agreements with respondents. For example, unless the researcher explains very clearly what a fed-back case will look like, people may not realize that they will not be anonymous at all to other people within the setting who read the case (Miles and Huberman, 1994). The evaluator may also find it difficult to balance the need to convey contextual information that will provide vivid descriptive information and the need to protect the identity of informants. But if participants have been promised anonymity, it behooves the evaluator to take every precaution so that informants cannot be linked to any of the information they provided. In practice, the decision of how and when to attribute findings to a site or respondent is generally made on a case-by-case basis. The following example provides a range of options for revealing and disclosing the source of information received during an interview conducted for the hypothetical project: Attribute the information to a specific respondent within an individual site: The dean at Lakewood College indicated that there was no need for curriculum changes at this time. Attribute the information to someone within a site: A respondent at Lakewood College indicated that there was no need for curriculum changes at this time. In this example, the respondent's identity within the site is protected, i.e., the reader is only made aware that someone at a site expressed a preference for the status quo. Note that this option would not
7-9
be used if only one respondent at the school was in a position to make this statement. Attribute the information to the respondent type without identifying the site: The dean at one of the participating colleges indicated that there was no need for curriculum changes at this time. In this example, the reader is only made aware of the type of respondent that expressed a preference for the status quo. Do not attribute the information to a specific respondent type or site: One of the study respondents indicated that there was no need for curriculum changes at this time. In this example, the identity of the respondent is fully protected.
Each of these alternatives has consequences not only for protecting respondent anonymity, but also for the value of the information that is being conveyed. The first formulation discloses the identity of the respondent and should only be used if anonymity was not promised initially, or if the respondent agrees to be identified. The last alternative, while offering the best guarantee of anonymity, is so general that it weakens the impact of the finding. Depending on the direction taken by the analysis (were there important differences by site? by type of respondent?), it appears that either the second or third alternative 2 or 3 represents the best choice. One common practice is to summarize key findings in chapters that provide cross-site analyses of controversial issues. This alternative is directly parallel to the procedure used in surveys, in which the only published report is about the aggregate evidence (Yin, 1990). Contextual information about individual sites can be provided separately, e.g., in other chapters or an appendix.
7-10
evaluation findings, conclusions, and recommendations generally need to wait for the end of the evaluation. Because of the volume of written data that are collected on site, it is generally a good idea to organize study notes as soon after a site visit or interview as possible. These notes will often serve as a starting point for any individual case studies that might be included in the report. In addition, as emphasized in Chapter 4, preparing written text soon after the data collection activity will help to classify and display the data and reduce the overall volume of narrative data that will eventually need to be summarized and reported at the end of the study. Finally, preparing sections of the findings chapter during the data collection phase allows researchers to generate preliminary conclusions or identify potential trends that can be confirmed or refuted by additional data collection activities. Make the report concise and readable. Because of the volume of material that is generally collected during mixed method evaluations, a challenging aspect of reporting is deciding what information might be omitted from the final report. As a rule, only a fraction of the tabulations prepared for survey analysis need to be displayed and discussed. Qualitative field work and data collection methods yield a large volume of narrative information, and evaluators who try to incorporate all of the qualitative data they collected into their report risk losing their audience. Conversely, by omitting too much, evaluators risk removing the context that helps readers attach meaning to any of the report's conclusions. One method for limiting the volume of information is to include only narrative that is tied to the evaluation questions. Regardless of how interesting an anecdote is, if the information does not relate to one of the evaluation questions, it probably does not belong in the report. As discussed previously, another method is to consider the likely information needs of your audience. Thinking about who is most likely to act upon the report's findings may help in the preparation of a useful and illuminating narrative (and in the discarding of anecdotes that are irrelevant to the needs of the reader). The liberal use of qualitative information will enhance the overall tone of the report. In particular, lively quotes can highlight key points and break up the tedium of a technical summation of study findings. In addition, graphic displays and tables can be used to summarize significant trends that were uncovered during observations or interviews. Photographs are an effective tool to familiarize readers with the conditions (e.g., classroom size) within which a project is being implemented. New desktop publishing and software packages have made it easier to enhance papers and briefings with photographs, colorful graphics, and even cartoons. Quotes can be enlarged and italicized throughout the report to make
A challenging aspect of reporting is deciding what information might be omitted from the final report.
New desktop publishing and software packages have made it easier to enhance papers and briefings with photographs, colorful graphics, and even cartoons.
7-11
important points or to personalize study findings. Many of these suggestions hold true for oral presentations as well. Solicit feedback from project staff and respondents. It is often useful to ask the project director and other staff members to review sections of the report that quote information they have contributed in interviews, focus groups, or informal conversations. This review is useful for correcting omissions and misinterpretations and may elicit new details or insights that staff members failed to share during the data collection period. The early review may also avoid angry denials after the report becomes public, although it is no guarantee that controversy and demands for changes will not follow publication. However, the objectivity of the evaluation is best served if overall findings, conclusions and recommendations are not shared with the project staff before the draft is circulated to all stakeholders. In general, the same approach is suggested for obtaining feedback from respondents. It is essential to inform them of the inclusion of data with which they can be identified, and to honor requests for anonymity. The extent to which other portions of the write-up should be shared with respondents will depend on the nature of the project and the respondent population, but in general it is probably best to solicit feedback following dissemination of the report to all stakeholders.
References Miles, M.B., and Huberman, A.M. (1994). Qualitative Data Analysis, 2nd Ed. Newbury Park, CA: Sage. National Science Foundation. (1993). User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering, and Technical Education. NSF 93-152. Washington, DC: NSF. Patton, M.Q. (1990). Qualitative Evaluation and Research Method, 2nd Ed. Newbury Park, CA: Sage. Yin, R.K. (1989). Case Study Research: Design and Method. Newbury Park, CA: Sage.
7-12
ANNOTATED BIBLIOGRAPHY
In selecting books and major articles for inclusion in this short bibliography, an effort was made to incorporate those useful for principal investigators (PIs) and project directors (PDs) who want to find information relevant to the tasks they will face, and which this brief handbook could not cover in depth. Thus, we have not included all books that experts in qualitative research and mixed method evaluations would consider to be of major importance. Instead, we have included primarily reference materials that NSF/EHR grantees should find most useful. Included are many of those already listed in the references to Chapters 1 through 7. Some of these publications are heavier on theory, others deal primarily with practice and specific techniques used in qualitative data collection and analysis. However, with few exceptions, all the publications selected for this bibliography contain a great deal of technical information and hands-on advice.
Denzin, Norman K., and Lincoln, Yvonna S. (Eds.). (1994). Handbook of Qualitative Research. Thousand Oaks, CA: Sage. This formidable volume (643 pages set in small type) consists of 36 chapters written by experts on their respective topics, all of whom are passionate advocates of the qualitative method in social and educational research. The volume covers historical and philosophical perspectives, as well as detailed research methods. Extensive coverage is given to data collection and data analysis, and to the art of interpretation of findings obtained through qualitative research. Most of the chapters assume that the qualitative researcher functions in an academic setting and uses qualitative methods exclusively; the use of quantitative methods in conjunction with qualitative approaches and constraints that apply to evaluation research are seldom considered. However, two chaptersDesigning Funded Qualitative Research, by Janice M. Morse, and Qualitative Program Evaluation, by Jennifer C. Greenecontain a great deal of material of interest to PIs and PDs. But PIs and PDs will also benefit from consulting other chapters, in particular Interviewing, by Andrea Fontana and James H. Frey, and Data Management and Analysis Methods, by A. Michael Huberman and Matthew B. Miles.
8-1
The Joint Committee on Standards for Educational Evaluation. (1994). How to Assess Evaluations of Educational Programs, 2nd Ed. Thousand Oaks, CA: Sage. This new edition of the widely accepted Standards for Educational Evaluation is endorsed by professional associations in the field of education. The volume defines 30 standards for program evaluation, with examples of their application, and incorporates standards for quantitative as well as qualitative evaluation methods. The Standards are categorized into four groups: utility, feasibility, propriety, and accuracy. The Standards are intended to assist legislators, funding agencies, educational administrators, and evaluators. They are not a substitute for texts in technical areas such as research design or data collection and analysis. Instead they provide a framework and guidelines for the practice of responsible and high-quality evaluations. For readers of this handbook, the section on Accuracy Standards, which includes discussions of quantitative and qualitative analysis, justified conclusions, and impartial reporting, is especially useful.
Patton, Michael Quinn. (1990). Qualitative Evaluation and Research Methods, 2nd Ed. Newbury Park, CA: Sage. This is a well-written book with many practical suggestions, examples, and illustrations. The first part covers, in jargon-free language, the conceptual and theoretical issues in the use of qualitative methods; for practitioners the second and third parts, dealing with design, data collection, analysis, and interpretation, are especially useful. Patton consistently emphasizes a pragmatic approach: he stresses the need for flexibility, common sense, and the choice of methods best suited to produce the needed information. The last two chapters, Analysis, Interpretation and Reporting and Enhancing the Quality and Credibility of Qualitative Analysis, are especially useful for PIs and PDs of federally funded research. They stress the need for utilizationfocused evaluation and the evaluator's responsibility for providing data and interpretations, which specific audiences will find credible and persuasive.
Marshall, Catherine, and Rossman, Gretchen B. (1995). Designing Qualitative Research, 2nd Ed. Thousand Oaks, CA: Sage. This small book (178 pages) does not deal specifically with the performance of evaluations; it is primarily written for graduate students to provide a practical guide for the writing of research proposals based on qualitative methods. However, most of the material presented is relevant and appropriate for project evaluation. In succinct and clear language, the book discusses the main ingredients of a sound research
8-2
project: framing evaluation questions; designing the research; data collection methods; and strategies, data management, and analysis. The chapter on data collection methods is comprehensive and includes some of the less widely used techniques (such as films and videos, unobtrusive measures, and projective techniques) that may be of interest for the evaluation of some projects. There are also useful tables (e.g., identifying the strengths and weaknesses of various methods for specific purposes; managing time and resources), as well as a series of vignettes throughout the text illustrating specific strategies used by qualitative researchers.
Lofland, John, and Lofland, Lyn H. (1995). Analyzing Social Settings: A Guide to Qualitative Observation and Analysis, 3rd Ed. Belmont, CA: Wadsworth. As the title indicates, this book is designed as a guide to field studies, using as their main data collection techniques participant observation and intensive interviews. The authors' vast experience and knowledge in these areas results in a thoughtful presentation of both technical topics (such as the best approach to compiling field notes) and nontechnical issues, which may be equally important in the conduct of qualitative research. The chapters that discuss gaining access to informants, maintaining access for the duration of the study, and dealing with issues of confidentiality and ethical concerns are especially helpful for PIs and PDs who seek to collect qualitative material. Also useful is Chapter 5, Logging Data, which deals with all aspects of the interviewing process and includes examples of question formulation, the use of interview guides, and the write-up of data.
Miles, Matthew B., and Huberman, A. Michael. (1994). Qualitative Data Analysis - An Expanded Sourcebook, 2nd Ed. Thousand Oaks, CA: Sage. Although this book is not specifically oriented to evaluation research, it is an excellent tool for evaluators because, in the authors' words, this is a book for practicing researchers in all fields whose work involves the struggle with actual qualitative data analysis issues. It has the further advantage that many examples are drawn from the field of education. Because analysis cannot be separated from research design issues, the book takes the reader through the sequence of steps that lay the groundwork for sound analysis, including a detailed discussion of focusing and bounding the collection of data, as well as management issues bearing on analysis. The subsequent discussion of analysis methods is very systematic, relying heavily on data displays, matrices, and examples to arrive at meaningful descriptions, explanations, and
8-3
the drawing and verifying of conclusions. An appendix covers choice of software for qualitative data analysis. Readers will find this a very comprehensive and useful resource for the performance of qualitative data reduction and analysis.
New Directions for Program Evaluation, Vols. 35, 60, 61. A quarterly publication of the American Evaluation Association, published by Jossey-Bass, Inc., San Francisco, CA. Almost every issue of this journal contains material of interest to those who want to learn about evaluation, but the three issues described here are especially relevant to the use of qualitative methods in evaluation research. Vol. 35 (Fall 1987), Multiple Methods in Program Evaluation, edited by Melvin M. Mark and R. Lance Shotland, contains several articles discussing the combined use of quantitative and qualitative methods in evaluation designs. Vol. 60 (Winter 1993), Program Evaluation: A Pluralistic Enterprise, edited by Lee Sechrest, includes the article Critical Multiplism: A Research Strategy and its Attendant Tactics, by William R. Shadish, in which the author provides a clear discussion of the advantages of combining several methods in reaching valid findings. In Vol. 61 (Spring 1994), The Qualitative-Quantitative Debate, edited by Charles S. Reichardt and Sharon F. Rallis, several of the contributors take a historical perspective in discussing the long-standing antagonism between qualitative and quantitative researchers in evaluation. Others look for ways of integrating the two perspectives. The contributions by several experienced nonacademic program and project evaluators (Rossi, Datta, Yin) are especially interesting.
Greene, Jennifer C., Caracelli, Valerie J., and Graham, Wendy F. (1989). Toward a Conceptual Framework for Mixed-Method Evaluation Designs in Educational Evaluation and Policy Analysis, Vol. II, No. 3. In this article, a framework for the design and implementation of evaluations using a mixed method methodology is presented, based both on the theoretical literature and a review of 57 mixed method evaluations. The authors have identified five purposes for using mixed methods, and the recommended design characteristics for each of these purposes are presented.
Yin, Robert K. (1989). Case Study Research: Design and Method. Newbury Park, CA: Sage. The author's background in experimental psychology may explain the emphasis in this book on the use of rigorous methods in the conduct
8-4
and analysis of case studies, thus minimizing what many believe is a spurious distinction between quantitative and qualitative studies. While arguing eloquently that case studies are an important tool when an investigator (or evaluator) has little control over events and when the focus is on a contemporary phenomenon within some real-life context, the author insists that case studies be designed and analyzed so as to provide generalizable findings. Although the focus is on design and analysis, data collection and report writing are also covered.
Krueger, Richard A. (1988). Focus Groups: A Practical Guide for Applied Research. Newbury Park, CA: Sage. Krueger is well known as an expert on focus groups; the bulk of his experience and the examples cited in his book are derived from market research. This is a useful book for the inexperienced evaluator who needs step-by-step advice on selecting focus group participants, the process of conducting focus groups, and analyzing and reporting results. The author writes clearly and avoids social science jargon, while discussing the complex problems that focus group leaders need to be aware of. This book is best used in conjunction with some of the other references cited here, such as the Handbook of Qualitative Research (Ch. 22) and Focus Groups: Theory and Practice.
Stewart, David W., and Shamdasani, Prem N. (1990). Focus Groups: Theory and Practice. Newbury Park, CA: Sage. This book differs from many others published in recent years that address primarily techniques of recruiting participants and the actual conduct of focus group sessions. Instead, these authors pay considerable attention to the fact that focus groups are by definition an exercise in group dynamics. This must be taken into account when interpreting the results and attempting to draw conclusions that might be applicable to a larger population. However, the book also covers very adequately practical issues such as recruitment of participants, the role of the moderator, and appropriate techniques for data analysis.
Weiss, Robert S. (1994). Learning from Strangers - The Art and Method of Qualitative Interview Studies. New York: The Free Press. After explaining the different functions of quantitative and qualitative interviews in the conduct of social science research studies, the author discusses in considerable detail the various steps of the qualitative interview process. Based largely on his own extensive experience in planning and carrying out studies based on qualitative interviews, he discusses respondent selection and recruitment, preparing for the
8-5
interview (which includes such topics as pros and cons of taping, the use of interview guides, interview length, etc.), the interviewing relationship, issues in interviewing (including confidentiality and validity of the information provided by respondents), data analysis, and report writing. There are lengthy excerpts from actual interviews that illustrate the topics under discussion. This is a clearly written, very useful guide, especially for newcomers to this data collection method.
Wolcott, Harry F. (1994). Transforming Qualitative Data: Description, Analysis and Interpretation. Thousand Oaks, CA: Sage. This book is written by an anthropologist who has done fieldwork for studies focused on education issues in a variety of cultural settings; his emphasis throughout is on what one does with data rather than on collecting it. His frank and meticulous description of the ways in which he assembled his data, interacted with informants, and reached new insights based on the gradual accumulation of field experiences makes interesting reading. It also points to the pitfalls in the interpretation of qualitative data, which he sees as the most difficult task for the qualitative researcher. U.S. General Accounting Office. (1990). Case Study Evaluations. Transfer Paper 10.1.9. issued by the Program Evaluation and Methodology Division. Washington, DC: GAO. This paper presents an evaluation perspective on case studies, defines them, and determines their appropriateness in terms of the type of evaluation question posed. Unlike the traditional, academic definition of the case study, which calls for long-term participation by the evaluator or researcher in the site to be studied, the GAO sees a wide range of shorter term applications for case study methods in evaluation. These include their use in conjunction with other methods for illustrative and exploratory purposes, as well as for the assessment of program implementation and program effects. Appendix 1 includes a very useful discussion dealing with the adaptation of the case study method for evaluation and the modifications and compromises that evaluatorsunlike researchers who adopt traditional field work methodsare required to make.
8-6
9
Accuracy: Achievement: Affective: Anonymity: (provision for)
GLOSSARY
The extent to which an evaluation is truthful or valid in what it says about a program, project, or material. Performance as determined by some type of assessment or testing. Consists of emotions, feelings, and attitudes. Evaluator action to ensure that the identity of subjects cannot be ascertained during the course of a study, in study reports, or in any other way. Often used as a synonym for evaluation. The term is sometimes recommended for restriction to processes that are focused on quantitative and/or testing approaches A persons mental set toward another person, thing, or state. Loss of subjects from the defined sample during the course of a longitudinal study. Consumers of the evaluation; those who will or should read or hear of the evaluation, either during or at the end of the evaluation process. Includes those persons who will be guided by the evaluation in making decisions and all others who have a stake in the evaluation (see stakeholders). Alternative to traditional testing, using indicators of student task performance. The contextual information that describes the reasons for the project, including its goals, objectives, and stakeholders information needs. Facts about the condition or performance of subjects prior to treatment or intervention.
Assessment:
Attitude: Attrition:
Audience(s):
Authentic assessment:
Background:
Baseline:
9-1
Chapter 9. Glossary
Behavioral objectives:
Specifically stated terms of attainment to be checked by observation, or test/measurement. A consistent alignment with one point of view. An intensive, detailed description and analysis of a single project, program, or instructional material in the context of its environment. Checklists are the principal instrument for practical evaluation, especially for investigating the thoroughness of implementation. The person or group or agency that commissioned the evaluation. To translate a given set of data or items into descriptive or analytic categories to be used for data labeling and retrieval. A term used to designate one group among many in a study. For example, the first cohort may be the first group to have participated in a training program. A physically or temporally discrete part of a whole. It is any segment that can be combined with others to make a whole. A set of concepts that generate hypotheses and simplify description. Final judgments and recommendations.
Checklist approach:
Client: Coding:
Cohort:
Component:
A process using a parsimonious classification system to determine the characteristics of a body of material or practices. The combination of factors accompanying the study that may have influenced its results, including geographic location, timing, political and social climate, economic conditions, and other relevant professional activities in progress at the same time. A criterion (variable) is whatever is used to measure a successful or unsuccessful outcome, e.g., grade point average. Tests whose scores are interpreted by referral to well-defined domains of content or behaviors, rather than by referral to the performance of some comparable group of people. Grouping data from different persons to common questions or analyzing different perspectives on issues under study.
Criterion, criteria:
Criterion-referenced test:
Cross-case analysis:
9-2
Chapter 9. Glossary
Cross-sectional study:
A cross-section is a random sample of a population, and a crosssectional study examines this sample at one point in time. Successive cross-sectional studies can be used as a substitute for a longitudinal study. For example, examining todays first year students and todays graduating seniors may enable the evaluator to infer that the college experience has produced or can be expected to accompany the difference between them. The cross-sectional study substitutes todays seniors for a population that cannot be studied until 4 years later. A compact form of organizing the available information (for example, graphs, charts, matrices). Process of selecting, focusing, simplifying, abstracting, transforming data collected in written field notes or transcriptions. and
Data display:
Data reduction:
Delivery system:
The link between the product or service and the immediate consumer (the recipient population). Information and findings expresses in words, unlike statistical data, which are expressed in numbers. The process of stipulating the investigatory procedures to be followed in doing a specific evaluation. The process of communicating information to specific audiences for the purpose of extending knowledge and, in some cases, with a view to modifying policies and practices. Any written or recorded material not specifically prepared for the evaluation. Refers to the conclusion of a goal achievement evaluation. Success is its rough equivalent. Well-qualified and especially trained persons who can successfully interact with high-level interviewees and are knowledgeable about the issues included in the evaluation. Descriptive anthropology. Ethnographic program evaluation methods often focus on a programs culture. A nontechnical summary statement designed to provide a quick overview of the full-length report on which it is based. Evaluation conducted by an evaluator from outside the organization within which the object of the study is housed.
Descriptive data:
Design:
Dissemination:
Document:
Effectiveness:
Elite interviewers:
Ethnography:
Executive summary:
External evaluation:
9-3
Chapter 9. Glossary
Observers detailed description of what has been observed. A group selected for its relevance to an evaluation that is engaged by a trained facilitator in a series of discussions designed for sharing insights, ideas, and observations on a topic of concern to the evaluation. Evaluation designed and used to improve an intervention, especially when it is still being developed. The standard model of the classical approach to scientific research in which a hypothesis is formulated before the experiment to test its truth. An evaluation focused on outcomes or payoff. Assessing program delivery (a subset of formative evaluation). A guided conversation between a skilled interviewer and an interviewee that seeks to maximize opportunities for the expression of a respondents feelings and ideas through the use of open-ended questions and a loosely structured interview guide. Agreement by the participants in an evaluation to the use, in specified ways for stated purposes, of their names and/or confidential information they supplied. An assessment device (test, questionnaire, protocol, etc.) adopted, adapted, or constructed for the purpose of the evaluation. A staff member or unit from the organization within which the object of the evaluation is housed. Project feature or innovation subject to evaluation. Writing a case study for each person or unit studied. Person with background, knowledge, or special skills relevant to topics examined by the evaluation. An investigation or study in which a particular individual or group of individuals is followed over a substantial period of time to discover changes that may be attributable to the influence of the treatment, or to maturation, or the environment. (See also cross-sectional study.) An arrangement of rows and columns used to display multi-dimensional information.
Formative evaluation:
Hypothesis testing:
Informed consent:
Instrument:
Internal evaluator:
Longitudinal study:
Matrix:
9-4
Chapter 9. Glossary
Determination of the magnitude of a quantity. An evaluation for which the design includes the use of both quantitative and qualitative methods for data collection and data analysis. Focus group leader; often called a facilitator. A person whose role is clearly defined to project participants and project personnel as an outside observer or onlooker. Tests that measure the relative performance of the individual or group by comparison with the performance of other individuals or groups taking the same test. A specific description of an intended outcome. The process of direct sensory inspection involving trained observers. Non-numeric data in ordered categories (for example, students performance categorized as excellent, good, adequate, and poor). Post-treatment or post-intervention effects. A general conception, model, or worldview that may be influential in shaping the development of a discipline or subdiscipline. (For example, The classical, positivist social science paradigm in evaluation.) A person who becomes a member of the project (as participant or staff) in order to gain a fuller understanding of the setting and issues. A method of assessing what skills students or other project participants have acquired by examining how they accomplish complex tasks or the products they have created (e.g., poetry, artwork). Evaluation planning is necessary before a program begins, both to get baseline data and to evaluate the program plan, at least for evaluability. Planning avoids designing a program that cannot be evaluated. All persons in a particular group. Reminders used by interviewers to obtain complete answers. Creating samples by selecting information-rich cases from which one can learn a great deal about issues of central importance to the purpose of the evaluation.
Norm-referenced tests:
Outcome: Paradigm:
Participant observer:
Performance evaluation:
Planning evaluation:
9-5
Chapter 9. Glossary
Qualitative evaluation:
The approach to evaluation that is primarily descriptive and interpretative. The approach to evaluation involving the use of numerical measurement and data analysis based on statistical methods. Drawing a number of items of any sort from a larger group or population so that every individual item has a specified probability of being chosen. Suggestions for specific actions derived from analytic approaches to the program components. A part of a population. A reanalysis of data using the same or other appropriate procedures to verify the accuracy of the results of the initial analysis or for answering different questions. A questionnaire or report completed by a study participant without the assistance of an interviewer. A stakeholder is one who has credibility, power, or other capital invested in a project and thus can be held to be to some degree at risk with it. Tests that have standardized instructions for administration, use, scoring, and interpretation with standard printed forms and content. They are usually norm-referenced tests but can also be criterion referenced. A systematic plan of action to reach predefined goals. An interview in which the interviewer asks questions from a detailed guide that contains the questions to be asked and the specific areas for probing. A short restatement of the main points of a report. Evaluation designed to present conclusions about the merit or worth of an intervention and recommendations about whether it should be retained, altered, or eliminated. An intervention that can be replicated in a different site.
Quantitative evaluation:
Random sampling:
Recommendations:
Self-administered instrument:
Stakeholder:
Standardized tests:
Transportable:
9-6
Chapter 9. Glossary
Triangulation:
In an evaluation, triangulation is an attempt to get a fix on a phenomenon or measurement by approaching it via several (three or more) independent routes. This effort provides redundant measurement. The extent to which an evaluation produces and disseminates reports that inform relevant audiences and have beneficial impact on their work. Use and impact are terms used as substitutes for utilization. Sometimes seen as the equivalent of implementation, but this applies only to evaluations that contain recommendations. The soundness of the inferences made from the results of a datagathering process. Revisiting the data as many times as necessary to cross-check or confirm the conclusions that were drawn.
Utility:
Utilization of (evaluations):
Validity:
Verification:
Sources:
Jaeger, R.M. (1990). Statistics: A Spectator Sport. Newbury Park, CA: Sage. Joint Committee on Standards for Educational Evaluations (1981). Standards for Evaluation of Educational Programs, Projects and Materials. New York: McGraw Hill. Scriven, M. (1991). Evaluation Thesaurus. 4th Ed. Newbury Park, CA: Sage. Authors of Chapters 1-7.
9-7