CPD Evaluation
CPD Evaluation
Centre for Education and Inclusion Research, Sheffield Hallam University, Sheffield, UK
(Received 3 March 2010; final version received 19 May 2010)
Taylor and Francis
RJIE_A_495497.sgm
Journal
10.1080/19415257.2010.495497
1367-4587
Original
Taylor
02010
00
[email protected]
MikeColdwell
000002010
&ofArticle
Francis
In-Service
(print)/1747-5082
Education
(online)
Introduction
The evaluation of continuing professional development (CPD) in education provides
major practical challenges to those commissioning such evaluations, those undertak-
ing them and those who use them. Underlying these challenges is a further one: theo-
rising the nature and effects of CPD in ways that both do justice to the complexity of
the CPD world and generate practical possibilities for programme evaluation. Our
judgement is that this issue has been addressed at best unevenly. Many attempts have
been made to theorise CPD, but few of these seem to have influenced evaluators; and
where evaluation of CPD has been theory based, such theories are often implicit, ill-
specified or overly reductive. This is despite the enormous literature that exists on
policy and programme evaluation generally.
The purpose of this paper is to begin to address this issue by focusing on what are
often called ‘level’ models for evaluating development and training. Such models
draw on the hugely influential work of Kirkpatrick and Guskey, and the ideas of these
writers have helped to inform much of our own work of evaluating a range of CPD
(including, especially, leadership development) programmes for a number of govern-
ment agencies in England. Our experience has been an evolutionary one, with a
constant interplay between our theorising and the practicalities of delivering evalua-
tions on time and to budget. This paper tries to reflect this, by locating our thinking
both temporally in terms of our own learning and in relation to evaluation models
developed by others. The aims of the paper are threefold: first, to consider the ways
in which level models have been articulated and critiqued; second, to explain how our
own evaluation work has been influenced by these models and critiques; and third, to
stand back from these models’ use in practice to consider some more fundamental
ontological and epistemological questions to which they give rise but which are not
often discussed in evaluation reports. We conclude that the complexity of CPD
processes and effects and, crucially, of the social world requires a range of approaches,
and that – therefore – an approach based on any single model is not enough.
Approaches to evaluation
Many attempts have been made to categorise different approaches, theories or models
of evaluation. Some go back to early seminal contributors to the field (for example,
House 1978, Stufflebeam and Webster 1980, Stake 1986, Guba and Lincoln 1989);
others are more recent (for example, Alkin 2004, Hansen 2005, Stufflebeam and
Shinkfield 2007). Such classifications vary widely in their focus and underpinning
rationales. At the risk of oversimplifying a complex and ever-growing field, it is
useful to distinguish among three inter-related dimensions of the evaluation ‘prob-
lem’. These concern respectively the ‘what’, the ‘how’ and the ‘who’ of evaluation
processes. In relation to the ‘what’, a core distinction is often made (for example,
Bennett 2003) between the ‘classical’ evaluation tradition deriving from the work of
Tyler (1942), with its emphasis on specification and measurement of outputs from
later approaches that present much wider perspectives such as Stufflebeam’s (1983)
CIPP (context–input–process–product) and Cronbach’s (1982) utos (units of focus,
treatments, observations/outcomes, settings) frameworks. In terms of ‘how’, discus-
sion traditionally draws on wider discussions of methodology to contrast quantitative
approaches, particularly experimental and quasi-experimental designs (Campbell
1976, Cook et al. 2010) with approaches that seek to explore the subject of the eval-
uation using more qualitative methods such as thick description and case study
(Parlett and Hamilton 1976, Stake 1986) or approaches that draw on the traditions of
connoisseurship and criticism (Eisner 1985). Finally, in terms of ‘who’ should partic-
ipate in evaluation and determine its outcomes, the history of evaluation exhibits a
wide range of perspectives from those that give the key role to the evaluators them-
selves (Scriven 1973), through those who focus on the importance of commissioners
and managers (Stufflebeam 1983), to those who seek to engage a wider range of
stakeholders (Guba and Lincoln 1989, Patton 1997), including some who place a
particular emphasis on participative processes (Cousins and Earl 1995, Torres and
Preskill 2001) or on the engagement of the disempowered (House 1991, Fetterman
1996). We will return to the ‘how’ and ‘who’ questions later. However, the primary
focus of this paper is a particular approach to the ‘what’ question: that of what we call
‘level models’.
For example, if only the four levels of outcome are measured and a weak correlation is
measured between levels two and three, all we really know is that learning from training
was not associated with behaviour change. In the absence of a fully specified model, we
don’t know if the correlation is weak because some aspect of the training effort was not
effective or because the underlying evaluation model is not valid. (Holton 1996, p. 6)
In other words, we do not know whether poor outcomes are the result of a poorly
designed programme or of factors that lie outside the programme itself. Holton goes
on to develop a more complex model that identifies influences beyond the intervention
that are likely to determine: first, whether the intervention will result in learning;
second, whether any learning will be transferred into improved participant perfor-
mance; and third, whether such increased performance will influence organisational
results. In doing so, he considers variables relating to the individual participant (e.g.
motivation), the programme (e.g. whether it enables the individual to try out new ideas
in practice) and the organisation (e.g. whether effective transfer is rewarded).
146 M. Coldwell and T. Simkins
Using similar ideas, Leithwood and Levin (2005) explore a range of models for
evaluating the impact of both leadership and leadership development that embody
various combinations of variables. In particular, they distinguish between what they
call ‘mediating’ and ‘moderating’ variables. Mediating variables are analogous to the
intermediate levels described above, in that they lie on an assumed causative path
from the ‘independent variable’ (e.g. the leadership development intervention) to the
‘dependent variable’ (i.e. the final outcome). Moderating variables, in contrast, are
described as: ‘features of the organizational or wider context of the leader’s work that
interact with the dependent or mediating variables … [and] potentially change the
strength or nature of the relationships between them’ (Leithwood and Levin 2005,
p. 12). These authors give examples of variables relating to the characteristics of
students, teachers, leaders and the organisation, making the important point that,
depending on how the theory or framework is used to guide the study, the same vari-
able might be defined as a moderator, a mediator or a dependent variable. Thus, for
example, ‘employee trust’ might be a dependent variable (the purpose of a training
programme), a mediator (a step on the assumed causative path from leadership devel-
opment to improved employee motivation or performance) or a moderator (a factor in
the work context that influences whether employees respond positively to leadership
development activities).
The various level models described above have been used by their authors in a
variety of ways. Kirkpatrick’s original model, for example, was developed (as was
Guskey’s modification of it) for the pragmatic purpose of enabling training evalua-
tors to carry out their task more systematically. Indeed, in his rejoinder to Holton’s
critique, Kirkpatrick claims the widespread use of his approach (he does not use
the term ‘model’) for that purpose as the main evidence for its validity (Kirkpatrick
1996). Alliger and Janak’s (1994) and Holton’s (1996) critiques of Kirkpatrick, and
Leithwood and Levin’s work, in contrast, are based on a more traditional research-
oriented approach. They seek to model empirically the factors that influence train-
ing and development outcomes through identifying key variables, specifying the
relationships between these, and measuring them. Such approaches lead to more
complex models of relationships than either Kirkpatrick’s or Guskey’s and also
to the likelihood that different patterns of variables may be identified in different
situations.
The frame for the model – shown in outline form in Figure 1 – is constructed
around the following sets of key variables, and their interactions:
These variables interact in often complex ways that are sensitive to the details of
design and implementation of particular CPD activities.
As we noted above, our early studies of school leadership programmes were
Figure 1. A basic model of leadership programme effects.
randomly assigned 40 first-grade teachers to two groups. One group received a brief 4-
hour PD [Professional Development] programme. The other received an extensive 80-
hour program known as cognitively guided instruction (CGI) … The students of the
teachers who received CGI outperformed the [others] on three of the six student achieve-
ment measures. (Wayne et al. 2008, p. 469)
Such findings provide some evidence of the effects of CPD in specific areas of pupil
performance, but the more general learning is less clear. Blamey and Mackenzie
(2007, pp. 440–441) argue that such approaches flatten out ‘variations in context’ by
Professional Development in Education 151
treating interventions as ‘unified entities through which recipients are processed and
where contextual factors are conceptualised as confounding variables’ rather than
essential ingredients in understanding causal processes at work. In this case, we know
that the intervention worked in some ways to improve pupil learning, but, as Wayne
et al. (2008, p. 469) note, such studies ‘have not yet provided the kind of guidance
needed to steer investments in PD’. Whilst evaluation designs of this kind are rare in
the UK CPD evaluation literature, their underlying ontology and successionist view of
causation (x causes y because, having attempted to rule out confounding factors, x is
associated with and is temporally prior to y) are consistent with level models as used
in the United Kingdom and elsewhere. Our own model, and others we discuss above
such as Leithwood and Levin’s, draw on this tradition in that they tend to use models
highly reliant on, and derived from and modified by, empirical data. Just as social
research in this tradition has been critiqued by more recent philosophical traditions,
the next two positions discussed below can be seen, therefore, as different types of
responses to this first position.
The second set of approaches sets out to be explicitly driven by theory rather than
data, and includes the group of post-positivist approaches: ‘realist(ic) evaluation’
(Pawson and Tilley 1997), ‘theory of change’ (Connell and Kubisch 1998) and
‘programme theory’ (Rogers et al. 2000). These evaluation approaches draw on what
is now usually called the ‘critical realist’ social theory of Roy Bhaskar (1998), devel-
oped by others, notably – particularly in relation to the education field – Margaret
Archer (1995). They share the ontological position that there are real, underlying
causal mechanisms that produce regularities observable in the social world. The level
model tradition can be seen to fit in to this group, since the application of level models
to programme and other evaluations can be thought of as using a theory-based
approach. However, as we will go on to argue, level models including ours tend to
underplay the complexity of the social world discussed by the theorists working
within the critical realist paradigm in social science and evaluation research.
For realist evaluators and social researchers, the mechanisms that produce regular-
ities are derived through what can be thought of as ‘middle-range’ theories: those ‘that
lie between the minor but necessary working hypotheses … and the all-inclusive
systematic efforts to develop a unified theory’ (Merton 1968, p. 39). This is the sense
in which such approaches are described as theory based. These mechanisms operate
in specific contexts to produce particular sets of outcomes. Hence these approaches
have a generative view of causation, in contrast with the data-driven successionist
view shared by positivist/naïve realist positions (Pawson and Tilley 1997). Viewed
from this perspective, the role of the evaluator is to uncover such combinations of
context, mechanisms and outcomes. These approaches have a strong focus on learning
from evaluation about why and how programmes work, not just ‘what works’.
However, they can be criticised for failing to provide highly valid findings in the way
that is claimed for experimental studies. From this perspective, the processes underly-
ing the workings of CPD programmes are complex in a number of ways. In particular,
they are embedded both within wider social structures and in specific contexts; they
tend to lead, in context, to ‘regularities’ (in programmes, these are usually described
as outcomes); they are unstable over time; and, since they underlie what is observable,
observable data are necessarily incomplete.
Turning again to level models, two key issues emerge from this discussion. First,
from these perspectives, level models tend not to provide enough detail of the theory
152 M. Coldwell and T. Simkins
or mechanisms underlying the levels of the model, and therefore are inadequate in
explaining why particular outcomes occur in particular contexts. The processes indi-
cated by the arrows that link the boxes in such models remain largely opaque.
Secondly, for evaluators working with this post-positivist tradition, any single frame-
work such as a level model cannot deal with all the possible combinations of context,
mechanism and outcomes that may create change in a programme (Blamey and
Mackenzie 2007). The discussion of our own approach in the previous section –
indicating the difficulties we faced in dealing with programmes that support groups as
well as individuals, or that comprise multiple interventions – illustrates this well.
From a realist viewpoint, the evaluator should look at a number of possible mecha-
nisms and compare their explanatory power in any given context in order to learn from
them (Pawson and Tilley 1997). There is no inherent reason why level-type models
cannot at least partly address this point, if they are underpinned by a theory-based
understanding of the nature of learning and development, and are flexible and adapt-
able to the specifics of the programme or experience being examined. This is true of
our model, which as we have shown is essentially a highly adaptable frame for
constructing a variety of specific models to gather and interpret data. It is in fact a
‘meta-model’ to be redefined in each project. Nevertheless, one can still persuasively
argue that any single model or even meta-model is inherently limited and limiting in
its approach to understanding social processes and the complexity of the social world.
Finally, we need to consider a third category of ontological approaches to evalua-
tion, which again can be seen as being in opposition to the first position above. This
is based on an underlying ontological position that the social world is constructed by
the actors engaged within it. Associated with this is the epistemological position that
knowledge of the social world can only be obtained through the perspectives of indi-
viduals and these perspectives may legitimately differ (Berger and Luckmann 1966,
Denzin 2001). Evaluators from this tradition – which we label a constructivist position
– concentrate on the perspectives and constructed meanings of programmes, their
workings and outcomes from the viewpoints of all of those involved. Some of these
positions – particularly Guba and Lincoln’s (1989) ‘fourth generation evaluation’ –
seem to us to be extreme, seeing no possibility in generating knowledge about a
programme beyond that which is subjective, specific to particular instances and nego-
tiated among a wide range of stakeholders. This underplays a more general construc-
tivist position; namely that programme purposes may be contested, that individuals
may experience interventions in different ways, and that understanding these contes-
tations and experiences may provide important information that can contribute to our
understanding of how interventions work (Sullivan and Stewart 2006). This is the
essence of the final point in the previous section about the ways in which participants
in the programmes that we have evaluated impute different personal and organisa-
tional purposes to programmes.
Level models can address this in part by treating their components as subject to
interpretation rather than simply in terms of a priori specification, and we have done
this in many of our evaluations. Nevertheless, many theorists of professional develop-
ment would be unhappy with this, tending to be deeply suspicious of any training and
development model that they feel to be underpinned by reductionist ideas associated
with performativity agendas (Fraser et al. 2007), and level models are easily charac-
terised in this way. The emphasis of such critics would be on the capacity of profes-
sional development to facilitate professional transformation and teacher autonomy
Professional Development in Education 153
and agency (Cochran-Smith and Lytle 1999, Kennedy 2005). It could be argued that
the enhancement of professional autonomy and the encouragement of genuine critique
are just particular outcomes that can easily be incorporated into a level model.
However, often implicit in these models are instrumentalist assumptions about the role
of training and development programmes in promoting specific outcomes, which are
typically pre-determined and measured in particular ways rather than emergent and
constructed by the participants themselves. The models are concerned with promoting
‘what works’ rather than enabling practitioners to engage with ‘what makes sense’
(Simkins 2005). This leads to a deeper concern about the relationship between level
models and the nature of professional learning itself. Webster-Wright argues, for
example, that:
She argues for a distinction to be made between professional development and profes-
sional learning (PL), and for studies to focus on the latter. This would involve an
approach that ‘views learner, context, and learning as inextricably inter-related, and
investigates the experience of PL as constructed and embedded within authentic profes-
sional practice’ (Webster-Wright 2009, p. 713). This is a very different approach from
that embodied in level models.
Conclusion
It was proposed at the beginning of this paper that evaluators need to address three key
questions: what should be the focus of evaluation; how should these aspects be inves-
tigated; and whose views should count in the evaluation. It was further suggested that
level models focus on the first of these questions – the ‘what’. However, consideration
both of our experience of using level models and of the theoretical perspectives
discussed above makes it clear that things are not so simple.
Firstly, the analysis in this paper suggests that, while level models can be used in
the positivist tradition to structure evaluations of well-defined development
programmes with clearly identifiable target groups and intended outcomes, perhaps
more significant is their potential for exploring heuristically the workings of such
programmes through identifying key variables, the possible relations between them
and the ways in which these variables and relationships can be constructed: an
‘inquiry’ rather and ‘audit review’ approach to evaluation (Edlenbos and van Buuren
2005). However, the models also have limitations. From a realist perspective they do
not typically give enough attention to the real mechanisms through which outcomes
are achieved, either in their specificity or complexity; and from some constructivist
perspectives they are based on reductionist instrumental assumptions that pervert the
complex reality of genuine professional learning.
Secondly, level models need to be implemented and, in doing this, evaluators
make choices about the kinds of data to gather, who to collect it from and what
weight to give to it. Alkin and Ellett (1985) suggest three dimensions against which
models or theories of evaluation should be judged: their methodological approach
154 M. Coldwell and T. Simkins
(from quantitative to qualitative); the manner in which the data are to be judged or
valued (from unitary – by the commissioner or the evaluator – to plural); and the user
focus of the evaluation effort (from instrumental to enlightenment). Alkin (2004) uses
these broad dimensions to develop an ‘evaluation theory tree’, attempting to place
each key writer on evaluation into one of these areas based on a judgement about
their primary concern while recognising that this inevitably over-simplifies many
writers’ views.
Level models are not easily placed on any of these dimensions. For those evalua-
tors in the first of the traditions we identified above, the aim may be to specify
intended outcomes, measure these and determine whether or not they have been
achieved: a typically quantitative, unitary and instrumental approach. For others who
reject such a position, such models may nevertheless be of value. For realists they
provide one starting point for seeking to understand the complex reality of profes-
sional development and the mechanisms through which learning and other outcomes
occur in a variety of contexts. And for some constructivists, the idea of multiple
models that reflect the differing perspectives of various stakeholders may be of value.
In each of these cases, evaluations are likely to draw on more qualitative, plural and/
or enlightenment-oriented approaches than positivist approaches do.
These complications emphasise the need to consider always, when and how level
models are used. In making these decisions, attention needs to be given to the
purposes of evaluation and to the nature of the programme, activity or process being
evaluated. In their comparison of two ‘theory-driven’ approaches to evaluation,
Blamey and Mackenzie (2007) argue that ‘theory of change’ approaches are most apt
for complex, large-scale programme evaluations and examining links between their
different strands, whereas ‘realist evaluation’ approaches suit examinations of learn-
ing from particular aspects of programmes or from less complex programmes. From
our experience of attempting to apply level models to a range of programme evalua-
tions, it appears that the strengths of level models are similar to those of ‘realist eval-
uation’ models in that they can be particularly useful in uncovering the workings of
well-defined development programmes with clearly identifiable participant groups.
Nevertheless, the emphasis on learning programmes is significant here: CPD is, or
should, comprise much more than programmes. Two final consequences arise from
this. First, there will be many areas of CPD activity for which level models are inap-
propriate and other evaluation approaches must be sought. These might include
approaches such as biographical studies or rich case studies, which seek to see profes-
sional learning as an emergent personal and social process rather than one simply
embodied in inputs and outputs. They might also include approaches that engage the
learners much more explicitly as partners in the evaluation process than many
commissioned evaluations typically do. Second, the necessary incompleteness of any
one model (including level models as a family) requires us to aim explicitly to develop
our theoretical understanding of the social world and in this way to ‘make evaluations
cumulate’ (Pawson and Tilley 1997).
This leads to a final point. There is an added complexity for evaluators, such as
ourselves, working in the arena of publicly funded evaluation research. On the one
hand, as evaluators commissioned to evaluate government programmes, we normally
work under the expectation that we will generate results that are essentially instrumen-
tal: in Easterby-Smith’s (1994) terms, results that ‘prove’ (or not) programme outcomes
and perhaps also contribute to ‘improving’ programme design. However, the ways in
Professional Development in Education 155
which evaluation purposes are constructed raise important ethical issues (Elliott and
Kushner 2007), and beyond this as academics our stance has a strong enlightenment
focus, with a major concern for ‘learning’ about the programmes we study, placing
them in context and, in so far as this is possible, generating understanding that can be
extended beyond the case at hand (Torres and Preskill 2001, Coote et al. 2004). The
analysis in this paper, by exploring the ways in which level models have been used to
evaluate CPD programmes while explicitly linking them to underlying ontological
positions, helps to explore this tension. It is all too easy – and sometimes unavoidable
– to succumb to the desire of contractors, whether explicit or not, to take an essentially
positivist stance to evaluation. However, by doing so the real potential for learning may
not be fully capitalised upon. In most of the work referred to here, we have been able
to avoid this temptation, but the relationship between ‘ownership’, methodology and
integrity is one that requires constant attention.
Acknowledgements
Thanks to John Coldron, Bronwen Maxwell and anonymous reviewers for comments on earlier
drafts.
Notes
1. The Training and Development Agency for Schools is an agency of the UK Government
responsible for the training and development of the school workforce in England, admin-
istering funding, developing policy and monitoring initial teacher education and CPD of
teachers and other school staff.
2. England’s NCSL – now renamed the National College for Leadership of Schools and Chil-
dren’s Services – is one of the largest national leadership development enterprises in the
world. Largely funded by government and with a total budget about £121 million in 2008/
09, it runs or commissions a very wide range of leadership development programmes
targeted at leaders at all career stages and now covering all children’s services, not just
schools. The titles of the programmes referred to in the text are largely self-explanatory –
except for Leadership Pathways, which is a programme targeted at middle and senior lead-
ers not yet eligible for the National Professional Qualification for Headship. For further
details see http://www.nationalcollege.org.uk.
References
Alkin, M., 2004. Evaluation roots: testing theorists’ roots and influences. London: Sage.
Alkin, M. and Ellett, F., 1985. Evaluation models and their development. In: T. Husen and N.
Postlethwaite, eds. International encyclopaedia of education: research and studies.
Oxford: Pergamon, 1760–1766.
Alliger, G. and Janak, E., 1994. Kirkpatrick’s levels of training criteria: thirty years later. In:
C. Scheier, C. Russell, R. Beatty, and C. Baird, eds. The training and development
sourcebook. Amherst, MA: HRD Press, 219–228.
Archer, M., 1995. Realist social theory: the morphogenetic approach. Cambridge: Cambridge
University Press.
Bennett, J., 2003. Evaluation methods in research. London: Continuum.
Berger, P.L. and Luckmann, T., 1966. The social construction of reality: a treatise in the soci-
ology of knowledge. Garden City, NY: Anchor Books.
Bhaskar, R.A., 1998. The possibility of naturalism. 3rd ed. London: Routledge.
Blamey, A. and Mackenzie, M., 2007. Theories of change and realistic evaluation: peas in a
pod or apples and oranges. Evaluation, 13 (4), 439–455.
156 M. Coldwell and T. Simkins
Campbell, D.T., 1976. Assessing the impact of planned social change. Occasional Paper
Series 8. Hanover, NH: Dartmouth College Public Affairs Centre.
Carpenter, D.P., et al., 1989. Using knowledge of children’s mathematics thinking in class-
room teaching: an experimental study. American journal of educational research, 26
(4), 499–531.
Centre for Education and Inclusion Research, Sheffield Hallam University, 2008. Evaluation
of the CEL/NCSL 14–19 Leadership and Management Development Programme, final
report. Internal report. Nottingham: National College for School Leadership.
Cochran-Smith, M. and Lytle, S., 1999. Relationships of knowledge and practice: teacher
learning in communities. Review of research in education, 24 (1), 249–306.
Connell, J. and Kubisch, A., 1998. Applying a theory of change approach to the evaluation of
comprehensive community initiatives: progress, prospects and problems. In: K.
Fulbright-Anderson, A. Kubisch, and J. Connell, eds. New approaches to evaluating
community initiatives: Vol 2. Theory, measurement and analysis. Washington DC:
Aspen Institute, 15–44.
Cook, T.D., et al., 2010. Contemporary thinking about causation in evaluation: a dialogue with
Tom Cook and Michael Scriven. American journal of evaluation, 31 (1), 105–117.
Coote, A., Allen, J., and Woodhead, D., 2004. Finding out what works: building knowledge
about complex community-based initiatives. London: King’s Fund.
Cousins, J. and Earl, L., eds., 1995. Participatory evaluation in education: studies in evalua-
tion use and organizational learning. London: Falmer.
Cronbach, L., 1982. Designing evaluations of educational and social programs. San Francisco,
CA: Jossey-Bass.
Denzin, N.K., 2001. Interpretive interactionism. 2nd ed. London: Sage.
Easterby-Smith, M., 1994. Evaluation of management development, education, and training.
London: Gower.
Edlenbos, J. and van Buuren, A., 2005. The learning evaluation: a theoretical and empirical
exploration. Evaluation review, 29 (6), 591–612.
Eisner, E., 1985. The art of educational evaluation: a personal view. London: The Falmer Press.
Elliott, J. and Kushner, S., 2007. The need for a manifesto for educational programme evalua-
tion. Cambridge journal of education, 37 (3), 321–336.
Fetterman, D., 1996. Empowerment evaluation: an introduction to theory and practice. In: D.M.
Fetterman, S.J. Kaftarian, and A. Wandersman, eds. Empowerment evaluation: knowledge
and tools for self-assessment and accountability. Thousand Oaks, CA: Sage, 3–48.
Fraser, C., et al., 2007. Teachers’ continuing professional development: contested concepts,
understandings and models. Journal of in-service education, 33 (2), 153–169.
Guba, Y. and Lincoln, E., 1989. Fourth generation evaluation. London: Sage.
Guskey, T., 2000. Evaluating professional development. Thousand Oaks, CA: Corwin Press.
Hansen, H.F., 2005. Choosing evaluation models: a discussion on evaluation design. Evaluation,
11 (4), 447–462.
Holton, E., 1996. The flawed four-level evaluation model. Human resource development
quarterly, 7 (1), 5–22.
House, E.R., 1978. Assumptions underlying evaluation models. Educational researcher, 7 (3),
4–12.
House, E.R., 1991. Evaluation and social justice: Where are we? In: M.W. McLaughlin and
D.C. Phillips, eds. Evaluation and education: at quarter century. Chicago, IL: University
of Chicago Press, 233–247.
Kennedy, A., 2005. Models of continuing professional development: frameworks for analysis.
Journal of in-service education, 31 (2), 235–250.
Kirkpatrick, D., 1996. Invited reaction: reaction to Holton article. Human resource develop-
ment quarterly, 7 (1), 23–25.
Kirkpatrick, D., 1998. Evaluating training programmes: the four levels. 2nd ed. San Francisco,
CA: Berrett-Koehler.
Leithwood, K. and Levin, B., 2005. Assessing school leadership and leadership programme
effects on pupil learning. Nottingham: Department for Education and Skills.
Maxwell, B., Coldwell, M., and Simkins, T., 2009. Possibilities of partnerships as sites for
learning: leadership development in English 14–19 Diploma consortia. Paper presented
Professional Development in Education 157