Levels of Confidence in System Dynamics Modeling
Levels of Confidence in System Dynamics Modeling
Levels of Confidence in System Dynamics Modeling
Abstract
This paper provides an overview of the literature in assessment of system dynamics (SD) models
to substantiate a pragmatic framework intended to guide model testing, refinement and
evaluation. It recaps the predominant philosophy of science embraced in the field, and its
implications for model validation. It reviews tests for building confidence in SD models. In this
literature, SD is presented as a relatively uniform approach to dynamic modeling. However,
surveys of the field paint a different picture, containing surprisingly diverse forms of practice.
We draw upon this breadth of existing practice to develop our framework. We propose five
components of practice: 1) system’s mapping, 2) quantitative modeling, 3) hypothesis testing, 4)
uncertainty analysis, and 5) forecasting/optimization. In light of the proposed framework, we
reclassify tests for assessment of dynamic models across these five practical categories. We
believe this is useful to tailor tests to specific modeling efforts, guide model testing in different
phases of model development, and to help conduct partial assessments of levels of confidence.
Key words: Model testing, model evaluation, model validation, confidence building
†
Sandia is a multi-program laboratory operated by Sandia Corporation, a Lockheed Martin Company, for
the United States Department of Energy’s National Nuclear Security Administration under Contract DE-
AC04-94AL85000. The Department of Homeland Security, Science & Technology Directorate, provided
funding for this work.
Levels of Confidence in System Dynamics Modeling:
A Pragmatic Approach to Assessment of Dynamic Models
Introduction
We desired guidelines for testing and building confidence in the range of models that we use to
analyze interdependencies between critical infrastructures. Our intent was to draw upon the field
of system dynamics (SD) for existing guidelines developed for model testing and evaluation.
Forrester & Senge (1980) proposed 17 tests organized in three broad categories: tests of model
structure, of model behavior, and of policy implications. Sterman (2000) revisited this work –
clustering some of the tests and eliminating the categories– and produced 12 tests. Overall, there
is a large degree of overlap, with the exception of tests of behavior prediction, dropped in
Sterman’s overview, and a test added on integration error.
However, we realized that our modeling approaches and objectives are diverse enough that not
all of the tests proposed in the literature are applicable to each modeling project. It seemed that
it would be helpful to bin these tests to make it easier for our modelers to identify the best subset
of tests for each project. It is clear that the decision as to whether or not a particular test is
appropriate should consider the assumed basis for model truth, or correctness. We therefore
examined the literatures concerning the scientific basis, and the breadth of SD modeling practice,
to categorize model tests. The result was a regrouping of the tests proposed by Forrester &
Senge, and Sterman into five categories that are intended to represent separate components of
modeling. We think that this grouping will help guide the choice of appropriate tests on a case-
by-case basis.
While we focused upon the seminal piece by Forrester & Senge, and on Sterman’s modern and
thorough textbook, we acknowledge that the literature on SD model testing and evaluation is
wide ranging, including many specific contributions (such as Mass & Senge 1980, Peterson
1980, Tank-Nielson 1980), and other efforts to overview and discuss the tests comprehensively
(such as Forrester 1961, Richardson & Pugh 1981, Barlas 1989, Ford 1999). For practical
reasons and because none of these other sources are as comprehensive, we focused primarily
upon the first two. In our paper we also indicate some examples of new developments in the
field which are not yet incorporated into any comprehensive overview.
We perceive the organizing structure that we seek to develop here as more practically useful than
the early framework adopted by Forrester & Senge (1980), also discussed in Barlas (1996), and
later apparently dropped by Sterman (2000). Finding some structure to organize and discuss
model testing and evaluation is an important issue, particularly to communicate the “know-how”
to novice modelers, and (at least for our purposes) to develop testing protocols and standards.
This effort seems useful to the SD field in general, as communicated to us by one of the leading
figures in the field:
Others attempted to approach this issue pragmatically. For example, Wakeland and Hoarfrost
(2005) examined the portfolio of tests looking for cost/benefit measures of test usefulness. They
found that contrary to its popularity, sensitivity analysis yielded relatively low benefit and high
cost, if compared to structure assessment, dimensional consistency and boundary adequacy.
However, they acknowledged that their measures were somewhat subjective, and that the results
could vary considerably from model to model. More importantly, we fear that such an approach
would induce a “check-list” type behavior, leading practitioners to focus upon checking off
quickly as many as possible boxes with minimum effort. On the other hand, we would like to
arrive at an organizing framework that is theoretically sound and practically useful, yet explicitly
recognizes that what tests are done, and how much testing is done, is contingent upon the context
of the modeling effort. While the idea of using the portfolio of tests “intelligently” is obviously
not new, we find that none of the existing frameworks convey explicitly how to do this. Without
it, we are left with the quandary: “I cannot give you a recipe on how to test your model, but I’ll
tell you if your model is valid when I look at what you’ve done.”
Method
2. We used expert judgment to identify five components of modeling based upon observed
practice. While the literature was informative, we based this step essentially upon the
experience of modelers of different background working within our organization. The
authors, then, articulated a summary description of the essence of each component, and
decided the order of the components in the scale established in step 1;
3. We analyzed the content of tests for building confidence in SD models. Two formally
trained SD modelers, one within our organization and the other an outsider, undertook the
task of specifying 24 distinct tests working from Forrester & Senge’s 17 and Sterman’s
12;
4. The same two modelers discussed and agreed upon how to cluster the 24 tests in each of
the five components, and ranked them within each component, thus also determining
which tests could be considered basic, intermediate, and advanced;
A different paradigm emerged building upon criticism dating back to Hegel’s Coherence Theory
(1770-1831) and Dewey’s Pragmatism (1859-1952), who articulated that knowledge is socially
justified belief, rather than a product of mirroring nature and, thus, socially, culturally, and
historically dependent. But, it was only in the second half of the 20th Century that this
perspective flourished. In “The Structure of Scientific Revolutions” (1962), Kuhn argued that
scientific progress is not directed toward an objective and absolute “truth” but simply toward
“successful creative work.” Also, Sellers (1963) proposed that knowledge acquisition is holistic
rather than atomistic. These alternative perspectives led to an opposing philosophy of science
denominated relativist/holistic (Barlas & Carpenter, pp. 152, and 155-157).
In their assessment, Barlas & Carpenter concluded that the field of system dynamics is more
closely associated with the relativist/holistic viewpoint, than that of the logical/empiricist
epistemological tradition. This is reflected both in its manifested standing, and in criticism
presented by others more closely associated with a traditionalist perspective. Forrester’s early
work (1961, 1971) embraced the relativist approach (i) proposing that the validity of a model
cannot be discussed absent of its purpose, (ii) accepting “qualitative” model validation, and (iii)
interpreting “data” in a broad sense, including non-numerical and verbal information, and mental
data bases. From an empiricist standpoint, his critics objected profusely and were quick to label
the approach as subjective (Ansoff and Slevin 1968), and summarized it as “measurement
without data” (Nordhaus 1973).
Barlas & Carpenter focused upon the core philosophy of the field, grounded in a feedback-rich,
insight-driven, problem-focused approach to system dynamics modeling. They cite original
methodological texts and refer to classic models such as World Dynamics (Forrester 1961, 1971;
Legasto et al. 1980; Randers 1980-A; Richardson and Pugh 1981). In principle, and in practice,
they concluded that system dynamics is a scientific method according to the standards placed by
the relativist/holistic philosophy of science (p. 163):
Real-life experience has taught most system dynamics practitioners that models
are inherently incomplete, relative, and partly subjective, and that model validity
means usefulness with respect to a purpose. But, at the same time, many
However, a recent survey of system dynamics practice (Lane 2001) demonstrated that the field is
not as uniform as the core philosophical description would characterize it. Rather, Lane would
denominate this characterization as “initial” or “broad” system dynamics, perhaps including the
most recent text embraced by the core of the field (Sterman 2000). But, SD practice has grown
in different directions. For example, “austere” SD places greater emphasis on deterministic,
positivist and objectivist approaches including (p. 106): 1
Lane’s research findings suggest that SD practice is not only growing towards the objective
(logical/empiricist) end as discussed above, but also towards the subjective (relativist/holistic)
viewpoint. In fact, there are more new areas of practice toward the relativist/holistic side. Some
of these new forms of SD practice may be so far off towards the relativist/holistic side as to
antagonize not only positivists but also mainstream system dynamicists.
This tension is clearly observed in the field’s main journal (e.g., Richardson et al. 1994). It is
often translated into a debate between systems “thinking” and system “dynamics” (soft and hard
approaches), or between “qualitative” and “quantitative” modeling. 2 This debate was recently
revisited in articles by Coyle (2000, 2001) and Homer & Oliva (2001). While there is much
agreement amongst practitioners sharing Forrester’s heritage, as indicated by Coyle (2001, p.
357):
1
Lane’s analytical framework for mapping SD practice is not the same as that of Barlas & Carpenter’s. Instead, he
used Burrell and Morgan’s (1979) sociological approach, which in one dimension contrasts subjective and objective
views of social science. However, these frameworks are epistemologically similar. They both describe a tension
between two polar points, one which is humanistic and the other which is positivistic (Lane 2001, p. 102). Burrell
and Morgan’s framework deals not only with epistemology, but also with ontology, hermeneutics, and methodology:
Epistemological issues concern the type of knowledge that can be obtained. The
positivist view is that causal laws perceivable by an objective observer may be deduced,
whilst the humanistic stance sees knowledge as being concerned with the significance
and meaning that humans ascribe to their actions, these being drawn out via the textual
interpretation… (Lane 2001, pp. 101-102)
2
While this debate has predominantly involved soft vs. hard, and qualitative vs. quantitative approaches, it also
appears in the discussion of academic vs. professional work (Graham 2002).
It seems also to be an agreement to disagree. One side argues that “simulation [a more objective
approach to practice] nearly always adds value, even in the face of significant uncertainties about
data and the formulation of soft variables” (Homer & Oliva 2001, p. 347); while the other insists
that “quantification may not ‘represent value for money’… even more concerning is… the risks
associated with attempting to quantify multiple and poorly understood soft relationships are
likely to outweigh whatever potential benefit there might be.” (Coyle 2001, p. 357) This may
indicate that there is a range of “accepted” modeling practice, even if not everyone is on board
on what those practices are.
While Barlas & Carpenter (1990) argue that the core practice of system dynamics bets fits into
the relativist/holistic philosophy of scientific knowledge, other authors describe a modeling
practice that sufficiently large that system dynamics modeling could also be considered to also fit
into logical/empiricist tradition. Model testing should, therefore, be able to accommodate a
range in belief of what constitutes a correct model.
Components of Modeling
We chose five components of models or modeling projects that can be combined to represent the
full range of modeling approaches and objectives: 1) system’s mapping, 2) quantitative
modeling, 3) hypothesis testing, 4) uncertainty analysis, and 5) forecasting/optimization. This is
not to say that these components exist in isolation, while in some cases that may be possible. But
in general any modeling effort will consist of a combination of these components centered upon
a particular objective. Table 1 contains a summary of the characterizations that follow:
3
To map these categories, we used an axis of Liberty vs. Constraint, which can be perceived as a common factor
extracted from five dimensions: a) adoption of a divide and conquer analytical strategy (reductionism), whereby a
complex set of facts, entities, phenomena, or structures is explained using a simpler set, such that when assembled,
the small pieces explain the whole; b) knowledge is derived from reasoning (rationalism); c) knowledge is derived
from observation and experience (objectivism/empiricism); d) reasoning from the general to the particular,
combining thesis and antithesis in a dialectical process to produce a higher level of truth (synthesism); and e) amount
of agreement upon problem definition and modeling objectives (monolithism). The first three constraining factors
were derived from Barlas & Carpenter (1990), and Lane (2001). The last two were borrowed from Zagonel’s (2002)
distinction of models as “boundary objects” and “micro worlds.” The specified “bins” are based upon a
classification from observed modeling approaches and practices. The five categories fall in this ordinal scale
according to a subjective assessment of the degree of constraint imposed by a combination of the above-mentioned
factors.
This area of practice is qualitative and inductive. 4 It involves drawing influence diagrams,
causal-loop diagrams (CLDs), and stock-and-flow (S&F) diagrams, or any other form of
mapping or organization of the elements forming a system. Normally the map is focused upon
an overriding typology or theme.
Systems mapping in and of it self can be very useful. It serves as a visual summary of a
lengthier verbal or written discussion. It organizes information. If built collectively, it reflects a
shared or sum of perspectives on the issue at hand. If the framework is rich enough, it may yield
preliminary dynamic insights. For example, a stock-and-flow diagram helps to understand points
of accumulation and intervention. Alternatively, causal-loop diagrams begin to explore
reinforcing and balancing feedback. Delays can also be graphically displayed. Maps facilitate
the surfacing and clarification of assumptions, and thus can help with communication.
System’s mapping has always been a part of system dynamics. From the early days of Dynamo,
S&F and CLDs were used in model conceptualization, and to communicate diagrammatically the
structure of the computational model. Richmond et al. (1987) and Richardson (1997) articulated
“principles” behind such diagrams (unit consistency, accumulation, causality, etc.). Morecroft
(1982) contributed with the policy structure diagram, and Mashayeki (citation needed) with the
sector diagram, implemented in iThinkTM for the purpose of higher-level mapping (Richmond
1994).
We believe there are numerous examples of modeling practice that focus more directly upon this
facet of modeling. We already mentioned Coyle’s (2000, 2001) regard for this component of
practice. Some new mapping approaches were developed outside of the field (Checkland 1981,
Hodgson 1994, Eden 1994, Ackermann et al. 2004). They are increasing being incorporated in
areas of practice focused upon the other components (Vennix et al. 1990, Lane 1993, Andersen
& Richardson 1997).
2. Quantitative modeling
Quantitative modeling involves formulation and simulation. We differentiate this category from
the next, hypothesis testing, because here the formulation effort is still (counter-intuitively)
4
The SD method includes both inductive and deductive logic. Beveridge describes these two systems in the
following manner:
Logicians distinguish between inductive reasoning (from particular instances to general
principles, from facts to theories) and deductive reasoning (from the general to the
particular, applying theory to a particular case). In induction one starts from observed
data and develops a generalization that explains the relationships between the objects
observed. On the other hand, in deductive reasoning one starts from some general law
and applies it to a particular instance. (In Babbie 1992, p. 49)
Normally the qualitative steps of the SD method emphasize the induction process of model building. This is a
theory building process; the conceptual model is a representation of a theory regarding the causal relationships in the
system. Alternatively, the deduction process is captured in the quantitative steps. This is a theory testing process;
the formulated model is simulated, tested and evaluated in light of the hypothesized expectations, and tested against
“known” aspects of the “real” system.
Warren’s approach (2002, 2004), as well as the gist of group model building work (Richardson
& Andersen 1995, Vennix 1996), appear to be centered in this modeling component. They are
more akin to a hypothesis-generating approach than to a hypothesis-testing approach. But, in our
view, they serve to bridge an important gap between system’s mapping and hypothesis testing,
often stepping in both these other components simultaneously. As stated by Warren (2005):
3. Hypothesis testing
“Initial” and “broad” system dynamics –as described in Lane (2001)-- focuses upon this category
of practice, drawing upon the others to the extent that they may be helpful to build, test and
evaluate a model and the insights derived from the work. While using a system’s map or a
formulated model, the goal/product is not the map or the model, but feedback-rich insights that
result from understanding dynamic complexity in the system under study. Many of the well-
known works in the field of SD emphasize this component of modeling which is central to
system dynamics, positing applied contributions (e.g., Ford 1990, Homer 1992, Repenning
2001), and educating on its philosophy and method (e.g., Richardson & Pugh 1981, Sterman
2000).
5
Of course, the hypotheses evolve as things become clearer during the process of model building and testing. It can
be said that the whole process is geared towards formulating a more educated hypothesis, akin to producing a more
relevant problem statement or asking a better research question.
4. Uncertainty analysis
Other forms of sensitivity are quantitative and policy sensitivity. In system dynamics,
quantitative sensitivity is rarely of concern, unless prediction or forecasting is involved, as
described in the next category. In general, system dynamics is advocated for its explanatory
power and as a learning instrument, rather than a predictive or forecasting tool. Still, tests for
assessment of quantitative sensitivity do have the potential to identify areas for improvement in
the modeling work, whether in terms of parameterization, level of aggregation or model
boundary. More often discrepancies are explained away as due to model boundary decisions that
treated variables exogenously, disregarded seasonality in the behaviors, or assumed away
stochastic components in the system. Of course, these explanations are ultimately dependent
upon problem definition and insights and recommendations. Policy sensitivity is when
recommendations for system’s improvement do not always hold, given the range in which
parameters may vary.
This last category has to do with predicting future patterns of behavior, changes in those patterns,
and event prediction. Also, it includes research questions that are aimed at finding optimal or
robust solutions. 6 As already stated, this falls outside of the main concern in system dynamics
with learning, understanding, and explaining. Nevertheless, this is an area of practice that is
widely used in other modeling approaches, and it has been applied using system dynamics as
well (Coyle 1985, Moxnes et al. 2001, Graham & Ariza 2003). State of the art software offer
capability for both sensitivity analysis and optimization (Eberlein & Peterson 1992). Forecasting
6
Recently, practitioners focusing upon this component of modeling are looking to find “robust” solutions/policies
rather than “optimal” ones. Robust solutions are preferred when the time horizon is particularly long, when
situations may change significantly over time, and flexibility is needed to adjust to a changing environment, or
where there is deep uncertainty. (citations needed)
As a starting point, we aimed to classify the traditional tests discussed by Forrester & Senge and
Sterman into one (and only one) of the components outlined above, as if there were some sort of
orderly process to move from one component to another. This limitation is addressed in the
discussion. We also refrained from describing the tests in detail. 7 Table 2 serves as a summary
display of the results. The brief descriptions below were extracted from Sterman (2000), Table
21-4, pp. 859-861. Several of Sterman’s original tests were broken down into components to
accommodate to the framework.
SYSTEM’S MAPPING
QUANTITATIVE MODELING
3. Physical conservation – Q: Does the model conform to basic physical laws such as
conservation laws?
4. Dimensional consistency – Q: Is each equation dimensionally consistent without the use
of parameters having no real world meaning?
5. Integration error – Q: Are the results sensitive to the choice of time step or numerical
integration method?
6. Extreme conditions tests (equations focus) – Q: Does each equation make sense even
when its inputs take on extreme conditions?
7. Parameter assessment – Q: Do all parameters have real world counterparts? Are they
consistent with relevant descriptive and numerical knowledge of the system?
8. Basic-behaviors reproduction – Q: Does the model generate the various modes of
behavior observed in the system?
9. Endogenous behavior-reproduction tests – Q: Does the model pass behavioral
reproduction tests without the aid of exogenous inputs driving the model in
predetermined ways?
10. Boundary adequacy tests (modes of behavior) – Q: Does the behavior of the model
change significantly when boundary assumptions are relaxed?
7
More information is available in both Sterman, pp. 861-889, and Forrester & Senge (1980), pp. 212-226.
11. Qualitative problem-behavior test – Q: Does the model qualitatively reproduce the
behavior(s) of interest in the system?
12. Boundary adequacy test (problem endogeneity) – Q: Are the important concepts for
addressing the problem endogenous to the model?
13. Validity of decision rules (policy focus) – Q: Do the decision rules capture the behaviors
of the actors in the system? (policy focus)
14. Assessment of surprise behaviors – Inspection for unusual, novel, unexpected or surprise
behaviors. Q: Does the model generate previously unobserved or unrecognized behavior?
Does the model successfully anticipate the response of the system to novel conditions?
15. Behavior sensitivity analysis – Q: Do the modes of behavior generated by the model
change significantly when assumptions about parameters, boundary, and aggregation are
varied over the plausible range of uncertainty?
16. Extreme conditions tests (model behaviors focus) – Q: Does the model respond plausibly
when subjected to extreme policies, shocks, and parameters?
17. Behavior anomaly tests (changed assumptions tests) – Q: Do anomalous behaviors result
when assumptions of the model are changed or deleted?
18. Family member (generalizability) – Ability to generalize. Q: Can the model generate the
behavior observed in other instances of the same system?
UNCERTAINTY ANALYSIS
19. Quantitative sensitivity analysis – Q: Do the numerical values change significantly when
assumptions about parameters, boundary, and aggregation are varied over the plausible
range of uncertainty?
20. Policy sensitivity analysis – Q: Do the policy implications change significantly when
assumptions about parameters are varied over the plausible range of uncertainty? Is the
level of aggregation appropriate?
21. Boundary adequacy tests (policy implications) – Q: Do the policy recommendations
change when the model boundary is extended?
22. Behavior correspondence – Q: Does the model quantitatively reproduce the behavior(s)
of interest in the system?
23. Behavior prediction – Pattern prediction, event prediction, shifting-mode prediction
24. Changed-behavior prediction (prior to worry about number forecast; behavioral forecast)
We deliberately left out, for now, tests of system improvement. Some novel tests or approaches
are listed in the discussion. The above list and Table 2 are not intended as final, but rather as a
rough cut at implementing our proposed classification scheme.
System's mapping Quantitative modeling Hypothesis testing Uncertainty analysis Forecasting & optimization
S #2a -- F&S Str #1a 1 - Face validity (structural assessment through deductive process)
S #2b -- F&S Str #1b 2 - Validity of decision rules (structural focus)
S #2c -- F&S Str #1c 3 - Physical conservation
S #3 -- F&S Str #5 4 - Dimensional consistency
S #6 5 - Integration error
S #5a -- F&S Str #3 6 - Extreme conditions tests (equations focus)
S #4 -- F&S Str #2 7 - Parameter assessment
S #7a -- F&S Beh #1a 8 - Basic-behaviors reproduction
S #7 b-- F&S Beh #1b 9 - Endogenous behavior-reproduction tests
S #1a -- F&S Beh #7 10 - Boundary adequacy tests (modes of behavior )
S #7c -- F&S Beh #1c 11 - Qualitative problem-behavior test
S #1b -- F&S Str #4 12 - Boundary adequacy (problem endogeneity )
S #2d -- F&S Str #1d 13 - Validity of decision rules (policy focus)
S #10 -- F&S Beh #5 14 - Assessment of surprise behaviors
S #11a -- F&S Beh #8 15 - Behavior sensitivity analysis
Test categories: S #5b -- F&S Beh #6 16 - Extreme condition tests (model behaviors focus)
Basic S #8 -- F&S Beh #3 17 - Behavior anomaly tests (changed assumptions tests)
Intermediate S #9 -- F&S Beh #4 18 - Family member (generalizability)
Advanced Quantitative sensitivity analysis - 19 S #11b -- F&S Beh #8
Policy sensititivity analysis - 20 S #s 1+11c -- F&S Pol #4
Boundary adequacy (policy implications ) - 21 S #1c -- F&S Pol #3
Behavior correspondence - 22 S #7d -- F&S Beh #1d
Behavior prediction - 23 F&S Beh #2
Changed-behavior prediction - 24 F&S Pol #2
System's mapping Quantitative modeling Hypothesis testing Uncertainty analysis Forecasting & optimization
S - Sterman (2000); F&S - Forrester and Senge (1980); Str - Structure; Beh - Behavior; Pol - Policy implications
Preliminary reviews of our framework were favorable with respect to our purpose. The specific
components of modeling used did not resonate well to all audiences. Specifically, the
quantitative modeling and forecasting/optimization categories of practice appear to be
particularly troublesome to modelers who embrace a traditionalist view of SD. The reviewers
questioned the form we chose to cluster the tests, each in a single category, and proposed
changes discussed below. They also suggested that we go beyond the two sources studied, and
include novel tests and approaches. We address the latter two points below.
Our initial attempt to cluster the tests into the components may have produced an artificial result,
in which it might be interpreted that the components exist in isolation. While this may be true,
more often than not a modeling effort or project will encompass more than one component. This
begs the question whether we discuss (1) how one might move from one component to another,
with many possibilities for beginning and end points, and steps along the way, or (2) revisiting
the results to portray the relevance of each test vis-à-vis the components.
In the former, for example, we could describe a modeling effort driven by a dynamic hypothesis,
that moves into the mapping, quantitative, and uncertainty analysis components in a particular
order or via iterations in the modeling process, but is centered in the hypothesis testing
component, as described by Sterman (2000), Figures 3-1 and 3-2 (pages 87-105), or Richardson
and Pugh (1980), Figure 1.11 (pages 15-17), among others. Alternative approaches might
deemphasize hypothesis testing, focusing primarily upon the system’s mapping component
(Powell & Coyle 2002), or the quantitative modeling component (Richardson et al. 2004), to
provide a couple of examples.
An alternative path might be to attempt to create a matrix of tests and components that shows the
relative relevance of each test for each component. This solution would have to propose how
important and useful each of the tests is for each component. Thus, if the overall purpose of the
modeling effort is forecasting, for example, how important is testing for face validity, compared
with its relevance in a modeling effort that is primarily devoted to mapping a system and
showing interdependencies amongst key stakeholders.
From our organizational perspective and need, because most of our projects follow a particular
path, the first alternative seems more attractive. However, for the breadth of practice in the field,
the second alternative may prove more useful. We welcome other modelers’ feedback and
reactions to our framework, preliminary clustering, and alternative future paths, as you think
about how helpful this framework could be in your daily practice.
A number of new tests not included in the early literature (Forrester & Senge 1980), and texts
(Sterman 2000) exist, and would be worth including in future iterations of this framework. The
list below is illustrative but not comprehensive. We try to provide at least one example of new
development for each of the components:
• Soft, qualitative deductive procedures used to validate system’s maps (e.g. Vennix 1990,
Luna-Reyes & Andersen 2003)
Group model building has additional issues with regard to testing and confidence. For some
models correctness can be defined quantitatively, but others correctness might mean including all
viewpoints of stakeholders.
Vensim® has built-in features that automate checking model conformance against statements of
“truth.” We think that embedding these solidly held beliefs about the nature of reality in the
model is particularly helpful to validate the quantitative model.
Using pathway participation metric (PPM), Digest detects and displays which feedback loops are
most influential in explaining patterns of behavior in a model. This and other promising
approaches (based upon Eigenvalue analysis) can help test a modeler’s experiential
understanding of the link between model structure and behaviors.
• New methods designed to conduct sensitivity analysis (e.g. Ford & Flynn 2005)
In the latest issue of the System Dynamics Review, a statistical screening approach is proposed to
learn which of the many uncertain inputs to a model stand out as most influential.
Model structure analysis through graph theory uses partition heuristics and feedback structure
decomposition to increase model confidence through careful calibration. This development may
be viewed as a stepping stone toward testing dynamic hypotheses. But, in and of it self, it
delineates a path toward automation of the calibration process, which is crucial to forecasting
and optimization.
Epilogue
We believe this framework to be theoretically sound and practically useful to guide model testing
in different phases of model development, and to conduct partial assessments of levels of
confidence. It can help negotiate project deliverables, reconciling client and modeler
Acknowledgements
The authors acknowledge David Andersen, Steve Conrad, Sharon Deland, Andy Ford,
Mohammad Mojtahedzadeh, George Richardson, and Silvia Ulli-Beer for their contributions to
this research effort.
References
Ackermann F, S Howick and DF Andersen. 2004. Stirling revisited: Practical approaches to
merging two systems thinking streams. Proceedings of the 22nd International Conference
of the System Dynamics Society. Oxford, England (July 25-29).
Andersen DF and GP Richardson. 1997. Scripts for group model building. System Dynamics
Review 13(2): 107-129.
Ariza CA and AK Graham. 2002. Quick and rigorous, strategic and participative: 12 ways to
improve on the expected tradeoffs. Proceedings of the 20th International Conference of
the System Dynamics Society. Palermo, Italy (July 28-August 1).
Babbie E. 1992. The Practice of Social Research. Belmond, CA: Wadsworth Publishing
Company.
Barlas Y. 1989. Multiple tests for validation of system dynamics type of simulation models.
European Journal of Operations Research 42(1): 59-87.
Barlas Y. 1996. Formal aspects of model validity and validation in system dynamics. System
Dynamics Review 12 (3): 183-210.
Barlas Y and S Carpenter. 1990. Philosophical roots of model validation: Two paradigms.
System Dynamics Review 6(2): 148-166.
Checkland P. 1981. Systems Thinking, Systems Practice. Chichester, England: John Wiley &
Sons.
Clemson B., Y Tang, J Pyne and R Unal. 1995. Efficient methods for sensitivity analysis.
System Dynamics Review 11(1): 31-49.
Coyle RG. 1985. The use of optimization methods for policy design in a system dynamics
model. System Dynamics Review 1 (1): 81-91.
Coyle RG. 2000. Qualitative and quantitative modeling in system dynamics: Some research
questions. System Dynamics Review 16(3): 225-244.
Coyle RG. 2001. Rejoinder to Homer and Oliva. System Dynamics Review 17(4): 357-363.
Eden C. 1994. Cognitive mapping and problem structuring for system dynamics model
building. System Dynamics Review 10(2-3): 257-276.
Ford A. 1990. Estimating the impact of efficiency standards on uncertainty of the northwest
electrical system. Operations Research (July-August)
Ford A and H Flynn. 2005. Statistical screening of system dynamics models. System Dynamics
Review 21(4): 273-303.
Forrester JW and PM Senge. 1980. Tests for building confidence in system dynamics models.
In AA Legasto Jr, JW Forrester and JM Lyneis (eds.). System Dynamics. TIMS Studies in
the Management Sciences 14. New York: North-Holland: 209-228.
Hodgson AM. 1994. Hexagons for systems thinking. In JDW Morecroft and JD Sterman (eds.).
Modeling for Learning Organizations. Portland, OR: Productivity Press: 359-374.
Homer JB. 1992. A system dynamics model of national cocaine prevalence. System Dynamics
Review 9(1): 49-78.
Homer J and R Oliva. 2001. Maps and models in system dynamics: A response to Coyle.
System Dynamics Review 17(4): 347-355.
Kuhn T. 1962. The Structure of Scientific Revolutions. Chicago: University of Chicago Press.
Lane 1993. The road not taken: observing a process of issue selection and model
conceptualization. System Dynamics Review 9 (3): 239-264.
Lane DC. 2001. Rerum cognoscere causas: Part I – How do the ideas of system dynamics relate
to traditional social theories and the voluntarism/determinism debate? System Dynamics
Review 17(2): 97-118.
Legasto Jr AA, JW Forrester and JM Lyneis (eds.). 1980. System Dynamics. TIMS Studies in
the Management Sciences 14. New York: North-Holland.
Luna-Reyes L and DL Andersen. 2003. Collecting and analyzing qualitative data for system
dynamics: methods and models. System Dynamics Review 19(4): 271-296.
Mass NJ and PM Senge. 1980. Alternative tests for selecting model variables. In J Randers
(ed.). 1980. Elements of the System Dynamics Method. Cambridge, MA: Productivity
Press: 205-225.
Morecroft J. 1982. A critical review of diagramming tools for conceptualizing feedback system
models. Dynamica 8(1): 20-29.
Moxnes E. 2005. Policy sensitivity analysis: Simple versus complex fishery models. System
Dynamics Review 21(2): 123-145.
Moxnes E, O Danell, E Gaare and J Kumpula. 2001. Optimal strategies for the use of reindeer
rangelands. Ecological Modelling 145(2-3): 225-241.
Oliva R. 2003. Model calibration as a testing strategy for system dynamics models. European
Journal of Operational Research 151(3): 525-568.
Oliva R. 2004. Model structure analysis through graph theory: Partition heuristics and feedback
structure decomposition. System Dynamics Review 20(4): 313-336.
Peterson DW. 1980. Statistical tools for system dynamics. In J Randers (ed.). 1980. Elements
of the System Dynamics Method. Cambridge, MA: Productivity Press: 226-242.
Peterson DW and RL Eberlein. 1994. Reality check: A bridge between systems thinking and
system dynamics. System Dynamics Review 10(2-3): 159-174.
Powell JH and RG Coyle. 2002. Setting strategic agendas: The use of qualitative methods in
highly politicized contexts. Proceedings of the 20th International Conference of the
System Dynamics Society. Palermo, Italy (July 28-August 1).
Randers J (Ed.). 1980-A. Elements of the System Dynamics Method. Cambridge, MA:
Productivity Press.
Randers J. 1980-B. Guidelines for model conceptualization. In J Randers (ed.). Elements of the
System Dynamics Method. Cambridge, MA: Productivity Press: 117-139.
Repenning NP. 2001. Understanding fire fighting in new product development. Journal of
Product Innovation Management 18(5): 285-300.
Richardson GP. 1997. Problems in causal loop diagrams revisited. System Dynamics Review
13(3): 247-252.
Richardson GP and DF Andersen. 1995. Teamwork in group model building. System Dynamics
Review 11(2): 113-137.
Richardson GP, DF Andersen and L Luna-Reyes. 2004. Joining minds: Group modeling to link
people, process, analysis and policy design. 26th Annual Association for Public Policy
Analysis and Management Research Conference. Atlanta, GA (October 28-30).
Richardson GP and AL Pugh III. 1981. Introduction to System Dynamics Modeling with
DYNAMO. Cambridge, MA: Productivity Press.
Richardson GP, EF Wolstenholme and JDW Morecroft (eds.). 1994. Systems Thinkers, Systems
Thinking. System Dynamics Review 10(2-3).
Sellers W. 1963. Empiricism and the philosophy of mind. In Science, Perception and Reality.
New York: Humanities Press.
Sterman JD. 2000. Business Dynamics: Systems Thinking and Modeling for a Complex World.
Boston, MA: The McGraw-Hill Companies.
Vennix JAM. 1996. Group Model Building: Facilitating Team Learning Using System
Dynamics. London: John Wiley & Sons.
Vennix JAM, JW Gubbels, D Post and HJ Poppen. 1990. A structured approach to knowledge
elicitation in conceptual model building. System Dynamics Review 6(2): 194-208.
Wakeland W and M Hoarfrost. 2005. The case for thoroughly testing complex system dynamics
models. Proceedings of the 23rd International Conference of the System Dynamics
Society. Boston (July 17-21).
Warren K. 2004. Why has feedback systems thinking struggled to influence strategy and policy
formulation? Systems Research and Behavioral Science 21(4): 331-347.
Warren K. 2005. Improving strategic management with the fundamental principles of system
dynamics. System Dynamics Review 21(4): 329-350.
Wolstenholme EF. 1994. A Systematic approach to model creation. In JDW Morecroft and JD
Sterman (eds.). Modeling for Learning Organizations. Portland, Oregon: Productivity
Press: 175-194.
Zagonel AA. 2002. Model conceptualization in Group Model Building: A review of the
literature exploring the tension between representing reality and negotiating a social
order. Proceedings of the 20th International Conference of the System Dynamics Society.
Palermo, Italy (July 28-August 1).