Guidelines For Performing Systematic Literature Reviews in Software Engineering
Guidelines For Performing Systematic Literature Reviews in Software Engineering
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
0.4
Executive
Executive Summary
summary
The objective of this report is to propose comprehensive guidelines for systematic
literature reviews appropriate for software engineering researchers, including PhD
students. A systematic literature review is a means of evaluating and interpreting all
available research relevant to a particular research question, topic area, or
phenomenon of interest. Systematic reviews aim to present a fair evaluation of a
research topic by using a trustworthy, rigorous, and auditable methodology.
The guidelines presented in this report were derived from three existing guidelines
used by medical researchers, two books produced by researchers with social science
backgrounds and discussions with researchers from other disciplines who are involved
in evidence-based practice. The guidelines have been adapted to reflect the specific
problems of software engineering research.
The guidelines cover three phases of a systematic literature review: planning the
review, conducting the review and reporting the review. They provide a relatively
high level description. They do not consider the impact of the research questions on
the review procedures, nor do they specify in detail the mechanisms needed to
perform meta-analysis.
0.5 Glossary
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
Examples of SLRs
R.F. Barcelos, G.H. Travassos, Evaluation approaches for software architectural documents: a
systematic review, in: Ibero-American Workshop on Requirements Engineering and
Software Environments (IDEAS), La Plata, Argentina, 2006.
T. Dyba, V.B. Kampenes, D.I.K. Sjøberg, A systematic review of statistical power in software
engineering experiments, Information and Software Technology 48 (8) (2006) 745–755.
D. Galin, M. Avrahami, Do SQA programs work – CMM works. A meta analysis, IEEE
International Conference on Software – Science, Technology and Engineering (2005).
D. Galin, M. Avrahami, Are CMM program investments beneficial? Analyzing past studies,
IEEE Software 23 (6) (2006) 81–87.
J.E. Hannay, D.I.K. Sjøberg, T. Dybå, A systematic review of theory use in software
engineering experiments, IEEE Transactions on SE 33 (2) (2007) 87– 107.
M. Jørgensen, Estimation of software development work effort: evidence on expert judgement
and formal models, International Journal of Forecasting 3 (3) (2007) 449–462.
N. Juristo, A.M. Moreno, S. Vegas, Reviewing 25 years of testing technique experiments,
Empirical Software Engineering Journal (1–2) (2004) 7–44.
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
the review procedures, nor do they specify in detail the mechanisms needed to
perform meta-analysis.
Secondary study. A study that reviews all the primary studies relating to a specific
research question with the aim of integrating/synthesising evidence related to a
specific research question.
Glossary 2/2
Systematic literature review (also referred to as a systematic review). A form of
secondary study that uses a well-defined methodology to identify, analyse and
interpret all available evidence related to a specific research question in a way that is
unbiased and (to a degree) repeatable.
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
A systematic literature review (often referred to as a systematic review) is a means of
identifying, evaluating and interpreting all available research relevant to a particular
Reasons for performing a SLR
research question, or topic area, or phenomenon of interest. Individual studies
contributing to a systematic review are called primary studies; a systematic review is
a form of secondary study.
There are many reasons for undertaking a systematic literature review. The most
common reasons are:
To summarise the existing evidence concerning a treatment or technology e.g. to
summarise the empirical evidence of the benefits and limitations of a specific
agile method.
To identify any gaps in current research in order to suggest areas for further
investigation.
To provide a framework/background in order to appropriately position new
research activities.
However, systematic literature reviews can also be undertaken to examine the extent
to which empirical evidence supports/contradicts theoretical hypotheses, or even to
assist the generation of new hypotheses (see for example [14]).
Most research starts with a literature review of some sort. However, unless a literature
review is thorough and fair, it is of little scientific value. This is the main rationale for
undertaking systematic reviews. A systematic review synthesises existing work in a
manner that is fair and seen to be fair. For example, systematic reviews must be
undertaken in accordance with a predefined search strategy. The search strategy must
allow the completeness of the search to be assessed. In particular, researchers
performing a systematic review must make every effort to identify and report research
that does not support their preferred research hypothesis as well as identifying and
reporting research that supports it.
Some of the features that differentiate a systematic review from a conventional expert
literature review are:
Systematic reviews start by defining a review protocol that specifies the research
question being addressed and the methods that will be used to perform the review.
Systematic reviews are based on a defined search strategy that aims to detect as
much of the relevant literature as possible.
Systematic reviews document their search strategy so that readers can assess their
rigour and the completeness and repeatability of the process (bearing in mind that
searches of digital libraries are almost impossible to replicate).
Systematic reviews require explicit inclusion and exclusion criteria to assess each
potential primary study.
Systematic reviews specify the information to be obtained from each primary
study including quality criteria by which to evaluate each primary study.
A systematic review is a prerequisite for quantitative meta-analysis.
There are two other types of review that complement systematic literature reviews:
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
medicine. This similarity is due to experimental practices, subject types and blinding
procedures. Within Software Engineering it is difficult to conduct randomised
controlled trials or to undertake
Comparing Softwaredouble Engineering
blinding. In addition, experimental
human expertise and the
human subject all affect the outcome of experiments.
methodology with that of other disciplines
Table 1 Comparing Software Engineering experimental methodology with that
of other disciplines
Comparison with SE (1 is perfect agreement, 0 is
Discipline
complete disagreement)
Nursing & Midwifery 0.83
Primary Care 0.33
Organic Chemistry 0.83
Empirical Psychology 0.66
Clinical Medicine 0.17
Education 0.83
These factors mean that software engineering is significantly different from the
traditional
These factorsmedical
mean arena in which systematic
that software engineeringreviews were first developed.
is significantly For this
different from the
traditional medical arena in which systematic reviews were first developed.
5
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
The review process
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
The review process
• Planning the review
• (Commissioning a review)
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
The review process
• Planning the review
• Identification of research
• Data synthesis
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
The review process
• Planning the review
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
The need for a systematic review arises from the requirement of researchers to
Examples
Kitchenham et al. [21] argued that accurate cost estimation is important for the software
industry; that accurate cost estimation models rely on past project data; that many companies
cannot collect enough data to construct their own models. Thus, it is important to know
whether models developed from data repositories can be used to predict costs in a specific
company. They noted that a number of studies have addressed that issue but have come to
different conclusions. They concluded that it is necessary to determine whether, or under
what conditions, models derived from data repositories can support estimation in a specific
company.
Jørgensen [17] pointed out in spite of the fact that most software cost estimation research
concentrates on formal cost estimation models and that a large number of IT managers know
about tools that implement formal models, most industrial cost estimation is based on expert
judgement. He argued that researchers need to know whether software professionals are
simply irrational, or whether expert judgement is just as accurate as formal models or has
other advantages that make it more acceptable than formal models.
In both cases the authors had undertaken research in the topic area and had first hand
knowledge of the research issues.
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
advisory group to ensure the review remains focused and relevant in the context.
The commissioning phase of a systematic review is not required for a research team
Specifying the research questions is the most important part of any systematic review.
The review questions drive the entire systematic review methodology:
The search process must identify primary studies that address the research
questions.
The data extraction process must extract the data items needed to answer the
questions.
The data analysis process must synthesise the data in such a way that the
questions can be answered.
In software engineering, it is not clear what the equivalent of a diagnostic test would
be, but the other questions can be adapted to software engineering issues as follows:
Assessing the effect of a software engineering technology.
Assessing the frequency or rate of a project development factor such as the
adoption of a technology, or the frequency or rate of project success or failure.
Identifying cost and risk factors associated with a technology.
Identifying the impact of technologies on reliability, performance and cost
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
The Australian NHMR Guidelines [1] identify six types of health care questions that
can be addressed by systematic reviews:
1. Assessing the effect of intervention.
Question types in software engineering
2. Assessing the frequency or rate of a condition or disease.
3. Determining the performance of a diagnostic test.
4. Identifying aetiology and risk factors.
5. Identifying whether a condition can be predicted.
6. Assessing the economic value of an intervention or procedure.
In software engineering, it is not clear what the equivalent of a diagnostic test would
be, but the other questions can be adapted to software engineering issues as follows:
Assessing the effect of a software engineering technology.
Assessing the frequency or rate of a project development factor such as the
adoption of a technology, or the frequency or rate of project success or failure.
Identifying cost and risk factors associated with a technology.
Identifying the impact of technologies on reliability, performance and cost
models.
Cost benefit analysis of employing specific software development technologies or
software applications.
Medical guidelines often provide different guidelines and procedures for different 9
types of question. This document does not go to this level of detail.
The critical issue in any systematic review is to ask the right question. In this context,
the right question is usually one that:
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
Cost benefit analysis of employing specific software development technologies or
Asking the right question
software applications.
Medical guidelines often provide different guidelines and procedures for different
types of question. This document does not go to this level of detail.
The critical issue in any systematic review is to ask the right question. In this context,
the right question is usually one that:
Is meaningful and important to practitioners as well as researchers. For example,
researchers might be interested in whether a specific analysis technique leads to a
significantly more accurate estimate of remaining defects after design inspections.
However, a practitioner might want to know whether adopting a specific analysis
technique to predict remaining defects is more effective than expert opinion at
identifying design documents that require re-inspection.
Will lead either to changes in current software engineering practice or to
increased confidence in the value of current practice. For example, researchers
and practitioners would like to know under what conditions a project can safely
adopt agile technologies and under what conditions it should not.
Will identify discrepancies between commonly held beliefs and reality.
Nonetheless, there are systematic reviews that ask questions that are primarily of
interest to researchers. Such reviews ask questions that identify and/or scope future
research activities. For example, a systematic review in a PhD thesis should identify
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
Examples
In both cases, the authors were aware from previous research that results were mixed, so in
each case they added a question aimed at investigating the conditions under which different
results are obtained.
Question structure
In both cases, the authors were aware from previous research that results were mixed, so in
each case they added a question aimed at investigating the conditions under which different
results are obtained.
More recently Petticrew and Roberts suggest using the PICOC (Population,
Intervention, Comparison, Outcome, Context) criteria to frame research questions
[25]. These criteria extend the original medical guidelines with:
Comparison: I.e. what is the intervention being compared with
Context: i.e. what is the context in which the intervention is delivered.
Question structure in SE
In addition, study designs appropriate to answering the review questions may be
identified and used to guide the selection of primary studies.
Population
In software engineering experiments, the populations might be any of the following:
A specific software engineering role e.g. testers, managers.
A category of software engineer, e.g. a novice or experienced engineer.
An application area e.g. IT systems, command and control systems.
An industry group such as Telecommunications companies, or Small IT
companies.
A question may refer to very specific population groups e.g. novice testers, or
experienced software architects working on IT systems. In medicine the populations
are defined in order to reduce the number of prospective primary studies. In software
engineering far fewer primary studies are undertaken, thus, we may need to avoid any
restriction on the population until we come to consider the practical implications of
the systematic review.
Intervention
The intervention is the software methodology/tool/technology/procedure that
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
are defined in order to reduce the number of prospective primary studies. In software
engineering far fewer primary studies are undertaken, thus, we may need to avoid any
restriction on the population until we come to consider the practical implications of
the systematic review.
Intervention
The intervention is the software methodology/tool/technology/procedure that
addresses a specific issue, for example, technologies to perform specific tasks such as
requirements specification, system testing, or software cost estimation.
Comparison
This is the software engineering methodology/tool/technology/procedure with which
the intervention is being compared. When the comparison technology is the
conventional or commonly-used technology, it is often referred to as the “control”
treatment. The control situation must be adequately described. In particular “not using
the intervention” is inadequate as a description of the control treatment. Software
engineering techniques usually require training. If you compare people using a
technique with people not using a technique, the effect of the technique is confounded
with the effect of training. That is, any effect might be due to providing training not
the specific technique. This is a particular problem if the participants are students.
Outcomes
Outcomes should relate to factors of importance to practitioners such as improved
reliability, reduced production costs, and reduced time to market. All relevant
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
the intervention” is inadequate as a description of the control treatment. Software
engineering techniques usually require training. If you compare people using a
technique with people not using a technique, the effect of the technique is confounded
with the effect of training. That is, any effect might be due to providing training not
the specific technique. This is a particular problem if the participants are students.
Outcomes
Outcomes should relate to factors of importance to practitioners such as improved
reliability, reduced production costs, and reduced time to market. All relevant
outcomes should be specified. For example, in some cases we require interventions
that improve some aspect of software production without affecting another e.g.
improved reliability with no increase in cost. 11
Context
For Software Engineering, this is the context in which the comparison takes place
(e.g. academia or industry), the participants taking part in the study (e.g. practitioners,
academics, consultants, students), and the tasks being performed (e.g. small scale,
large scale). Many software
Source: „Guidelines experiments
for performing take place
Systematic Literature Reviews in academia
in SE“, Kitchenham etusing
al., 2007 student
measures may be misleading and conclusions based on such studies may be less
robust.
Context
For Software Engineering, this is the context in which the comparison takes place
(e.g. academia or industry), the participants taking part in the study (e.g. practitioners,
academics, consultants, students), and the tasks being performed (e.g. small scale,
large scale). Many software experiments take place in academia using student
participants and small scale tasks. Such experiments are unlikely to be representative
of what might occur with practitioners working in industry. Some systematic reviews
might choose to exclude such experiments although in software engineering, these
may be the only type of studies available.
Experimental designs
In medical studies, researchers may be able to restrict systematic reviews to primary
studies of one particular type. For example, Cochrane reviews are usually restricted to
randomised controlled trials (RCTs). In other circumstances, the nature of the
question and the central issue being addressed may suggest that certain study designs
are more appropriate than others. However, this approach can only be taken in a
discipline where the large number of research papers is a major problem. In software
engineering, the paucity of primary studies is more likely to be the problem for
systematic reviews and we are more likely to need protocols for aggregating
information from studies of widely different types.
Examples
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
The review protocol
Background. The rationale for the survey.
The research questions that the review is intended to answer.
The strategy that will be used to search for primary studies including search terms
and resources to be searched. Resources include digital libraries, specific journals,
and conference proceedings. An initial mapping study can help determine an
appropriate strategy.
Study selection criteria. Study selection criteria are used to determine which
studies are included in, or excluded from, a systematic review. It is usually
helpful to pilot the selection criteria on a subset of primary studies.
Study selection procedures. The protocol should describe how the selection
criteria will be applied e.g. how many assessors will evaluate each prospective
primary study, and how disagreements among assessors will be resolved.
Study quality assessment checklists and procedures. The researchers should
develop quality checklists to assess the individual studies. The purpose of the
quality assessment will guide the development of checklists.
Data extraction strategy. This defines how the information required from each
primary study will be obtained. If the data require manipulation or assumptions
and inferences to be made, the protocol should specify an appropriate validation
process.
Synthesis of the extracted data. This defines the synthesis strategy. This should
clarify whether or not a formal meta-analysis is intended and if so what
techniques will be used.
Dissemination strategy (if not already included in a commissioning document).
Project timetable. This should define the review schedule.
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
previously,
Once researchers
the protocol should
has been expect
agreed, thetoreview
try outproper
each of thestart.
can stepsHowever,
describedasinnoted
this
section when
previously, they construct
researchers their
should research
expect to tryprotocol.
out each of the steps described in this
6.1
Identification of research
section when they construct their research protocol.
Identification of Research
6.1 Identification of Research
The aim of a systematic review is to find as many primary studies relating to the
research
The question
aim of as possible
a systematic reviewusing
is toan unbiased
find as manysearch strategy.
primary The
studies rigourtoofthe
relating the
search process
research is one
question factor that
as possible distinguishes
using an unbiasedsystematic reviewsThe
search strategy. from traditional
rigour of the
reviews.
search process is one factor that distinguishes systematic reviews from traditional
reviews.
6.1.1 Generating a search strategy
6.1.1 Generating
It is necessary a search
to determine strategy
and follow a search strategy. This should be developed in
consultation
It is necessarywith librarians and
to determine or others
followwith relevant
a search experience.
strategy. Searchbestrategies
This should developedarein
usually iterative
consultation withand benefitor
librarians from:
others with relevant experience. Search strategies are
Preliminary
usually searches
iterative and aimed
benefit at both identifying existing systematic reviews and
from:
assessing thesearches
Preliminary volume aimed
of potentially relevant studies.
at both identifying existing systematic reviews and
Trial searches
assessing using various
the volume combinations
of potentially relevantofstudies.
search terms derived from the
research
Trial question.
searches using various combinations of search terms derived from the
Checkingquestion.
research trial research strings against lists of already known primary studies.
Consultations
Checking trial with experts
research in the
strings field.lists of already known primary studies.
against
Consultations with experts in the field.
A general approach is to break down the question into individual facets i.e.
population,
A intervention,
general approach comparison,
is to break down the outcomes,
questioncontext, study designs
into individual as discussed
facets i.e.
in Section 5.3.2.
population, Then draw
intervention, up a list ofoutcomes,
comparison, synonyms, abbreviations,
context, and alternative
study designs as discussed
spellings.
in Section Other
5.3.2. terms can beupobtained
Then draw a list of by considering
synonyms, subject headings
abbreviations, used in
and alternative
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
Jørgensen [16] investigated when we can expect expert estimates to have acceptable
However, publication
(e.g. time estimation
where failure
engineering
Publication bias
accuracy in comparison with formal models by reviewing relevant human judgement studies
bias remains
studies) a problem
and comparing their particularly forresults
results with the formalofexperiments,
software
to reject the null hypothesis is considered less interesting than an
studies.
experiment that is able to reject the null hypothesis. Publication bias is even more of a
problem when methods/techniques
6.1.2 Publication Bias are sponsored by influential groups in the software
industry.
PublicationForbias
example,
refers the
to theUSproblem
MoD is that
an extremely important
positive results and influential
are more likely to be
organisation
published than which sponsored
negative theThe
results. development
concept of of the Capability
positive or negativeMaturity
resultsModel and
sometimes
used its influence
depends to encourage
on the viewpoint of theindustry to adopt
researcher. (For the CMM.evidence
example, In such circumstances
that full
few companieswere
mastectomies wouldnotwant
alwaysto publish
requirednegative results
for breast andwas
cancer there is a strong
actually incentive
an extremely
to publishresult
positive papers
forthat support
breast thesufferers.)
cancer new method/technique.
Once reference lists have been finalised the full articles of potentially useful studies
will need to be obtained. A logging system is needed to make sure all relevant studies
are obtained.
6.2
Study selection criteria
Study Selection
Once the potentially relevant primary studies have been obtained, they need to be
Inclusion andtheir
assessed for exclusion
actual criteria
relevance.should be based on the research question. They
should be piloted to ensure that they can be reliably interpreted and that they classify
6.2.1 correctly.
studies Study selection criteria
Study selection criteria are intended to identify those primary studies that provide
Examples
direct evidence about the research question. In order to reduce the likelihood of bias,
Kitchenham et al. used
selection criteria thebe
should following
decidedinclusion
duringcriteria:
the protocol definition, although they may
any study that compared predictions of cross-company models with within-
be refinedcompany
during the search process.
models based on analysis of single company project data.
Inclusion
They used and exclusion
the following criteriacriteria:
exclusion should be based on the research question. They
should bestudies
pilotedwhere
to ensure thatwere
projects theyonly
cancollected
be reliably
frominterpreted and that
a small number they classify
of different sources
(e.g. 2 or 3 companies),
studies correctly.
studies where models derived from a within-company data set were compared with
Examplespredictions from a general cost estimation model.
18
Kitchenham et al. used the following inclusion criteria:
Jørgensenany[17]study
included
that papers
comparedthatpredictions
compare judgment-based
of cross-company andmodels
model-based software
with within-
development effort estimation.
company He also
models based excluded
on analysis ofone relevant
single paper
company due to
project “incomplete
data.
information
They used theabout how the
following estimates
exclusion were derived”.
criteria:
studies where projects were only collected from a small number of different sources
Issues: (e.g. 2 or 3 companies),
Medical standards
studies where make
modelsa derived
point that it aiswithin-company
from important to avoid, as were
data set far ascompared
possible,with
predictions
exclusions basedfrom a general
on the costof
language estimation
the primarymodel.
study. This may not be so
important for Software Engineering.
Jørgensen [17] included papers that compare judgment-based and model-based software
It is possible
development effort
Source:
that inclusion
estimation.
„Guidelines He decisions
also
for performing excluded
Systematic
could one
Literature
be affected
relevant
Reviews
by knowledge
paper
in SE“, due to
Kitchenham
of the
“incomplete
et al., 2007
obtained. However, Brereton et al. [5] point out that “
engineering abstracts is too poor to rely on when sele
More study selection criteria
should also review the conclusions.”
Most quality checklists (see Section 6.3.2) include questions aimed at assessing the
extent to which articles have addressed bias and validity.
In most cases, data extraction will define a set of numerical values that should be
extracted for each study (e.g. number of subjects, treatment effect, confidence
intervals, etc.). Numerical data are important for any attempt to summarise the results
of a set of primary studies and are a prerequisite for meta-analysis (i.e. statistical
techniques aimed at integrating the results of the primary studies).
29
Data extraction forms need to be piloted on a sample of primary studies. If several
researchers will use the forms, they should all take part in the pilot. The pilot studies
are intended to assess both technical issues such as the completeness of the forms and
usability issues such as the clarity of user instructions and the ordering of questions.
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
Examples
Kitchenham et al. [21] used the extraction form shown in Table 7 (note the actual form also
included the quality questions).
Within-company model
What technique(s) was used A preliminary productivity analysis
to construct the within- was used to identify factors for
company model? inclusion in the effort estimation
model.
Data Summary
Data base summary (all Effort min: 7.8 MM KLOC: non-blank, non-
projects) for size and effort Effort max: 4361 MM comment delivered 1000
metrics. Effort mean: 284 MM lines. For reused code
Effort median: 93 MM Boehm’s adjustment were
Size min: 2000 KLOC made (Boehm, 1981).
Size max: 413000 KLOC Effort was measured in
Size mean: 51010 KLOC man months, with 144 man
Size median: 22300 KLOC hours per man month
With-company data Effort min: Not specified
summary for size and effort Effort max:
metrics. Effort mean:
Effort median:
Size min:
Size max:
Size mean:
Size median:
Jørgensen [17] extracted design factors and primary study results. Design factors included:
Study design
Estimation method selection process
Estimation models
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
Accuracy
Variance
Other results
Data extraction procedures
Jørgensen’s article includes the completed extraction form for each primary study.
If several researchers each review different primary studies because time or resource
constraints prevent all primary papers being assessed by at least two researchers, it is
important to employ some method of checking that researchers extract data in a
consistent manner. For example, some papers should be reviewed by all researchers
(e.g. a random sample of primary studies), so that inter-researcher consistency can be
assessed.
For single researchers such as PhD students, other checking techniques must be used.
For example supervisors could perform data extraction on a random sample of the
primary studies and their results cross-checked with those of the student.
Alternatively, a test-retest process can be used where the researcher performs a
second extraction from a random selection of primary studies to check data extraction
consistency.
ExamplesSource: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
meta-analysis see [7].)
It is important to identify whether results from studies are consistent with one another
(i.e. homogeneous) or inconsistent (e.g. heterogeneous). Results may be tabulated to
display the impact of potential sources of heterogeneity, e.g. study type, study quality,
and sample size.
34
Source: „Guidelines for performing Systematic Literature Reviews in SE“, Kitchenham et al., 2007
He concluded that models are not systematically better than experts for software cost
estimation, possibly because experts possess more information than models or it may be
Study 1
Study 2
Study 3