Lectura 05 Cap 04 Juristo Terminología
Lectura 05 Cap 04 Juristo Terminología
Lectura 05 Cap 04 Juristo Terminología
EXPERIMENTAL DESIGN
4.1. INTRODUCTION
x Experimental unit: The objects on which the experiment is run are called
experimental units or experimental objects. For example, patients are
experimental units in medical experiments (although any part of the human body
or any biological process is equally eligible), as is each piece of land in
agricultural experiments. SE experiments involve subjecting project
development or a particular part of the above development process to certain
conditions and then collecting a particular data set for analysis. Depending on
the goal of the experiment, the experimental unit in a SE experiment can then be
the software project as a whole or any of the intermediate products output during
this process. For example, suppose we want to experiment on the improvement
process followed by our organisation, we could compare the current process
with a process improved according to CMM recommendations. Both processes
would be assessed after application to the development of the same software
system and data would be collected on the productivity of the resources or the
errors detected. Thus, the experimental unit would, in this case, be the full
process, as this is the object to which the methods examined by this
58 Basic Notions of Experimental Design
the variables does not, in principle, necessarily depend on who applies them.
The response variable is sometimes called dependent variable. This term comes
not from the field of experimental design but from another branch of
mathematics. As we discussed in section 3.2, the goal of experimentation is
usually to find a function that relates the response variable to the factors that
influence the variable. Therefore, although the term dependent variable is not
proper to experimental design, it is sometimes used.
Factors are also called predictor variables or just predictors, as they are the
characteristics of the experiment used to predict what would happen with the
response variable. Another term, taken from mathematics and used for the
factors, is independent variables.
The term treatment is often used for this concept of alternatives of a factor in
experimental design. This term dates back to the origins of experimental design,
which was conceived primarily with agricultural experimentation in mind. The
factors in this sort of studies used to be insecticides for plants or fertilisers for
land, for which the term treatment is quite appropriate. The term treatment is
also correct in medical and pharmacological experimentation. A similar thing
can be said for the term level, which is very appropriate for referring to the
examination of different concentrations of chemical products, for example. The
terms treatment and level in SE, however, can be appropriate on some occasions
and not on others. So, we prefer to use the term alternative to refer to the values
of a factor in this book. The alternatives of the factors of the experiments
addressed in this book, such as COCOMO or Putnam’s method, for example, are
qualitative, as discussed above. Remember, though, that the response variables
gathered in these experiments are quantitative. The aim of these experiments
then is to determine the quantitative effect of some alternatives. Other
quantitative experiments aim to find relationships between quantitative
variables, such as, for example, the relationship between years of experience and
Basics of Software Engineering Experimentation 61
Figure 4.1 shows the relationships among parameters, factors and response variables
in an experimentation.
x Interactions. Two factors A and B are said to interact if the effect of one
depends on the value of the other. The interactions between the factors used in
the experiments should be studied, as this interaction will influence the results of
the response variable. Therefore, the experimental designs that include
experiments with more than one factor (factorial designs discussed in Chapter 5)
examine both the effects of the different alternatives of each factor on the
response variable and the effects of the interactions among factors on the
response variable.
known as blocking variables and call for a special sort of experimental design,
called block design (examined in Chapter 5).
(1.b) Replications that alter the manner in which the first experiment was run.
For example, suppose we have an experiment that calls for the subjects
to be trained in the techniques to be used and the subjects are sent a
document describing the above techniques beforehand, a second
experiment could be run giving subjects classroom training.
(2.a) Replications that alter design issues, such as, for example, the detail
level of the specifications of a problem to be estimated.
(2.b) Replications that alter factors of the setting of the experiment, such as
the type of subjects who participate (students, practitioners, etc.), the
problem domain addressed, etc., for example.
3. Replications that reformulate the goals and, hence, the hypothesis of the
Basics of Software Engineering Experimentation 63
Since the effect of the subjects who apply the factors to the experimental units in
SE can, as mentioned above, be very significant, there is also the possibility of
running the replication on the subjects. Our example was originally composed of
64 Basic Notions of Experimental Design
The fact that they are not trained to deal with situations in which experimental errors
cannot be ignored has been a mighty obstacle for many researchers. Caution is not
only essential with regard to the possible effects of experimental error on data
analysis, its influence is also a consideration of the utmost importance in
experimental design. Therefore, an elementary knowledge of experimental error and
associated probability theory is essential for laying a solid foundation on which to
build the design and analysis of experiments. Part III of the book will detail how to
Basics of Software Engineering Experimentation 65
It is evident that the software project depends on more than one factor (for example,
the people involved, the activities performed, the methods used for development,
etc.). A proper study of software development calls for the effects of each factor to
be isolated from the effects of all the other factors so that significant claims can be
made, for example, technique X speeds up the development of Y-type software.
Below, we suggest variables that may have an impact on the outcome of software
development and which, therefore, can be taken into account when experimenting
with software development. These variables can be selected as parameters, blocking
variables or factors, depending on the goal of the experiment.
Another point remains to be made concerning the suggested variables. This point is
related to the selected experimental unit. As mentioned earlier, an experimentation
in SE can be run on the whole or any part of the project. The same variable may
play different roles (as a factor or response variable, for example) depending on
what the experimental unit is. For example, suppose we want to determine the size
of the code for implementing one and the same algorithm using two different
programming languages. In this case, the algorithm to be developed would be the
experimental unit and code size would be the response variable in question.
However, if we chose to do another experiment to test two testing techniques, the
experimental unit in this case would be the piece of code, and size would be a
possible parameter or factor, as the result of the experiment could vary depending
on its value.
So, having identified the possible focus of influence in the software project, we can
start to analyse and then identify possible parameters for a SE experiment. An
extensive list of possible sources of variables for a software project is given in
Annex I. Although we have sought to be exhaustive so as to aid readers with their
experiments, this should not be taken to mean that the list is comprehensive or that
readers cannot select other variables apart from those listed in this annex. Therefore,
readers who are using this book to prepare a particular SE experiment can make use
of this information to select given parameters. Some of the factors and parameters
used in real SE experiments are referred to below.
Table 4.1 shows some examples of factors and parameters used in real experiments.
With regard to parameters, it has to be said that it is difficult to find an accurate
description of this sort of variables in the experimental SE literature, as many are
not described explicitly in the references. Moreover, this makes it difficult to
replicate the experiments since the conditions of the experiments are not
exhaustively described. Therefore, Table 4.1 describes the parameters that have
been mentioned explicitly in some experiments. This does not, however, mean that
these were the only ones taken into account.
As far as the factors shown in the table are concerned, note that factor selection
depends on the goal of the experiment in question. They are not unique, however.
68 Basic Notions of Experimental Design
So, two experiments, which may share the same overall goal, may use different
factors and parameters. This choice also depends on the conditions and possible
constraints (time, subject, development conditions, etc.) subject to which each
experiment is run.
As we have already mentioned, response variables reflect the data that are collected
from experiments. They are, therefore, variables that can only be measured a
posteriori, after the entire experiment (the software project or the respective phase
or activity) has ended. Remember that the response variables with which we are
concerned in this book must provide a quantitative measure that will be studied
during the process of analysis. These variables have to represent the effect of the
different factor alternatives on the experimental units in question. For example,
suppose that we want to evaluate the accuracy of two estimation techniques; the
70 Basic Notions of Experimental Design
The possible response variables that we can identify in software experiments can
measure characteristics of the development process, of the methods or tools used, of
the team or of the different products output during the above development process.
Table 4.2. shows some of the response variables related to each component.
Special measures or, alternatively, special metrics have to be used to get the
individual values of these response variables. The relationship between these two
concepts is discussed in the following section.
Fenton and Pfleeger class these attributes as internal and external attributes. Table
4.3 shows some product-, process- or people-related attributes arranged according
to this classification. The internal attributes of a product, process or resource are
what can be measured purely in terms of the product, process or resource. In other
words, an internal attribute can be measured by examining the product, process or
resource as distinct from its behaviour. On the other hand, the external attributes of
a product, process or resource are what can be measured solely with regard to how
the product, process or resource is related to its environment. In other words, the
behaviour of the process, product or resource is more important than the entity
itself.
Consider code, for example. An internal attribute could be its size (measured, for
example, as the number of lines of code) or we could even measure its quality by
identifying the number of faults found when it is read. However, there are other
attributes that can only be measured when the code is executed, like the number of
faults perceived by the user or the user’s difficulty in navigating from screen to
screen, for example. Table 4.3 shows other internal and external attributes for
products and resources.
Table 4.3 also shows some metrics that could be applied to evaluate the respective
attributes for response variables in terms of SE experimentation. This table is not
designed as a comprehensive guide to software metrics. It simply provides readers
with some examples that can be used to measure given attributes (or response
variables). Note that the table includes no response variables or metrics related to
the methods or tools for use, for example. However, some response variables used
to evaluate a finished product, like usability, efficiency, etc., can be applied for this
purpose.
The metrics included in Table 4.3 actually depend on the (entity, attribute) pair,
where some products, separate parts of the process or of resources are represented
under the entity column. More than one metric can be applicable to the same (entity,
attribute) pair, such as the (code, reliability) pair, which can be measured by the
number of faults in time t or by means of the mean time between failure, for
example. This table is far from being a full list of metrics for application in software
development, it simply gives examples of some of these measures.
When working with metrics, we need to consider the different sorts of measurement
scale. The most common scale types are: nominal, ordinal, interval and ratio
(Fenton, 1997) (Kitchenham, 1996).
Processes Overall Time months from start to finish of the Schedule deviation estimated months/real months
Process development
Constructing Effort person.months from start to finish of Stability of number of requirements changes
specifications the activity requirements
Testing Time months from start to finish of the Cost-effectiveness number of detected defects/cost
activity of the testing activity
Effort person.months from start to finish of Quality number of detected
the activity defects/number of existing
defects
Resources Personnel Cost $ per month Productivity number-of-function-points-
implemented/person-month
Experience years of experience
Teams Size number of members Productivity number-of-function-points-
implemented/team-month
Basics of Software Engineering Experimentation 75
Table 4.4 shows examples of these scales both inside and outside SE. This table
also shows some constraints on the mathematical operators that can be applied to
each one. As discussed in Chapter 6, this scale is important insofar as it determines
the sort of method of data analysis to be applied to get the respective conclusions.
mentioned, the metrics that will be used are sometimes directly specified, and the
two terms are thus used as synonyms.
This is the approach proposed by Basili at al. (Basili, 1994), called Goal-Question-
Metric (GQM), that has been successfully used in several experiments (Shull, 2000)
(Basili, 1987) (Lott, 1996) (Kamsties, 1995) for identifying response variables
(which are directly metrics). This approach involves defining the goal of the
experiment. We then have to generate a set of questions whose responses will help
us to determine the proposed goal and, finally, we have to analyse each question in
terms of which metric we need to know to answer each question.
Let’s take a look at an application of GQM in a real experiment to show how useful
it is for choosing the metrics of an experiment. Kamsties (1995) and Lott (1996)
applied this approach to get the metrics of an experiment that aims to study several
testing techniques. In Table 4.5, we describe the goals defined by the authors, as
well as the questions and response variables considered. Note that one and the same
response variable can be useful for answering different questions, like, for example,
the experience of the subjects, which is used in questions Q.1.2, Q.2.2, Q.3.2 and
Q.4.2. Thus, the GQM provides a structured and gradual means of determining the
response variables to be considered in an experiment, where the choice of the above
variables is based on the goal to be achieved by the above experiment.
development) x 100
Studying the accuracy of the x ((actual effort - estimated effort) / actual effort) (Myrtevil,
analogy-based estimation x 100 1999)
compared with the regression
model-based estimation
Studying the quality of x Number of known errors found during execution (Samaraweera,
structured versus object- of test scripts 1998)
oriented languages on the x Time to fix the known errors
development process x Number of modifications requested during code
reviews, testing and maintenance
x Time to implement modifications
x Development time
x Testing time
Studying the quality of x Number of non-comment, non-blank source
structured versus object- lines
oriented languages on the
x Number of distinct functions called
delivered code
x Number of domain specific functions called
x Depth of the function call hierarchy chart
Studying the effect of x Test cases passed (Selby, 1987)
Cleanroom development on the x Number of source lines
product developed x Number of executable statements
x Number of procedures and functions
x Completeness of the implementation as a
function of compliance of certain requirements
Studying the effect of using a x Number of activities included in the process and
predefined process versus let not executed
developers use a self-defined x Number of deliverables expected and not
process on the defects in the produced
execution of the process x Number of activities executed incorrectly
(1)
The authors indicate that this response variable can be somewhat subjective.