Lectura 05 Cap 04 Juristo Terminología

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

4 BASIC NOTIONS OF

EXPERIMENTAL DESIGN
4.1. INTRODUCTION

This chapter focuses on the basic concepts to be handled during experimental


design. Before addressing design, we need to study the terminology to be used. This
is done in section 4.2. Sections 4.3 and 4.4 focus on the application of this
terminology to the particular field of SE. In those sections we suggest possible
variables for SE experiments as an aid for novice experimenters. However, the
variables proposed here are only a suggestion and SE experimenters can work with
an alternative set depending on their particular goals. Additionally, we will also
examine some variables used in real SE experiments, going beyond a merely
theoretical discussion.

Remember that experimental design has been referred to as a crucial part of


experimentation, hence the importance of this and the next chapter, which details
different kinds of designs.

4.2. EXPERIMENTAL DESIGN TERMINOLOGY

Before software engineers can experiment, they must be acquainted with


experimental design terminology. These are not difficult concepts and are basically
related to the provoked variations that distinguish one experiment from another. The
most commonly used terms in experimental design are discussed below.

x Experimental unit: The objects on which the experiment is run are called
experimental units or experimental objects. For example, patients are
experimental units in medical experiments (although any part of the human body
or any biological process is equally eligible), as is each piece of land in
agricultural experiments. SE experiments involve subjecting project
development or a particular part of the above development process to certain
conditions and then collecting a particular data set for analysis. Depending on
the goal of the experiment, the experimental unit in a SE experiment can then be
the software project as a whole or any of the intermediate products output during
this process. For example, suppose we want to experiment on the improvement
process followed by our organisation, we could compare the current process
with a process improved according to CMM recommendations. Both processes
would be assessed after application to the development of the same software
system and data would be collected on the productivity of the resources or the
errors detected. Thus, the experimental unit would, in this case, be the full
process, as this is the object to which the methods examined by this
58 Basic Notions of Experimental Design

experimentation (process improvement) are applied. However, if we wanted to


study process improvement in one area only, say requirements, the object and,
therefore, the experimental unit would be the requirements phase. Now suppose
we aim to compare the accuracy of three estimation techniques, the experimental
unit would be the requirements to which the techniques are applied. If we
wanted to compare two testing techniques, the experimental unit would be the
piece of code to which the techniques are applied. Thus, the experimental unit
would be a process or subprocess in the first example, whereas it would be a
product in the latter two.

x Experimental subjects: The person who applies the methods or techniques to


the experimental units is called experimental subject. In the above process
improvement example, the experimental subject would be the entire team of
developers. In the estimation example, the subjects would be the estimators who
apply the estimation techniques. And in the testing techniques example, the
subjects would be the people applying the testing techniques. Unlike other
disciplines, the experimental subject has a very important effect on the results of
the experiments in SE and, therefore, this variable has to be carefully considered
during experiment design. Why? Suppose, for example, that we have an
agronomy experiment aimed at determining which fertiliser is best for the
growth of a seed. The experimental subjects of this experiment would be the
people who apply the different fertilisers (experiment variables) on the same
seed sown on a piece of land (experimental unit). The action of different subjects
is not expected to affect the growth of the seed much in this experiment, as the
manner in which each subject applies the fertiliser is unlikely differ a lot. Let’s
now look at the experiment on estimation techniques in SE. The subjects of this
experiment would be software engineers who apply the three estimation
techniques on particular requirements (experimental unit). As the estimation
techniques are not independent of the characteristics of the estimator by whom
they are applied (that is, the result of the estimation will depend, for example, on
the experience of the estimator in applying the technique, in software
development and even, why not, on the emotional state of the estimator at the
time of running the experiment), the result can differ a lot depending on who the
subjects are. Similarly, the result of the application of most of the techniques and
procedures applied in SE happens to depend on who applies them, as the above
procedures are not, so as to say, automatic and independent of the software
engineer who applies them. Therefore, the role of the subjects in SE experiments
must be carefully addressed during the design of the experiment. As we will see
in Chapter 5, there are different points related to the subjects that will have an
impact on the final design of the experiment. Particularly, if we are running
experiments in which we do not intend to study the influence of the subjects, it
will be a good idea to select a design that cancels out the variability implicit in
the use of different developers. Paying special attention to the subjects is typical
of what are known as the social sciences, like psychology or SE, as opposed to
other sciences, like physics or chemistry, where the result of the application of
Basics of Software Engineering Experimentation 59

the variables does not, in principle, necessarily depend on who applies them.

x Response variable. The outcome of an experiment is referred to as a response


variable. This outcome must be quantitative (remember that this book focuses on
laboratory experiments during which quantitative data are collected). The
response variable of an experiment in SE is the project, phase, product or
resource characteristic that is measured to test the effects of the provoked
variations from one experiment to another. For example, suppose that a
researcher proposes a new project estimation technique and argues that the
technique provides a better estimation than existing techniques. The researcher
should run an experiment with several projects, some using the new technique
and others using existing techniques (experimental design would be an aid for
deciding how many projects would be required for each technique). One
possible response variable in these experiments would be the accuracy of the
estimate. The response variable in this example, accuracy, can be measured
using different metrics. For instance, we could decide to measure accuracy in
this experiment as the difference between the estimate made and the real value.
However, if the researcher claims that the new method cuts development times,
the response variable of the experiment would be development time. Therefore,
the response variable is the characteristic of the software project under analysis
and which is usually to be improved. Other examples of response variables and
metrics will be given in section 4.4. Each response variable value gathered in an
experiment is termed observation, and the analysis of all the observations will
decide whether or not the hypothesis to be tested can be validated.

The response variable is sometimes called dependent variable. This term comes
not from the field of experimental design but from another branch of
mathematics. As we discussed in section 3.2, the goal of experimentation is
usually to find a function that relates the response variable to the factors that
influence the variable. Therefore, although the term dependent variable is not
proper to experimental design, it is sometimes used.

x Parameters. Any characteristic (qualitative or quantitative) of the software


project that is to be invariable throughout the experimentation will be called
parameter. These are, therefore, characteristics that do not influence or that we
do not want to influence the result of the experiment or, alternatively, the
response variable. There are other project characteristics in the example of the
estimation technique that could influence the accuracy of the estimate:
experience of the project manager who makes the estimate, complexity of the
software system under development, etc. If we intend to analyse only the
influence of the technique on the accuracy of the estimate, the other
characteristics will have to remain unchanged from one experiment to another
(the same level of experience, same complexity of development, etc.). As we
discussed in section 2.4, the parameters have to be set by similarity and not by
identity. Therefore, the results of the experimentation will be particular to the
60 Basic Notions of Experimental Design

conditions defined by the parameters. In other words, the facts or knowledge


yielded by the experimentation will be true locally for the conditions reflected in
the parameters. The knowledge output could only be generalised by considering
the parameters as variables in successive experiments and studying their impact
on the response variable. Section 4.3.4 lists other examples of parameters used in
real experiments.

x Provoked variations or factors. Each software development characteristic to be


studied that affects the response variable is called a factor. Each factor has
several possible alternatives. Experimentation aims to examine the influence of
these alternatives on the value of the response variable. Therefore, the factors of
an experiment are any project characteristics that are intentionally varied during
experimentation and that affect the result of the experiment. Taking the example
of the estimation technique, the technique is actually the factor and its possible
alternatives are: new technique, COCOMO, Putnam’s method, etc. Other
examples of factors used in real experiments will be given in section 4.3.4.

Factors are also called predictor variables or just predictors, as they are the
characteristics of the experiment used to predict what would happen with the
response variable. Another term, taken from mathematics and used for the
factors, is independent variables.

x Alternatives or levels. The possible values of the factors during each


elementary experiment are called levels. This means that each level of a factor is
an alternative for that factor. In our example, the alternatives would be: the new
technique, COCOMO and Putnam’s method, that is, the alternatives used for
comparison.

The term treatment is often used for this concept of alternatives of a factor in
experimental design. This term dates back to the origins of experimental design,
which was conceived primarily with agricultural experimentation in mind. The
factors in this sort of studies used to be insecticides for plants or fertilisers for
land, for which the term treatment is quite appropriate. The term treatment is
also correct in medical and pharmacological experimentation. A similar thing
can be said for the term level, which is very appropriate for referring to the
examination of different concentrations of chemical products, for example. The
terms treatment and level in SE, however, can be appropriate on some occasions
and not on others. So, we prefer to use the term alternative to refer to the values
of a factor in this book. The alternatives of the factors of the experiments
addressed in this book, such as COCOMO or Putnam’s method, for example, are
qualitative, as discussed above. Remember, though, that the response variables
gathered in these experiments are quantitative. The aim of these experiments
then is to determine the quantitative effect of some alternatives. Other
quantitative experiments aim to find relationships between quantitative
variables, such as, for example, the relationship between years of experience and
Basics of Software Engineering Experimentation 61

productivity. As mentioned in Chapter 1, we are not going to address this sort of


designs as there are many examples in the SE literature. However, experiments
in which the values of the factors are qualitative are less common, and their
results can go a long way towards expanding the body of knowledge of a
discipline, particularly SE, which explains why they are the focus of this book.

Figure 4.1 shows the relationships among parameters, factors and response variables
in an experimentation.

Experimental Unit 1 Measurement of rv


p 1 = v1 – Add/delete variables
Experiment 1
p 2 = v2 – Detail goals
Set Parameters of f1 = a rv =x
the Inquiry: f2 = D – Generalise the results
p1 = v1
p2 = v2

Determine the Experimental Unit 2 Findings on the


Measurement of rv
Factors of the Inquiry: p 1 = v1 Experiment 2 influence of f1
f1, f2 p 2 = v2 Data Analysis and f2 on rv under
f1 = b rv =y the following
Determine the Alternatives x, y … z
of the Inquiry: f 2 =E conditions:
f1: a, b, c p 1 = v1
f2: D, E, O . . p 2 = v2.
. .
START Determine the . .
experimentation
Response Variables:
rv
Experimental Unit n
Measurement of rv
p 1 = v1 Experiment n
p 2 = v2
f1 = c rv =z
f 2 =O

Definition of Design Execution Analysis


Objectives

Figure 4.1. Relationship among Parameters, Factors and Response Variable in an


Experimentation

x Interactions. Two factors A and B are said to interact if the effect of one
depends on the value of the other. The interactions between the factors used in
the experiments should be studied, as this interaction will influence the results of
the response variable. Therefore, the experimental designs that include
experiments with more than one factor (factorial designs discussed in Chapter 5)
examine both the effects of the different alternatives of each factor on the
response variable and the effects of the interactions among factors on the
response variable.

x Undesired variations or blocking variables: Although, we aim to set the


characteristics of an experiment that we do not intend to examine at a constant
value, this is not always possible. There are inevitable, albeit undesired
variations from one experiment to another. These variations can affect several
elements of the experiment: the subjects who run the experiment (not enough
subjects with similar characteristics can be found to apply the different
techniques); the experimental unit (it is not possible to get very similar projects
on which to apply the different alternatives); the time when the experiment is run
(each alternative has to be applied at different points in time), etc. In short, these
variations can affect any conditions of the experiment. These variations are
62 Basic Notions of Experimental Design

known as blocking variables and call for a special sort of experimental design,
called block design (examined in Chapter 5).

x Elementary experiment or unitary experiment: Each experimental run on an


experimental unit is called elementary experiment or unitary experiment. This
means that each application of a combination of alternatives of factors by an
experimental subject on an experimental unit is an elementary experiment. Thus,
in the example of the estimation techniques, the application of each new
technique on the requirements by a particular subject is an elementary
experiment. As we have three techniques applied to the same requirements, this
experiment is composed of three elementary experiments.

x External replication: As we said in Chapter 2, external replication is performed


by independent researchers. Judd et al. (Judd, 1991) provide the following
definition of external replication: “other researchers in other settings with
different samples attempt to reproduce the research as closely as possible. If the
results of the replication are consistent with the original research, we have
increased confidence in the hypothesis that the original study supported”. We
also said in Chapter 2 that exact replication is not possible in SE, as it is not
possible to find identical subjects, identical units, etc. So when replicating
experiments, it is very important to categorise the differences between the
original experiment and the replication. Basili et al. (Basili, 1999) divided the
types of external replications into three groups:

1. Replications that do not alter the hypothesis:

(1.a) Replications that repeat the original experiment as closely as possible.

(1.b) Replications that alter the manner in which the first experiment was run.
For example, suppose we have an experiment that calls for the subjects
to be trained in the techniques to be used and the subjects are sent a
document describing the above techniques beforehand, a second
experiment could be run giving subjects classroom training.

2. Replications that alter the hypothesis:

(2.a) Replications that alter design issues, such as, for example, the detail
level of the specifications of a problem to be estimated.

(2.b) Replications that alter factors of the setting of the experiment, such as
the type of subjects who participate (students, practitioners, etc.), the
problem domain addressed, etc., for example.

3. Replications that reformulate the goals and, hence, the hypothesis of the
Basics of Software Engineering Experimentation 63

experiment: for example, suppose we have an experiment finding that a


particular testing technique detects more errors of omission than commission.
The goal of a possible replication of the above experiment would be to
distinguish which sort of errors of omission or commission are best detected
by the above technique. Thus, we could determine whether the technique is
better at detecting errors of omission irrespective of the error type or whether
the technique fails to detect omissions better than commissions for a
particular error, etc.

Of these replications, the aim of group 2 is to generalise the results of the


experiments, seeking to extend their applicability. Group 3 analyse the study in
more detail, that is, can be used examine the survey in more depth getting more
specific results from the experiments. On the other hand, group 1 replications
serve only to reinforce the results of the original experiment, as they neither
extend more modify the original hypotheses.

Examples of the three categories of replicated experiments will be mentioned


throughout the book.

x Internal replication. As mentioned in Chapter 2, the repetition of all or some of


the unitary experiments in an experimentation is referred to as internal
replication. If, for example, all the experiments of a study are repeated three
times, it is said to be an experiment with three replications. As discussed in
section 2.5, replication increases the reliability of the results of the
experimentation. In our example, we may decide that we need six elementary
experiments (equal to the combination of factors): two for each estimation
technique and each of the above two with a large or small value for project size.
This means that the values of the two identified factors are: new, COCOMO and
Putnam’s method for the estimation technique, and large and small for project
size. So, we will test COCOMO on a large and a small project, and we will do
the same with the other two techniques. Finally, as a question of confidence in
the results, we may decide to replicate each experiment three times in order to be
able to be sure about the values measured for the response variable. Remember
that, as mentioned in Chapter 2, replication is based on similarity in SE. Hence,
if we replicate each elementary experiment three times, we would then have to
work on three similar small projects, it being practically impossible to find three
exactly identical software projects. Similarly, we would have to find some
similar large projects to carry out the replication. So, we would have 18
elementary experiments to be run by the experimental subjects. The ideal thing
would be to assign a different subject to each of the 18 experiments, by means of
which we could avoid undesired effects, as we will see in Chapter 5.

Since the effect of the subjects who apply the factors to the experimental units in
SE can, as mentioned above, be very significant, there is also the possibility of
running the replication on the subjects. Our example was originally composed of
64 Basic Notions of Experimental Design

six elementary experiments (two per each estimation technique). As discussed in


Chapter 5, we should ideally have six subjects with similar characteristics to run
one elementary experiment each. Each elementary experiment could be
replicated using two similar subjects (that is, two subjects applying the same
technique to the same program) to assure that the characteristics of the subjects
have as little effect as possible on the experiment. Nonetheless, a better design
would be to run the replication using subjects and programs, that is, have each
elementary experiment replicated by two people, as in the above case, but
adding a second large and a second small project. In this case, we would have 24
elementary experiments, run by 24 subjects, 12 subjects experimenting on one
large and small program and another 12 on another large and small program.

The number of replications to be run in each experiment has to be identified


during the design process. Certain statistical concepts have to be applied and
knowledge of some characteristics of the population on which the experiment is
run is required to calculate this number. Indeed, Chapter 15 will discuss how to
know a minimum number of replications depending on how sure we need to be
about the findings of the experiment.

x Experimental error. Even if an experiment is repeated under roughly the same


conditions, the observed results are never completely identical. The differences
that occur from one repetition to another are called noise, experimental
variations, experimental error or simply error. The word error is used not in a
pejorative but in a technical sense. It refers to variations that are often inevitable.
It is absolutely blame free. There are, therefore, several possible sources, the
most self-evident of which are errors in the measurement of the values of the
response variable. However, the most interesting cause from the experimental
viewpoint are the unconsidered variations. This means that, by studying the
experimental errors, a decision can be made on whether there is a source of
variation in the experiments that has not been considered (either as a factor or as
a blocking variable). This is a means of learning about the software development
variables and their influence on the project results. Note that if an unknown
variation of this sort is detected, it invalidates the results of the experimentation,
which has to be repeated considering this new source of variation. This is what
we called stepwise approach to experimentation in section 3.1, that is, the
experiments will be run in successive round where what has been learnt in one
group of experiments will feed the following group.

The fact that they are not trained to deal with situations in which experimental errors
cannot be ignored has been a mighty obstacle for many researchers. Caution is not
only essential with regard to the possible effects of experimental error on data
analysis, its influence is also a consideration of the utmost importance in
experimental design. Therefore, an elementary knowledge of experimental error and
associated probability theory is essential for laying a solid foundation on which to
build the design and analysis of experiments. Part III of the book will detail how to
Basics of Software Engineering Experimentation 65

measure this error and its effects on experiments.

4.3. THE SOFTWARE PROJECT AS AN EXPERIMENT

4.3.1. Types of Variables in a Software Experiment

As we have mentioned, the goal of running experiments in SE is to improve


software system development. This improvement will have to be set at some point
or under some circumstance within the development project. We can consider that
the basic components of the development project are: people (developers, users and
others), products (software system and all the intermediate products), problem (need
raised by the user and point of origin of the project) and process (set of activities
and methods that implement the project from start to finish).

It is evident that the software project depends on more than one factor (for example,
the people involved, the activities performed, the methods used for development,
etc.). A proper study of software development calls for the effects of each factor to
be isolated from the effects of all the other factors so that significant claims can be
made, for example, technique X speeds up the development of Y-type software.

Below, we suggest variables that may have an impact on the outcome of software
development and which, therefore, can be taken into account when experimenting
with software development. These variables can be selected as parameters, blocking
variables or factors, depending on the goal of the experiment.

Another point remains to be made concerning the suggested variables. This point is
related to the selected experimental unit. As mentioned earlier, an experimentation
in SE can be run on the whole or any part of the project. The same variable may
play different roles (as a factor or response variable, for example) depending on
what the experimental unit is. For example, suppose we want to determine the size
of the code for implementing one and the same algorithm using two different
programming languages. In this case, the algorithm to be developed would be the
experimental unit and code size would be the response variable in question.
However, if we chose to do another experiment to test two testing techniques, the
experimental unit in this case would be the piece of code, and size would be a
possible parameter or factor, as the result of the experiment could vary depending
on its value.

Therefore, if we take part of a development project as an experimental unit in our


experiment, some of the possible factors and parameters will be the result of earlier
phases of development, whereas if we take the entire project, these very same
factors and parameters could be considered as response variables.

4.3.2 Sources of Variation in a Software Project


66 Basic Notions of Experimental Design

The origins of variables (parameters, factors, blocking and response variables) of a


SE experiment may be distinct, that is, their sources may differ. It may, therefore, be
of interest to study the sources of variables that can affect the software project in
order to identify possible experimental parameters, factors and response variables.
For this purpose, we recommend the use of two different perspectives to address the
software project: internal (inside) and external (outside) to the software project.
Different sources of parameters, factors and response variables are identified for
each perspective.

x External perspective. The software project is seen as a black box and we


examine only the variables affecting it from the outside. These variables cannot
be modified or adjusted from within the software project, as they are
predefined, so they will have to be considered parameters of the experiment.
Figure 4.2 shows the different sources that can influence a software project
from the external perspective. User characteristics can affect the development
process, as well as the characteristics of the problem that we are trying to solve,
the sources of information, some characteristics of the organisation at which the
software is being developed, and customer constraints. Therefore, these are the
sources of possible parameters and response variables in an experiment.

Figure 4.2. External parameters

x Internal perspective. The software project is viewed as a white box and we


examine only variables affecting it from the inside. These variables are
configured at the start of or during the project. Depending on the goal of the
Basics of Software Engineering Experimentation 67

experiment these variables could be selected as parameters, factors of even


response variables. Figure 4.3 shows the different sources that can influence a
software project from the internal perspective. These internal sources are
processes (composed of activities), methods, tools, personnel and products.

Figure 4.3. Internal parameters

So, having identified the possible focus of influence in the software project, we can
start to analyse and then identify possible parameters for a SE experiment. An
extensive list of possible sources of variables for a software project is given in
Annex I. Although we have sought to be exhaustive so as to aid readers with their
experiments, this should not be taken to mean that the list is comprehensive or that
readers cannot select other variables apart from those listed in this annex. Therefore,
readers who are using this book to prepare a particular SE experiment can make use
of this information to select given parameters. Some of the factors and parameters
used in real SE experiments are referred to below.

4.3.3. Parameters and Factors Used in Real SE Experiments

Table 4.1 shows some examples of factors and parameters used in real experiments.
With regard to parameters, it has to be said that it is difficult to find an accurate
description of this sort of variables in the experimental SE literature, as many are
not described explicitly in the references. Moreover, this makes it difficult to
replicate the experiments since the conditions of the experiments are not
exhaustively described. Therefore, Table 4.1 describes the parameters that have
been mentioned explicitly in some experiments. This does not, however, mean that
these were the only ones taken into account.

As far as the factors shown in the table are concerned, note that factor selection
depends on the goal of the experiment in question. They are not unique, however.
68 Basic Notions of Experimental Design

So, two experiments, which may share the same overall goal, may use different
factors and parameters. This choice also depends on the conditions and possible
constraints (time, subject, development conditions, etc.) subject to which each
experiment is run.

Table 4.1. Examples of factors and parameters in real experiments


GOAL FACTORS PARAMETERS REFERENCE
Studying the effect of x Software testing x Testing process (first (Basili, 1987)
different testing techniques (code training, then three
techniques on the reading, functional testing sessions and
effectiveness of the testing, structured then a follow-up
testing process testing) session)
x Program types: three x Program size
different programs x Familiarity of subjects
x Subject level of with editors, terminal
expertise (advanced, machines and programs
intermediate, junior) implementation
language (good
familiarity)
x High-level language for
implementing programs
Studying the effect of x Inspection technique x Order in which subjects (Kamsties, 1995)
testing techniques on the (code reading, inspect programs (first
effectiveness and functional testing, program 1, then
efficiency at revealing structured testing) program 2, and then
failures x Program types (three program 3).
different programs) x Implementation
x Subjects (six groups language (C)
of similar subjects) x Problem complexity
x Order of applying (low)
techniques x Subjects from a
university lab course
Studying the ease of x Programming x Problem complexity (Murphy, 1999)
creating a program using approach (Aspect J, (low)
an aspect-oriented Java) x Application type
approach and an OO (program with
approach concurrence)
x Subjects from a
university course
Studying the x Method x Modules with a specific (Porter, 1992)
effectiveness of methods (classification tree kind of fault
for isolating faulty analysis, random x Modules domain
modules sampling, largest (NASA environment)
module) x Implementation
language (Fortran)
Studying the quality of x Programming x Problem domain (image (Harrison, 1996)
code produced using a language (SML, C++) analysis)
functional language and x Specific development
an OO language process
x Subjects experienced in
both programming
languages
Basics of Software Engineering Experimentation 69

Studying the effect of x Development process x Subjects from a (Selby, 1987)


cleanroom development (cleanroom, non- university course
on the process and on the cleanroom) x Similar professional
product experience, academic
performance and
implementation
language experience
x Problem description (an
electronic message
system)
x Implementation
language (Simpl-T)
x Development machine
(Univac 1100/82)
Studying the best way of x Approaches to x Problem description (Arisholm, 1999)
assessing changeability assessing (commercial project for
decay changeability decay: an airline)
benchmarking, x Visual Basic 6
structure implementation
measurement, change language
complexity analysis
Studying whether x Organisational x Specific part of the (Seaman, 1998)
organisational structure distance (close: all development process
has an effect on the participants report to (process inspection)
amount of effort the same manager; x Implementation
expended on distant: at least one language (C++)
communication-between participant from a x Problem description (a
developers different management mission planning tool
area) for NASA)
x Physical distance x Number of participants
(same corridor, same in the inspection
building, separate process (around 20)
building) x Use of a specific
x Present familiarity approach for
(degree of interaction inspections described in
among participants) the paper
x Past familiarity
(degree to which a set
of participants have
worked together on
past projects)

4.4. RESPONSE VARIABLES IN SE EXPERIMENTATION

As we have already mentioned, response variables reflect the data that are collected
from experiments. They are, therefore, variables that can only be measured a
posteriori, after the entire experiment (the software project or the respective phase
or activity) has ended. Remember that the response variables with which we are
concerned in this book must provide a quantitative measure that will be studied
during the process of analysis. These variables have to represent the effect of the
different factor alternatives on the experimental units in question. For example,
suppose that we want to evaluate the accuracy of two estimation techniques; the
70 Basic Notions of Experimental Design

response variable has to measure accuracy. Alternatively, for example, if we want


to quantify the time saving when using as opposed to not using a code generator,
the response variable to be measured would be the time taken with both alternatives
to program certain specifications. Note then that the response variable of an
experiment depends mainly on the goal and hypothesis of the experiment in
question, whereas more than one response variable can be gathered for one and the
same experiment, as shown in Table 4.6. This will involve running several analyses,
one for each response variable.

The possible response variables that we can identify in software experiments can
measure characteristics of the development process, of the methods or tools used, of
the team or of the different products output during the above development process.
Table 4.2. shows some of the response variables related to each component.

Special measures or, alternatively, special metrics have to be used to get the
individual values of these response variables. The relationship between these two
concepts is discussed in the following section.

Table 4.2 Examples of response variables in SE


experiments
Development process Schedule deviation, budget
deviation, process compliance

Methods Efficiency, usability,


adaptability
Resources Productivity
Products Reliability, portability,
usability of the final product,
maintainability, design
correctness, level of code
coverage

4.4.1. Relationship between Response Variables and Metrics

The response variables of an experiment are closely related to the concept


of metric used in software development. Indeed, as mentioned earlier, metrics
applied to the products or deliverables that are output during development, the
development process or any of its activities and the resources involved in the above
development are used to measure these variables.

Response variables can be likened to what are referred to as product, process or


resource attributes in the literature on software metrics (Fenton, 1997). Here,
Basics of Software Engineering Experimentation 71

Fenton and Pfleeger class these attributes as internal and external attributes. Table
4.3 shows some product-, process- or people-related attributes arranged according
to this classification. The internal attributes of a product, process or resource are
what can be measured purely in terms of the product, process or resource. In other
words, an internal attribute can be measured by examining the product, process or
resource as distinct from its behaviour. On the other hand, the external attributes of
a product, process or resource are what can be measured solely with regard to how
the product, process or resource is related to its environment. In other words, the
behaviour of the process, product or resource is more important than the entity
itself.

Consider code, for example. An internal attribute could be its size (measured, for
example, as the number of lines of code) or we could even measure its quality by
identifying the number of faults found when it is read. However, there are other
attributes that can only be measured when the code is executed, like the number of
faults perceived by the user or the user’s difficulty in navigating from screen to
screen, for example. Table 4.3 shows other internal and external attributes for
products and resources.

Table 4.3 also shows some metrics that could be applied to evaluate the respective
attributes for response variables in terms of SE experimentation. This table is not
designed as a comprehensive guide to software metrics. It simply provides readers
with some examples that can be used to measure given attributes (or response
variables). Note that the table includes no response variables or metrics related to
the methods or tools for use, for example. However, some response variables used
to evaluate a finished product, like usability, efficiency, etc., can be applied for this
purpose.

The metrics included in Table 4.3 actually depend on the (entity, attribute) pair,
where some products, separate parts of the process or of resources are represented
under the entity column. More than one metric can be applicable to the same (entity,
attribute) pair, such as the (code, reliability) pair, which can be measured by the
number of faults in time t or by means of the mean time between failure, for
example. This table is far from being a full list of metrics for application in software
development, it simply gives examples of some of these measures.

When working with metrics, we need to consider the different sorts of measurement
scale. The most common scale types are: nominal, ordinal, interval and ratio
(Fenton, 1997) (Kitchenham, 1996).

a. Nominal scales are actually mere classifications dressed up as numerical


assignations. The values assigned to objects have neither a quantitative nor a
qualitative meaning. They simply act as mere classes of equivalence of the
classification.
72 Basic Notions of Experimental Design

b. Ordinal scales are actually mere relationships of comparison dressed up as


numerical assignations. In this case, the values assigned to the objects do not
have a quantitative meaning and act as mere marks that indicate the order of
the objects.
c. Interval scales represent numerical values, where the difference between each
consecutive pair of numbers is an equivalent amount, but there is no real zero
value. On an interval scale, 2-1 = 4-3, but two units are not twice as much as
one unit.
d. Ratio scales are similar to interval scales, but include the absolute zero. On a
ratio scale, two units are equivalent to twice the amount of one unit.
Basics of Software Engineering Experimentation 73
Table 4.3. Examples of software attributes and metrics
Entities Internal Attributes Metrics External Attributes Metrics
Products Specifications Size x number of classes Comprehensibility hours that an external analyst
x number of atomic process takes to understand the
specifications
Reuse number of classes used without change Maintainability person.months spent in making a
change
Functionality number of function points
Syntactic correctness number of syntactic faults
Designs Size number of modules Maintainability number of modules affected by a
change in another one
Reuse number of modules used without
change
Coupling number of interconnections per
module
Cohesiveness number of modules with functional
cohesion/total number of modules
Code Size Non-comment lines of code (NCLOC) Quality defects/LOC
Complexity x number of nodes in a control flow Usability hours of training before
diagram independent use of a program
x McCabe’s cyclomatic complexity
Maintainability days spent in making a change
Efficiency execution time
Reliability x number of faults in a time t
x mean time between failures
74 Basic Notions of Experimental Design

Processes Overall Time months from start to finish of the Schedule deviation estimated months/real months
Process development
Constructing Effort person.months from start to finish of Stability of number of requirements changes
specifications the activity requirements
Testing Time months from start to finish of the Cost-effectiveness number of detected defects/cost
activity of the testing activity
Effort person.months from start to finish of Quality number of detected
the activity defects/number of existing
defects
Resources Personnel Cost $ per month Productivity number-of-function-points-
implemented/person-month
Experience years of experience
Teams Size number of members Productivity number-of-function-points-
implemented/team-month
Basics of Software Engineering Experimentation 75

Table 4.4 shows examples of these scales both inside and outside SE. This table
also shows some constraints on the mathematical operators that can be applied to
each one. As discussed in Chapter 6, this scale is important insofar as it determines
the sort of method of data analysis to be applied to get the respective conclusions.

Table 4.4. Measurement type scales

Name Examples outside SE Examples inside SE Constraints


Nominal Colours: Testing methods: Categories cannot be used
1. White x type I (design in formulas even if you
inspections) map your categories to
2. Yellow
x type II (unit testing) integers.
3. Green x type III (integration We can use the mode and
4. Red testing) percentiles to describe
5. Blue x type IV (system nominal data sets.
6. Black testing)
Fault types:
x type 1 (interface)
x type 2 (I/O)
x type 3 (computation)
x type 4 (control flow)
Ordinal The Mohs scale to detect Ordinal scales are often used for Scale points cannot be
the hardness of minerals adjustment factors in cost models used in formulas. So, for
or scales for measuring based on a fixed set of scale instance, 2.5 on the SEI
intelligence. points, such as very high, high, CMM scale is not
average, low very low. meaningful. We can use
The SEI Capability Maturity the median and
Model (CMM) classifies percentiles to describe
development on a five-point ordinal data sets.
ordinal scale.
Interval Temperature scales: If we have been recording We can use the mean and
-1 degree centigrade resource productivity at six- standard deviation to
0 degrees centigrade monthly intervals since 1980, we describe interval scale
1 degree centigrade can measure time since the start data sets.
etc. of the measurement programme
on an interval scale starting with
01/01/1980 as 0, followed by
01/06/1980 as 1, etc.
Ratios Length, mass, length The number of lines of code in a We can use the mean,
program is a ratio scale measure standard deviation and
of code length. geometric mean to
describe interval data
sets.

4.4.2. How to Identify Response Variables and Metrics for a SE Experiment

The identification of response variables and metrics in an experiment is an essential


task if the experiment in question is to be significant. The concept of response
variable is often used as interchangeable with the concept of metric in the literature
on SE experiments, that is, when the response variables of an experiment are
76 Basic Notions of Experimental Design

mentioned, the metrics that will be used are sometimes directly specified, and the
two terms are thus used as synonyms.

This is the approach proposed by Basili at al. (Basili, 1994), called Goal-Question-
Metric (GQM), that has been successfully used in several experiments (Shull, 2000)
(Basili, 1987) (Lott, 1996) (Kamsties, 1995) for identifying response variables
(which are directly metrics). This approach involves defining the goal of the
experiment. We then have to generate a set of questions whose responses will help
us to determine the proposed goal and, finally, we have to analyse each question in
terms of which metric we need to know to answer each question.

Let’s take a look at an application of GQM in a real experiment to show how useful
it is for choosing the metrics of an experiment. Kamsties (1995) and Lott (1996)
applied this approach to get the metrics of an experiment that aims to study several
testing techniques. In Table 4.5, we describe the goals defined by the authors, as
well as the questions and response variables considered. Note that one and the same
response variable can be useful for answering different questions, like, for example,
the experience of the subjects, which is used in questions Q.1.2, Q.2.2, Q.3.2 and
Q.4.2. Thus, the GQM provides a structured and gradual means of determining the
response variables to be considered in an experiment, where the choice of the above
variables is based on the goal to be achieved by the above experiment.

4.4.3. Response Variables in Real SE Experiments

In this section we present some response variables found in SE experimentation


literature. As we said before, in the case of a software experiment, the response
variables, which will be assessed by means of the metrics under consideration,
depend on the goal of the experiment in question, the size of the resources available
for running the experiment, the conditions under which the experiment is run, etc.
Thus, for example, Table 4.6 shows some response variables (in this case, metrics
Basics of Software Engineering Experimentation 77

Table 4.5. Examples of GQM application to identify response variables in an experiment


Goal G.1. Effectiveness at revealing G.2. Efficiency at revealing G.3. Effectiveness at isolating G.4. Efficiency at isolating faults
failures failures failures
Questions Q.1.1.What Q.1.2. What Q. 2.1. How Q.2.2. What Q.3.1. What Q.3.2. What Q.4.1. How Q.4.2. What
percentage of effect did the many unique effect did the percentage of effect did the many faults did effect did the
total possible subject’s failure classes subject’s total faults subject’s the subject subject’s
failures did experience did the subject experience (that experience isolate per experience with
each subject with the reveal and with the manifested with the hour? the language or
reveal and language or record per language or themselves in language or motivation for
record? motivation hour? motivation failures) did motivation for the experiment
for the for the each subject the experiment have on the
experiment experiment isolate? have on the number of faults
have on the have on the percentage or isolated per
percentage number of total faults hour?
of total unique isolated?
possible failure
failures classes
revealed and revealed and
Metrics recorded? recorded per
hour?
Number of different, possible * *
failures
Subject’s experience with the * * * *
language (estimated on a scale
from 0-5)
Subject’s experience with the * * * +
language (measured in years of
working with it)
78 Basic Notions of Experimental Design

Subject’s mastery of the technique * * * *


(estimated on a scale from 0-5)
Number of times a test case caused * * * *
a program’s behaviour to deviate
from the specified behaviour
Number of revealed deviations that * * * *
the subject recorded
Amount of time the subject * *
required to reveal and record the
failures
Number of faults present in the * *
program
Number of faults that manifested * * * *
themselves as failures
For all faults that manifested * * * *
themselves as failures, the number
of those faults that were isolated
Amount of time the subject * *
required to isolate faults
Basics of Software Engineering Experimentation 79

directly) employed in real experiments alongside the goal pursued by each


experiment. This illustrates how the response variables depend on the above goal.
Note how it is possible to measure several response variables for just one
experiment. This will involve an independent analysis for each one, and a joint
interpretation of the separate analyses in order to give some response about the
defined goal (remember that data analysis will be examined in Part III of this book).

Table 4.6. Examples of response variables in real SE experiments


Goal Response Variable Experiment
Studying the effect of three x No. of faults detected (Basili, 1987)
testing techniques on the x Percentage of faults detected
effectiveness of the testing x Total fault detection time
process x Fault detection rate
Studying the effectiveness of x RE=(estimate_no._defects- (Briand, 1997)
different capture-recapture actual__defects)/actual_no._defects
models to predict the number of
remaining defects in an
inspection document
Studying the performance of x Meeting gain rate: percentage of defects first (Fusaro, 1997)
meeting inspections compared identified at the meeting
to individual inspections x Meeting loss rate: percentage of defects first
identified by an individual but not included in
the meeting report
Studying the degree of x Depth of inheritance tree: maximum level of the (Counsell,
inheritance in friend C++ inheritance hierarchy of a class 1999)
classes
Studying the performance x Number of true defects: defects that need (Land, 1997)
advantage of interacting groups rework
over average individuals x Number of false positive defects: defects that
require no repair
x Net defect score: number of true defects-number
of false positives
Studying performance between x Number of defects found after a given time (Macdonald,
individuals performing tool- period 1998)
based inspections and those
performing paper-based
inspections
Studying the effect on the x TP= (SLC/EFT) (Mizuno,
productivity of the development x SLC:size of delivered code 1998)
team on projects with accurate x EFT: total amount of effort needed in the
cost estimation development(person.month)
Studying the impact on the x Ureview/total = (Faults detected during the design (Mizuno,
number of faults for those phase) / (Faults detected during the design 1999)
projects that have correctly phase + Faults detected during the debug phase
applied specific guidelines + Faults detected during six months after code
provided by a software development) x 100
engineering process group.
x Utest/total = (Faults detected during the debug
phase) / (Faults detected during the design phase
+ Faults detected during the debug phase +
Faults detected during six months after code
Basic Notions of Experimental Design 80

development) x 100
Studying the accuracy of the x ((actual effort - estimated effort) / actual effort) (Myrtevil,
analogy-based estimation x 100 1999)
compared with the regression
model-based estimation
Studying the quality of x Number of known errors found during execution (Samaraweera,
structured versus object- of test scripts 1998)
oriented languages on the x Time to fix the known errors
development process x Number of modifications requested during code
reviews, testing and maintenance
x Time to implement modifications
x Development time
x Testing time
Studying the quality of x Number of non-comment, non-blank source
structured versus object- lines
oriented languages on the
x Number of distinct functions called
delivered code
x Number of domain specific functions called
x Depth of the function call hierarchy chart
Studying the effect of x Test cases passed (Selby, 1987)
Cleanroom development on the x Number of source lines
product developed x Number of executable statements
x Number of procedures and functions
x Completeness of the implementation as a
function of compliance of certain requirements

x Efficiency with which subjects think that they


Studying the effect of applied off-line software review techniques(1)
Cleanroom development on the x CPU time used by subjects
development process
x Number of deliveries
Studying the effect of using a x Number of tables in the database (Tortorella,
predefined process versus let x Number of modules in the structure chart 1999)
developers use a self-defined
process on the size of the
systems

Studying the effect of using a x Number of activities included in the process and
predefined process versus let not executed
developers use a self-defined x Number of deliverables expected and not
process on the defects in the produced
execution of the process x Number of activities executed incorrectly
(1)
The authors indicate that this response variable can be somewhat subjective.

4.5. SUGGESTED EXERCISES


4.5.1. An aeronautics software development laboratory aims to identify the best
two of four possible programming languages (Pascal, C, PL/M and
FORTRAN) in terms of productivity, which are to be selected to implement
two versions of the same flight control application so that if one fails the
other comes into operation. There are 12 programmers and 30 modules
with similar functionalities to flight control applications for the experiment.
Basics of Software Engineering Experimentation 81

The individual productivity of each programmer differs, which could affect


the experiment productivity. Specify what the factors, alternatives,
blocking variables, experimental subjects and objects, and parameters of
this experiment would be. What would a unitary experiment involve?

Solution: factor: programming language;


alternatives: Pascal, C, PL/M and FORTRAN;
blocking variable: 12 programmers;
subjects: 12 programmers;
experimental objects: 30 modules;
response variable: mean productivity in terms of months/person, for example;
parameters: flight control domain modules, similar complexity;
a unitary experiment would involve the implementation of one of the modules
by one of the subjects in a given language.

4.5.2. An educational institution is considering justifying whether the deployment


of an intelligent tutoring system to teach OO would improve the quality of
instruction in the above discipline. For this purpose, it decides to compare
the result of a test on this subject taken by students who have used this
intelligent tutor with the result of the same test taken by students who have
used traditional printed material. None of the students will be acquainted
with the domain; the instructors will not interact with students, which
means that the subject matter will not be explained by the instructors in
question; all the students will be of the same age; they will all be given the
same time to do the test; the test will be the same; and the motivation will
also be the same, that is, none of the students will receive anything in
return. What are the factors and parameters of the experiment, blocking
variables, experimental subjects and objects and response variable? What
would a unitary experiment involve?

Solution: factor: system of instruction


parameters: students unfamiliar with the domain;
same test; same time; same motivation; same age;
no interaction with instructors;
block: none;
subjects: students;
experimental objects: test;
response variable: test grade;
unitary experiment: a student is taught according to a particular system of
instruction and takes the test in question.

You might also like