Gravino 2015
Gravino 2015
Gravino 2015
a r t i c l e i n f o
abstract
Article history:
Received 5 February 2014
Received in revised form
28 November 2014
Accepted 17 December 2014
Available online 25 December 2014
Objective: The main objective is to investigate whether the comprehension of objectoriented source-code increases when it is added with UML class and sequence diagrams
produced in the software design phase.
Methods: We conducted a controlled experiment and a differentiated replication with
young software maintainers. In particular, groups of Bachelor and Master students were
involved.
Results: The results show that more experienced participants better comprehend
source-code when added with UML design models. An average improvement (or benefit)
of circa 12% was achieved when the participants accomplished the comprehension task
with UML class and sequence diagrams. The results of an analysis on the time to
accomplish comprehension tasks showed that less experienced participants significantly
spent more time when comprehending source-code with UML design models. This kind of
participants spent on average 44.8% of the time to accomplish the same task with sourcecode alone.
Implications: It is useless to give UML design models to comprehend source-code in
case maintainers are not adequately experienced with the UML. Furthermore, the less the
experience of participants, the more the time to accomplish a comprehension task with
UML diagram is.
& 2014 Elsevier Ltd. All rights reserved.
Keywords:
Design models
Controlled experiment
Source-code comprehension
1. Introduction
Several issues (e.g., technical and managerial) contribute to the cost to execute comprehension tasks and might
affect the comprehension of source-code [1]. For example,
the absence of software documentation might impact on
This paper has been recommended for acceptance by Shi Kho Chang.
Corresponding author.
E-mail addresses: [email protected] (C. Gravino),
[email protected] (G. Scanniello),
[email protected] (G. Tortora).
n
http://dx.doi.org/10.1016/j.jvlc.2014.12.004
1045-926X/& 2014 Elsevier Ltd. All rights reserved.
24
specifically focus on source-code comprehension tasks supported by UML class and sequence diagrams, while the
authors consider UML based documentation (a use case
diagram, sequence diagrams for each use case, and a class
diagram) on modification tasks performed both on UML
diagrams and source-code. In some sense, our work fills a
gap in that work, explicitly considering those diagrams that
are ignored there. Furthermore, the authors do not focus
their study on the models produced in a given phase of the
development process: models produced in the requirement
engineering process and design phase have been considered
together. Another difference with respect to our study is that
the effect of experience and ability is not analyzed at all.
Dzidek et al. [12] investigate the costs and benefits in
using the UML to maintain and evolve software systems.
The authors conduct a controlled experiment with professional programmers. The results reveal that the use of the
UML significantly impacts the functional correctness of the
maintenance operations. Conversely, the use of the UML
does not significantly affect the time to perform maintenance operations. This result is corroborated in our
study: the effect of UML diagrams is not significant in task
completion time. Then, the use of the UML is not a cause of
distraction in case software maintainers have experience
with that notation. The main difference with respect to our
investigation is that the focus of the controlled experiment
is not the comprehension of source-code.
Staron et al. [13] show the results of a series of
controlled experiments with students and professionals
on UML stereotypes represented by ad hoc icons. The
authors assess the effectiveness of these stereotypes in
UML class diagrams on the tasks of comprehending objectoriented applications in the telecommunication domain.
The use of stereotypes significantly improves the comprehension of the considered applications. As opposed to the
present study, the effect of UML behavioral diagrams (i.e.,
sequence diagrams) is not investigated.
Genero et al. [14] present a controlled experiment with
77 undergraduate students which studies the influence of
stereotypes in the comprehension of UML sequence diagrams. The effect of stereotypes is not statistically significant. However, the results show a slight tendency in favor
of stereotypes. The effect of sequence diagrams both using
and not using stereotypes is not analyzed with respect to
source-code comprehension.
2.2. Effect of the ability and experience on UML
comprehension tasks
As far as the influence of participants' ability and experience on the execution of comprehension tasks is concerned,
a few empirical investigations have been conducted. For
example, Briand et al. [7] establish that training is required to
achieve better results when the UML is coupled with the OCL
(Object Constraint Language). The authors focus on models
produced in the requirements engineering process and
consider in their investigation three typical activities: (i)
understanding the analysis document; (ii) modifying the
analysis document; and (iii) detecting defects in the analysis
document. The authors find that the OCL has the potential to
improve an engineer's ability to understand, inspect, and
3. Controlled experiments
We have conducted a survey on the role of the UML in
the Italian software industry [17]. The results suggest that the
core business of the interviewed companies mostly concerns
the development and the maintenance of software systems
implemented with object-oriented programming languages.
The greater part of these companies uses UML class and
sequence diagrams produced in both the requirements
engineering process and design phase (referred to in what
follows as requirements and design models, respectively).
Another result of this survey is that maintenance operations
are performed by practitioners with a few years of experience. The companies generally employ people with a Bachelor or a Master degree in Computer Science with less than
5 years of experience. To perform maintenance operations
the companies spend from 1 to 5 person-hours for typical
corrective changes,2 while the average effort ranges from 10
to 50 person-hours for perfective changes.3
2
A reactive modification performed after the delivery of a software
system to correct discovered problems.
3
A modification performed after the delivery of a software system to
improve its performance or maintainability.
25
3.1. Definition
Applying the Goal Question Metric (GQM) paradigm
[23], the goal of the experiments presented in the paper
can be defined as follows:
Analyze source-code comprehension
4
www2.unibas.it/gscanniello/UMLDesignModSourceCode/
26
27
28
Table 2
Post-experiment survey questionnaire.
Table 1
Experiment design.
Run
Group A
Group B
Group C
Group D
Run 1
Run 2
S1, Mo
S2, NO_Mo
S1, NO_Mo
S2, Mo
S2, Mo
S1, NO_Mo
S2, NO_Mo
S1, Mo
Id
Question
Possible
answers
Table 3
Performed analyses.
Factor/cofactors and their interaction
Investigation
Method
Ability
Experience
Method vs. Ability
Method vs. Experience
System
Order of method
Wilcoxon test
MannWhitney
MannWhitney
Interaction plot
Interaction plot
MannWhitney
MannWhitney
test
test
test
test
29
Table 4
Descriptive statistics of the comprehension dependent variable.
Experiment
System
Mo
NO_Mo
Min
Max
Med
Mean
Std. Dev.
Min
Max
Med
Mean
Std. Dev.
UniSa
All
S1
S2
10
11
10
15
14
15
14
14
13.5
13.06
13.12
13
1.53
1.25
1.85
9
9
10
14
13
14
12
11.5
12
11.75
11.62
11.88
1.29
1.41
1.25
UniBas
All
S1
S2
8
10
8
14
14
14
12
12
12.50
11.88
12
11.75
1.71
1.07
2.25
9
9
10
13
13
13
12
12
12
11.88
11.88
11.88
1.47
1.25
1.13
30
Table 5
Descriptive statistics of the task completion time dependent variable.
Experiment
System
Mo
NO_Mo
Min
Max
Med
Mean
Std. Dev.
Min
Max
Med
Mean
Std. Dev.
UniSA
All
S1
S2
15
15
15
65
65
42
27.50
31.50
27
30.06
33.12
27
13.13
16.17
9.30
12
15
12
34
34
34
20
20.50
20
23
23.38
22.62
7.61
7.25
8.43
UniBas
All
S1
S2
26
26
38
60
45
60
40
32.50
40
38.69
33.88
43.50
8.15
6.22
7.13
14
15
14
28
28
26
22
24.50
18.50
21.19
23.12
19.25
4.71
4.09
4.71
Table 6
Analysis on the comprehension dependent variable.
Exp.
UniSa
UniBas
Yes ( o 0:01)
No (0.57)
E. Size
0.62
0.05
S. Power
0.91
0.05
Mo 4NO_Mo
Moo NO_Mo
Mo NO_Mo
Max
Med
Mean
Std. Dev.
3
4
1
0
1.31
0
1.66
1.63
13/16
5/16
2/16
6/16
1/16
5/16
3
3
Mo 4NO_Mo
Moo NO_Mo
Mo NO_Mo
Table 7
Analysis on the task completion time dependent variable.
Exp.
UniSa
UniBas
No (0.12)
Yes ( o 0:01)
E. Size
0.30
0.83
S. Power
0.36
1
10/16
16/16
6/16
0/16
0/16
0/16
Min
Max
Med
Mean
Std. Dev.
14
4
50
43
9
16.5
7.06
17.5
16.61
10.09
31
Table 8
Descriptive statistics on the comprehension dependent variable grouping participants by ability and experiment/experience.
Experiment
Ability
Observations
Mo
NO_Mo
Min
Max
Med
Mean
Std. Dev.
Min
Max
Med
Mean
Std. Dev.
UniSa
High
Low
9
7
11
10
15
15
14
13
13.33
12.71
1.32
1.80
10
9
14
13
12
11
12.11
11.29
1.27
1.23
UniBas
High
Low
11
5
11
8
14
13
12
10
12.55
10.4
1.04
2.07
10
9
13
12
12
11
12.27
11
0.90
1.22
Table 9
Descriptive statistics on the task completion time dependent variable grouping participants by ability and experiment/experience.
UniSa
UniBas
Ability
High
Low
High
Low
Observations
9
7
11
5
Mo
NO_Mo
Min
Max
Med
Mean
Std. Dev.
Min
Max
Med
Mean
Std. Dev.
15
15
26
30
65
40
60
45
30
24
38
41
32.44
27
38
40.2
15.79
8.91
9.11
6.14
12
15
14
15
34
34
26
28
20
20
22
22
22
24.29
20.91
21.8
8.17
7.22
4.87
4.82
15
Ability
High
Low
Comprehension
Experiment
10
0
Mo
NO_Mo
Method
Table 8 shows some descriptive statistics on the comprehension dependent variable grouping the participants
by Ability and Experience (i.e., experiment). High ability
participants achieved better comprehension values than
low ability ones within each experiment and using both
Mo and NO_Mo. Furthermore, high and low ability participants to UniSa achieved better comprehension values
with Mo than NO_Mo. For UniBas, high ability participants
achieved nearly the same comprehension value both using
and not using the diagrams. Low ability participants
achieved slightly better values with NO_Mo on comprehension. Regarding task completion time (see Table 9),
there is not a huge difference between high and low ability
participants within each experiment on NO_Mo. The only
remarkable difference concerns UniSa and Mo: high ability
participants slightly spent more time than low ability
participants to accomplish a source-code comprehension
task. For high ability participants the median is 30 and the
mean is 32.44, while for low ability participants the value
for these descriptive statistics are 24 and 27.
The effect of Ability on the considered dependent
variables is not statistically significant within each experiment as the results of the MannWhitney test revealed.
32
80
15
Low
High
High
Low
60
10
Time
Comprehension
Ability
Ability
40
5
20
0
Mo
NO_Mo
Mo
NO_Mo
Method
Method
Fig. 2. Analysis of Ability on Comprehension for UniBas, using the interaction plot.
Fig. 4. Analysis of Ability on Task Completion Time for UniBas, using the
interaction plot.
15
80
Experience
Low
High
Time
60
40
Comprehension
Ability
Low
High
10
20
Mo
Mo
NO_Mo
NO_Mo
Method
Method
of
Experience
on
Comprehension,
using
the
80
Experience
High
Low
60
Time
Fig. 3. Analysis of Ability on Task Completion Time for UniSa, using the
interaction plot.
Fig. 5. Analysis
interaction plot.
40
20
0
Mo
NO_Mo
Method
Fig. 6. Analysis of Experience on Task Completion Time, using the interaction plot.
33
Fig. 7. Boxplots of the answers to the post-experiment survey questionnaire for UniSa.
Fig. 8. Boxplots of the answers to the post-experiment survey questionnaire for UniBas.
4.5. Discussion
4.4. The results of the post-experiment survey questionnaire
The answers to the post-experiment survey questionnaire of UniSa and UniBas are summarized by means of
boxplots in Figs. 7 and 8, respectively. Overall, we can
observe that the distributions of the answers in both the
experiments are similar. In particular, the participants to
UniSa and UniBas considered appropriate the time they
had to accomplish the tasks in the laboratory trials (the
median is 1 for both experiments). They also clearly
understood both the objectives and the comprehension
tasks they were asked to accomplish: the medians are 1 for
UniSa and 2 for UniBas. A neutral judgment on the
complexity of S1 and S2 was given (3 is the median for
Q4 and Q5 in both the experiments). All the participants
found the use of the UML effective for the comprehension
of source-code (2 is the median for Q6 in both the
experiments).
The effect of Method was significant for more experienced participants. An average improvement6 (or benefit)
of circa 12% was achieved when the participants accomplished the comprehension task with UML class and
sequence diagrams. To accomplish a task with these
diagrams, less experienced participants spent on average
44.8% of the time to accomplish the same task with
source-code alone.
High ability participants achieved a better comprehension of the source-code with respect to low ability participants independent from their level of experience. The
difference between these two groups of participants was
6
Given two values a; b, the mean percentage improvement of a is
computed as a b=bn100. The values a and b are the mean comprehension values achieved by the participants on the systems used in the
experiments.
34
4.6. Implications
We adopted a perspective-based approach to judge the
practical implications of our investigation. In particular, we
based our discussion on the practitioner/consultant (only
practitioner in the following) and researcher perspectives
[34]:
5. Threats to validity
The threats that could affect the validity of the results
are presented here according to the schema proposed in
[22].
5.1. Internal validity
Internal validity concerns the degree to which conclusions can be drawn about the causal effect of the independent variable/s on the dependent variable/s considered
in the investigation:
Interaction with selection: This threat has been mitigated
because each group of participants worked on different
experimental objects with either Mo or NO_Mo. Further,
the participants within each experiment had similar experience with the UML, software system modeling, and computer programming. Additionally, both the kinds of participants
found clear the experimental material.
Maturation: Participants might have learned how to
improve source-code comprehension and how to reduce
the task completion time when passing from the first
laboratory run to the subsequent one. The data analysis
35
showed that the order in which the participants performed these two tasks did not significantly affect the
comprehension on the source-code the participants
achieved and the time to accomplish these tasks.
Diffusion or imitation of treatments: This threat concerns
the information exchanged among the participants, while
performing each comprehension task and when passing
from the first run to the second one. We prevented this in
several ways. The participants were monitored by the
experiment supervisors, who did not allow the participants to communicate with each other. Another issue
could be related to the communication among participants
in different experiments. The participants to UniSa did not
have any opportunity to give information to those in
UniBas because they resided in different regions. Further,
the participants to UniSa were asked to give back all the
experiment material at the end of the experiment.
5.2. External validity
The main issue of the external validity refers to the
possibility of generalizing the results.
Interaction of selection and treatment: The use of students may affect external validity [26,3739]. Threats are
related to the representativeness of the participants as
compared with professionals. However, the participants'
familiarity with the UML, the application domains of the
experimental objects, and the results of the industrial
survey presented in [17] suggest that the participants are
not far from novice software maintainers and junior
programmers. The participants to UniSa were probably
better trained in UML modeling than many senior software
professionals of small medium software companies in
Italy. However, an increasing number of graduates with
such modeling skill is being integrated into the software
industry and should therefore increase UML capability.
Interaction of setting and treatment: In our case, it
concerns with the software systems7 on which the participants were asked to perform the experimental tasks. The
authors were not involved in the realization of the documentation and in the implementation of the system used
in the two experiments. Also, the size and complexity of
the used experimental objects may affect the validity of
the obtained results. The rationale for selecting the used
experimental objects relies on the need of simulating
actual comprehension tasks related to small maintenance
operations that novice software engineers and/or junior
programmers may perform in a software company. Larger
and more complex experimental objects could excessively
overload the participants, thus biasing the results. Nevertheless, it could be also possible that with more complex
and larger objects, the help of UML diagrams may be more
effective. To analyze this issue, different users' studies in
terms of case studies with professionals are needed. The
7
The used software systems (and their documentation) have never
undergone maintenance operations. Therefore, the software entropy can
be considered low within these systems and within their source-code, in
particular. The low level of entropy may positively affect code comprehension [1]. Software entropy is a concern that has never been studied.
Thus, it may represent a direction for our future investigations.
36
Acknowledgments
We wish to thank the Ilaria Bilancia and Michela
Continanza for their precious help in conducting the
experiments. We also would like to thank all the participants in the experiments.
References
[1] M.M. Lehman, Programs, life cycles and laws of software evolution,
Proc. IEEE 68 (9) (1980) 10601076.
[2] G. Canfora, M. Di Penta, New frontiers of reverse engineering, in:
Proceedings of the International Workshop on the Future of Software Engineering, 2007, pp. 326341.
[3] Object Management Group, OMG Unified Modeling Language (OMG
UML), Infrastructure, v2.1.2, Technical Report, OMG. URL http://
www.omg.org/spec/UML/2.1.2/Infrastructure/PDF, November 2007.
[4] D. Budgen, A.J. Burn, O.P. Brereton, B.A. Kitchenham, R. Pretorius,
Empirical evidence about the UML: a systematic literature review,
Software: Pract. Exp. 41 (4) (2011) 363392.
[5] A.M. Fernndez-Sez, M. Genero, M.R.V. Chaudron, Empirical studies
concerning the maintenance of UML diagrams and their use in the
maintenance of code: a systematic mapping study, Inf. Softw.
Technol. 55 (7) (2013) 11191142.
[6] S. Abrhao, C. Gravino, E.I. Pelozo, G. Scanniello, G. Tortora, Assessing
the effectiveness of sequence diagrams in the comprehension of
functional requirements: results from a family of five experiments,
IEEE Trans. Softw. Eng. 39 (3) (2013) 327342.
[7] L.C. Briand, Y. Labiche, M. Di Penta, H. Yan-Bondoc, An experimental
investigation of formality in UML-based development, IEEE Trans.
Softw. Eng. 31 (10) (2005) 833849.
[8] V. Basili, F. Shull, F. Lanubile, Building knowledge through families of
experiments, IEEE Trans. Softw. Eng. 25 (4) (1999) 456473.
[9] F.J. Shull, J.C. Carver, S. Vegas, N. Juristo, The role of replications in
empirical software engineering, Empir. Softw. Eng. 13 (2) (2008)
211218.
[10] G. Scanniello, C. Gravino, G. Tortora, Does the combined use of class
and sequence diagrams improve the source code comprehension?
results from a controlled experiment, in: Proceedings of the International Workshop on Experiences and Empirical Studies in Software Modelling, ACM, New York, NY, USA, 2012, pp. 2530.
[11] E. Arisholm, L.C. Briand, S.E. Hove, Y. Labiche, The impact of UML
documentation on software maintenance: an experimental evaluation, IEEE Trans. Softw. Eng. 32 (2006) 365381.
[12] W.J. Dzidek, E. Arisholm, L.C. Briand, A realistic empirical evaluation
of the costs and benefits of UML in software maintenance, IEEE
Trans. Softw. Eng. 34 (2008) 407432.
[13] M. Staron, L. Kuzniarz, C. Wohlin, Empirical assessment of using
stereotypes to improve comprehension of UML models: a set of
experiments, J. Syst. Softw. 79 (5) (2006) 727742.
[14] M. Genero, J.A. Cruz-Lemus, D. Caivano, S.M. Abraho, E. Insfrn, J. A.
Cars, Assessing the influence of stereotypes on the comprehension
of UML sequence diagrams: a controlled experiment, in: Proceedings of Model Driven Engineering Languages and Systems, Lecture
Notes in Computer Science, Springer Berlin, Heidelberg, 2008, pp.
280294.
[15] F. Ricca, M. Di Penta, M. Torchiano, P. Tonella, M. Ceccato, How
developers' experience and ability influence Web application comprehension tasks supported by UML stereotypes: a series of four
experiments, IEEE Trans. Softw. Eng. 36 (1) (2010) 96118.
[16] J. Conallen, Building Web Applications with UML, 2nd edition,
Addison-Wesley Publishing Company, Reading, MA, 2002.
37
38
comprehension, in: Proceedings of Conference on Software Maintenance and Reengineering, IEEE Computer Society Washington, DC,
USA, 2013, pp. 367370.
[46] M. Genero, M. Piattini, M.R.V. Chaudron, Quality of UML models, Inf.
Softw. Technol. 51 (12) (2009) 16291630.
[47] A. Nugroho, B. Flaton, M.R.V. Chaudron, Empirical analysis of the
relation between level of detail in UML models and defect density,
in: Proceedings of Model Driven Engineering Languages and Systems, Lecture Notes in Computer Science, vol. 5301, Springer,
Heidelberg, 2008, pp. 600614.
[48] K. Beck, Extreme Programming Explained: Embrace Change, Addison-Wesley, Boston, USA, 1999.