Requirements Tracing in A Distributed Online Collaboration System
Requirements Tracing in A Distributed Online Collaboration System
Submission
Institute of Software
Systems Engineering
Thesis Supervisor
online collaboration
system
Master Thesis
to obtain the academic degree of
Diplom-Ingenieur
in the Master’s Program
Computer Science
JOHANNES KEPLER
UNIVERSITY LINZ
Altenbergerstraße 69
4040 Linz, Austria
www.jku.at
DVR 0093696
Statutory Declaration
I hereby declare that the thesis submitted is my own unaided work, that I have not used
other than the sources indicated, and that all direct and indirect sources are acknowledged
as references.
Kurzfassung
Das Tracing, also das Verfolgen von Requirements zu deren Implementierung ist äußerst
wichtig, besonders in umfangreicheren Softwaresystemen. Diese Traceability-Verbindungen
helfen nicht nur dabei, das System zu verstehen, sondern sind auch in Kosten- und Auf-
wandsschätzungen sowie bei der Wartung der Software von Vorteil. Diese Verbindungen
von Hand zu erzeugen, ist jedoch aufwendig und sehr ermüdend. In dieser Arbeit werden
verschiedene Ansätze mit deren Vorteilen und Nachteilen gegenüber anderen Prozessen
beschrieben. Außerdem wird ein methoden-basierter Algorithmus zur Validierung und
Prognose von Traces vorgestellt, der verwendet werden kann, um fehlende Traces oder
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Outlook and goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1 Motivation
The creation of those traces, however, is mainly a manual task which can be supported by
automated processes. This can only be applicable for requirements that are well defined
or formal enough. One such process, more specifically information retrieval, will be
described in chapter 3. There, code and requirements are processed and normalized to
find similarities between a specific requirement and parts of the software. The two metrics,
precision and recall, heavily depend on specification of requirements as well as naming of
methods, variables, number of comments etc. [2]. The quality of these traces, whether they
are obtained by the original developers or retrieved by automated processes, however,
In the author’s experience, even in a small team (less than five developers), conventions
are required to be established so that any form of requirements tracing can be done. It was
therefore defined in the author’s firm, that for instance every commit to the versioning
system needs to contain the ID of the feature / requirement that is introduced or changed
with the commit. Additionally, the requirement or feature descriptions need to have
a specific level of detail in them and at least define, which parts of the software they
will change. This in addition with comments in code and strictly obeying these commit
message conventions enables us as developers to understand, why something changed in
code and what was changed due to which requirement. Being able to understand from
the commit logs which feature influences which parts of the code can only be beneficial to
developers new to the software.
Different techniques to retrieve candidate links for requirement to code traceability will be
introduced in this work and their methods as well as their limitations will be discussed.
This includes a possible framework for a (semi-)automated traceability link creation
process, information retrieval as an approach to generate candidate links for existing
software, as well as deep learning as another approach to create an automated solution.
The implementation in the DesignSpace enables these tracing capabilities to be used in all
forms of software in a large scale collaborative environment. The practical application can
be in validating traces and ensuring that parts of a code perform specific operations such
as the trace to a requirement suggests. Errors may be perceived and found, when a part of
a code contradicts its predicted trace value to a requirement.
1.3 Structure
Chapter 4 refers to deep learning as the possible solution to improve automated ap-
proaches for traceability link retrieval. Domain knowledge is identified as an important
aspect when trying to generate traceability links.
In chapter 6, an adaptation of the algorithm described in chapter 5 is presented and its im-
plementation in the DesignSpace [4] [5] is depicted including some practical examples.
Chapter 7 contains the conclusion as well as the performance evaluation for the algorithm
in the DesignSpace.
The following series of research questions was concluded for this thesis:
2. Which short-comings result from automated approaches for traceability link cre-
ation?
In this chapter, the term of traceability is introduced and its usefulness in software de-
velopment as well as obstacles for the recovery of traceability links are identified. A
framework proposed by Tsuchiya et al. is presented as one solution to the problems [6].
That framework is a semi-automated approach to recover requirement to code traces.
Traceability should and does have an important part in a software life-cycle. To empathize
why it is so important, Winkler et al. gathered important application scenarios in their
work [8]. Some of these are briefly depicted here:
• Validating artifacts
Traceability can be vital in detecting implementation defects in regard to the specifi-
cation and its requirements. An example for this can be seen in section 6.7. There
a requirement is violated as the code does not perform the actions as specified.
Traceability also supports an early detection of inconsistencies as well as incomplete
implementations.
• Testing
In the scenario of software testing, the creation of appropriate test data and specifi-
cations using the requirements is supported by trace links. This is especially vital to
receive a high coverage value.
• Improving changeability
Not only time and cost estimations can be improved by traceability, the changeability
and maintainability of software artifacts is also improved as the change impact of
requirements is made observable.
• Monitoring progress
Monitoring of requirements and their implementation state progress is enabled by
traceability. This can be used to observe which requirements are planned, imple-
mented, tested and ready for roll-out.
Ambiguous traceability links are a risk to software development as they influence esti-
mations and changes in code and requirements. The same applies to maintainability of
a software if documents or traceability links are not updated after adding or removing
features and therefore requirements. However, developers tend to disregard servicing
these links either due to management costs or because the advantages of maintaining
these links are not recognized [6]. It is shown in many cases that recovering and capturing
requirement traces is an intensive task [1]. However, any recovered trace link is vital to
support software engineering and help understanding a software system.
In case of [6], the so-called configuration management log offers information about changes
in the system. In figure 2.1 such a change affecting one file can be observed including the
message and any files affected. It was observed in the configuration management log that
messages in the log will contain words that are related to requirements, in case of figure 2.1
the word is "XML" which belongs to the requirement ’Running tests in Automated mode’
[6], as it is the only requirement that yields output in XML format. These words combined
with the file paths allow link recovery without needing to adhere to the notation. This also
enables the recovery of non-explicit traceability links as long as only one file is changed
by a revision. However, any commits changing more than one file may unintentionally
link all of these files to a requirement if the message can be linked to a requirement. To
overcome this obstacle, revisions are classified into different types using commonality and
variability analysis (CVA).
CVA is used to differ between commonly used features or specific features of a product. As
it can be observed in figure 2.2, shared features of requirements are also in the overlapping
section of the diagram while specific features like "Lookup" and "Activation" are dedicated
to their products. The vector space model mentioned in [6] and [9] is used to determine
the similarity of sentences of a requirement. A sentences’ content of "valid words" defines
the direction of the vector.
The framework proposed by Tsuchiya et al. uses the configuration management log and
CVA [6] to recover traceability links (see figure 2.3). The inputs for the framework are
requirement documents and the source code including the configuration management log.
Recovery of traceability links is done by finding revisions whose messages are related to
requirements, these links are additionally refined by using CVA before recovering them.
The process is divided into the following seven steps:
1. CVA of requirements
2. CVA of code
3. Keyword setting
Keywords are set for each requirement, term frequency and inverse document
frequency (TF-IDF) ([6] [2]) is used to support this.
4. Classification of revisions
Classification is based on affected number of domains (e.g. number of files changed
or number of domains affected).
In steps one and two, CVA is used to find shared parts of requirements or code. In terms
of code, the similarity between code is used against a threshold to determine if that code
is shared in some products or dedicated to one.
Revisions are classified in step four to prevent linking unrelated requirements to code if
e.g. more than one file is modified by a revision. Three different types are obtained based
on the number of domains they affect. The easiest case is type A, here only one file is
changed in a revision and links recovered from this type are of high reliability (example of
such a log in figure 2.1). Type B is consisting of revisions where the number of domains
stays below a defined threshold. The threshold is set by the users of the framework and
should be set in relation to the number of type A revisions found. If that number is low
in comparison to the total number, then the threshold needs to be adjusted in upwards
direction whereas if the number of type A revisions was high, the threshold can be set
to a low value to keep a high level of reliability of the traceability links recovered. If the
number of domains is above the threshold, a type C revision is obtained which will not be
considered in any of the following steps as such is likely to cause false-positives.
The traceability links are recovered in step five by using the keywords from step three
and the classified revisions from step four. Five different traceability link types are used
depending on where a requirement or a component belongs to (see figure 2.4). These five
The automated refinement process from step six is done in the following sequence:
4. Elimination of false-positives
Any link of type 4 or 5 is a hint to a false-positive as the products differ, where a
requirement or a component is belonging to. These links are removed from the
resulting trace matrix.
In the final step, a manual refinement process is performed by developers to check the
validity of the before recovered traceability links. The first task is to obtain requirements
from the trace matrix which trace to a high number or range of components. The key-
words for this requirement are checked for frequent terms or terms shared with other
requirements and if necessary changes are found, the whole process is restarted from step
3 after adopting the keywords for this requirement. The second task is to review links
where the relationship is not easy to understand in the first case. Here, the revision log
for the traceability link needs to be reviewed. If they are confirmed by the developers, a
non-explicit traceability link was successfully recovered.
One of the advantages of the proposed framework in [6] is that it can recover non-explicit
traceability links too. Additionally, the method is robust against different identifiers for
requirements and code when recovering traceability links. Another advantage is that
using thresholds as well as well-defined keywords for the requirements, significantly
influences the precision and recall values (for the definition see chapter 3, [2] [9]).
In the experiments in [6] it was concluded that about 30 % of the known links were not
recovered after considering false-positives. The more critical effect of the used approach is
that only modified code can be recovered by the framework. That means that if code is
for instance reused in another software, the only change information may be the initial
commit containing that code. Also, the method heavily depends on well-written log
messages in the commits and revisions.
Information retrieval [2] [10] [11] based traceability recovery tools tend to use similarities
between artifacts to suggest candidates that may create links [12]. Or, as De Lucia et
al. state, such a tool ’[...] compares a set of source artefacts against another set of target
artefacts and ranks the similarity of all possible pairs of artefacts’ [13]. Usually, the
document databases tend to be exhaustive wherefore different approaches are required
to receive candidate links between requirements and source code [2]. Depending on the
parameters set for any approach, a threshold needs to be found that limits the number of
false-positives while still recovering the relevant traceability links.
The basic approach of information retrieval can be seen as a search operation [14]. Given a
query, documents or information are expected to be received. Some of these queries will
not return all requested information, this is the classical problem of synonymy [15] [16].
An example would be when querying for "car", documents referencing to "automobiles"
will not be returned. Another problem is called polysemy [15] [16], where a query is
expected to return documents from a specific field but also non-related documents are
returned (search for "yacht", but also receive documents about "space ships"). These
are problems often encountered with information retrieval, as many approaches tend to
simply perform word matching strategies to determine the rank of relevance [14]. As a
method to overcome these and other related problems, latent semantic indexing (LSI) is
proposed. Latent means "hidden" in this case, the objective is to capture latent concepts
for terms in documents and queries.
Antoniol et al. suggest vector space information retrieval as an addition to the well
known probability based approach [2]. Using two case studies, the introduced method
is verified by applying it to selected data and measuring the performance. Specifically,
a vector space model (VSM) for information retrieval is applied to rank the available
text documents against implementation artefacts. The process described in [2] uses the
following procedure to extract the required information from the text documents as well
as the code artefacts in question.
The documents are prepared by creating an index based on an extracted vocabulary which
is created by normalizing the text sections. This is performed by the following steps:
3. Using morphological analysis, any plural words are converted to their respective
singular form as well as all verbs are brought to their basic form from any bent
transformations (by time, case, etc.).
The source code artefacts are processed as well to create index-based queries for each class
in the code according to the following three steps:
2. Any identifiers which are composed of two or more words are separated into single
words (i.e., AmountDue or amount_due).
3. Finally, the resulting data is processed in the same way as described for text docu-
ments using normalization to support indexing.
Each class is then assigned a ranked list of documents based on a classifier which is
computed by the similarity between source code queries and text documents. According
to [2], the particular information retrieval model which is adopted, significantly impacts
the indexing process of the source artefacts and text documents as well as the ranking
process between them. Using the probabilistic model referenced in [2] and [9], documents
are indexed by computing their stochastic language model, while source code artefacts
are not indexed at all. To obtain a similarity link between document and code artefact, the
product of the probabilities, that each identifier in a query appears in a document as well,
is used.
Vector space information retrieval is introduced as a model that creates a vector for
each document and each code query. TF-IDF [2] [6] is used to calculate the term value.
This method is later compared against an established probabilistic information retrieval
model.
The process to analyze source code was partly automated using a developed toolkit [2]
which can interpret C++ and JAVA sources by traversing parse trees. For every class, its
comments, attribute identifiers, methods and their parameters are extracted and stored
to support files. The identifier separation process consists of two parts, the first part
automatically separates words which are concatenated by underscores or uppercase letters
inside a word. In the second part, software engineers are prompted suspected joined words
which are qualified by spelling facilities to query their decision to separate the words. The
other steps in the source code processing operation were completely automated.
The case studies mentioned in [2], use the probabilistic information retrieval model by
applying the two widely used metrics precision and recall. Recall is defined as the ratio of
the number of relevant documents retrieved for a code query over the total number of
relevant documents for that query. Precision describes the ratio of the number of relevant
Using the same metrics, the vector space model is applied, which resulted in a similar
result as with the probabilistic approach. The vector space model requires less preparation
effort for the query and document representations and tends to more regular results while
the probabilistic model yields faster results when a high value for recall is not relevant [2].
The results are compared against a brute-force link creation attempt using the tool "grep"
to empathize the advantages of a more sophisticated approach, which resulted to be in
favor of the information retrieval approaches.
The process introduced in chapter 3.2 is a "one-shot" process which means that it is
commonly used one time with a fixed threshold that then produces a list of candidate
links. This usually sorted list contain have the well-traced links in the upper region
having a high similarity while the lower region tends to include a significant number
of false-positives [13]. The threshold therefore influences both precision and recall. To
overcome the limitations of such a "one-shot" process, an incremental process is proposed
that adopts that threshold. Starting with a high value for the similarity threshold, it is
incrementally decreased during the process which gives developers more control over the
number of correct links and false-positives. Applying the incremental process can result in
a lower number of tracing errors while reducing the effort to recover a significant number
of traceability links. This still does not result in a complete traceability matrix, this can
only be achieved when the links are manually analyzed.
To tackle the problems with precision and recall that result from information retrieval
methods, several improvements are suggested for the different kinds of methods.
The vector space model can benefit by using key-phrases [12], thesauri [19] [12] and the
"pivot normalization weighting process" [19] to retrieve the candidate links. The idea
behind key-phrases is incorporating a list of technical terms into the documents which
leads to a reduction of the number of false-positives found and increases precision. An
advantage in this case is that design documents usually contain a section with acronyms
and definitions which can be reused. The thesaurus in this case is a list where each entry is
a triple containing two words and a perceived similarity coefficient [12]. When calculating
the similarity, the thesaurus is included when performing the steps in chapter 3.2. This
can improve recall as words otherwise not related can now be linked using the thesaurus.
Pivot normalization includes the length of documents when calculating the similarity so
that longer documents (which will have more words and more repetitions of the same
word) are not necessarily preferred.
Several solutions have been proposed and developed to automatically create and maintain
traceability links. Some of these include techniques like information retrieval [2] [12] [13]
[20], machine learning [21] [22] and heuristic techniques [23]. Considering the requested
levels of precision for the results of these approaches, they perform below expectations
with large data-sets as they occur in industrial sized applications when trying to achieve
high levels of recall (over 90 %) [24]. A cause for this is often a mismatch of terms
or wordings between pairs of related artifacts. Especially when domain knowledge is
required to find traceability links between requirements and code, solutions using term
similarity to create those links will fail to recognize them.
Guo et al. propose a solution that uses deep learning to incorporate requirement artifact
semantics and domain knowledge to improve the recovery process [24]. Using this, they
aimed to construct a scalable, portable and fully automated solution. The process is
separated into two phases.
In the first phase a collection of word embeddings [24] for the specific domain is learned.
This is performed by executing an unsupervised learning approach which was trained
over numerous documents set in that domain. Each word is assigned to its distributional
semantics and co-occurrence statistics as a high-dimensional word vector.
The second phase consists of training the tracing network by using an existing training
set of already validated trace links taken from the specific domain. This trained tracing
network was adopted from a recurrent neural network (RNN) architecture, which will be
used to predict the likelihood of a trace link. The idea behind RNN [24] [25] is to generate
the output state not only by using the input for it but the output from the previous state
as well, this is referred to as "hidden state" or "hidden input". The word vectors created in
phase 1 are used as input which is sequentially fed into the tracing network which yields a
vector containing semantic information for an artifact as the final output. These semantic
vectors are compared pairwise to calculate the probability that two artifacts are linked.
Deep learning techniques yielded successful applications in the field of natural language
processing (NLP) tasks including e.g. parsing and sentiment analysis [24]. Leveraging
either supervised or unsupervised learning techniques, language can be transformed to a
usable representation for NLP tasks which are described in the following.
As aforementioned, word embedding is used to encode information for words which are
represented in a continuous high dimensional vector. Syntactic and semantic relationships
can be represented as linear relationships between these vectors so that similar words
tend to cluster in vector space. This is considered as one of the the primary reasons for the
success of deep learning in NLP tasks, especially in comparison to information retrieval
techniques, where associations among words are not taken into account (they are treated
as atomic symbols). Guo et al. utilize the skip-gram [24] [26] model for word embedding.
It is ’an efficient method for learning high-quality distributed vector representations that
capture a large number of precise syntactic and semantic word relationships’ [26], which
selects context windows from the training text that are scanned to train the prediction
models [24]. The probability of target words appearing near "center" words in a window
of a specified size is maximized while trying to limit the probability of random words to
occur in the same range to a minimum.
Recurrent neural networks (RNN) are considered to be a good fit for natural language
processing (NLP) as they are well-suited for tasks that involve processing sequential data
like text or audio [24]. The output of a RNN is both depending on the input as well as
the "hidden state" from a previous time step in the same network. This serves as a kind
of "memory" and enables a RNN to process sequential data of an arbitrary length. A
drawback for a standard RNN is that it is difficult to train when long dependencies exist
in the sequence. This is caused by degradation of the network as gradients explode or
vanish during the back-propagation process of the model [27].
Several adaptations have been discussed to limit these effects. Exploding gradients can be
scaled down when its norm is bigger than a threshold value [27]. To reduce the effect of
vanishing gradients, the RNN can be adopted to support the preservation of long-term
dependencies that would otherwise degrade eagerly [24].
Two solutions for the vanishing gradients problem were compared against each other,
namely long short term memory (LSTM) and gated recurrent unit (GRU). They both
include a so-called "gating mechanism", which controls whether information should be
written in a unit of the recurring system. The gate basically determines the ratio of how
much information should be used from a hidden input or the current input. While the
GRU has a simpler internal structure to control the gating mechanism it performs with a
comparable performance as LSTM [24].
The resulting tracing process consists of four steps which are repeated for all source
artifacts:
In figure 4.1, the resulting design of the tracing network is depicted. First, word em-
beddings are learned from a domain corpus which are then processed with the RNN.
In the presented approach, both LSTM and GRU are used and later compared as RNN
4.2 Results
The results from Gao et al. suggest that GRU with a bi-directional configuration performs
best for the task of automatically generating traceability links [24]. Bi-directional in terms
of RNN means that not only past results are included in the current evaluation (back-
propagation), but also future data is allowed [25]. In the specific application, the word
vectors are also sent in reverse order to the RNN after the word embedding process to
use a RNN as bi-directional RNN. The results of the network using GRU in bi-directional
In this chapter, a project called TraceabilityCDG 1 will be introduced and its capability to
validate and predict trace values is described. The basis for that project was a previous
work by Achraf Ghabi [3]. This project was not realized by the author but was adopted to
introduce method based and class based trace prediction in the DesignSpace [5], which
will be described in chapter 6. In [3], an algorithm is used to decide if a requirement traces
to a method. Requirement to code trace matrices (RTM) are introduced as structure to
hold information about each method and requirement whether any given requirement
traces against a method or not. These data are called "trace link" and are defined in each
cell of the RTM. In the provided examples in this thesis, the columns of a RTM will always
be requirements, while rows are either methods or classes of provided code. The aim
of [3] was not to create an oracle to predict or generate traces but to emphasize unlikely
situations of traces (which are likely to be errors). Additionally the trace values for the
surrounding methods do not need to be fully correct as long as they are complete.
The next sections will describe the patterns which will be utilized to predict traces as well
as the existing method based trace algorithm from TraceProcessor which is then adopted
to class based tracing.
A hypothesis was formed in [3] stating that a tracing method is expected to be surrounded
by other tracing methods. If a method has neighboring methods like callers (calling
the method in question) and callees (other methods called by this method) and these
neighboring methods trace to a given requirement, chances are high that the examined
method also traces to that requirement. This analogically applies to a method’s callers
1 https://github.com/jku-isse/TraceabilityCDG
The hypothesis was supported by observations made in [3]. A metric called "connected-
ness" was used to determine ’the percentage of methods that are directly connected to
at least one other method implementing the same requirement (where connected means
having a caller or callee).’ [3]. The metric was applied to four different projects having
between 3 to 72 kilo lines of code (KLOC, kilo means thousand in this case) and a total of
59 requirements (see table 5.1).
Table 5.1: Percentage of connected trace/no-trace methods. Data taken from [3].
To better analyze the metrics the methods were distinguished between "inner-nodes" and
"leaves". The distinctions for methods and classes will be described in the next sections.
Given the data obtained in [3] which is visible in table 5.1, a "connectedness" value of
88-99% was found for "inner-nodes". The value for "leaves" was determined to be in
the range of 59-79%. The better value for "inner-nodes" can be explained from the fact
that in case of methods callers as well as callees influence this value, while "leaves" can
only depend on their callers. Additionally the values for "no-trace connectedness" are
significantly higher than for "trace connectedness". This results from the higher number of
"no-trace" links compared to trace links. Resulting from this observation the implication
arises that a high value of "connectedness" for a method tracing to a requirement results
in a high likelihood of finding a trace in another neighboring method for that requirement
[3].
Another observation that was made in [3] further supported the hypothesis of "sur-
roundedness". After grouping the methods according to their tracing relationships for a
requirement and adding calling relationships, regions of connected methods tracing to the
same requirement were found. These regions were found to form a continuous chain of
connected methods.
The first group of basic patterns that were obtained in [3] are called "surrounding patterns".
These patterns can be applied to "inner-nodes" as they have callers as well as callees (when
using methods) and the "surrounding patterns" make use of caller and callee relationships.
A "t-surrounding method" is a method that has both a caller and a callee that is tracing to
the same given requirement. Analoguous a "n-surrounding method" is one that has both a
caller and a callee that is not tracing to the same given requirement. These relationships
can be denoted as patterns. A pattern for a "t-surrounding method" is "T-x-T" for instance,
"x" in this case is the method that is examined. Same applies to "N-x-N" describing a
pattern for a "n-surrounding method". The likelihood for a "t-surrounding method" to
trace to a given requirement was found to be between 61 to 96% while the likelihood for a
"n-surrounding method" to not trace to a given requirement was observed to be between
86 to 96% [3]. Both value ranges depend on the software, its size and the number of
requirements. The given value ranges are extracted from the data and software described
in [3] and table 5.2.
Table 5.2: Likelihood for different calling relationship patterns. Data taken from [3].
The "surrouding patterns" can only be used for "inner-nodes" leading to another group of
patterns to be introduced, the so-called "leaf patterns". These can also be found in the table
5.2. The likeliness value for these patterns is significantly reduced in comparison to that of
"surrounding patterns". For "t-leaf methods" having a pattern of "T-x", the likeliness value
is ranged from 34 to 76% and for "n-leaf methods" with a pattern of "N-x", the value is in
the range from 79 to 94%. Both of these patterns only consider either one caller or one
caller and a callee for a method’s trace value. As a method can have an arbitrary number
The patterns were only considering "pure relationships" so far, meaning that in case of sur-
rounding patterns there was only one type of surrounding trace value. For "t-surrounding
methods" or "n-surrounding methods" the neighboring methods only had trace or no-trace
relationships but they were not mixed. If the trace value of the callers differs from the
trace value of the callees of a method, we receive "boundary patterns". Examples for such
patterns are "T-x-N" or "N-x-T". For such patterns it is not possible to obtain a predicted
value, the prediction remains to be undefined, or "U". The aforementioned extension of
the patterns means that when considering more than one possible caller or callee for a
"surrounding pattern", these boundary patterns can be handled. A property called "T
over N dominance" is used for these cases [3]. If for instance a method has callers that
are tracing as well as not tracing to a requirement and also callees which are tracing as
well as not tracing to the same requirement ("TN-x-TN"), then the "T" or tracing value
dominates the "N" or not tracing value. An example for this can be seen in table 5.3. There
the method "m5" has the methods "m1" and "m3" as callers and the methods "m2" and
"m4" as callees. Due to the dominance of T over N, the prediction value for method "m5"
is received as "PT".
requirements
RTM
r1
m1 T
m2 T
methods m3 N
m4 N
m5 x ->PT
Using all this information, the following patterns were created for TraceabilityCDG. These
patterns were also introduced in the DesignSpace.
These inner patterns are used for inner methods, also referred to as "inner-nodes". These
patterns will also be reused for inner classes (see section 5.5). These patterns are listed
together with their trace value in table 5.4.
Leaf patterns are used for leaf methods and leaf classes. The table 5.5 shows all leaf
patterns with their predicted trace value.
Root methods and root classes utilize root patterns which are depicted in table 5.6 with
their prediction.
Using the patterns described, the resulting algorithm is depicted in simplified form in
figure 5.1. Given the existing method relationships and a RTM containing existing trace
values for the methods "m1", "m3", "m4", "m5" and "m6", prediction values are received
from the algorithm via the created patterns. In chapter 6 it is further explained how the
patterns are received for a method of a given type.
As aforementioned, method relationships and the trace values of surrounding methods are
used in form of patterns to validate and generate trace values for any method. If a visited
method is for instance a leaf method (only has callers), then a pattern will be created
with information for its callers whether there are any tracing, not tracing or undefined
relationships, and using this pattern the trace information is obtained for the method in
question. The patterns which are used are defined for five different types of methods:
• Inner methods
• Leaf methods
• Root methods
• Isolated methods
• has no callees
• has no callers
Any methods that do not have any callers or callees (isolated) or cannot be categorized
into any type from above (not applicable) will have a predicted trace value of undefined
(PU).
The work in [3] and the TraceabilityCDG project allows only methods to be validated and
predicted. The author therefore adapted the work from TraceabilityCDG to further enable
class based tracing. To get a similar relationship between classes as it can be done with
methods (caller / callee), the class scope will use super-class / sub-class and interface /
implementation affiliations for the predictions. The result of the implementation in the
DesignSpace will be visualized in section 6.6. In the context of classes, a caller in the
method scope would be either the super-class or any interface of a class, this will be called
parent in the following. Analogous, any sub-classes as well as implementors (consumers
of a given interface) of a class will act as callees in the method scope, these are denoted
• Inner classes
• Leaf classes
• Root classes
• Isolated classes
• has no children
• has no parents
In this chapter, the DesignSpace [5] is described and the integration of the trace validation
and prediction algorithm (as described in chapter 5) into the DesignSpace will be shown
in the form of examples.
The DesignSpace is ’a cloud infrastructure for engineering knowledge and service’ [4]
and was developed at Johannes Kepler University Linz. This environment can be used to
connect single space applications into a collaborative system and link artifacts together
that are otherwise elaborate to join. The DesignSpace can be used to check consistency of
elements, propagate values across systems and different tools as well as repair inconsistent
states for different types of models. The DesignSpace consists of a server which can be
extended by modules to enable the aforementioned functionalities. It enables different
software, models and data to be used in a cloud-based work area. Shared work areas can
be created so that multiple users can cooperate on projects. This works with virtually
every data as not only the data and model can be stored in the DesignSpace but also
meta-models for any tool. The server handles any updates / changes / additions to the
data so that users connected to it can receive changes to the data.
The meta-models used for the traceability functionality in the DesignSpace are stored in
form of instance-types and property-types.
A property-type contains information about the structure of a property and consists of the
name, the type of value it stores as well as the cardinality of that data.
A special kind of property-types are "opposable" property-types. These combine two
cardinalities to link the belonging instance-type to another instance-type according to
those cardinalitites. Additionally, on both instance-types the opposing property-types
are available and setting or changing a value on one side of the "opposed" property-type
will be reflected on the other side, too. An example for that are "1:n" (see "opposable"
property-type "parentClass/methods" in table 6.6) or "m:n" (see "opposable" property-type
"callers/callees" in table 6.7) relationships between instance-types.
In the following sections, the meta-models used to support traceability in the DesignSpace
are described. A property-type without "opposable" (opp.) properties is referred to as
"simple" in any of the following tables, the instance-type in a table always refers to the
instance-type the property links to. For "opposable" property-types this means that the
own instance-type is implicitly that of the belonging instance-type while the "opposing"
instance-type is given. Cardinality is abbreviated with "C" in the tables.
In the following list, some "basic" instance-types can be found, which are used in the
meta-models for traceability:
The different cardinalities that are presented in this work, are SINGLE, SET and LIST. A
SINGLE cardinality means that there is a "1:1" relationship between the property-type and
the target instance-type. A SET can be understood like the JAVA structure Set, a collection
without duplicates, while LIST allows duplicates in its collection.
neighboring methods which are used in the trace validation and prediction algorithm, the
remaining property-types are not consumed in it.
Using the project TraceabilityCDG and the information from chapter 5.4, the author
integrated tracing capabilities into the DesignSpace.
The DesignSpace uses trace matrices to model tracing information between instances of
two arbitrary instance-types. This leads to two different trace matrices to either store
requirement to method traces or requirement to class traces. A trace matrix is initialized
with the two different instance-types and their respective folders in the DesignSpace cloud
system, either of these instance-types will be represented on one edge of the matrix. The
requirement instances will always be on the column edge of the matrix while method or
To load tracing information, existing defined data can be supplied using several JSON
(java script object notation) files having the correct structure, this can be used to load “gold
traces” which are either obtained by analysis like presented in the first chapters or defined
by the developers of a software. The contents of the JSON files contain information about
requirements, classes, methods, calling relationships, "gold traces" between requirements
and classes or methods and much more. All of that information is interpreted and stored
in the DesignSpace using the aforementioned meta-models. Tracing data can also be
created by adding code to the DesignSpace, introducing requirements and then connecting
them using the user interface (see chapter 6.5).
The integrated event-based update functionality of the DesignSpace handles all changes
to any data or models and meta-models. The tracing evaluation was integrated into that to
immediately obtain updated trace validation and prediction values. Based on the type of
a change, different operations are performed: Whenever a trace value is changed, then the
corresponding trace matrix is determined and all methods that are registered in the matrix
are re-evaluated again to get any changed trace values by the algorithm. Any method
removing or adding any of its callers or callees is handled by getting any requirements
tracing or not tracing to that method and the algorithm is conducted again for each of
these requirements and all methods (as new link chains may have been created). Classes
are handled in an analogous manner, any changes in super-class or sub-class relationships
as well as any changes in interface / implementors relationships cause the algorithm to be
executed using the changed relationships.
To visualize any results of the trace validation and empathize on problems, a color-scheme
was created for each element of the matrix shown in table 6.8. Problematic differences
between a defined and a predicted value will be displayed in red (#FF0000). An example
for this can be seen in figure 6.7.
6.4 Implementation
After the according trace matrix was found, all requirements that are impacted by the
changes are queried from the RTM to perform the prediction algorithm for each of them.
In case of a "traces" or "notTraces" "propertyUpdate" event this is usually one requirement
where a trace or no-trace relationship was added or removed to a method or class. If
a method’s "callers" or "callees" property-type is changed (adding or removing callers
or callees) then all methods need to check, if they can update traceability links to any
requirements involved in these methods. This analogously applies to classes that update
their "superClass", "subClasses", "interfaces" or "implementations" property-types.
Each class or method loads its surrounding methods or classes by using the according
property-types.
• Load all callers of the method and store them to the list "methodCallers".
• Load all callees of the method and store them to the list "methodCallees".
• For each entry in the list "methodCallers", get its callers and store them into the list
"methodCallersOfCallers".
• For each entry in the list "methodCallees", get its callees and store them into the list
"methodCalleesOfCallees".
• Load the super-class of the class and add it to the list "classSuperClasses".
• Load the sub-classes of the class and add them to the list "classSubClasses".
• Load the implementations of the class and add them to the list "classSubClasses".
• For each entry in the list "classSuperClasses", get it’s super-classes (super-class and
interfaces) and store them to the list "classSuperClassesOfSuperClasses".
• For each entry in the list "classSubClasses", get it’s sub-classes (sub-classes and
implementations) and store them to the list "classSubClassesOfSubClasses".
After the surrounding methods or classes are found, each method type or class type (inner
/ root / leaf / isolated / other) can be found according to their definition in chapters 5.4
and 5.5. The patterns are built as following:
• Inner
Each parent instance is checked for its trace value in the RTM. If there is any oc-
currence of trace (T), no-trace (N) or undefined (U), the parents-pattern adds that
occurrence. The same is performed for each child instance and results in the children-
pattern. The pattern is build in form of "<parents-pattern>-x-<children-pattern>".
– Class - uses the list "classSuperClasses" for the parent instances and the list
"classSubClasses" for the child instances.
– Method - uses the list "methodCallers" for the parent instances and the list
"methodCallees" for the child instances.
• Root
Each child instance is checked for its trace value in the RTM. If there is any occurrence
of trace (T), no-trace (N) or undefined (U), the children-pattern adds that occurrence.
The same is performed for each grand-child instance and results in the grand-
children-pattern. The pattern is built in form of "x-<children-pattern>-<grand-
children-pattern>"
– Class - uses the list "classSubClasses" for the child instances and the list "class-
SubClassesOfSubClasses" for the grand-child instances.
– Method - uses the list "methodCallees" for the child instances and the list
"methodCalleesOfCallees" for the grand-child instances.
– Class - uses the list "classSuperClasses" for the parent instances and the list
"classSuperClassesOfSuperClasses" for the grand-parent instances.
– Method - uses the list "methodCallers" for the parent instances and the list
"methodCallersOfCallers" for the grand-parent instances.
• Isolated or other
Just returns a specific prediction pattern for these cases which leads to an undefined
prediction.
Finally, the defined patterns are queried for a method’s or classes pattern obtained in the
previous step to get an according prediction value (see chapter 5 for details on defined
patterns).
The DesignSpace user interface has been extended by the author to use and show trace
information. At run-time, trace links between requirements and methods / classes can
also be set and reset. To establish a trace relationship, a user can simply click into an
element of the matrix with the mouse. To change that link to a non-trace relationship, a
user can click the element while holding down the "SHIFT" key of the keyboard. Holding
down the "CTRL" key of the keyboard while clicking an element will remove any trace
and change the relationship to undefined.
In the following examples, two requirements and several methods have been created.
Methods are visible as rows in the editor while requirements are depicted as columns. The
methods have the following relationships:
• Method m22, m31 and m32 are all methods that do not call any other methods or
are not called by them (isolated to all other methods in this example).
Using this information, m12 has been predicted to also trace to requirement r1 using the
pattern “T-x-T”. This prediction is observable by “U -> PT” (see figure 6.2) which means
undefined trace has changed to predicted trace.
Figure 6.2: Existing traces for requirement r1 and prediction on method m12
Due to the calling relationship between m11, m12 and m21, m21 has now been predicted
not to trace as well to requirement r2 using the pattern “N-N-x”. The value has changed
to “U -> PN” (see figure 6.3) describing a predicted not trace from an undefined trace.
As a third example, a trace is created between method m32 and requirement r2. As method
m32 is not in any relationship with any other method, it does not change any prediction
value and was in fact before not influenced by other methods either (see figure 6.4).
Figure 6.4: Adding trace for requirement r2 and isolated method m32
In the following examples, two requirements and several classes have been created. Classes
are visible as rows in the editor while requirements are depicted as columns. The classes
have the following relationships:
• Class c3 is super-class of c5
• Class c4 is super-class of c6
Based on this setup, c1 has been predicted to also trace to requirement r1 using the pattern
"x-TN-TN", as can be observed in figure 6.5.
Due to inheritance between c1, c3 and c5 (c1 is super-class of c3, c3 is super-class of c5),
c3 has now obtained a predicted value to also trace to requirement r2 using the pattern
"T-x-T". The value of c3 for requirement r2 has changed to "U -> PT" (see figure 6.6).
Figure 6.6: Adding traces for requirement r2 and creating predicted trace on class c3
In the following example an IntelliJ plugin 2 is used to make trace errors visible in IntelliJ
Idea 3 . This plugin was implemented by peers of the author based on the trace prediction
adoptions in the DesignSpace done by the author. The requirement “Pick up an object” is
not supposed to trace to method “Grapper.releaseObject”, shown on the left side by a trace
value of “N” in figure 6.7. However, implementing the code in a way that the method
releaseObject in Class Grapper uses the close method of the GripperJoint class causes a
change to “U -> PT” which is visible as an error in IntelliJ Idea (see figure 6.7) hinting to
the problem that the release method should not close but open the gripper. This can be
useful to deduct wrong implementations in relation to the requirements.
Figure 6.7: Using the DesignSpace trace prediction in IntelliJ Idea (source: [4])
In the beginning of my thesis I defined three research questions. In the following, I will
answer these questions.
Traceability is a very vital aspect in software development, whether it is due to the scale
of a software system or requirements on security or functionality for that software. It
is defined as ’[...] the ability to describe and follow the life of a requirement, in both a
forwards and backwards direction [...] through all periods of on-going refinement and
iteration [...]’ [7]. Creating traces from requirements to code is often a manual and tedious
task. Risks arise when these trace links are not well-maintained or are missing at all. To
emphasize, why traces are important, some benefits especially for software developers are
listed in the following:
• Validating artifacts - An example for this can be seen in figure 6.7, where an imple-
mentation fault is detected using traceability.
• Testing - Test cases can be improved if the specification is known for a code artifact.
For instance, when limiting possible inputs or finding valid test values to test them.
I will now answer my second question ’Which short-comings result from automated
approaches for traceability link creation?’.
In section 2.3 a framework was introduced which is using requirement documents together
with the "configuration management log" (change-log including information about the
changed files and a message). One of the advantages for that framework is that also non-
explicit traceability links can be recovered which are non-trivial to identify. Experiments
showed that the proposed framework was not able to recover 30% of the known links.
More critical is the fact that due to the method of recovery, only modified code can be
traced back to requirements, as the configuration management log is the basis for it.
In chapter 3 several information retrieval methods for traceability link recovery were
introduced. Basically using similarity between terms or words from code and requirement
documents, the probability that these are links is calculated. Several improvements for the
approaches were also described which help overcoming the basic problems with infor-
mation retrieval processes of synonymy and polysemy. Still, these approaches struggle
to find all correct links without intervention of a developer. Also, the precision of the
recovered links tend to be of low value, even when a high value for recall is obtained.
The answer for my third question, ’Which solutions exist for creating and maintaining
traceability links?’, is as follows:
Each of the approaches for automated traceability link creation introduced in chapters 2 -
4 have their own techniques and advantages as well as disadvantages. The best way in
terms of recall and precision may seem to have developers create all links manually, but
the quality of these traces varies due to their experience and the complexity of the software
[1]. The approaches can support creating and maintaining traceability links. An idea
would be to combine some of them to receive a more complete collection of valid traces.
For instance using information retrieval methods to generate a basic set of traces and then
further process them using a deep learning approach. It remains part of future research to
investigate if this can be the solution of the traceability link creation problem. The method
introduced in chapter 5 can support developers in validating implementations against a
requirement specification and further predicting valid trace links. The results of Ghabi
showed that the algorithm captures traceability links in less than three seconds with a
value of over 90% for precision while having a recall value of about 80% given the projects
mentioned in their paper [3].
7.2 Evaluation
In this section the performance of the algorithm is analyzed in the DesignSpace using the
following four projects:
Each of these programs was imported into the DesignSpace using the JSON import
functionality. The import time and other times as well as the impacts of the algorithm
The import time depends on the project size and the number of requirements. The import
was used to add the code to the DesignSpace using the meta-models mentioned in chapter
6 as well as the so-called "gold traces" (traces defined by the developers or created for the
specific software). The conclude operation transacts the data into the Designspace, in one
case the tracing functionality is turned off which simply results in persisting the imported
data. Concluding the transaction with the tracing algorithm enabled also performs the
prediction algorithm where the average total time as well as the number of trace executions
for a project is measured. The time per trace is the average execution time needed to
perform all predictions for a requirement. The "TraceService" is a class in the DesignSpace
where the trace information is managed, the average proportion of total execution time to
insert predictions and persist them in the DesignSpace was measured per project (1). To
come up with the surrounding methods or classes, the method "getPropertyAsSet" (3) is
heavily used as can be seen in the average percentage of time. The method "addAll" (2)
for collections is used as well during these operations its average percentage of run time
is also documented. The method "calculateTNU" (4) is used to receive the surrounding
In total the algorithm runs in an acceptable time considering that adding a trace using
the user interface will perform all predictions for the requirement concerned in less than
20 milliseconds for small to medium sized projects. During the analysis, some smaller
performance problems were found. One adaptation improved the import performance
by a factor of 4. Other smaller fixes improved the performance especially when querying
collections as properties from the DesignSpace. Compared to the speed in the implemen-
tation of Ghabi ([3]), it still does not perform that well, but the most drawbacks are due to
the meta-model overhead of the DesignSpace and the structure of the data. With over 30
% of average total execution time, the operation to insert predictions (this simply stores
them) in the DesignSpace takes a significant run-time amount of the algorithm. This was
verified, when the functionality to store predictions was turned off, then the algorithm
took about 30 % less time to complete. The performance may further be improved after
analyzing the reasons for the overhead and the bad performance of the insert predicted
trace operation in the DesignSpace, however this remains part of future research and
implementation.
[1] Alexander Egyed, Florian Graf, and Paul Grünbacher. “Effort and Quality of Re-
covering Requirements-to-Code Traces: Two Exploratory Experiments”. In: 2010
18th IEEE International Requirements Engineering Conference. 2010, pp. 221–230. DOI:
10.1109/RE.2010.34 (cit. on pp. 1, 7, 52).
[2] Antoniol et al. “Information retrieval models for recovering traceability links be-
tween code and documentation”. In: Proceedings 2000 International Conference on
Software Maintenance. 2000, pp. 40–49. DOI: 10.1109/ICSM.2000.883003 (cit. on
pp. 1, 9, 13, 14, 15, 16, 17, 19).
[3] Achraf Ghabi. Automatic approach to validating requirement-to-code traces. eng. 2010
(cit. on pp. 2, 3, 24, 25, 26, 27, 32, 33, 52, 54).
[4] DesignSpace | JKU Linz. Dec. 28, 2021. URL: https://www.jku.at/en/institute-
of-software-systems-engineering/research/tools/designspace/ (visited on
12/28/2021) (cit. on pp. 2, 3, 35, 49).
[5] Andreas Demuth et al. “DesignSpace: An Infrastructure for Multi-User/Multi-
Tool Engineering”. In: Proceedings of the 30th Annual ACM Symposium on Applied
Computing. SAC ’15. Salamanca, Spain: Association for Computing Machinery,
2015, pp. 1486–1491. ISBN: 9781450331968. DOI: 10.1145/2695664.2695697. URL:
https://doi.org/10.1145/2695664.2695697 (cit. on pp. 2, 3, 24, 35).
[6] Ryosuke Tsuchiya et al. “Recovering Traceability Links between Requirements and
Source Code in the Same Series of Software Products”. In: Proceedings of the 17th
International Software Product Line Conference. SPLC ’13. Tokyo, Japan: Association
for Computing Machinery, 2013, pp. 121–130. ISBN: 9781450319683. DOI: 10.1145/
2491627.2491633. URL: https://doi.org/10.1145/2491627.2491633 (cit. on pp. 5,
7, 8, 9, 10, 11, 12, 13, 16).