0% found this document useful (0 votes)

23 views

Learning Software Requirements Syntax An Unsupervised Approach To Recognize Templates

The document presents an unsupervised approach to automatically recognize templates from software requirements by extracting their common syntactic structures, without relying on predefined templates. The approach represents each requirement as a graph vertex and detects communities in the graph representing templates. Experiments show the approach can detect standard templates with 0.90 F1-measure and identify syntactic features for non-standard templates in over 73.5% of cases, with results robust to the number and length of requirements. The work aims to address gaps between template-based requirements engineering approaches and real-world projects with less standardized requirements.

Uploaded by

Mushlih Ridho

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Learning Software Requirements Syntax An Unsupervised Approach To Recognize Templates

Uploaded by

Mushlih Ridho

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Knowledge-Based Systems 248 (2022) 108933

Contents lists available at ScienceDirect

Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys

Learning software requirements syntax: An unsupervised approach to

recognize templates
∗
Riad Sonbol a , , Ghaida Rebdawi a , Nada Ghneim b
a
Department of Informatics, Higher Institute for Applied Sciences and Technology (HIAST), Syrian Arab Republic
b
Faculty of Information and Communication Technology, Arab International University (AIU), Syrian Arab Republic

article info a b s t r a c t

Article history: Requirements are textual representations of the desired software capabilities. Many templates have
Received 27 September 2021 been used to standardize the structure of requirement statements such as Rupps, EARS, and User
Received in revised form 4 April 2022 Stories. Templates provide a good solution to improve different Requirements Engineering (RE)
Accepted 27 April 2022
tasks since their well-defined syntax facilitates the different text processing steps in RE automation
Available online 4 May 2022
researches. However, many empirical studies have concluded that there is a gap between these RE
Keywords: researches and their implementation in industrial and real-life projects. The success of RE automation
Requirements Engineering approaches strongly depends on the consistency of the requirements with the syntax of the predefined
Requirements templates recognition templates. Such consistency cannot be guaranteed in real projects, especially in large development
Natural Language Processing (NLP) projects, or when one has little control over the requirements authoring environment.
Syntax learning In this paper, we propose an unsupervised approach to recognize templates from the requirements
Graph community detection themselves by extracting their common syntactic structures. The resultant templates reflect the actual
syntactic structure of requirements; hence it can recognize both standard and non-standard templates.
Our approach uses techniques from Natural Language Processing and Graph Theory to handle this
problem through three main stages (1) we formulate the problem as a graph problem, where each
requirement is represented as a vertex and each pair of requirements has a structural similarity, (2)
We detect main communities in the resultant graph by applying a hybrid technique combining limited
dynamic programming and greedy algorithms, (3) finally, we reinterpret the detected communities as
templates.
Our experiments show that the suggested approach can detect templates that follow well-known
standards with a 0.90 F1-measure. Moreover, the approach can detect common syntactic features for
non-standard templates in more than 73.5% of the cases. Our evaluation indicates that these results
are robust regardless of the number and the length of the processed requirements.
© 2022 Elsevier B.V. All rights reserved.

1. Introduction [9,10]. However, NL does not guarantee clear requirements be-

cause of its ambiguity, incompleteness, and inaccuracy [11]. These
A requirement is a statement that expresses a need and its challenges make NL hard to be analyzed automatically without
associated constraints and conditions [1]. It conveys the expec- imposing restrictions on how requirements are written [12].
tations of users from the software product [2]. Requirements are Controlled Natural Languages (CNLs) have been widely used
processed through different Requirement Engineering (RE) tasks to standardize requirement statements based on predefined tem-
such as requirements traceability [3], conformance checking [4], plates or boilerplates [13]. Templates provide a reasonable trade-
ambiguity resolution [5], and incompleteness detection [6]. The off between ambiguous natural language and unintuitive formal
quality of these tasks plays a major role in the success or the notations [14]. They define the syntactic structure of a require-
failure of software projects [7]; the lack of understanding requirements text and usually include several predefined slots [4]. For
ments may lead to their misinterpretation which consequently, example, the template:
increases the risk of time and cost overrun of the project [7,8].
Requirements are typically specified in a Natural Language The ⟨entity⟩ shall be able to ⟨capability⟩
(NL) since it is usually the only common language between consists of two slots: the first slot describes the potential entity
requirements engineers, customers, and business stakeholders while the second one indicates its capability. Template-based
requirements can be easily treated automatically. It facilitates the
∗ Corresponding author. different text processing steps [4]. Some of the widely used tem-
E-mail address: [email protected] (R. Sonbol). plates are EARS [15], Rupp [16], and user stories templates [17].

https://doi.org/10.1016/j.knosys.2022.108933
0950-7051/© 2022 Elsevier B.V. All rights reserved.
R. Sonbol, G. Rebdawi and N. Ghneim Knowledge-Based Systems 248 (2022) 108933

However, many empirical studies concluded that there is a gap RQ2: How effectively our approach detects templates when han-
between RE tasks automation researches and its implementation dling requirements which do not follow one of the well-
undertaken in industrial and real-life projects [18–20]. These known templates or do not follow any clear template?
automation approaches usually need the requirements to be rep- RQ3: To what degree does the number of processed require-
resented based on specific templates [21], hence their success ments, and requirements statements’ length influence ex-
strongly depends on the consistency of the requirements with the perimental results?
predefined templates [4,21]. Using a ‘‘template-based’’ approach
to handle RE tasks could lead to lower precision when applied The main contributions of this paper include:
on new requirements written based on some variations of the (1) One of the first works which study the syntax of templates
predefined template or on a completely different template. These and try to recognize software requirements templates au-
situations are common in industrial projects [4,17–22]. Two main tomatically.
sources have been mentioned in the literature for this gap: (1) (2) A methodology, based on NLP and Graph theory, which can
the template is insufficient to fully express the requirements of
be applied to any set of requirements to detect common
some industrial cases (2) in some real-life projects, it is hard to
syntax.
control requirements authoring environments, especially in large
(3) A public-access source code2 to facilitate the replication of
development projects or when one has little control over the
our study and to enable other researchers to build on our
requirements authoring environments. The latter case is common
results.
when multiple organizations are involved in requirements writ-
(4) A data set of 82 lists of requirements labeled manually
ing, or when working on projects related to crossover services
based on their templates. To our knowledge, this is the first
[4,18].
large labeled data set in this domain.
Focusing on the above problems, we set a research goal to
(5) Experimental evaluation of the methodology to answer the
develop an automated approach enabling templates recognition from
above three research questions.
requirements themselves. Our work is motivated by perceived
advantages and potential applications of recognizing templates: The rest of the paper is structured as follows: Section 2 pro-
vides a quick overview of Natural Language Processing (NLP),
(1) Each company has its specific jargon and standards [22],
thus having a tool which automatically recognizes the ac- graph theory, and the community detection (CD) problem. These
tual used (not the supposed) templates leads to recogniz- topics are used in the different stages of our approach. In Sec-
ing both standard or non-standard templates. We argue tion 3, we provide an overview of the related works in the
that RE tasks can achieve more real-life solutions when literature. Section 4 presents the suggested approach to extract
requirements representation and analysis starts from re- templates automatically. Section 5 provides an initial evalua-
quirements themselves not from standard or predefined tion of our work. Section 6 identifies the limitations and ana-
templates. lyze threats to validity. We conclude our paper in Section 7 by
(2) Understanding the syntax of requirements is an essential discussing our findings and future works.
step to understand their semantic; recognizing the key
syntactic components in requirements texts helps find- 2. Background
ing more suitable semantic representations for require-
ments [23]. This section provides a brief overview of two key concepts
(3) Recognizing requirements syntax becomes a laborious task used in our work: Natural Language Processing that is used to
for large projects since one project may contain a large analyze requirements texts, and community detection in a graph
number of requirements.1 [16,25]. that is used to group syntactically similar requirements to predict
(4) Detecting the syntax of requirements texts leads to build- templates.
ing a supportive environment for requirements author-
ing; it can help in building more adaptive tools to de- 2.1. Natural language processing
tect syntactic inconsistency, and to measure the quality of
requirements. Natural language processing is one of the main artificial in-
telligence disciplines. It aims to enable computer programs to
In order to achieve our research goal, we developed an unsu- ‘‘understand’’ and process natural language texts to achieve some
pervised approach to detect the syntactic structure of the require- specific goal [26,27]. NLP has been used in various domains
ments and to recognize their templates. The proposed solution to build applications like text classification, semantic relation
starts by applying a set of Natural Language Processing (NLP) detection, conceptual diagram extraction, semantic labeling, etc.
steps to process requirements and to represent them as a graph. In this section, we define the main NLP concepts which are
Each vertex in the graph represents one requirement, and each related to our approach:
pair of vertexes has a structural similarity which reflects the com-
mon syntax of their related requirements. Then, the main com- • Tokenization: the process of tokenizing or splitting a text
munities in the resultant graph are detected by applying a hierar- into a list of tokens. Tokens can be words, numbers or punc-
chical community detection algorithm. Each community consists tuation marks. For example, the sentence ‘‘As an applicant,
of a set of requirements which share a common syntax. Finally, I want to submit Supporting Documentation’’ consists of 10
we extract templates based on the recognized communities. tokens:
Our work was guided by the following research questions: [‘‘As’’, ‘‘an’’, ‘‘applicant’’, ‘‘,’’, ‘‘I’’, ‘‘want’’, ‘‘to’’, ‘‘submit’’, ‘‘Sup-
porting’’, ‘‘Documentation’’]
RQ1: To which extent can we recognize templates automati-
• Lemmatization: the process of finding the dictionary form,
cally when requirements follow well-known standard tem-
or the lemma, of each word. It is useful to group all inflected
plates?
forms of a word in a single form (lemma). For example, the
lemma of ‘‘Supporting’’ and ‘‘Supported’’ is ‘‘Support’’.
1 Although there are no previous works that define what number of require-
ments is considered large, some researchers [24] suppose that 50 requirements
per project is considered a ‘‘large’’ number. 2 https://zenodo.org/record/6525271

2
R. Sonbol, G. Rebdawi and N. Ghneim Knowledge-Based Systems 248 (2022) 108933

guided by recent related systematic reviews [35–37], many ap-

proaches [4,38–40] have been proposed to ensure that require-
ments are indeed written according to defined templates, either
by checking the conformance of requirements to templates or by
providing requirements authoring tools to enforce the require-
ment writing process to follow specific standards.
Arora et al. [4] presented an automated approach for check-
ing conformance to requirements templates based on annota-
tions produced by a set of natural language steps (POS tag-
ging, Named Entity Recognition, and text chunking). The work is
based on two requirement templates: Rupp’s templates [16] and
EARS’s templates [15]. Their approach represents these templates
as BNF grammars. This representation enables the definition of
Fig. 1. A Graph with three communities [30].
pattern matching rules for checking template conformance us-
ing a regular-expression-based pattern matching language (JAPE).
Arora et al. evaluated their approach using several test cases,
• Part Of Speech Tagging (or POS-Tagging): the process of tag- results indicate that their approach provides an accurate and
ging each token in a sentence with its corresponding part of scalable basis for template conformance checking.
a speech tag based on its syntactical context. Assigned tags Lucassen et al. [38] proposed a tool (called AQUSA) that can
reflect the syntactic role of each word in its sentence such analyze a set of user stories using NLP techniques. AQUSA exposes
as noun, verb, adjective, etc. For example, in the previous defects and deviations from good practice in user stories based on
sentence the tag of ‘‘applicant’’ is ‘‘NOUN’’ while the tag of 14 quality criteria that user stories should strive to conform to.
‘‘want’’ is ‘‘VERB’’ and the tag of ‘‘an’’ is ‘‘DET’’. Lucassen et al. evaluated their work using three user story sets;
• Chunking: the process of detecting syntactic constituents for each set, at least 25% of the processed user stories violated
such as Noun Phrase and Verb Phrase in a sentence. For ex- one or more quality criteria that the AQUSA tool can detect.
ample, in the previous sentence ‘‘Supporting Documentation’’ These approaches provide a clear time saving compared to
represents a Noun Phrase chunk. the needed effort to detect conformance cases manually [21]
especially when the task has to be repeated multiple times in
2.2. Graph and community detection problem
response to changes in the requirements [4]. However, applying
these approaches in practice needs to have a clear idea about
A graph G can be defined as a pair (V , E), where V is a set of
the used templates, since some approaches need to be configured
vertexes, and E is a set of edges between the vertexes.
based on any new templates. Therefore, it is difficult to apply
Graphs can be undirected or directed depending on whether the
these approaches when one does not have high control over the
edges are bidirectional or not: in an undirected graph, edges are
requirements authoring environments.
bidirectional, while in a directed graph each edge has a direction
Other works suggested providing tools to assist in require-
associated with it [28].
ments authoring process. Femmer et al. [41] proposed a lightwe-
A graph is called simple if it does not contain self-edges or
parallel edges; parallel edges are two or more edges that connect ight requirements analysis approach named Smella based on
the same vertexes [28]. the natural language criteria of ISO 29148 [42] which imposes
A community C in a graph is a set of vertexes that are highly constraints for ‘‘good requirements specifications’’. It enables to
connected with each other and only have a few edges with rapidly detect requirements smells by immediate rapid checks
vertexes from the rest of the graph [29,30]. Fig. 1 shows a small when requirements are written down.
graph with three clear communities, denoted by the dashed cir- Besides, some tools provide requirements authoring and man-
cles. Each of these communities has more internal edges between agement software which force requirements authors to follow a
vertexes within the community rather than external edges with specific structure. Some examples of these tools are DODT [39]
other communities. Finding these communities is very useful to and RQA [40]. These tools improve the quality of requirements
understand graph structure. Many empirical problems can be by controlling the process of requirements authoring [43], How-
modeled as a graph that divides naturally into communities, for ever, anyone who has used these templates in real projects has
instance, protein interactions, social interactions [29–32]. realized that some requirements cannot be expressed with that
Several algorithms for community detection have been pro- structure [44].
posed in the literature. These algorithms can be categorized ac-
cording to their functionality as hierarchical clustering, graph 4. Our approach to recognize templates from requirements
partitioning, optimization techniques, dynamic methods, among
others. A detailed description and a comparative analysis of each In this section, we present our proposed approach to automat-
category can be found in Fortunato et al. work [29]. Agglomera- ically recognize templates from a set of requirements. The resul-
tive hierarchical techniques are well-known techniques in solving tant templates reflect the actual syntactic structure of require-
community detection problems [33,34], they usually start by ments, hence it can recognize both standard and non-standard
defining a structural similarity between two vertexes, then iter- templates. The input of the proposed solution is a set of require-
atively merges vertexes and communities based on the defined ments to be analyzed without any previous knowledge about the
similarity. Our approach is based on this family of approaches requirements authoring environments or the used templates (if
since it has shown good results in terms of quality of the detected exist).
community structure and the speed of computation [34]. The proposed approach can be divided into three stages (see
Fig. 2):
3. Related words
(1) Formulating the problem as a graph problem: The input of
To the best of our knowledge, there is no previous work this stage is a list of textual requirements, and the output is
reported in the literature about recognizing requirements syn- a graph of requirements. We define a structural similarity
tax automatically from requirements themselves. Nevertheless, between each pair of requirements (to be defined later).
3
R. Sonbol, G. Rebdawi and N. Ghneim Knowledge-Based Systems 248 (2022) 108933

Fig. 2. Overview of the proposed approach to recognize templates from requirements showing the key stages.

(2) Detecting main communities in the resultant graph by ap- (d) Find top frequent words: Top frequent words, denoted
plying a hierarchical community detection algorithm. The by W , represents words that occur in more than
output of this stage is a number of communities, each of a certain percentage N of the number of the pro-
which consists of a set of vertexes (i.e. requirements) which cessed requirements. The threshold N is determined
have a large degree of structural similarity. experimentally. In this work, we used N = 1/3
(3) Reinterpreting the output of the community detection al- i.e. W contains words that occur in one-third of the
gorithm to a set of templates. We consider each community requirements. These words are expected to be part of
as a candidate template, and we define its structure using templates’ structures like ‘‘as’’, ‘‘so’’, ‘‘that’’, ‘‘ability’’,
the community related requirements. ‘‘shall’’, etc.
(e) Noun Phrase Chunking: We extract noun phrases from
The next subsections explain the detailed steps for each stage: each requirement using OpenNLP toolkit [46]. A noun
phrase is a noun, plus all the words that surround
4.1. Formulating the problem as a graph problem and modify it, such as adjectives, relative clauses and
prepositional phrases, for example ‘‘Car Alarm’’.
We convert the requirements list into a simple undirected graph
2. ‘‘Blurring’’ Process:
where each requirement r is represented as a vertex v in the
For each vertex v , the final Γ (v ) is retrieved by applying
graph, and each pair of requirements is represented as an edge
a set of blurring rules on v ’s related requirement. These
e. We denote the resultant graph G(V , E), where V consists of
rules hide the details that are related to the specific case
all vertexes and E consists of all edges based on the previous
described in the processed requirement, and focus on its
definitions.
overall syntactic structure. The blurring rules are as fol-
To formulate the task of template recognition as a community
lows:
detection task, we define the following concepts:
(a) All detected noun phrases are converted to empty
4.1.1. Vertex structure slots: Applying this rule does not affect the general
Let v ∈ V , the structure of vertex v , denoted by Γ (v ), reflects structure of the requirements, and therefore pre-
the core syntactic structure of its related requirement r. Γ ‘‘blurs’’ serves the overall syntax of the desired templates.
the details of requirement statements to focus on their overall For example, all these sentences share the same
structure. It behaves like blurring techniques in image processing, overall structure:
which apply a set of image filters to hide the tiny details and focus
‘‘Car alarm shall be inhibited’’
on the overall ‘‘shape’’ in the image (Fig. 3):
‘‘Electrical and manual commands shall be
Γ values can be retrieved by applying the following steps on
inhibited’’
requirements texts:
(b) All verbs and nouns are converted to empty slots
1. Text preprocessing:
unless they are contained in top frequent words W .
(a) Tokenization: We split each requirement text into a These verbs and nouns are usually related to the spe-
list of words. cific functionality or constraint which is described in
(b) POS-Tagging: We use Stanford pos-tagger [45] to tag each requirement. We blur these details since they
each word with its suitable part of speech tag that are not related to the desired syntax, such as ‘‘key’’
gives the syntactic role of the word (such as, Plural and ‘‘ignition’’ in Fig. 3.
Noun, Singular Noun, Adverb, Adjective, . . . ). (c) All remaining words are kept in their original form
(c) Remove articles: We remove words that have the tag (as they appear in the requirement) without any
‘‘DET’’ i.e. remove all articles. blurring.
4
R. Sonbol, G. Rebdawi and N. Ghneim Knowledge-Based Systems 248 (2022) 108933

Fig. 3. Example showing how Γ ‘‘blurs’’ the details a requirement statement and focus on its general syntactic structure.

4.1.2. Common structure and structural similarity

Let (u, v ) ∈ V , the common structure of u and v , denoted
by σ (u, v ), represents the common syntactic elements between
these two vertexes. The syntactic structure of a template is usu-
ally preserved in all requirements written based on that tem-
plate. Thus, finding the Longest Common Subsequence LCS3 [47]
of Γ (u) and Γ (v ) keep the shared syntactic structure of both
requirements because the Γ of each vertex reflects the syntac-
tic structure of its related requirement. We define σ (u, v ) as
following:
σ (u, v ) = LCS(Γ (u), Γ (v ))
The length of σ reflects the Structural Similarity between re-
quirements. Longer σ reflects a larger degree of structural simi-
larity. For example, if u, v , and w are three requirements with the
following Γ values: Fig. 4. The elements of the power set of four requirements {a,b,c,d}.

Γ (u) = ‘‘As , I want to be able to ’’

Γ (v ) = ‘‘As , I want to by ’’
Γ (w ) = ‘‘ shall ’’ These two constraints will guide our approach while detecting
Then, main communities in the next stage.
σ (u, v ) = ‘‘As , I want to ’’
σ (u, w) = ‘‘ ’’ 4.2. Detecting main communities in the graph
|σ (u, v )| = 7, because σ (u, v ) consists of 7 tokens: { As, ,
After formulating the problem as a graph problem, we apply
‘‘,’’, I, want, to, }. Similarly, we can find that |σ (u, w )| = 1.
a community detection algorithm to group vertexes that share a
Therefore, the syntactic structure of u is closer to the syntactic
common syntactic structure in meaningful communities. Mathe-
structure of v than the syntactic structure of w .
matically speaking, we need to select suitable subsets from the
power set that can be generated based on V . A power set is the
4.1.3. Community and meaningful community
collection of all subsets of a given set (Fig. 4). Each of these
We define a community C as a subgraph from G consisting of subsets can be considered as a candidate community, and our
a set of vertexes V (C ) connected by a set of edges E(C ). problem can be seen as a trade-off between two opposite goals:
Each community C has three main properties: (1) increasing the size of selected subsets, since we want to cover
(1) Community syntactic structure, denoted Γ (C ), which can be as much requirements as possible (2) having as much syntactic
structure as necessary to have a clear template.
calculated by finding the common structure between all its
For n requirements, the related power set has 2n subsets,
vertexes V (C ) using σ definition.
i.e. the number of possible subsets increases exponentially as
(2) Community internal similarity |Γ (C )|, which represents the
the number of processed requirements increase. All these chal-
length of Γ (C ).
lenges make the problem an NP-hard problem, which means that
(3) Community size, which represents the number of related applying optimization approaches like dynamic programming in-
vertexes |V (C )|. feasible.
Meaningful communities are those which satisfy two heuristi- To address these problems, our approach combines a limited
cally based constraints: dynamic programming algorithm which produces a set of seed
communities ranked by their internal similarity, and a Greedy
1. The size is larger than 10% of the total number of vertexes algorithm which merges these solutions to enlarge seed commu-
in the graph. This constraint eliminates small communities nities and form the final output (See Fig. 5).
that are less than enough to recognize a template. The details of these two main steps are explained in the
2. The internal similarity is equal or larger than 3. This con- following sections:
straint ensures a minimum level of shared syntactic struc-
1. Dynamic Programming Step: In this step, we generate
ture between V (C ) items.
seed communities based on a limited dynamic program-
ming approach. We start by generating communities con-
3 LCS of two lists of strings finds the longest subsequence in both of them. sisting of one vertex C (1) by considering each vertex as a
5
R. Sonbol, G. Rebdawi and N. Ghneim Knowledge-Based Systems 248 (2022) 108933

Fig. 5. The two main steps to detect main Communities in the graph.

this section is inspired by Agglomerative Hierarchical tech-

nique, a widely used greedy technique to find communities
in complex graphs. Agglomerative Hierarchical technique
is suitable for our case since it can achieve good results in
terms of (1) the quality of the detected communities struc-
ture and (2) the speed of computation when the number
of requirements increases [34]. The output of this step is
a set of meaningful communities i.e. they should satisfy
the size and the internal similarity conditions described in
Section 4.1.3. To address these conditions, our algorithm
starts by focusing on communities that have large degree
of internal similarity, and enlarge them iteratively to assure
Fig. 6. Number of subsets with size k per number of vertexes.
size conditions. Enlarging a community is done by merg-
ing the most similar communities with that community
iteratively. The following pseudocode illustrates the main
procedure for this step:
separated community. The structure of these communities
can be calculated based on the structure of vertexes as
following: Algorithm 1 A Greedy Step

Γ (C ) = Γ (α )
(1)
w here V (C ) = {α}
(1) 1: Input Seed Communities
2: Output Final Recognized Communities
Then, we generate larger communities and calculate their 3: Q ← Seed Communities
structure gradually in a bottom-up direction as following: 4: output ← { } ▷ This set will include recognized templates
5: covered ← { } ▷ list of covered requirements
Γ (C (i+1)
) = σ (Γ (C ), α )
(i)
where V (C (i+1) (i)
) = V (C ) ∪ {α}
6: while Q ̸ = Ø OR V ⊆ covered do
We stop this process after k iterations since the number of 7: C ← MAX(Q ) ▷ Get the community that has the max
generated communities increases exponentially after each internal similarity in Q
step. The number of possible communities at the level k 8: Remove C from Q
can be calculated by finding the binomial coefficient of the 9: if C is a meaningful community then ▷ Based on its
definition in Section 4.1.3
(|V |)
(number
|V |
) of vertexes |V | and the level k i.e. k (there are
10: output.add(C )
k
different ways to select k vertexes from |V | vertexes).
11: covered.add( V (C ) )
Fig. 6 shows how the number of the generated commu-
12: else
nities grows as the number of vertexes (i.e. number of
13: Find Ci from Q which have the max similarity structure
requirements) increases. The same figure shows different
σ with C.
values for k ranged from 2 to 5. These charts clarify the
14: Remove Ci from Q.
need to set a limit for the dynamic programming step.
15: Merge Ci with C and add the resultant community to
For example, the 4th level includes more than 2 × 105
Q.
communities when |V | = 50.
16: end if
In our work, we set k = 3, and we considered the re-
17: end while
trieved communities at that level as a seed communi-
18: return output
ties. These seed communities are passed to the next step
where a greedy algorithm will be used to retrieve the final
communities. At the end of this procedure, we retrieve a set of communities,
2. Greedy step: In this step, we apply a greedy technique to each of which has a set of related vertexes and Γ (C ) representing
find main communities based on the output of dynamic the common structure over these vertexes. Table 1 shows a
programming step (see Fig. 5). The proposed algorithm in sample output for this stage.
6
R. Sonbol, G. Rebdawi and N. Ghneim Knowledge-Based Systems 248 (2022) 108933

Fig. 7. The representative template for community C3 .

Table 1 Table 3
Sample output showing how ‘‘Detecting main community’’ stage General statistics about the final data set.
divides a set of 35 requirements statements into 4 communities. Number of sets 82
Community Γ (C ) Size(C ) Number of requirements 8084
C1 ‘‘if is , then shall ’’ 10 Avg. number of requirements per set 99
Well-known templates used Rupp, User Story, EARS, Use Case
C2 ‘‘while is , shall ’’ 12
C3 ‘‘when is , I want to ’’ 8
C4 ‘‘ shall ’’ 5
5. Experiments and results

Table 2 5.1. Data preparation

Some retrieved chunks for the first three slots in Γ (C3 ).
Slot Retrieved examples
5.1.1. Data collection
A Ø (Fake slot, no matched chunks) We base our experiments on 82 sets of requirements collected
B ‘‘the record’’, ‘‘the process’’, ‘‘the user’’, . . .
from publicly available data sets. Some of these sets are part of
C ‘‘applied’’, ‘‘done by the administrator’’, ‘‘informed’’, . . .
previous works related to requirements engineering [4,48–52].
These sets address several considerations:

4.3. Recognize the syntax of possible templates: • Covering different levels of control over requirements au-
thoring environment i.e. including both requirements with
clear templates and requirements with no clear templates.
In this stage, we reinterpret the output of the CD algorithm • Covering different templates including both standard tem-
into requirements templates. We consider each community as a plate (such as user stories, Rupps), and non-standard ones
candidate template. Γ (C ) is used to define the syntactic structure (such as company-related templates, author-related tem-
of each template. plates).
For each resultant community, the following steps are applied • Covering different levels of conformance with the templates.
to find the representative template: • Covering different sets sizes in terms of number of require-
ments in each set.
1. Insert Dummy Slots: Community structure Γ (C ) is cal- • Covering various domains (healthcare, e-commerce, . . . ).
culated based on LCS, thus we cannot suppose that all • Covering both academic and industrial sources.
successive tokens in Γ (C ) are also successive in the original
requirements texts. For example, theoretically, we can- 5.1.2. Data cleaning and standardization
not guarantee that there are no chunks of texts between The collected data sets have different formats: text files, PDFs,
‘‘want’’ and ‘‘to’’ in all related requirements of C3 in the XMLs, and My SQL tables. We extracted requirements texts from
previous example (Table 1). each of which, and prepared a text file for each set of require-
For this reason, we add slots in all possible places to match ments. Each line in this text file represents one requirement.
any possible undetected chunk of text. Γ (C3 ) will be: Fig. 8 shows statistics about the resultant requirements sets.
Γ (C3 ) = ‘‘ A when B is C , D I E want F to G ’’ About 72% of these sets consist of more than 50 requirements,
and more than 30% of sets include more than 100 requirements
This includes adding a slot between each two tokens (like
(see Fig. 8(a)). In addition, the final sets include short and long
slots ‘‘E’’ and ‘‘F’’) and at the beginning (slot ‘‘A’’).
requirements in terms of number of words per requirement.
2. Retrieve Slots Examples: We use a regular expression to
Fig. 8(b) shows the distribution of these sets in terms of the length
slice each requirement based on the Γ of its community.
of their requirements. About 75% of the data set comes from
Based on that, we can retrieve a set of examples for each industrial sources, the rest are from academic sources (Fig. 9).
slot. Table 2 shows sample matched chunks for the first Table 3 provides general statistics about the final data set.
three slots in Γ (C3 ). Note that slot A is a fake slot since it
is empty in all related requirements. These fake slots will 5.2. Data annotation
be ignored in the last representative template.
3. Analyze Slots Syntax: In this step, we can understand the To evaluate our approach, we annotated each requirement
syntax of each non-fake slot using its retrieved examples. with its matched template. We used 5 labels to annotate all sets:
Using the syntactical tags of these examples, we can decide 4 of them for the well-known used templates (User Story, Rupp,
if a slot is verbal (like the slot C in the last example) or EARS, Use Case), and an additional label for the remaining cases
nominal (like slot B). (Others). The initial annotation stage has been done by two an-
notators with business analysis experience. The inter-annotator
Fig. 7 shows the result of applying the last 3 steps on the agreement (Cohen’s kappa) between the two annotator reaches
community C3 . 0.92 with a percentage agreement of 93.3% which represents
7
R. Sonbol, G. Rebdawi and N. Ghneim Knowledge-Based Systems 248 (2022) 108933

Fig. 8. The distribution of different sets over the number of requirements and the average length of requirements.

• If T ∈ {Rupp, User Story, EARS, USe Case}: T̂ matches the

template T if T̂ conforms to the template T ; this means that
T̂ should equal T or should be a specification for T .
• If T is ‘‘Others’’: T̂ matches T if T̂ is not one of the well-
known templates.
Table 5 shows some examples for matched and not matched
pairs of (T̂ , T ).
To evaluate our results, we calculated precision, recall, and
F1-measure. These measures are calculated for each label (Ti ) as
follows:
Precision(Ti ) =

Number of requirements w ith T = Ti , and T̂ matches Ti

Fig. 9. The distribution of different sets over sources. Number of requirements w ith T̂ matches Ti
Recall(Ti ) =
Table 4
The distribution of data set requirements over templates. Number of requirements w ith T = Ti , and T̂ matches Ti
Template Number of requirements Percentage
Number of requirements w ith T = Ti
Rupp 3678 45.5%
User Story 1855 22.9% Precision(Ti ) . Recall(Ti )
EARS 235 2.9% F 1(Ti ) = 2.
Precision(Ti ) + Recall(Ti )
Use Case 31 0.4%
Others 2285 28.3% These measures are widely used in similar problems. Precision
gives an idea about quality of recognized templates, while recall
gives an idea about the coverage of results.
almost perfect agreement level [53]. Finally, a third annotator
5.4. Results
(the first author of this paper) resolved conflicts (about 630
requirements) to produce the final data set.
To answer our research questions, three experiments were
Table 4 shows the number of requirements for each label. conducted. We explain the details of each experiment below.
About 72% of annotated requirements follow one of the well- RQ1: To which extent can we recognize templates automat-
known standard templates. The most used template is Rupp and ically when requirements follow well-known standard tem-
user story. On the other hand, more than 28% of the data set re- plates?
quirements do not follow any well-known standard: about 37.5% Table 6 provides detailed results for recognized syntax when
follow different non-standard templates, while the remaining applying our approach on the prepared requirements sets. Results
62.5% have no clear standard. The final annotated data set is show that the automatically recognized templates match the
publicly available for research purposes.4 manually annotated ones with 0.90 F1-measure (0.92 precision
and 0.89 recall). This percentage increases to more than 0.98
5.3. Evaluation methodology when requirements follow templates with more restrictions (such
as user story or use case), while the percentage decreases to 0.89
For each set of requirements, we apply our approach to detect when more flexible templates (like Rupp) are used.
templates. The final output is a set of recognized templates each The detailed values of precision and recall show that the ap-
of which is matched with a set of requirements. proach recognizes well-known templates with perfect precision,
In our evaluation, we consider that a recognized template T̂ i.e. whenever the approach recognizes a well-known template T̂ ,
matches the template T based on these three cases: this template matches the manually annotated template T in all
tested cases.
To check the stability of these results over all sets, we cal-
4 https://github.com/... culated F1-measure values for each of the 82 sets separately.
8
R. Sonbol, G. Rebdawi and N. Ghneim Knowledge-Based Systems 248 (2022) 108933

Table 5
Examples of matched and not matched templates.
T̂ T Evaluation
B G L
‘‘as , I want to know , so that I can .’’ User Story Matched
A C D
‘‘ should be to .’’ Rupp Matched
B C F G
‘‘when is , I want to ’’ EARS Matched
A D J
‘‘ want to know , so that .’’ User Story Not Matched
A B
‘‘ and ’’ Others Matched
‘‘Requirement definition: D system shall provide G
to H
.
requirement specification : L system will N to O
. Others Matched
origin : S priority : U .’’

Table 6
Evaluation results.
Label Number of requirements TP FP FN Precision Recall F1-measure
Rupp 3678 2925 0 753 1.00 0.78 0.89
User Story 1855 1796 0 59 1.00 0.97 0.98
EARS 235 188 0 47 1.00 0.80 0.89
Use Case 31 30 0 1 1.00 0.97 0.98
Others 2285 2285 860 0 0.73 1.00 0.84
Weighted average 0.92 0.89 0.90

Table 7
Evaluating the non-standard cases.
Info Number of requirements Percentage
Template is correctly detected 611 73.5%
Template is partially detected 171 20.1%
Template is not detected 49 5.9%

These results indicate that the proposed method represents an

effective approach for determining the overall syntactic structure
of requirements even though they use organization or author
related templates.
RQ3: To what degree does the number of processed require-
ments, and requirements statements’ length influence experi-
mental results?
Fig. 10. Cumulative percentage of sets based on their F1-measure values.
To answer this research questions, we divide the data set
into 4 parts based on their length (i.e number of words per
requirement). We calculate the F1-measure for each part sepa-
Fig. 10 shows the cumulative percentage of sets based on their rately, and study how F1-measure changes over these 4 parts.
F1-measure values i.e. for each F1-measure value x on X − Axis, Moreover, to remove the effect of the template structure, we
we calculate the percentage of sets which have F1-measure value repeat our experiment for each template separately. We apply
equals or larger than x. Although most sets contain different vari- this experiment using Rupp, User Story, and EARS templates. Use
ations of well-known standard templates with many deviations Case Template has not been included in this experiment as our
from their most common formats, this experiment shows that our data set contains only one set that match this template. Fig. 11
approach can detect the syntactic structure of more than 80% of shows the results of this experiments.
the sets with an F1-measure value larger than 0.9. The same procedure has been applied to test the effect of the
RQ2: How effectively our approach detects templates when set size (number of requirements per set) on F1-measure value.
handling requirements which do not follow one of the well- Fig. 12 shows the results of this experiments (Note that there are
known templates or do not follow any clear template? no user story sets with more than 150 requirements).
In the second experiment, we focus on requirements which These results suggest that there is no significant relation be-
tween the effective of the proposed method and number of pro-
do not follow any well-known standard. These requirements can
cessed requirements or requirement length. Using LCS is one of
be classified into two cases: (A) requirements that follow non-
the key factors to avoid reducing the accuracy when handling
standard templates (such as company or author related stan-
long requirements, since it can detect common subsequences
dards). (B) requirements which do not follow any template which
regardless the length of processed sentences.
indicates that requirements have been written with little control
over the requirements structure. 5.5. A comparative study with related works
To answer this research question, we examine to which extent
our approach is able to detect the common syntax for the re- In this section, a comparative study is presented to illustrate
quirements of case A. Table 7 shows that our approach detect the the benefits of the proposed approach compared to the works
correct common syntax for non-standard templates in more than discussed in Section 3. Our comparison includes seven criteria:
73% of the cases. The approach can partially detect the expected (1) the goal of each work, (2) does it provide an automated
non-standard templates in about 20% of the cases and fail to procedure or not, (3) used techniques, (4) the covered (or sup-
detect any common syntax in about 6% of the cases. ported) standard templates, (5) the needed effort to support
9
R. Sonbol, G. Rebdawi and N. Ghneim Knowledge-Based Systems 248 (2022) 108933

Fig. 11. F1-measure values when dividing the data set based on the requirement length (# of words per requirement).

Fig. 12. F1-measure values when dividing the data set based on their set size.

new templates, (6) the ability to handle non-standard templates 1. We construct our data set based on well-known require-
(organization-specific templates), (7) the number of provided ments sets which have been used previously in various RE
case studies. automation works [4,48–52].
Table 8 summarizes the main differences between the re- 2. Data annotation is carried out by two domain experts, and
lated works and our approach. As shown in this comparison, the differences were then rechecked. Our inter-annotator
our approach differs from other works in its focus; instead of agreement analysis shows almost perfect agreement (more
handling the requirements based on pre-defined templates, we than 0.9) which provides confidence about the quality of
learn templates from the requirements themselves. Also, using annotated data set.
an unsupervised technique makes our approach more practical 3. A detailed guideline has been made available to the an-
and efficient – compared to related works in the literature – notators explaining the syntax of the used well-known
when analyzing requirement without any previous knowledge templates.
about their authoring environments, or the used templates in
Another potential threat to validity is the number of levels
these requirements. In addition, we use 82 sets of requirements
used in the dynamic programming step when detecting main
to validate the efficiency of our approach, which represent the
communities (Section 4.2). We apply this step on three levels
largest set of case studies in related literature.
and then passed the results (the seed communities) to the greedy
step. The number of levels have been chosen based on a theoret-
6. Limitations and threats to validity ical analysis since practical experiments cannot be done because
of the complexity of the problem.
This section discusses the limitations and potential threats to Moreover, another possible threat to internal validity arises
validity of our methodology and experimental results. from the two thresholds used to define meaningful communities:
community size that should be more than 10% of total number of
6.1. Limitations: requirements, and internal similarity that should be more than
3. However, we consider these thresholds acceptable since they
Our approach is applicable for any clean list of requirements: only eliminate non-important communities i.e. communities with
recognizing whether a statement is part of requirement text or small number of requirements, or communities that do not have
not is out of the scope of our work. Thus, in our experiments, we significant common structure. Note that Γ (C ) is used to construct
cleaned the used software requirements specification documents templates, and the internal similarity |Γ (C )| equals the size of
by selecting only requirements sections. the template which represent C ’s requirements. Thus, the internal
similarity threshold only eliminate template with two tokens
6.2. Internal validity: (one of them represent a slot) such as ‘‘ and’’ or ‘‘admin ’’ .

A potential threat to internal validity arises from the fact that 6.3. External validity:
the evaluation data set is developed during this research. Several
mitigation actions have been taken to control bias related to the We tested our approach on 82 sets of requirements covering
data set: different aspects for requirements: various syntactic structure,
10
R. Sonbol, G. Rebdawi and N. Ghneim Knowledge-Based Systems 248 (2022) 108933

Table 8
A comparative study with related works.
Arora et al. [4] Lucassen et al. [38] Femmer et al. [41] DODT [39] Our approach

Goal Checking Detecting defects and Detecting Transforming NL Recognizing used

conformance deviations from user requirements bad requirements into templates and
with pre-defined story template specification higher quality learning the syntactic
templates practices semi-boilerplate structure of a set of
requirements requirements

Provides Automated Automated Automated Semi-automated Automated

automated
procedure
Used technique Regular Rules Rules Rules Unsupervised
expression machine learning
approach

Covered (or Rupp, EARS User Story Rupp, User Story, Rupp Rupp, EARS, User
supported) Use Case Story, Use Case
standard
templates
Needed effort to High, need to High, the suggested Low, since it just High, it requires a No additional work,
support new add new criteria are specific to applies a rapid high-quality domain the approach is
templates patterns and user stories checklist that is ontology to be dynamic and can
rules for any applicable for any developed first learn new templates
new template template automatically

Ability to handle No No No No Yes

non-standard
templates
Number of 4 sets of 3 sets of user stories 13 sets of One set of 43 82 sets of
provided case requirements (each consists of requirements requirements requirements (each
studies (each consists of 24–124 user stories) (1–561 reqs) consists of 12–890
110–890 reqs) reqs)

different sources, different domains, different size, and different Declaration of competing interest
requirement length. Our experiments provides reasonable confi-
dence that the quality of results is preserved over these aspects. The authors declare that they have no known competing finan-
However, larger scale empirical studies can benefit to improve cial interests or personal relationships that could have appeared
the external validity of our approach. to influence the work reported in this paper.

7. Conclusion
References
In this paper, we presented an automated approach to learn
[1] ISO/IEC/IEEE, Systems and Software Engineering—Life Cycle Processes—
requirements syntax and recognize their templates. The pro- Requirements Engineering, ISO Switzerland, 2018.
posed approach uses NLP techniques to extract the syntactical [2] A. Chakraborty, M.K. Baowaly, A. Arefin, A.N. Bahar, The role of require-
features of requirements statements, and uses community de- ment engineering in software development life cycle, J. Emerg. Trends
tection algorithm to group them in coherent communities based Comput. Inf. Sci. 3 (5) (2012) 723–729.
on their syntactic similarity. These communities are then used to [3] J. Holtmann, J.-P. Steghöfer, M. Rath, D. Schmelter, Cutting through the
jungle: Disambiguating model-based traceability terminology, in: 2020
construct the final templates. Our experiments show that the sug- IEEE 28th International Requirements Engineering Conference, RE, IEEE,
gested approach can detect well-known standard templates with 2020, pp. 8–19.
0.90 F1-measure. This result depends on template structure; it [4] C. Arora, M. Sabetzadeh, L. Briand, F. Zimmer, Automated checking of
increases to about 0.98 for templates with strong restrictions (like conformance to requirements templates using natural language processing,
user story), or decreases to 0.89 for less restricted templates (like IEEE Trans. Softw. Eng. 41 (10) (2015) 944–968.
[5] F.S. Bäumer, M. Geierhos, Flexible ambiguity resolution and incompleteness
Rupp template). Experiments results indicate that F1-measure
detection in requirements descriptions via an indicator-based configuration
is approximately preserved regardless of the number and the of text analysis pipelines, 2018.
length of the processed requirements. Moreover, the approach [6] F. Dalpiaz, I. Van der Schalk, G. Lucassen, Pinpointing ambiguity and incom-
can detect common syntactic features for non-standard templates pleteness in requirements engineering via information visualization and
in more than 73.5% of the cases. For future works, we plan to NLP, in: International Working Conference on Requirements Engineering:
investigate how can we use the retrieved templates to formulate Foundation for Software Quality, Springer, 2018, pp. 119–135.
[7] A. Aurum, C. Wohlin, Requirements engineering: setting the context, in:
a semantic representation of requirements. These syntactic and
Engineering and Managing Software Requirements, Springer, 2005, pp.
semantic representations may lead to more accurate techniques 1–15.
for different RE tasks. [8] T. Ambreen, N. Ikram, M. Usman, M. Niazi, Empirical research in require-
ments engineering: trends and opportunities, Requir. Eng. 23 (1) (2018)
CRediT authorship contribution statement 63–95.
[9] X. Lian, M. Rahimi, J. Cleland-Huang, L. Zhang, R. Ferrai, M. Smith, Mining
requirements knowledge from collections of domain documents, in: 2016
Riad Sonbol: Conceptualization, Methodology, Software, Data
IEEE 24th International Requirements Engineering Conference, RE, IEEE,
curation, Writing – original draft, Writing – review & editing. 2016, pp. 156–165.
Ghaida Rebdawi: Supervision, Writing – review & editing. Nada [10] A. Ferrari, A. Esuli, An NLP approach for cross-domain ambiguity detection
Ghneim: Supervision, Writing – review & editing. in requirements engineering, Autom. Softw. Eng. 26 (3) (2019) 559–598.

11
R. Sonbol, G. Rebdawi and N. Ghneim Knowledge-Based Systems 248 (2022) 108933

[11] C. Denger, D.M. Berry, E. Kamsties, Higher quality requirements specifica- [33] A. Lancichinetti, S. Fortunato, Community detection algorithms: a
tions through natural language patterns, in: Proceedings 2003 Symposium comparative analysis, Phys. Rev. E 80 (5) (2009) 056117.
on Security and Privacy, IEEE, 2003, pp. 80–90. [34] E. Castrillo, E. León, J. Gómez, Fast heuristic algorithm for multi-scale
[12] B. DeVries, B.H. Cheng, Automatic detection of incomplete requirements hierarchical community detection, in: Proceedings of the 2017 IEEE/ACM
via symbolic analysis, in: Proceedings of the ACM/IEEE 19th International International Conference on Advances in Social Networks Analysis and
Conference on Model Driven Engineering Languages and Systems, 2016, Mining 2017, 2017, pp. 982–989.
pp. 385–395. [35] L. Zhao, W. Alhoshan, A. Ferrari, K.J. Letsholo, M.A. Ajagbe, E.-V. Chioasca,
[13] A. Umber, I.S. Bajwa, Minimizing ambiguity in natural language software R.T. Batista-Navarro, Natural language processing for requirements engi-
requirements specification, in: 2011 Sixth International Conference on neering: A systematic mapping study, ACM Comput. Surv. 54 (3) (2021)
Digital Information Management, IEEE, 2011, pp. 102–107. 1–41.
[14] J. Schumann, Generation of formal requirements from structured natural [36] F. Nazir, W.H. Butt, M.W. Anwar, M.A.K. Khattak, The applications of
language, in: Requirements Engineering: Foundation for Software Quality: natural language processing (NLP) for software requirement engineering-a
26th International Working Conference, REFSQ 2020, Pisa, Italy, March systematic literature review, in: International Conference on Information
24–27, 2020, Proceedings, vol. 12045, Springer Nature, 2020, p. 19. Science and Applications, Springer, 2017, pp. 485–493.
[15] A. Mavin, P. Wilkinson, A. Harwood, M. Novak, Easy approach to re- [37] R. Sonbol, G. Rebdawi, N. Ghneim, The use of NLP-based text represen-
quirements syntax (EARS), in: 2009 17th IEEE International Requirements tation techniques to support requirement engineering tasks: A systematic
Engineering Conference, IEEE, 2009, pp. 317–322. mapping review, IEEE Access under review (2022).
[16] K. Pohl, C. Rupp, Requirements engineering fundamentals, Rocky Nook Inc, [38] G. Lucassen, F. Dalpiaz, J.M.E. Van Der Werf, S. Brinkkemper, Forging high-
2011. quality user stories: towards a discipline for agile requirements, in: 2015
[17] Y. Wautelet, S. Heng, M. Kolp, I. Mirbel, Unifying and extending user story IEEE 23rd International Requirements Engineering Conference, RE, IEEE,
models, in: International Conference on Advanced Information Systems 2015, pp. 126–135.
Engineering, Springer, 2014, pp. 211–225. [39] S. Farfeleder, T. Moser, A. Krall, T. Stålhane, H. Zojer, C. Panis, DODT:
[18] Z. Liu, B. Li, J. Wang, R. Yang, Requirements engineering for crossover Increasing requirements formalism using domain ontologies for improved
services: Issues, challenges and research directions, IET Softw. 15 (1) embedded systems development, in: 14th IEEE International Symposium
(2021) 107–125. on Design and Diagnostics of Electronic Circuits and Systems, IEEE, 2011,
[19] A. Mavin, P. Wilkinson, S. Teufl, H. Femmer, J. Eckhardt, J. Mund, Does pp. 271–274.
goal-oriented requirements engineering achieve its goal? in: 2017 IEEE [40] RQA: The Requirements Quality Analyzer Tool https://www.reusecompany.
25th International Requirements Engineering Conference, RE, IEEE, 2017, com/rqa-quality-studio.
pp. 174–183. [41] H. Femmer, D.M. Fernández, S. Wagner, S. Eder, Rapid quality assurance
[20] U. Eklund, H.H. Olsson, N.J. Strøm, Industrial challenges of scaling agile in with requirements smells, J. Syst. Softw. 123 (2017) 190–213.
mass-produced embedded systems, in: International Conference on Agile [42] ISO, IEC, IEEE: ISO/IEC/IEEE 29148, systems and software engineering, life
Software Development, Springer, 2014, pp. 30–42. cycle processes, Requir. Eng. (2011).
[21] G. Fanmuy, A. Fraga, J. Llorens, Requirements verification in the industry, [43] T. Stålhane, T. Wien, The DODT tool applied to sub-sea software, in: 2014
in: Complex Systems Design & Management, Springer, 2012, pp. 145–160. IEEE 22nd International Requirements Engineering Conference, RE, IEEE,
[22] A. Ferrari, F. Dell’Orletta, A. Esuli, V. Gervasi, S. Gnesi, Natural lan- 2014, pp. 420–427.
guage requirements processing: A 4D vision, IEEE Softw. 34 (6) (2017) [44] M. Kamalrudin, N. Mustafa, S. Sidek, A template for writing secu-
28–35. rity requirements, in: Asia Pacific Requirements Engeneering Conference,
[23] R. Sonbol, G. Rebdawi, N. Ghneim, Towards a semantic representation Springer, 2017, pp. 73–86.
for functional software requirements, in: 2020 IEEE Seventh International [45] P. Qi, Y. Zhang, Y. Zhang, J. Bolton, C.D. Manning, Stanza: A python
Workshop on Artificial Intelligence for Requirements Engineering, AIRE, natural language processing toolkit for many human languages, 2020, arXiv
IEEE, 2020, pp. 1–8. preprint arXiv:2003.07082.
[24] S. Hatton, Early prioritisation of goals, in: International Conference on [46] J. Baldridge, The apache OpenNLP project, 2005, URL: https://Opennlp.
Conceptual Modeling, Springer, 2007, pp. 235–244. Apache. Org.
[25] C. Arora, M. Sabetzadeh, L. Briand, F. Zimmer, R. Gnaga, RUBRIC: A flexible [47] J.W. Hunt, T.G. Szymanski, A fast algorithm for computing longest common
tool for automated checking of conformance to requirement boilerplates, subsequences, Commun. ACM 20 (5) (1977) 350–353.
in: Proceedings of the 2013 9th Joint Meeting on Foundations of Software [48] A. Ferrari, G.O. Spagnolo, S. Gnesi, Pure: A dataset of public requirements
Engineering, 2013, pp. 599–602. documents, in: 2017 IEEE 25th International Requirements Engineering
[26] D. Jurafsky, Speech and Language Processing : An Introduction to Natural Conference, RE, IEEE, 2017, pp. 502–505.
Language Processing, Computational Linguistics, and Speech Recognition, [49] F. Dalpiaz, Requirements data sets (user stories), Mendeley, 2018, http://dx.
Prentice Hall, Upper Saddle River, N.J, 2000. doi.org/10.17632/7ZBK8ZSD8Y.1, URL https://data.mendeley.com/datasets/
[27] V. Teller, Speech and Language Processing: an Introduction to Natural Lan- 7zbk8zsd8y/1.
guage Processing, Computational Linguistics, and Speech Recognition, MIT [50] E. Knauss, S. Houmb, K. Schneider, S. Islam, J. Jürjens, Supporting require-
Press One Rogers Street, Cambridge, MA 02142-1209, USA journals-info . . . , ments engineers in recognising security issues, in: International Working
2000. Conference on Requirements Engineering: Foundation for Software Quality,
[28] G.A. Pavlopoulos, M. Secrier, C.N. Moschopoulos, T.G. Soldatos, S. Kossida, Springer, 2011, pp. 4–18.
J. Aerts, R. Schneider, P.G. Bagos, Using graph theory to analyze biological [51] G. Lucassen, M. Robeer, F. Dalpiaz, J.M.E. Van Der Werf, S. Brinkkemper,
networks, BioData Min. 4 (1) (2011) 1–27. Extracting conceptual models from user stories with visual narrator,
[29] S. Fortunato, D. Hric, Community detection in networks: A user guide, Requir. Eng. 22 (3) (2017) 339–358.
Phys. Rep. 659 (2016) 1–44. [52] J.H. Hayes, J. Payne, M. Leppelmeier, Toward improved artificial intelligence
[30] M.E. Newman, Detecting community structure in networks, Eur. Phys. J. B in requirements engineering: Metadata for tracing datasets, in: 2019 IEEE
38 (2) (2004) 321–330. 27th International Requirements Engineering Conference Workshops, REW,
[31] A. Clauset, C. Moore, M.E. Newman, Hierarchical structure and the IEEE, 2019, pp. 256–262.
prediction of missing links in networks, Nature 453 (7191) (2008) 98–101. [53] J. Pustejovsky, A. Stubbs, Natural Language Annotation for Machine Learn-
[32] N. Gulbahce, S. Lehmann, The art of community detection, BioEssays 30 ing: A Guide To Corpus-Building for Applications, O’Reilly Media, Inc.,
(10) (2008) 934–938. 2012.

What About The Usability in Low-Code Platforms - A SLR
No ratings yet
What About The Usability in Low-Code Platforms - A SLR
16 pages
Report - Global Marketing Information Systems and Research Final
80% (5)
Report - Global Marketing Information Systems and Research Final
18 pages
Software Requirements As An Application Domain For Natural Language Processing
No ratings yet
Software Requirements As An Application Domain For Natural Language Processing
30 pages
Problem Description: A Framework Design and Instantiation Method
No ratings yet
Problem Description: A Framework Design and Instantiation Method
4 pages
Problem Description: A Framework Design and Instantiation Method
No ratings yet
Problem Description: A Framework Design and Instantiation Method
4 pages
sriatmaja,+Noraini+-+7581
No ratings yet
sriatmaja,+Noraini+-+7581
8 pages
Beyond Structured Programming
No ratings yet
Beyond Structured Programming
18 pages
Exploring A Self-Replication Algorithm To Flexibly Match Patterns
No ratings yet
Exploring A Self-Replication Algorithm To Flexibly Match Patterns
18 pages
Meta Patterns
No ratings yet
Meta Patterns
13 pages
Software Requirements Modeling From The Selected S
No ratings yet
Software Requirements Modeling From The Selected S
13 pages
A Framework of Patterns Applicability in Software Development
No ratings yet
A Framework of Patterns Applicability in Software Development
6 pages
06web Application For Rag Implementation and Testing
No ratings yet
06web Application For Rag Implementation and Testing
30 pages
Using_Conceptual_Models_in_Agile_Software_Developm
No ratings yet
Using_Conceptual_Models_in_Agile_Software_Developm
27 pages
Assisted Design of Data Science Pipelines: Specialissuepaper
No ratings yet
Assisted Design of Data Science Pipelines: Specialissuepaper
25 pages
A Study To Support Agile Methods More Effectively Through Traceability
No ratings yet
A Study To Support Agile Methods More Effectively Through Traceability
17 pages
(PRINT) DRIP - Segmenting Individual Requirements From Software Requirement Documents SoftwPractExp2023
No ratings yet
(PRINT) DRIP - Segmenting Individual Requirements From Software Requirement Documents SoftwPractExp2023
33 pages
Web Application for Retrieval-Augmented Generation: Implementation and Testing
No ratings yet
Web Application for Retrieval-Augmented Generation: Implementation and Testing
31 pages
Requirements Classification and Reuse: Crossing Domain Boundaries
No ratings yet
Requirements Classification and Reuse: Crossing Domain Boundaries
12 pages
Information and Software Technology: Banu Aysolmaz, Henrik Leopold, Hajo A. Reijers, Onur Demirörs
No ratings yet
Information and Software Technology: Banu Aysolmaz, Henrik Leopold, Hajo A. Reijers, Onur Demirörs
16 pages
Requirements Similarity and Retrieval: July 2024
No ratings yet
Requirements Similarity and Retrieval: July 2024
28 pages
Lab Manual: Sub: Software Engineering Lab (6KS07) Software Engineering Lab 6KS07 Software Engineering Lab P-2, C-1
No ratings yet
Lab Manual: Sub: Software Engineering Lab (6KS07) Software Engineering Lab 6KS07 Software Engineering Lab P-2, C-1
11 pages
SP-Deep Code Comment Generation
No ratings yet
SP-Deep Code Comment Generation
12 pages
Alexandru Panichella Gall Code Analysis Saner17
No ratings yet
Alexandru Panichella Gall Code Analysis Saner17
12 pages
Component Based Architecture
No ratings yet
Component Based Architecture
4 pages
computers-12-00136
No ratings yet
computers-12-00136
32 pages
Invitedpaper Aspdac 24
No ratings yet
Invitedpaper Aspdac 24
7 pages
(2023) PARALLELC-ASSIST - Productivity Accelerator Suite Based On Dynamic Instrumentation
No ratings yet
(2023) PARALLELC-ASSIST - Productivity Accelerator Suite Based On Dynamic Instrumentation
15 pages
Software Framework - Wikipedia, The Free Encyclopedia
No ratings yet
Software Framework - Wikipedia, The Free Encyclopedia
4 pages
2021-Sentence Embedding Models For Similarity Detection of Software Requirements
No ratings yet
2021-Sentence Embedding Models For Similarity Detection of Software Requirements
11 pages
p15 Yrjonen
No ratings yet
p15 Yrjonen
8 pages
Agile Software Requirements Engineering Challenges
No ratings yet
Agile Software Requirements Engineering Challenges
19 pages
CPS需求-23
No ratings yet
CPS需求-23
17 pages
Unlocking Underrepresented Use-Cases For Large Language Model-Driven Human-Robot Task Planning
No ratings yet
Unlocking Underrepresented Use-Cases For Large Language Model-Driven Human-Robot Task Planning
15 pages
Applying Case-Based Reasoning To Code Understanding and Generation
No ratings yet
Applying Case-Based Reasoning To Code Understanding and Generation
14 pages
11 Requirements Engineering in Machine Learning Projects
No ratings yet
11 Requirements Engineering in Machine Learning Projects
23 pages
uploaded_01247668
No ratings yet
uploaded_01247668
16 pages
1603-1-2328-1-10-20200414
No ratings yet
1603-1-2328-1-10-20200414
19 pages
Software Engineering Lab Pract List
No ratings yet
Software Engineering Lab Pract List
2 pages
Software Prototyping
No ratings yet
Software Prototyping
2 pages
download_17_09_2016_18_41_11
No ratings yet
download_17_09_2016_18_41_11
9 pages
[ICSE17] Semantically Enhanced Software Traceability Using Deep Learning Techniques
No ratings yet
[ICSE17] Semantically Enhanced Software Traceability Using Deep Learning Techniques
12 pages
978-1-4757-2920-7_2
No ratings yet
978-1-4757-2920-7_2
2 pages
article4
No ratings yet
article4
23 pages
A Classification Framework For Automated Control Code Generation
No ratings yet
A Classification Framework For Automated Control Code Generation
26 pages
Text to Web Application Using LLM
No ratings yet
Text to Web Application Using LLM
4 pages
Jise98 Unified Approach
No ratings yet
Jise98 Unified Approach
245 pages
Transformation of Requirement Specifications Expressed in Natural Language Into An EER Model
No ratings yet
Transformation of Requirement Specifications Expressed in Natural Language Into An EER Model
2 pages
TILDE Term Independent Likelihood moDElfor Passage Re-ranking
No ratings yet
TILDE Term Independent Likelihood moDElfor Passage Re-ranking
10 pages
A Metamodel for Agile Requirements Engineering
No ratings yet
A Metamodel for Agile Requirements Engineering
23 pages
Word Embedding Comparison
No ratings yet
Word Embedding Comparison
19 pages
Deriving Requirements Specifications From The Application Domain Language Captured by Language Extended Lexicon
No ratings yet
Deriving Requirements Specifications From The Application Domain Language Captured by Language Extended Lexicon
15 pages
Jurnal Model Matematika
No ratings yet
Jurnal Model Matematika
11 pages
Base Paper
No ratings yet
Base Paper
6 pages
Preprints202405 0305 v1
No ratings yet
Preprints202405 0305 v1
22 pages
P17 Requirement Measurement
No ratings yet
P17 Requirement Measurement
23 pages
Documenting Software Requirements Specification Using R2UC Ontology
No ratings yet
Documenting Software Requirements Specification Using R2UC Ontology
8 pages
14.2403.16289v1
No ratings yet
14.2403.16289v1
12 pages
Conceptual Queries: by Dr. Terry Halpin, BSC, Diped, Ba, Mlitstud, PHD Director of Database Strategy, Visio Corporation
No ratings yet
Conceptual Queries: by Dr. Terry Halpin, BSC, Diped, Ba, Mlitstud, PHD Director of Database Strategy, Visio Corporation
14 pages
A Graphical Toolkit For IEC-62264-2 - 2020 - CIRP
No ratings yet
A Graphical Toolkit For IEC-62264-2 - 2020 - CIRP
6 pages
C# OOP Step by Step: A Practical Guide with Examples
From Everand
C# OOP Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
C# Algorithms for New Programmers: A Practical Guide with Examples
From Everand
C# Algorithms for New Programmers: A Practical Guide with Examples
William E. Clark
No ratings yet
Group 6 - Edn 1202-13 - Written Report
No ratings yet
Group 6 - Edn 1202-13 - Written Report
75 pages
TKR Study Booklet
No ratings yet
TKR Study Booklet
44 pages
NCM 102 Endterm Notes
No ratings yet
NCM 102 Endterm Notes
788 pages
Polity Complete Course Updated Part 02 - 5371451
No ratings yet
Polity Complete Course Updated Part 02 - 5371451
17 pages
LMS: View Results: No Question Type Weightage Questions Associate Answers Score Status
No ratings yet
LMS: View Results: No Question Type Weightage Questions Associate Answers Score Status
5 pages
Topic: Life Cycle of A Butterfly Grade Level: 2nd Grade Content-Area Standards
No ratings yet
Topic: Life Cycle of A Butterfly Grade Level: 2nd Grade Content-Area Standards
2 pages
English Vocabulary in Use Advanced 3rd Edition
No ratings yet
English Vocabulary in Use Advanced 3rd Edition
8 pages
CRT Practice Problems
No ratings yet
CRT Practice Problems
2 pages
Atlas Farms v. NLRC, November 18, 2002 (Termination Dispute)
No ratings yet
Atlas Farms v. NLRC, November 18, 2002 (Termination Dispute)
2 pages
Introduction To Vibration Analysis Program - Bnetly Nevada
100% (9)
Introduction To Vibration Analysis Program - Bnetly Nevada
52 pages
Accounting March 3 Activity
No ratings yet
Accounting March 3 Activity
2 pages
Test 4 a Comparative Study of Coping Skills and Digital Dependency Among Young Adults
No ratings yet
Test 4 a Comparative Study of Coping Skills and Digital Dependency Among Young Adults
7 pages
HashMap Vs ConcurrentHashMap
No ratings yet
HashMap Vs ConcurrentHashMap
2 pages
May 22 M2 QP
No ratings yet
May 22 M2 QP
28 pages
ĐỀ 3
No ratings yet
ĐỀ 3
4 pages
5 SSC - All Subjects WHLP - January 4-8, 2021
No ratings yet
5 SSC - All Subjects WHLP - January 4-8, 2021
32 pages
What Happened Till The First Supply Part 2
No ratings yet
What Happened Till The First Supply Part 2
13 pages
04 Powers of Officers
100% (3)
04 Powers of Officers
3 pages
Lesson 1 - Developing A Positive Work Ethic
No ratings yet
Lesson 1 - Developing A Positive Work Ethic
8 pages
(Ebook) The Visual Display of Quantitative Information, 2e, 5th Printing by Edward Tufte ISBN 9781930824133, 1930824130 - Read the ebook online or download it for a complete experience
100% (2)
(Ebook) The Visual Display of Quantitative Information, 2e, 5th Printing by Edward Tufte ISBN 9781930824133, 1930824130 - Read the ebook online or download it for a complete experience
47 pages
Thesis Topics in Pulmonary Medicine
100% (3)
Thesis Topics in Pulmonary Medicine
4 pages
Reinforced Concrete Columns: J. P. Forth
100% (1)
Reinforced Concrete Columns: J. P. Forth
25 pages
Artigo Achieving Employee Commitment
No ratings yet
Artigo Achieving Employee Commitment
18 pages
Gestalt Laws of Perceptual Organization
No ratings yet
Gestalt Laws of Perceptual Organization
10 pages
Critical Essay - Voting Final
No ratings yet
Critical Essay - Voting Final
3 pages
Management Principles and Practices: Multiculturalism: Meaning and Taylor Cox'S 6 Arguments
No ratings yet
Management Principles and Practices: Multiculturalism: Meaning and Taylor Cox'S 6 Arguments
11 pages
Pharmacotherapy of Hypertension
No ratings yet
Pharmacotherapy of Hypertension
52 pages
EnglishFile4e Intermediate Plus TG PCM Comm 3B
No ratings yet
EnglishFile4e Intermediate Plus TG PCM Comm 3B
1 page
Relational Algebra and SQL
No ratings yet
Relational Algebra and SQL
68 pages