0% found this document useful (0 votes)
25 views

COSMIC FunctionalSizeClassification

This document discusses using deep learning approaches to classify software processes into COSMIC functional sizes in agile development. It introduces RE-BERT, a domain-specific language model pretrained on requirement engineering texts. Various deep learning classifiers using RE-BERT are developed and experimentally evaluated for their ability to classify processes into COSMIC sizes. The results show that RE-BERT Sequential Classifier achieves 78.97% accuracy, outperforming other classifiers. Domain-specific pretraining is found to improve performance over generic language models for this downstream task of functional size classification in agile development.

Uploaded by

Samuel temesgen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

COSMIC FunctionalSizeClassification

This document discusses using deep learning approaches to classify software processes into COSMIC functional sizes in agile development. It introduces RE-BERT, a domain-specific language model pretrained on requirement engineering texts. Various deep learning classifiers using RE-BERT are developed and experimentally evaluated for their ability to classify processes into COSMIC sizes. The results show that RE-BERT Sequential Classifier achieves 78.97% accuracy, outperforming other classifiers. Domain-specific pretraining is found to improve performance over generic language models for this downstream task of functional size classification in agile development.

Uploaded by

Samuel temesgen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

COSMIC-Functional Size Classification of Agile

Software Development: Deep Learning Approach


Abstract – Knowing the size of a software early (i.e., further-pretrain a generic BERT model over requirement
requirement stage) helps to manage software projects ahead engineering domain texts and produce a new domain-specific
of resources, especially in agile approach. With the pre-trained model called RE-BERT. Using RE-BERT, we
proliferation of agile software industries, large number of develop deep learning classifiers (RE-BERT Seq., BASE-
requirements are not clearly defined at the early phase of the BERT Seq., BASE-BERT-LSTM, RE-BERT-LSTM, BASE-
software development and are left unmeasured, this leads to BERT-Bi-LSTM, RE-BERT-Bi-LSTM) for conducting
inaccurate size and effort estimations and in turn failure of COSMIC-based functional size classifications. The
software projects. In addition, it is challenging to apply experimental results show that RE-BERT Seq. Classifier
Common Software Measurement International Consortium provides 78.97% prediction accuracy, which is better among
(COSMIC) standard in agile developments. This is because other classifier models (RE-BERT-LSTM, RE-BERT-Bi-
COSMIC needs strict formalization of requirements, whereas LSTM, BASE-BERT LSTM, BASE-BERT-Bi-LSTM, and
agile relies on less formal specifications. Having a formal or BASE-BERT Seq. Classifier). Overall, RE-BERT-based
informal functional process or user story description, classifiers provide a 1.40 to 4.80% average improvement over
COSMIC-based estimation divides this process description in BASE-BERT-based Classifiers. Moreover, the experimental
to four data elements (entry, read, write, and exit) and finally results show that domain-specific pre-trained models have a
counts the number of each data element so that the size of such promising effect on improving the performance of machine
process description will be determined (small, medium, large, learning or deep learning models towards a particular
and complex). By exploiting the advantages of COSMIC on downstream task in that domain (in our case, functional size
agile methods, in this study, we develop domain-specific classification task).
vocabularies through domain-specific pre-trained model for
classification of COSMIC-functional sizes in agile Keywords - BERT, COSMIC, Domain-specific pre-training,
developments. We employ an experimental research Downstream Task, Agile Development, RE-BERT, Functional
methodology for implementing our proposed approach. We Size Classification

I. INTRODUCTION
Requirement-level software size measurement and judgment when counting code segments, applicable in a
estimation at the early phase of a software development later stage of the software development life-cycle, source
helps organizations and agile practitioners to have better instructions vary with programming languages,
project plan ahead of resources [1]. Source Lines of Code implementation methods, and with the programmer’s ability
(SLOC) is the most widely used metric for determining the [3].
size of a software application [2]. SLOC relies on human
in Fig. 1, these data movements are Entry (E), Exit (X),
Read (R) and Write (W). One data movement is mapped to
To alleviate the limitations of LOC, the software
one COFMIC Function Point (CFP). So, a functional
industry has developed a standardized function point based
process comprising three data movements has 3 CFP size.
functional size measuring approach called Common
Consider a functional process “the user selects an exam
Software Measurement International Consortium
from the list in the home screen and clicks on the button
(COSMIC) [4]. COSMIC measurement considers the size
‘update’ to update its data in the database. The system
of each of the functional processes independently. The size
provides error/confirmation messages”. The total number of
of a functional process is measured by counting four
data movements is 6 CFPs, which is shown in Fig. 2.
COSMIC elements also called data movements. As shown

FIGURE 1: COSMIC PROCESS FIGURE 2: COUNTING COSMIC DATA MOVEMENTS

Even though COSMIC has provided effective functional little or no work has been done on investigating COSMIC
size measurement approach, modern software development functional size estimation in agile software environments.
environments such as agile-based software development So, by exploiting the advantages from both sides (COSMIC
methods are not suitable for such functional size and agile), effective COSMIC-based functional size
measurement schemes [5]. To the best of our knowledge measurement and estimation can be developed for agile
1
developments. The work described in this paper downstream task. Using RE-BERT as a feature extraction
investigated new pre-trained language model called RE- and embedding model [9], deep learning models such as
BERT, which was further pre-trained with generic BERT LSTM [6], Bi-LSTM [7], and BERT Sequential
pre-trained model over requirement engineering domain Classifier[8] were used for conducting experiments on
texts. Using RE-BERT and deep learning classifier models, functional process classification.
we performed COSMIC functional process classification
The intended research questions that could be answered pre-trained domain-specific model outperform the generic
after conducting the experiments are: - RQ1. To what extent BERT model on the performance of the classification?
does the newly pre-trained model represent requirement RQ3. Which classifier model (among all classifiers used)
engineering domain vocabulary compared to the generic performs better?
BERT language model? RQ2. Does the use of the newly

II. LITERATURE REVIEW


A deep learning-based model for software effort two techniques. The results of the GWO and SB algorithms
estimation in agile projects using story points was proposed were compared to five different meta-heuristic techniques
by [10] Multiple Machine-Leaning (ML) algorithms were for software effort estimation that have been published in
investigated and used for train a large story point dataset. the previous literature. The findings of the experiment
The dataset was obtained from 16 publicly available showed that the GWO has a clear advantage in terms of
projects. The study took advantage of natural language estimation accuracy over optimization algorithms which
processing methods for extracting effective features of were employed in the previous literature. The study
requirements written in the form of user stories. In addition compared the proposed work with previous work on
to the employed deep learning algorithms, the study has also employing DNN for software effort estimation, the
implemented the traditional Random Forest ML algorithm suggested DNN model (GWDNNSB) generated improved
as it has been investigated that it is a powerful model for results by using meta-heuristic techniques for initial weights
software effort estimation. The study carried out with seven and learning rate selection.
different deep learning regressors to make comparisons
[12] proposed a COSMIC FP method that employed
between the algorithms and evaluate the story points-based
Artificial Neural Network (ANN) based Orthogonal Array
effort prediction accuracy for each algorithm. The
for building effort estimation models. The major
experiments showed that the deep learning and pre-trained
contribution of this study was identifying the effect of input
word embeddings techniques has promising results when
values of the COSMIC FP on the change of Magnitude
compared to traditional effort estimation regressors such as
Relative Error (MRE). Two simple Neural Network
Random Forest. From the experiments, the author
architectures were developed. The study used different
concluded that the larger the volume and variety of samples
values of the International Software Benchmarking
in the corpus used, the higher the performance of the models
Standards Group (ISBSG) and other datasets in the
and good story point estimates can be achieved.
experiment. The minimum MRE to evaluate these
Khan investigated a Deep Neural Network (DNN) based architectures considering the cost effect function, the type
effort estimation model based on meta-heuristic algorithms of data used in the training, testing, and validation of the
[11]. The meta-heuristic algorithms Grey Wolf Optimizer proposed models, was used. They design fuzzification
(GWO) and StrawBerry (SB) were used to provide a logical followed by clustering technique in order to obtain seven
and acceptable parametric model for software effort different datasets. Comparisons made on the results
estimation. A set of nine benchmark functions with large obtained from experiments on each dataset has achieved
dimensions were used to validate the performance of these excellent reliability and accuracy.
major phases. Data collection and analysis, domain-specific
III. METHODOLOGY
further pre-training, and development of functional process
We employed experimental research methodology [13], classifier models. The high-level architecture of the
that involves the investigation, design, experimentation, and proposed approach is shown in Fig. 3.
evaluation of COSMIC-based functional size classification
in agile context. The proposed approach contains three

2
FIGURE 3: PROPOSED APPROACH

Data collection and analysis: - Because there was no In order to increase volume of train-test data, we
publicly available historical data in the form required by our conducted COSMIC-based functional size measurement
method, the first step was developing datasets by collecting using human experts and Scope Master tool. 15000
different software project artifacts, such as agile user stories, requirements and user stories were measured using Scope
and requirement specifications from software projects of Master and 3000 with human experts. A total of 21, 990
various domains. After collecting the data, train-test and measured and more than 400, 000 unmeasured user
pre-training corpuses are prepared. We divided the major requirements and user stories were collected. The
sources of datasets as Functional User Requirements and unmeasured datasets were used for performing further pre-
User Stories. A total of 91,941 user stories and 8345 training and the measured datasets for performing training
functional user requirements were extracted from different the classifier and classier models. After removing outliers,
software repositories [14] such as Kaggle , PROMISE, the final set of data ready for model training were 20371.
PURE, ZENODO, Jira, and COSMIC websites and forums Due to the capacity of the running machine and available
[15]. From these collected user data, 6990 were found time, we forced to reduce the pre-training data to 204,027.
measured by following COSMIC principles. Table 1 shows the summary of extracted data (both train-
test and pre-training).

TABLE 1: DATASET SUMMARY

Details Total
Corpus group: Experts 3,000
Train-test Measu ScopeMaster 12,000 21,990
red by Previously 6,990
Corpus group: Unmeasured 78,296
Pre-train 356,286
Stack overflow 57,600
Training data/used without label 21,990

Miscellaneous 198400
#of outliers 1619 #of extracted pre-training dataset 356,286
#of train-test 20371 #of used pre-training dataset 204,027

Experiment and evaluation: - In order to build efficient generic BERT-base pre-trained model over requirement
model, we performed effective preprocessing on textual engineering corpus that contains about 200,000 requirement
requirements of both train-test dataset and pre-training and user story texts.
corpus. We performed basic data cleaning tasks such as
The sentences contain a variety of word lengths ranging
lower casing, removal of punctuations, URLs and HTML
from 7 to 250 words. The further pre-training took 6 days
tags, correction of spellings, stemming and lemmatization.
and 14 hours on a single Core-i9 CPU with 128 GB memory
After the data were cleaned, we tokenized and extracted
for 5 epochs. We tested and compared the results of the
textual features using the newly pre-trained RE-BERT word
newly pre-trained model against the generic BERT model.
embedding model. The first phase of the proposed approach
The pre-trained model (RE-BERT) achieved 74.60%
was creating a domain-specific pre-trained model in the area
objective prediction accuracy and 86.72% training
of requirement engineering. To do so, we further pre-trained
accuracy. RE-BERT is now available in Hugging Face
3
community at https://huggingface.co/yohannesSM/re-bert deep learning models for performing COSMIC functional
for further investigation of requirement engineering process classification and size estimation downstream tasks.
problems. By using the newly pre-trained model, we build
Functional Process Classification: this is the targeted texts which fall under one of these levels are annotated with
downstream task intended to be addressed in this study. The COSMIC measurement standards (i.e., during data
final set of train-test data with 80:20 train-test split ratio was collection). The trained model helps to predict the class or
used for performing the classification task. Using RE-BERT category of the unseen requirement text. The same set of
as tokenization and word-embedding model, we applied hyperparameter such as epochs, batch-size, and, learning-
deep learning models such as LSTM, Bi-LSTM, and BERT rate, maximum sequence-length, etc., were used for each
Sequential Classifier to conduct functional process deep learning classifier models. Fig. 4b shows the learning
classification experiments. Vector representation of textual curve of classifier models and Fig. 4a shows the overall
requirement together with their granularity level used as an summary of training and validation performances of the
input for the training. The granularity levels are small, classifier models.
medium, large, and complex. Requirement or user story

(a) (b)

FIGURE 4: LEARNING CURVE (a) AND SUMMARY OF EXPERIMENTAL EVALUATION RESULTS (b) OF CLASSIFIER MODELS

The experimental results show LSTM model using 92.73% validation accuracy and 0.18 loss using RE-BERT
BASE BERT provided 88.79 % validation accuracy and model. The BERT classifier model using a BASE BERT
0.48 loss. Whereas, using RE-BERT, it achieved 91.13% achieved 90.52 % validation accuracy and 0.27 loss, and it
validation accuracy and 0.17 loss using. The Bi-LSTM achieved 95.10% validation accuracy and 0.18 loss when
model using a BASE BERT pre-trained model achieved using RE-BERT. Figure 3 shows summary of experimental
90.23 % validation accuracy and 0.35 loss, and it achieved evaluation of all deep learning classifier models.
IV. DISCUSSIONS OF RESULTS LSTM, Bi-LSTM, and BERT Sequential classifier is 4.82%,
2.68%, and 1.40% respectively. RE-BERT Bi-LSTM
The newly further-pre-trained RE-BERT model
achieved moderate improvement over RE-BERT LSTM,
achieved 74.60% objective prediction accuracy. Comparing
this is because RE-BERT has sufficient vocabularies to
the newly pre-trained RE-BERT with the generic BASE
extract semantically similar features from the input domain
BERT on sample sentence pairs and masked word
texts and the bidirectional nature of Bi-LSTM helps to attain
predictins, RE-BERT has a 13.85% average improvement
better feature learning than its counterpart LSTM. Among
over BASE BERT. This shows pre-training towards a
each of the RE-BERT based classifier models, RE-BERT
particualr domain on a particular task can produce efficient
Sequential Classifier achieved higher accuracy than others,
language vocabulary for performing context-based learning
this is because the sequential classifier at the top of RE-
towards specific downstream task in that domain.
BERT helps to process long sequences simultaneously and
As shown in Table 2, The overall average improvement efficiently.
rate of RE-BERT based classifiers over the BASE BERT

TABLE 2: IMPROVEMENT OF RE-BERT OVER BASE BERT CLASSIFIERS

Classifier Models RE-BERT Overall average


LSTM Bi-LSTM BERT improvements of
Classifier Models Test accuracy 76.19 77.02 78.37 RE-BERT over
BASE BERT
BASE LSTM 73.64 3.46% 4.59% 6.42% 4.82%
Bi-LSTM 74.72 1.97% 3.08% 3.0% 2.68%
BERT BERT 76.13 0.08% 1.17% 2.94% 1.40%

4
In general, using a pre-trained model in a specific model on that particular task rather than using a generic pre-
domain (in our case requirement engineering domain) trained model.
provides domain-specific vocabulary for semantical
Contribution: - twofold contributions: The first is adding
relationship and context understanding towards a specific
some knowledge to the science. Since, research
downstream task in that domain.
investigations in the circle of software metrics, requirement
V. CONCLUSION AND RECOMMENDATIONS engineering, and software engineering at large are not well-
matured, future researchers will take advantages of this
Conclusion: - In this study, we conducted COSMIC- work and consolidate it for addressing further software
based functional process classification using domain- engineering problems. The second is deployment of the
specific pre-trained model called RE-BERT and deep models to real world software organizations. Early size
learning models. RE-BERT is used for word embedding, estimations help organizations to have better project plans
and feature extraction, as well as being used as a sequential ahead of resources, and this in turn reduce failure rate of
classifier itself. All the models have been trained and tested software projects.
using both BASE-BERT and RE-BERT pre-trained models.
Each of the classifier models were trained for 25 epochs Future Work and Recommendation: - the dataset used
with same configuration of hyperparameters. The for pre-training were not sufficient to produced much
evaluation results of the classifier models show that RE- efficient and wealthier domain vocabulary and need to
BERT based BERT Sequential Classifier has achieved 1.4- increase its volume, as well as update the vocabulary of RE-
4.8% average improvement over other classifier models. In BERT periodically whenever new technology emerges or
general, the performance of deep learning and NLP models change exists. In the future, we plan to incorporate nature-
towards a particular downstream task can be improved by inspired metaheuristics algorithms for optimal feature
developing and applying a domain-specific pre-trained selection and hyperparameter optimization so that the
performance of our models can be boosted.

REFERENCES Estimation with Deep Learning Model,” IEEE Access, vol. 9,


pp. 107124–107136, 2021, doi:
[8] Q. Liu, M. J. Kusner, and P. Blunsom, “A Survey on Contextual
[1] A. Vetrò, R. Dürre, M. Conoscenti, D. M. Fernández, and M. Embeddings,” 2020.
Jørgensen, “Combining Data Analytics with Team Feedback to [9] F. Murtagh, “Multilayer perceptrons for classification and
Improve the Estimation Process in Agile Software regression,” Neurocomputing, vol. 2, no. 5–6, pp. 183–197,
Development,” Found. Comput. Decis. Sci., vol. 43, no. 4, pp. 1991, doi: 10.1016/0925-2312(91)90023-5.
305–334, 2018, doi: 10.1515/fcds-2018-0016.
[10] U. Urbas, D. Vlah, and N. Vukašinović, “Machine learning
[2] E. I. Mustafa, “SEERA : A software cost estimation dataset for method for predicting the influence of scanning parameters on
constrained environments SEERA : A Software Cost Estimation random measurement error,” Meas. Sci. Technol., vol. 32, no. 6,
Dataset for Constrained Environments,” no. November, 2020, May 2021, doi: 10.1088/1361-6501/abd57a.
doi: 10.1145/3416508.3417119.
[11] M. S. Khan, F. Jabeen, S. Ghouzali, Z. Rehman, and W. Abdul,
[3] A. Sinhal and B. Verma, “International Journal of Advanced “Metaheuristic Algorithms in Optimizing Deep Neural Network
Research in Software Development Effort Estimation : A Model for Software Effort Estimation,” vol. 4, 2021, doi:
Review,” Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol. 3, no. 10.1109/ACCESS.2021.3072380.
6, pp. 1120–1135, 2013.
[12] M. Choetkiertikul, H. K. Dam, T. Tran, T. Pham, A. Ghose, and
[4] A. Corazza, S. D. M. B, F. Ferrucci, C. Gravino, and F. Sarro, T. Menzies, “A Deep Learning Model for Estimating Story
“From Function Points to COSMIC -A Transfer Learning Points,” IEEE Trans. Softw. Eng., vol. 45, no. 7, pp. 637–656,
Approach for Effort Estimation,” vol. 1, pp. 251–267, 2015, doi: 2019, doi: 10.1109/TSE.2018.2792473.
10.1007/978-3-319-26844-6.
[13] V. Mildner, “The SAGE Encyclopedia of Human
[5] S. Ait, “AGILITY & COSMIC FUNCTION POINTS IN JIRA Communication Sciences and Disorders Experimental
(PROJECT MANAGEMENT TOOL FOR AGILE TEAMS) Research,” SAGE Ref., pp. 728–732, 2019.
Saadia Ait Adapted & Translated by Alain Abran,” 2017.
[14] V. Tawosi, A. Al-subaihin, R. Moussa, and F. Sarro, A Versatile
[6] Y. Wu et al., “Google’s Neural Machine Translation System: Dataset of Agile Open Source Software Projects, vol. 1, no. 1.
Bridging the Gap between Human and Machine Translation,” Association for Computing Machinery, 2022.
pp. 1–23, 2016, [Online]. Available:
http://arxiv.org/abs/1609.08144 [15] R. R. Dumke and A. Abran, COSMIC function points: Theory
and advanced practices. 2016.
[7] K. Zhang, X. Wang, J. Ren, and C. Liu, “Efficiency
Improvement of Function Point-Based Software Size

You might also like