Software Effort Estimation Based On Use Case Reuse (Back Propagation)
Software Effort Estimation Based On Use Case Reuse (Back Propagation)
https://doi.org/10.22214/ijraset.2023.50822
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
Abstract: Project managers often use effort estimating strategies to manage the human resources of current or upcoming
software projects. Prior to project implementation, cost, time, and personnel estimation are basically necessary. For every project
of software, getting accuracy in Effort Estimation has always been difficult. In this study, the estimation of software
development effort was determined using a back propagation model. This model's goal is to investigate the capabilities and
potential uses of Utilizing artificial neural networks (ANN) as a tool for forecasting the effort required for software development.
In order to estimate the software work, we are attempting to implement a machine learning technique in this research. Out of all
machine learning methods, we are applying an algorithm based on Artificial Neural Networks that is Back propagation. The
Desharnais dataset, a well-known publicly available dataset for estimation of software effort, is used to test the approach. The
performance and accuracy of the tested model have been evaluated using three metrics: MMRE, MRE, and Pred (0.25). In the
sections below that follows, I explain the algorithm and its results.
Index Terms: Artificial Neural Network, Software Effort Estimation (SEE), Machine Learning, Back propagation.
I. INTRODUCTION
The action of estimating a time necessary to create software iscalled effort estimation. Estimating the work is a critical job in
the software industry. To produce accurate estimates, manycomputational models have been developed. Initial estimates without
a clear understanding of the needs are inaccurate, but as the project advances, estimate accuracy increases. Choosing the right
estimating technique is crucial as a result. Estimates of the effort can be utilized as input in budgets, investment plans, iteration
plans, and project plans evaluations, pricing methods, and bidding rounds. Since at least the 1960s, The problem of software
development effort estimation has been addressed byresearchers and practitioners in the field of software. project. The biggest
difficulty in project scheduling in the software industry is deciding how much of the project's resources should go toward the
testing phase. It has been discovered that the testingphase often uses between 40% and 50% of the resources.
Estimating the specific amount of work that has to be put into the testing phase is quite difficult, though. As a result, the project
planning is flawed. Inadequate testing of a project could cause the company to suffer severe losses. The study's primary focus has
been on creating formal models for estimating software effort. Software effort estimation has been investigated using a range of
methodologies, supporting vector regression (SVR) [4], radial basis function (RBF) neural networks [1], bagging predictors [9], and
more modern machine learning techniques, such as the COCOMO [12] and COCOMO [12]. Machine learning techniques create
models using data from previous work., which are then used to predict how much work will go into upcoming ventures. The vast
majority of techniques for calculating software effort only offer estimates [1][4][7][9][10]. However, in addition to the estimate, it
would be important to include estimates accuracy measures [10]. As a result, an estimation technique would be able to give a range
of accuracy for where the effort would fall.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2858
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
A. Dataset (Desharnais)
Jean-Marc Desharnais produced this dataset in 1988 [4]. It is one of the earliest datasets for SDEE. It has so been utilized in
numerous empirical research, including [10] [7] [9]. Dataset with 81 rows and 12 attributes which is of real software projects data
from a software company called canadian make up this dataset. Based on their technological environments, these 81 projects have
been divided into three subgroups: the conventional environment (46 projects), the "improved" traditional environment (25
projects), and the micro environment (10 projects). There are 12 features total in each project, nine of which (Team Experience,
Manager Experience, Length(months), Entities, Transactions, Adjustment, Points Non Adjust, , Points Adjust, and Language(1,2,3))
are independent, and one of which (Effort) is dependent.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2859
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
Mathematical model
1) Equation 1 shows the results of the hidden layer's calculations:
( ) = [∑ ( ) ( )θj
=1
The quantity of input neurons is depicted by the variables which, n, j stands for number of the hidden layer, and wij, which stands
for the estimated mapped weights between the variables Journal of Computer Science, 2019, Volume 15(3), Page 321.331, by
Abdel Karim Baareh. A threshold value is a value that determines whether data is considered to be valid the output layer is reached
after passing through the hidden layer and the input layer.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2860
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
2) Equation 2 shows how the sigmoid function was put into practice:
3) Equation 3 shows the calculated output of an output layers: yk(p)=sigmoid[[∑j=1xjk(p)wjk(p)−θj] where m is the quantity of
input neurons in back propagation and k is the output layer
4) The Gradient of Errors obtained coming from the output layer is represented by Equation 4:
δk(p)=yk(p)[1−yk(p)]ek(p)
Supervised learning is a technique used by neural networks to produce output vectors from input vectors on which the network is
based. If the result does not match the created output vector, it compares the generated output to the favoured results, and creates
an errors report. It is the weights then made adjust in accordance with an bug report to produce the results you want.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2861
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
Multiple linear regression model is a statistical tool that uses two or more independent variables to predict the outcome of a
dependent variable. Analysts can use the technique to determine the total variance of the model as well as the proportional
contributions of each independent variable.
This regression analysis model enables higher variance and accuracy when it comes to outcome prediction as well as analyzing the
impact of each explanatory variable on the model's overall variance due to the many variables, which can be linear or nonlinear.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2862
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
When the Multiple linear regression model involves two or more independent variables and one dependent variable. is used to
assess the connection between them. When seeking information, you can utilize the multiple linear regression model to:
1) How closely two or more independent variables are related. are positively correlated with one dependent variable (for example,
the amount of rainfall, temperature, and Adding fertilizer impacts crop growth).
2) Value of the dependent variable in relation to a specific of the independent variables' values (For instance, the crop's
anticipated yield at specific levels of temperature, rainfall, and fertilizer application).
3) Considering the most relevant attributes, the effort values are calculated. The predicted value of Effort is compared with the
actual effort value for each of the output of the algorithmic model corresponding to its input. Each time the weight vector, is
updated and the output is computed again and compared. The graph of predicted vs actual values
4) The estimated effort is mainly dependent on the attributes which are present in the dataset. Whenever we calculate the
estimated effort we will compare it with the actual effort which is present in the dataset the difference is MRE here.
IV. RESULT
Firstly, only the attributes impacting the effort are considered and rest of the attributes are dropped based on the correlation matrix.
The heatmap of the values of attributes are-
Figure 7: heatmap
The results revealed that the NFR feature "Language" is the most statistically significant attribute, and that removing it from the
regression model's training leads in a 25 percent rise in MMRE.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2863
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
The next statistically significant attribute was "Envergure," which was followed by the less statistically significant attributes "Team
Exp" and "Manager Exp."
Software effort and the attribute "Team Exp" have a negative correlation, whereas "Manager Exp" has a positive correlation.
The findings also demonstrated that the MMRE error increases by100% when the quality standards attributes are disregarded, and
prediction models are created exclusively using software size asan independent.
In Figure 7, Representation of training and validation loss graph is shown. It is observed that there is higher validation loss than
training loss, it represents that my model is overfitting. It learns superstition which means that patterns in my datasets of training
data but in reality, it’s not the case and hence it is not true for validation data.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2864
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
The MMRE and MDMRE accuracy of the trained model by thevalidated datasets are-
V. CONCLUSION
This system designed is based on various neural networktechniques that have been used to effort estimation. Every technique
aims to provide the most accurate software effort estimation. In this research, we suggest that a good method for calculating
software development effort is the back propagation algorithm, which is an ANN model. It was advised to adopt the back
propagation strategy, which will quickly propagate errors, for complex and computationally intensive tasks, where the outcomes of
the multilinear regression were contrasted however, it is necessary to assess the approaches' correctness since they are mostly
needed in software work estimation. We found that neuron-based models are more accurate estimators and can therefore be utilized
to determine software effort estimation for alltypes of projects.
REFERENCES
[1] S.G MacDonell, “A comparison of modeling techniques for software development effort prediction” ,in :Proceedings of the International Conference on Neural
Information Processing and Intelligent Information Systems ,Dunedin, New Zealand, Springer, Berlin, 1997, pp. 869–872.
[2] K. Strike, K. El-Emam, “Software cost estimation with incomplete data”, IEEE Transactions on Software Engineering, 27(10) 2001.
[3] A. C. Hodgkinson, and P. W. Garratt, “A neuro fuzzy cost estimator”, in: Proceedings of the Third International Conference onSoftware Engineering and
Applications—SAE 1999, pp. 401–
[4] C. Kirsopp, M. J. Shepperd, and J. Hart, “Search heuristics, casebased reasoning and software project effort prediction",Genetic and Evolutionary
Computation Conference (GECCO), New York,AAAI, 2002.
[5] X. Huang, D. Ho, J. Ren and L. F. Capretz, "Improving the COCOMO model using a neuro-fuzzy approach". Applied Soft Computing, vol. 7, pp, 29-40,
January 2007.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2865
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
[6] P. L. Braga, A. L. I. Oliveira, G. H. T. Ribeiro and S. R. L. Meira. “Bagging predictors for estimation of software project effort”. In: IEEE/INNS
International Joint Conference on Neural Networks, IJCNN’2007, Orlando Florida.
[7] R. Babuska ,Fuzzy Modeling for Control, Kluwer AcademicPublishers, Dordrecht, 1999
[8] A. A. Porter and R. W. Selby. Evaluating techniques for generating metric-based classification trees. J. Syst. Softw., 12(3):209–218, July1990
[9] Krishnamoorthy Srinivasan and Douglas Fisher. Machine learning approaches to estimating software development effort. IEEE Trans. Softw. Eng., 21(2):126–
137, February 1995.
[10] Sutton S Informed projection Proceedings of the 2018 InternationalConference on Software and System Process, (76-85).
[11] Tanveer B, Vollmer A and Braun S A hybrid methodology for effort estimation in Agile development Proceedings of the 2018 International Conference on
Software and System Process.
[12] Boehm B Software cost estimation meets software diversity Proceedings of the 39th International Conference on Software Engineering Companion. A. A.
Porter and R. W. Selby. Evaluating techniques for generating metric-based classification trees. J. Syst. Softw., 12(3):209–218, July1990.
[13] Barry W. Boehm, “ Software Engineering Economics”, prentice– Hall , Inc., Englewood Cliffs, New Jersey 07632, 1981, ISBN0 – 13 –822122 – 7.[7]
[14] S. H. Lee, H. Goëau, P. Bonnet, and A. Joly, "New perspectives on plant disease characterization based on deep learning," Comput. Electron. Agricult., vol. 170,
Mar. 2020, Art. no. 105220.
[15] Z.-W. Hu, H. Yang, J.-M. Huang, and Q.-Q. Xie, "Fine-grained tomato disease recognition based on attention residual mechanism," J. South China Agricult.
Univ., vol. 40, no. 6, pp. 124132, Jul. 2019.
[16] Rahul Kumar Yadav,Dr. S. Niranjan,Software Effort Estimation Using Fuzzy Logic: A Review, International Journal of Engineering Research & Technology
(IJERT) Vol. 2 Issue 5, pg.1377-1384, May2013. ISSN: 2278-0181.
[17] ImanAttarzadeh and Siew Hock Ow, Software Development Effort Estimation based on a New Fuzzy Logic Model,International Journal of Computer
Theory and Engineering, Vol 1. No. 4 October2009.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2866