0% found this document useful (0 votes)
4 views

An Empirical Study On Software Defect Prediction Using Function Point Analysis

Uploaded by

Jose Castillo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

An Empirical Study On Software Defect Prediction Using Function Point Analysis

Uploaded by

Jose Castillo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)

An Empirical Study on Software Defect Prediction


using Function Point Analysis
Xinghan Zhao1,2 and Cong Tian1,∗
2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS) | 978-1-6654-7704-8/22/$31.00 ©2022 IEEE | DOI: 10.1109/QRS57517.2022.00027

1
Xidian University, Xi’an, Shanxi, China, 710071
2
the 27th Research Institute of CETC, Zhengzhou, Henan, China, 450047
[email protected], [email protected]
*corresponding author

Abstract—The software defect prediction method based on tester’s involvement, the V-model, or the more explicit W-
requirement specification is proposed to address the defect model [32] always be chosen. In W-model, testers’ activity
prediction needs in the requirements phase when the organization tends to shift left [21]. In this model, the configuration testing
adopts the W-model of software development. The theoretical
synthesis presents that the function point and the number of plan starts in the requirements phase of the project. Testers
defects should be positively correlated. The theory’s correctness must decide on test strategies and allocate test resources
is verified by analyzing the correlation between function point and complete the main test case design work before the
and defect distribution of eight software applications. Then, requirements phase is completed. Unlike the Waterfall and V-
the mathematical equations for software configuration testing model, the design and coding work is not carried out then.
defects are derived, and the specific meaning of the equation
is explained. Finally, the shortcomings of this study and the Software defect prediction based on source code or process
subsequent research directions are pointed out. information is impossible.
Keywords—software defect prediction; function point analysis; This study proposes a method for predicting software de-
software configuration testing; IFPUG; fects using software requirement specifications as input. Thus,
the managers can obtain software fault prediction results in
I. I NTRODUCTION the requirement phase, which may be helpful for software
Software defect prediction is one of the key research di- activities such as software quality assurance and test resource
rections in software testing and software engineering. This is allocation. In terms of input parameter selection, function
because failures caused by software defects can have very point for every requirement item are used to calculate which
severe consequences including property damage, monetary requirement item is more likely to have bugs. Because we
loss, or even human casualty [34], [36]. Software defect didn’t find an open-source database that we could use, we
prediction can predict the number of potential software bugs chose eight software projects from our organization’s historical
and their distribution based on the code, documents, con- software repository. These software projects were randomly
figuration process information, etc. The results of software selected. All function point for each requirement item have
defect prediction are very instructive for the organization’s been analyzed with the IFPUG method [16]. The configuration
software testing resource allocation, software product quality tests’ bugs have been selected from the software configuration
judgment, software process quality management, and software testing reports. This study involves 374 functional items,
testing result assessment. Defect prediction methods are gen- 5011.24 functional points, and 233 software defects. The
erally based on the size of software code [2], complexity results show that the probability of occurrence and number
[8], and various design parameters [11], [35]. After the pre- of functional errors of software configuration items positively
processing, fitting, and regression of the inherent properties of correlate with the number of function point of software
software and process information, a targeted prediction model requirements. Then we use the least-squares method with
is formed. The model predicts the likelihood of the distribution curve fitting to calculate a mathematical model for defect
of defects in the target software. risk calculation. The mathematical model can use the function
Software configuration testing is the process of testing point of each requirement as input and calculate the relative
software configuration items’ functionality, performance, and risk level of defects. At last, the differences between the actual
other characteristics. This type of testing is typically performed data and the theory are given, and our insights are provided.
by an organization-level software testing department and al- The subsequent chapters are organized as follows: Chapter
ways occurs when the development team has completed the 2 gives a brief description of the development of defect
code work and unit testing. Configuration testing is crucial prediction and related research works; Chapter 3 introduces
throughout the software life cycle as the last step of the orga- the theoretical basis of this study; Chapter 4 presents the
nization’s software quality control. In the traditional waterfall information of the selected validation software and the calcu-
model [3], the overall activity of configuration testing begins lation of function point, as well as the difficulties encountered
when the source code of SUT (software under test) is fully in data collection and our solutions; Chapter 5 discusses the
developed. The software testing member can use the source relationship between the final function point and defects, gives
code and the process information from the configuration man- the fitting equation and analyze the engineering significance of
agement system to predict the number and the distribution of each parameter of the equation; Then, we give some results of
defects. However, in the present day, to increase the software data distribution in Chapter 6. Chapter 7 explores the reasons

2693-9177/22/$31.00 ©2022 IEEE 167


DOI 10.1109/QRS57517.2022.00027
Authorized licensed use limited to: Univ de Alcala. Downloaded on June 20,2024 at 10:57:14 UTC from IEEE Xplore. Restrictions apply.
for the data distribution in the context of the research. Chapter information from design documents and made a successful
8 describes the follow-up research plan. Chapter 9 shows the defects prediction on Ericsson’s telephone switch software
summary and outlook of this study. [27]. In addition, another research direction is complexity
analysis and defect prediction based on object-oriented (OO).
II. R ELATED W ORKS Basili and Chidamber et al. have published their study results
The original defect prediction methods are mainly based on in this area [4], [6].
lines of code and their complexity-derived metrics. In 1971, After 2000, with the popularity of version control systems
Akiyama proposed a measure of complexity based on lines such as SVN and Git, using process information to predict
of code and used this for defect prediction [2]. Akiyama also defects has become a new direction [24]. Moser et al. build
gave a correspondence between software defects and lines of metrics from records of code revisions, refactoring, and bug
code based on his research sample. In 1973, Ferdinand used fixes, and he illustrates that the model of using change records
the relevant information theory to analyze the complexity of a is much more accurate than the code-based model [22].
system, the number, and the density of defects. It shows that Since 2000, machine learning has become more and more
the larger the system size, the more possible defects [11]. popular. Using machine learning in software defect prediction
There is one obvious problem with using the code line to has become a new research hotspot. In the direction of cross-
predict defects. This method does not include the complexity project prediction, machine learning has helped researchers a
of the software system. Based on Akiyama’s research, Hal- lot in processing and analyzing large amounts of data, and
stead proposed complexity metrics derived from operations many scholars have given their own models for cross-project
and operands in the following years. He calculated the cor- prediction [5], [14], [17], [19], [25], [33]. However, how to
relation between these metrics and defects, and he got the evaluate these cross-project models is also a problem that
result that the correlation coefficient is greater than 0.9 [13]. needs to be faced. Zimmermann [38] and He [14] et al. have
Lipow et al. generalize based on Halstead’s study by proposing done research in this area.
a polynomial correspondence between the LOC and defects There are a lot of studies on software defect predictions,
which coefficients depend on language-related operands [18]. but there is little literature about how to predict defects in the
But Lipow’s conclusions still had some arguments. Gaffney requirement phase. Air force’s Rome laboratory developed a
proposed that the correspondence between defects and LOC model for early software reliability prediction, which is based
is language-independent, and he uses Lipow’s sample to derive on the software requirement specification data collection [20].
a more simplified relational expression independent of the Smidts suggest a reliability prediction model based on the
programming language [12]. requirement changes during the life cycle [31]. Yadav [37]
Unlike the route taken by Halstead et al., who based their et al. propose a software defect prediction model using fuzzy
calculations on the metrics derived from code and operands, logic to solve the problem of early-stage defect prediction,
McCabe used a language-independent, structure-based ap- but he uses not only the requirement information but also
proach to the complexity measure. He utilized a graphical the metrics about the size and historical quality information.
method to calculate the complexity of programs by computing Sangeeta et al. show a failure rate model centered on iterative
the circle complexity of the program’s flowchart and used this software development life cycle [29]. However, all the studies
as the primary metric for defect distribution prediction [8]. above are centered on the defect distribution of the different
These studies, like Halstead and McCabe’s, were prevalent phases of the software development life cycle across programs.
around the 1980s. The effectiveness of these metrics in practice They haven’t answered the question of how to analyze the
has been evaluated and compared [4]. defects distribution across the different functions and how a
Most of the studies in the 1970s and early 1980s focused software manager can predict defects when he gets a new
on the fit of prediction models to existing program data. In requirement specification.
contrast, the performance of the models and their different
parameters were not validated for the new SUT. Shen et al. III. T HEORETICAL A NALYSIS
selected three commercial programs as test subjects to address
this issue and validated the correlation between models and There is no suitable theory or method for predicting de-
metrics using linear regression [30]. Munson et al. stated that fects from software requirement specifications. However, it is
the current regression model yielded inaccurate predictions possible to achieve it indirectly using some relevant research
of the number of defects and proposed the classification of findings. The ideas involved are mainly in the following two
modules into high and low-risk categories to replace the areas.
number of predictions. The classification model obtained 92% Theory 1: The more function point there are, the larger
prediction accuracy on their target software [23]. LOC will be.
The above studies are based on code, and many more studies Function point analysis (FPA) is one of the most mature and
try to establish available metrics in another way. Henry and popular methods of software size prediction today. The main
Kafura et al. established a metric system for module complex- FPA methods are IFPUG [16] and COSMIC [9]. Although
ity based on statements and data flow in design documents the calculation methods of these two are different, both ways
[15]. Ohlsson et al. implemented the automatic collection of calculate function point that are proportional to LOC.

168

Authorized licensed use limited to: Univ de Alcala. Downloaded on June 20,2024 at 10:57:14 UTC from IEEE Xplore. Restrictions apply.
Theory 2: The larger LOC of the software module, the Internal Logic File (ILF): A set of logic-related data or
larger number of BUGs. control information that can be identified by the user and
In the discussion in Chapter 2, whether it is Halstead or maintained within this software. The primary use of an internal
Akiyama, or later Lipow, Gaffney, and other subsequent stud- logic file is to control data through one or more of the
ies, although the final prediction models of their studies differ, software’s basic processes.
their results show that the larger the software size (LOC), the External Interface File (EIF): A set of logically related data
higher the number and likelihood of bugs embedded in the or control information that can be identified by the user and
software modules. referenced by software but maintained by other software. The
By combining theory 1 and theory 2, we can draw the primary purpose of the external interface file is to control data
inference below. references through one or more fundamental processes of this
Inference: The more function point of the software module, software, i.e., the external interface file of one software should
the more probability and larger the number of bugs in the be the internal logic file of another software.
software. External Input (EI): A basic processing of data or control
Using this inference, we can predict the distribution of information that comes unexpectedly from the boundaries of
defects in software by calculating the function point of the this software. The main purpose of external input is to maintain
software requirement specification in the early stage of the one or more internal logic files and (or) to change the behavior
software life-cycle, when only the software requirement spec- of the system.
ification is available. External Output (EO): A basic process of sending data or
controlling information outside the boundaries of this software.
IV. DATA AND M ETHODS
The main purpose of external output is to provide information
A. Target Programs to the user through processing logic or through the retrieval
We did not find a publicly available database about software of data or control information. The process should contain
requirement function point and defects. In order to make the at least one mathematical formula or calculation, generate
experimental data more realistic, we randomly selected eight data everywhere, maintain one or more internal logic files,
commercial programs from the organization’s project database or change system behavior.
for validation. The descriptions of these projects are in Table External Query (EQ): A basic processing of sending data
1. or controlling information outside the boundaries of this
These projects were developed by different teams and have software. The main purpose of the external query is to provide
been completely done unit tests and integration tests, and then information to the user by retrieving data or controlling infor-
handed over to the organization-level software testing depart- mation from internal logic files in external interface files. This
ment to complete the configuration testing. The requirements processing logic does not contain mathematical formulas or
of all projects were formulated according to the relevant stan- calculations, does not produce exported data, and the process
dards, and all functions were described in natural language. neither maintains the internal logic files nor changes the
All projects have been in operation for more than one year, system behavior.
and no escape defects were found during the operation that 3. Determine the weighting factor
was not detected by the configuration tests. The functions are divided into different levels according to
high, average, or low. The level is determined by the number
B. FPA method
of data element types and the number of record element types
Nowadays, the most widely used FPA methods are IFPUG or the number of reference file types involved in a particular
and COSMIC [10], [26]. Although COSMIC is simpler than function together. Different functional units are involved in
IFPUG, we still use IFPUG because it is the recommended different levels with different corresponding weights.
method in our organization. According to the related research 4. Calculate the number of unadjusted function point
[1], [7], [28], the difference between the results using IFPUG The number of external inputs (EI), external outputs (EO),
and COSMIC methods is not much for the final prediction external queries (EQ), internal logic files (ILF), and external
results. Since the difference brought by their methods is within interface files (EIF) are multiplied by their corresponding
our tolerance range, it does not affect our research results. weighting factors, and then the products are added together,
The IFPUG method is with the following steps. and the result is the number of unadjusted function point
1. Analyze user functional requirements The functional (UFP).
requirements are identified by analyzing the documentation
related to the software requirements. In this study, the func-
tional requirements do not include performance, quality, and U F P = NEI ∗ θEI + NEO ∗ θEO + NEQ ∗ θEQ + . . . (1)
environmental requirements.
2. Decompose functional requirements 5. Determine the adjustment factor
Decompose the requirement entries according to the func- Each function was analyzed for system impact according
tional unit in Table 2, down to the smallest functional unit to fourteen system characteristics: data communication, dis-
possible. tributed data processing, performance, system configuration

169

Authorized licensed use limited to: Univ de Alcala. Downloaded on June 20,2024 at 10:57:14 UTC from IEEE Xplore. Restrictions apply.
Table 1. Experimental Validation Software
Index Software Name Software Type Platform Language Size
P1 Display and control software Non-Embedded QT C++ Medium
P2 Photoelectric tracking software Embedded IAR C Small
P3 Resource allocation management software Non-Embedded Eclipse JAVA Medium
P4 Data census analysis software Non-Embedded QT C++ Medium
P5 Data customization platform software Non-Embedded Eclipse JAVA Medium
P6 Data processing and forwarding software Embedded Keil C Small
P7 Information processing and control software Non-Embedded VS 2015 C# Medium
P8 Control software Non-Embedded VS 2015 C++ Small

Table 2. Functional Unit


Data Functions Operation Function process or logical relationship, or when the process is complex
Internal logic files External Input according to our judgment, we will consider making the ILF
External interface files External Output level to medium or high, so this ILF’s function point would
External queries
be larger. Vice versa. If the requirements clearly state that the
internal logic is divided into several parts or the processing
is divided into several steps, we will divide the ILF into
requirements, processing rate, online data entry, end-user effi- several parts according to the description of the requirements,
ciency, online updates, complex processing, reusability, ease of otherwise, they will all be processed into one ILF.
installation, ease of operation, multiple workplaces, and ease 2. Treatment of adjustment factors
of change, and each system characteristic was scored on a scale The adjustment factor in the IFPUG method requires the es-
of 0 to 5 for system impact, where 0 indicates no impact and timation of 14 factors. However, in the actual implementation,
5 indicates strong impact. After that, the total impact degree we found that a large part of the impact factors recommended
was obtained by adding up all the system characteristics, and by IFPUG was not suitable for our project. So we did not
the value of the adjustment factor (VAF) was calculated by follow the method recommended by IFPUG in the selection
equation 2. of impact factors but based on the understanding of the project
n
X requirement. We made an overall estimate of the adjustment
V AF = 0.65 + ( Ni /100) (2) factor VAF directly. We estimated a value between 0.65 to 1
i=1 for the VAF by judging the requirement in terms of perfor-
Where n is the number of system performance characteris- mance, reliability, fault tolerance, and criticality. We should
tics (not limited to 14) determined based on the actual impact, notice that this method may cause a decrease in estimation
and Ni is the degree of influence of the ith influence factor. accuracy compared to the recommended method of IFPUG,
6. Calculate the number of delivered function point but it would greatly reduce the estimation effort of VAF.
Multiplying the unadjusted function point (UFP) and the 3. Handing of interfaces, performance, and other non-
adjustment factor (VAF) yields the IFPUG function point (FP). functional requirements
Although the interface specification description is often a
F P = U F P × V AF (3) very important section in the requirement document, there is
no specific treatment for interface requirements in IFPUG.
C. Difficulties and Solutions in Implementation In the target projects we selected, the interface requirements
In the actual implementation process, we also found some are all external interfaces, so we merged all contents in the
difficulties in the implementation of the IFPUG method, the interface requirement specification into the relevant functional
details, and solutions of which are described as follows. requirement entries and treated them as external inputs (EI) to
1. Function point calculation for data-related functional the functional requirement units.
units The IFPUG approach does not give an explicit treatment
The IFPUG calculation method involves five main param- on performance and other non-functional requirements. We
eters, which are external input (EI), external output (EO), choose to reflect such elements in the adjustment factors. If
external query (EQ), internal logic file (ILF), and external a functional unit has performance or other non-functional re-
interface file (EIF). EI, EO, and EQ are relatively easy to quirements, such as special safety and reliability requirements,
analyze, but ILF and EIF may be highly subjective if the con- we adjust its adjustment factor (VAF) by adding a value of 0.05
clusion is drawn only from the textual descriptions in software to 0.1 to reflect the requirement.
requirement specifications. For example, as a certain complex 4. Statistical methods for software defects
logic control information, we can divide it into several small We use projects that have passed the organization-level
ILF, or we can calculate it to one larger ILF. However, the testing and have operated for a long time to ensure that
final result about FP would be somewhat different. there are no obvious remaining defects in the software. Our
We made some adaptations for this situation when we defect statistics are derived from the organization’s software
calculate the function point. For ILF, when the requirement configuration test report. Our program defects are derived from
specification clearly shows that there is a more complex the organization’s software configuration test reports. The re-

170

Authorized licensed use limited to: Univ de Alcala. Downloaded on June 20,2024 at 10:57:14 UTC from IEEE Xplore. Restrictions apply.
Table 3. Function Point and Defects Information
Software Function items Function point error omissions software is shown in Figure 11 . The horizontal coordinate is
P1 32 578.1 19 4 the index of the function items, the vertical axis is the function
P2 12 196.8 2 0 point, the red vertical axis on the left is the function error
P3 81 812.95 32 2
P4 62 764.53 34 4 defects, and the right vertical axis is the function omission
P5 94 1264.91 92 0 defects. The blue dots in the figure indicate the function point
P6 20 308.35 3 1 of the corresponding requirement items, the red dots indicate
P7 58 915.4 33 1
P8 15 170.2 6 0 the number of functional error defects of the corresponding
requirement items, and the green dots indicate the number of
functional omission defects of the corresponding requirement
ports only contain the software defects related to functionality, items. In order to represent the relationship between function
performance, reliability, etc. which were found by the software point, function errors and functional omissions more clearly,
testing department at the organization level during the software the requirement point in the figure are reordered from smallest
configuration testing, and do not include the defects found by to largest function point.
the static analysis, code review, unit testing, and integration The relationship between defects and function point in
testing. Figure 1 can support our conclusion in Chapter 3. It is obvious
In the statistics, we also eliminated some low-level errors from Figure 1 that the number and probability of defects in
that existed in the software, because these defects are useless to software increases as the number of function point increases.
analyze the statistical results. For example, in P3 software, all However, we also need to note that it is not always the case
the input controllers in this program did not have a length limit, that more function point will result in defects; there are also
and we intentionally eliminated these defects from statistics. some functions with more function point that do not have
In terms of classifying functional defects, this study clas- defects, and similarly, requirement items with relatively low
sifies defects into two types: functional errors and functional function point also have a certain probability of having defects.
omissions. If a function is not as expected due to incorrect In addition, functional omissions often occur in requirements
design or coding, the defect is classified as a functional error, with fewer function point.
while if human negligence causes a function to be ignored B. Modeling
in whole or in part, the defect is classified as a functional
We use the results of FPA and the defect distributions as
omission.
inputs to generate the predictive model by curve fitting, the
In terms of the number of defects counted, we use different
specific steps are described as follows.
treatments for different defects in the software.
1. Normalize the function point
1) Defects that appear separately in functional tests and The function point of different software vary greatly, so
have no obvious correlation with other defects are we normalize the function point in order to allow horizontal
counted according to the number of defects, and each comparison between software. For the mth requirement, the
regular defect is counted as one defect. However, if there corresponding normalization method is
are several defects of the same kind in one requirement
item, we count them as one defect. But if they are in Fm
fm = P
n
two different requirement items, we count them as two Fi
defects. i=0
2) These items also involve fault-tolerant, environment- Where Fm is the number of function point corresponding
adaptive, and performance-related requirements. For to the original requirement, and fm is the number of function
fault-tolerant requirements, we count the defects int point after normalization.
the corresponding functional requirement items. In the 2. Normalize the software defects
defects statistics of this study, we ignore environmental As the same reason, we use the same method to normalize
adaptability defects and performance defects. the software defects. For the mt h requirement, the normaliza-
3) This study ignored defects classified as documentation tion of defects is
bugs found in all inspections in the software defect
statistics. Em
em = P
n
Ei
V. F UNCTION POINT STATISTICS AND DEFECTS i=0
DISTRIBUTION
Where Em denotes the number of defects corresponding to
A. Function point and defects data the original requirement, and em denotes the normalized value
of defects.
After the calculation of function point, the number of In the statistics here, we did not include functional omission
function point and defects of the software selected for this defects in the calculation of the model generation because
study shows in Table 3.
The distribution of function point and defects of the target 1 The raw data has been published at [email protected]:zhaoxinghan/FPA.git

171

Authorized licensed use limited to: Univ de Alcala. Downloaded on June 20,2024 at 10:57:14 UTC from IEEE Xplore. Restrictions apply.
p1 p2
FP 3.0 2.00 FP
Error 30 Error
50 Omission 1.75 Omission 1.04 0.04
2.5
1.50
25 1.02 0.02
40 2.0
1.25
Function Point

Function Point
Omission

Omission
20

Error

Error
30 1.5 1.00 1.00 0.00

0.75 15
1.0
20 0.98 −0.02
0.50
0.5 10
0.25 0.96 −0.04
10
0.0 0.00 5
0 5 10 15 20 25 30 2 4 6 8 10 12
Index Index

p3 p4
FP 5.0 FP 1.0
70 Error 140 Error
Omission 4.5 1.04 Omission 8
60 120 0.8
4.0
1.02
50 100 6
3.5
Function Point

Function Point
Omission 0.6

Omission
40 80
Error

Error
3.0 1.00
4
30 2.5 60 0.4
0.98
20 2.0 40
2 0.2
10 1.5 0.96 20

1.0 0 0 0.0
0
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Index Index

p5 p6
FP FP
Error 14 Error 14
Omission 0.04 Omission 0.04
80 80
12 12
0.02 0.02
10 10
60 60
Function Point

Function Point
Omission

Omission
8 8
Error

Error
0.00 0.00
40 6 40 6
−0.02 −0.02
4 4
20 20
2 2
−0.04 −0.04

0 0 0 0
0 20 40 60 80 0 20 40 60 80
Index Index

p7 p8
FP FP
Error 14 Error 14
Omission 0.04 Omission 0.04
80 80
12 12
0.02 0.02
10 10
60 60
Function Point

Function Point
Omission

Omission

8 8
Error

Error

0.00 0.00
40 6 40 6
−0.02 −0.02
4 4
20 20
2 2
−0.04 −0.04

0 0 0 0
0 20 40 60 80 0 20 40 60 80
Index Index

Figure 1. Defects Distribution

172

Authorized licensed use limited to: Univ de Alcala. Downloaded on June 20,2024 at 10:57:14 UTC from IEEE Xplore. Restrictions apply.
the number of functional omission defects is small and not The distribution of the normalized relationship in A and the
sufficient to support the mathematical model generation. fitting curve are shown as Figure 2. The horizontal coordinates
3. Form the set of function point and defects are the normalized function point and the vertical coordinates
After normalization, each software will have a normalized are the number of defects after normalization.
set of function point and defects, and then these sets will be
grouped into a universal set A. C. Interpretation of the prediction equation
nj
8 [
[ In equation 5, x represents the proportion of a certain
j
A= (fm , ejm ) function point to the total function point of the software, while
j=1 i=0 y represents the proportion of defects of a certain functional
item to the total number of defects.
Where nj denotes the number of the requirement items
in the j th software, and (fm j
, ejm ) denotes the data pair In the software engineering context, y in equation 5 denotes
th
corresponding to the m function point and defects in the the probability that one certain functional item has bugs. The
j th program. higher the value, the greater the probability of failure and the
4. Construct the required mathematical expression using more defects that may exist.
curve fitting The final value of the coefficient b is calculated as 1.21,
A curve fitting approach is used to construct the most which is named as ‘configuration item maturity factor’. In the
suitable mathematical expression for the set of A, so that the software engineering context, the smaller the value, the slower
error between the mathematical model and the actual value is the growth of defects in the software’s requirement items in the
minimized. case of growing function point (size), which can be interpreted
In the case of independent requirements, there should be a that the software is more stable. Of course, stability here does
linear relationship between the requirements of the software not mean excellent quality.
and the corresponding defects. In another word, if we combine The sum squared residual is 0.61 after fitting in Figure 2,
any two requirement items, the number of defects after their it can be seen that there are some data that still deviate from
combination should be equal to the sum of the number the fit to a greater extent. This is because the appearance of
of defects corresponding to the original two requirements. software defects is not a logical event and can be influenced by
We suppose the fitting function is f (x), the function point the designer’s condition and many external circumstances. Our
corresponding to any two requirement items (x1 , x2 ) and prediction model can only represent a trend and probability,
the number of defects corresponding to them (y1 , y2 ), the but the defect data in the specific software will not exactly
relationship between them should be match the prediction results.

y1 + y2 = f (x1 + x2 ) (4) VI. S TATISTICAL R ESULTS

we assume the form of fitting function is By combining two acknowledged theories, we obtain a the-
oretical inference that the more function point that a function
y = f (x) = ax2 + bx + c requirement item has, the more probability that the item has
defects. We randomly choose 8 empirical programs to verify
Because y should satisfy equation 4, we can get this theory and get some results as follow.
1) The distribution of function point and defects is gen-
a=0 erally in line with the trend of the theoretical conclu-
If a requirement item is empty, the function point should be sion. The probability and number of defects in most
0. requirement items in the configuration test increase as
the number of function point increases.
f (0) = 0 2) However, there are still some function items data which
is not followed this trend. Some requirement items
we can deduce that with large function point have not found defects in the
configuration test, and vice versa.
c=0
So the final form of the fitting function should be VII. I SSUES AND A NALYSIS
A. Function point is not in keeping with the LOC
y = bx
From our statistics, the LOC of the software is basically
we use the least squares pair for fitting, and the value of b consistent with the function point, but not completely consis-
calculated to be 1.21, which corresponds to a residual sum of tent. Specifically, in the case of the same function point, the
squares of 0.61. The prediction model for the defects is scale of embedded software is always larger than the scale of
non-embedded software. From our investigation and analysis,
y = 1.21x (5) the reasons are mainly reflected in two aspects.

173

Authorized licensed use limited to: Univ de Alcala. Downloaded on June 20,2024 at 10:57:14 UTC from IEEE Xplore. Restrictions apply.
0.5

0.4

bug % 0.3

0.2

0.1

0.0
0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200
function point %

Figure 2. Distribution relationship between FPs and defects

1. In terms of support for the development environment, C. The reason why we choose function point as prediction
the embedded development environment is not as supportive input
as the non-embedded development environment. In recent years, since deep learning become widely used,
Non-embedded development environments, such as Java, C defect prediction technology also has a trend that the input
sharp, etc. various standard libraries and algorithm libraries are parameters become more and more complex. Of course, if
more mature, and most of them are integrated with the IDE, more input parameters were used, the result would be more
while most embedded development environments do not have accurate. However, in another way, the costs would be more
the corresponding standard libraries, many basic functions expensive.
need to be implemented by the designers themselves. The FPA is the most accurate program size prediction
2. Embedded software project often contains some other technology so far. Many organizations use FPA as the main
code that is not written by the developers. method to support the development plan. If we use function
The codes of non-embedded software are often closely point as the input parameter to predict defects distribution,
related to the program’s functions, and most of them were this would be no additional costs when the development team
written by the program developers. However, the embedded uses FPA to predict the program size. Thus, it would be easy
software often contains some manufacturers to provide the to spread across the industrial organization.
basic library, such as bootloader and some basic functions of
the package, as well as some general standard library, such as VIII. F UTURE R ESEARCH P LAN
TCP/IP library, etc. These codes are often integrated into the
A. Enlarge study samples
source code and counted into the LOC statistics.
Due to limited resources, there are only eight target pro-
B. The difference between the distribution of defects and grams. Although it shows some basic patterns of defect
function point distribution in statistics and analysis, the fitting error is large
When conducting the data statistics, it was found that and the persuasive power is still lacking. We will add more
although the distribution of function point and defects of programs to the study samples so that the requirement items
the software satisfied the equation 5 in terms of the total and defects would be richer.
statistical tendency, there would be many cases with a large
amount of deviation. For example, in program P4 there were B. Increase the input parameters
9 defects in the function item with not large function point, In this project, only the function point of the requirement
but in several other projects, there were a large number of items were used as the input to the study. The input form was
function point corresponding to a function item with no defects homogeneous. We also found that the shake of the distribution
found. After analysis and verification, the main reasons for this of defects among the fitting curve is still significant. Exploring
phenomenon are as follows. the causes of these shakes and the weights of these factors
1) The degree of unit test coverage within the project team is helpful for software quality improvement of organizations.
2) Difficulty of the processing involved in the function Therefore, the next step should be to add various types of
point corresponding to the requirement item inputs, such as the composition of the development team,
3) The project team has no similar engineering experience historical data, process information from configuration man-
4) It is also possible that the designer was in a god or bad agement systems, etc. Then, machine learning would be used
state at that time to calculate the impact weights.

174

Authorized licensed use limited to: Univ de Alcala. Downloaded on June 20,2024 at 10:57:14 UTC from IEEE Xplore. Restrictions apply.
C. Use natural language processing methods [10] C. Eduardo Carbonera, K. Farias, and V. Bischoff, “Software devel-
opment effort estimation: A systematic mapping study,” IET Software,
Using manual parsing of software requirements and calcu- vol. 14, no. 4, pp. 328–344, 2020.
lating the function point is not only a heavy workload but [11] A. E. Ferdinand, “A theory of system complexity,” International Journal
also prone to errors. The use of natural language processing of General System, vol. 1, no. 1, pp. 19–33, 1974.
[12] J. E. Gaffney, “Estimating the number of faults in code,” IEEE Trans-
(NLP) methods can automatically parse out various types of actions on Software Engineering, no. 4, pp. 459–464, 1984.
entities in the requirement description, which not only greatly [13] M. H. Halstead, “Natural laws controlling algorithm structure?” ACM
improves efficiency but also ensures the correctness of the Sigplan Notices, vol. 7, no. 2, pp. 19–26, 1972, aCM New York, NY,
USA.
parsing process. This is the only method to enlarge the number [14] Z. He, F. Shu, Y. Yang, M. Li, and Q. Wang, “An investigation on
of samples quickly. the feasibility of cross-project defect prediction,” Automated Software
Engineering, vol. 19, no. 2, pp. 167–199, 2012.
IX. S UMMARY AND O UTLOOK [15] S. Henry and D. Kafura, “The evaluation of software systems’ structure
using quantitative software metrics,” Software: Practice and Experience,
By exploring the relationship between the function point vol. 14, no. 6, pp. 561–573, 1984.
of software requirement items and the defects of configu- [16] IFPUG., The IFPUG guide to IT and software measurement. CRC
Press, 2012.
ration items, we explore the basic law that the defects of [17] M. Li, H. Zhang, R. Wu, and Z.-H. Zhou, “Sample-based software
configuration items are consistent with the growth trend of defect prediction with active and semi-supervised learning,” Automated
the number of requirement point, and through the method of Software Engineering, vol. 19, no. 2, pp. 201–230, 2012.
[18] M. Lipow, “Number of faults per line of code,” IEEE Transactions on
least squares, we derive the mathematical model equation of software Engineering, no. 4, pp. 437–439, 1982.
the relationship between function point and defects, elaborate [19] Y. Ma, G. Luo, X. Zeng, and A. Chen, “Transfer learning for cross-
the significance of the parameters of the equation in practical company software defect prediction,” Information and Software Tech-
nology, vol. 54, no. 3, pp. 248–256, 2012.
engineering, and analyze the causes of the problems found
[20] J. McCall, W. Randall, C. Bowen, N. McKelvey, and R. Senn, “Method-
during the calculation and statistics of requirement function ology for software reliability prediction,” Rome Air Development Center
point. (RADC) Technical Reports, RADC-TR-87-171 (Volumes 1 and 2), 1987.
However, we only extracted eight projects as the study [21] S. Miller and D. Firesmith, “Four types of shift left testing,”
CARNEGIE-MELLON UNIV PITTSBURGH PA, Report, 2021.
target due to resource constraints and selected only one input [22] R. Moser, W. Pedrycz, and G. Succi, “A comparative analysis of
parameter of function point. From the final results, although the efficiency of change metrics and static code attributes for defect
the conclusions from the general trend are consistent with prediction,” in Proceedings of the 30th international conference on
Software engineering, Conference Proceedings, pp. 181–190.
our theoretical derivation, the shakes between the fitting curve [23] J. C. Munson and T. M. Khoshgoftaar, “The detection of fault-prone
are a bit larger when applied to specific projects. To achieve programs,” IEEE Transactions on software Engineering, vol. 18, no. 5,
more accurate prediction results, it is necessary to upgrade the p. 423, 1992.
[24] N. Nagappan and T. Ball, “Use of relative code churn measures to
number of data samples for the study and obtain more input predict system defect density,” in Proceedings of the 27th international
parameters as well as various historical and process data to conference on Software engineering, Conference Proceedings, pp. 284–
form a more accurate prediction model. 292.
[25] J. Nam, S. J. Pan, and S. Kim, “Transfer defect learning,” in 2013
35th international conference on software engineering (ICSE). IEEE,
R EFERENCES Conference Proceedings, pp. 382–391.
[1] A. Z. Abualkishik and L. Lavazza, “Ifpug function points to cosmic [26] M. Nasir, “A survey of software estimation techniques and project
function points convertibility: A fine-grained statistical approach,” In- planning practices,” in Seventh ACIS International Conference on
formation and Software Technology, vol. 97, pp. 179–191, 2018. Software Engineering, Artificial Intelligence, Networking, and Paral-
[2] F. Akiyama, “An example of software system debugging,” pp. 353–359, lel/Distributed Computing (SNPD’06). IEEE, 2006, pp. 305–310.
1971. [27] N. Ohlsson and H. Alberg, “Predicting fault-prone software modules
[3] B. Barry et al., “Software engineering economics,” New York, vol. 197, in telephone switches,” IEEE Transactions on Software Engineering,
1981. vol. 22, no. 12, pp. 886–894, 1996.
[4] V. R. Basili, L. C. Briand, and W. L. Melo, “A validation of object- [28] C. Quesada-López, D. Madrigal-Sánchez, and M. Jenkins, “An empirical
oriented design metrics as quality indicators,” IEEE Transactions on analysis of ifpug fpa and cosmic ffp measurement methods,” in Inter-
software engineering, vol. 22, no. 10, pp. 751–761, 1996. national Conference on Information Technology & Systems. Springer,
[5] X. Cheng, G. Zhang, H. Wang, and Y. Sui, “Path-sensitive code Conference Proceedings, pp. 265–274.
embedding via contrastive learning for software vulnerability detection,” [29] K. Sharma, M. Bala et al., “New failure rate model for iterative soft-
in Proceedings of the 31st ACM SIGSOFT International Symposium on ware development life cycle process,” Automated Software Engineering,
Software Testing and Analysis, 2022, pp. 519–531. vol. 28, no. 2, pp. 1–22, 2021.
[6] S. R. Chidamber and C. F. Kemerer, “A metrics suite for object oriented [30] V. Y. Shen, T.-j. Yu, S. M. Thebaut, and L. R. Paulsen, “Identifying error-
design,” IEEE Transactions on software engineering, vol. 20, no. 6, pp. prone software—an empirical study,” IEEE Transactions on Software
476–493, 1994. Engineering, no. 4, pp. 317–324, 1985.
[7] J. J. Cuadrado-Gallego, L. Buglione, M. J. Domı́nguez-Alda, M. F. [31] C. Smidts, M. Stutzke, and R. W. Stoddard, “Software reliability
De Sevilla, J. A. G. De Mesa, and O. Demirors, “An experimental modeling: an approach to early reliability prediction,” IEEE Transactions
study on the conversion between ifpug and cosmic functional size on Reliability, vol. 47, no. 3, pp. 268–278, 1998.
measurement units,” Information and Software Technology, vol. 52, [32] A. Spillner and H. Bremenn, “The w-model. strengthening the bond
no. 3, pp. 347–357, 2010. between development and test,” in Int. Conf. on Software Testing,
[8] B. Curtis, S. B. Sheppard, P. Milliman, M. Borst, and T. Love, Analysis and Review, Conference Proceedings, pp. 15–17.
“Measuring the psychological complexity of software maintenance tasks [33] S. Wang, T. Liu, J. Nam, and L. Tan, “Deep semantic feature learning for
with the halstead and mccabe metrics,” IEEE Transactions on software software defect prediction,” IEEE Transactions on Software Engineering,
engineering, no. 2, pp. 96–104, 1979. vol. 46, no. 12, pp. 1267–1293, 2018.
[9] R. Dumke and A. Abran, COSMIC Function Points: Theory and [34] W. E. Wong, V. Debroy, A. Surampudi, H. Kim, and M. F. Siok, “Recent
Advanced Practices. CRC Press, 2016. catastrophic accidents: Investigating how software was responsible,” in

175

Authorized licensed use limited to: Univ de Alcala. Downloaded on June 20,2024 at 10:57:14 UTC from IEEE Xplore. Restrictions apply.
2010 Fourth International Conference on Secure Software Integration
and Reliability Improvement. IEEE, 2010, pp. 14–22.
[35] W. E. Wong, J. R. Horgan, M. Syring, W. Zage, and D. Zage, “Applying
design metrics to predict fault-proneness: a case study on a large-scale
software system,” Software: Practice and Experience, vol. 30, no. 14,
pp. 1587–1608, 2000.
[36] W. E. Wong, X. Li, and P. A. Laplante, “Be more familiar with our
enemies and pave the way forward: A review of the roles bugs played
in software failures,” Journal of Systems and Software, vol. 133, pp.
68–94, 2017.
[37] D. K. Yadav, S. Chaturvedi, and R. B. Misra, “Early software defects
prediction using fuzzy logic.” International Journal of Performability
Engineering, vol. 8, no. 4, 2012.
[38] T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy,
“Cross-project defect prediction: a large scale experiment on data vs.
domain vs. process,” in Proceedings of the 7th joint meeting of the
European software engineering conference and the ACM SIGSOFT
symposium on The foundations of software engineering, Conference
Proceedings, pp. 91–100.

176

Authorized licensed use limited to: Univ de Alcala. Downloaded on June 20,2024 at 10:57:14 UTC from IEEE Xplore. Restrictions apply.

You might also like