Software Measurement and Software Metrics in Software Quality
Software Measurement and Software Metrics in Software Quality
Abstract
Software measurement process must be a good oriented methodical process that measures,
evaluates, adjusts, and finally improves the software development process. The main
contribution of this work is the easy and extensible solution to software quality of validation
and verification in software develop process. Therefore, we use formal approaches in order to
describe the fundamental aspects of the software. This formalization supports the evaluation
of the metrics or measurement level themselves. We discuss several metrics in each of five
types of software quality metrics: product quality, in-process quality, testing quality,
maintenance quality, and customer satisfaction quality.
Keywords: Software metrics; software quality; software measurement
1. Introduction
Software measurement process must be a good oriented methodical process that measures,
evaluates, adjusts, and finally improves the software development process (Shanthi and
Duraiswamy, 2011). All through the entire life cycle phase, quality, progress, and
performance are evaluating utilizing the measure process (Liu et al., 2008). Software
measurement has become a key aspect of good software engineering practice (Farooq and
Quadri, 2011). Software metrics deals with the measurement of software product and
software product development process and it guides and evaluating models and tools (Ma et
al., 2006).
Metrics are managements of different aspects of an endeavor that help us determine
whether or not we are progressing toward the goal of that endeavor. Many software measures
activities have been proposed in the literature, some of them are (Baumert and McWihinnet,
1992; Hammer et al., 1998; Janakiram and Rajasree, 2005; Loconsole, 2001; Paulk et al.,
1993). Software metrics can be classified under different categories although same metrics
may belong to more than category. Table 1 lists some notable software metrics that are broken
up into five categories (1) Commercial perspective (2) Significance perspective (3)
Observation perspective (4) Measurement perspective (5) Software development perspective
(Farooq and Quadri, 2011)
The definition of a framework for the study of metrics is to examine the influence of classes
of metrics upon each other (Woodings and Bundell, 2001). In this work uses the framework to
discuss the metrics of software quality. Distinguishing between product and process metrics
has now become a well established practice. The attributes of the product supporting services
(Maintenance) influence the mode of usage and degree of acceptance by the customer. A
hierarchy of levels of influences in software production is depicted in Figure 1. Section 2
starts some formal approaches of software measurement in order to describe the fundamental
aspects of the software. It includes functional approach, structure-based approach,
information theoretic approach, and method of statistical analysis. In Section 3, we use the
15
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
fundamental aspects of the software to discuss software quality metrics. Finally, the
conclusions are summed in Section 4.
16
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
2.2.1. Halsted’s software science: A lexical analysis of source code was intended by Halsted
(1977).
The measure of vocabulary: n n1 n2
Halstead defined the following formulas of software characterization for instance.
Program length: N N1 N 2
Program volume: V N log2
V*
Program level: L
V
Where n1 = the number of unique operators
17
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
The best predictor of time required to develop and run the program successfully was
Halstead’s metric for program volume. Researchers at IBM (Christensen et al. 1988) have
taken the idea further and produced a metrics called difficulty. V * is the minimal program
volume assuming the minimal set of operands and operators for the implementation of given
algorithm:
V
Program effort: E =
L
n1 N 2
Difficulty of implementation: D =
2n2
E
Programming time in seconds: T =
S
n1 N2
Difficulty:
2 n2
With S as the Stroud number ( 5 S 20) which is introduced from the psychological
science. Based on difficulty and volume Halstead proposed an estimator for actual
programming effect, namely
Effort = difficulty * volume
2.2.2. Complexity Metrics: The lines of code (LOC) metric have also been proposed as a
complexity metrics. McCable (McCable, 1976) has proposed a complexity metric on
mathematical graph theory. The complexity of a program is defined in terms of its control
structure and is represented by the maximum number of “linearly independent” path through
the program. The formulas for the cyclomatic complexity proposed by McCable are:
V(G)= e - n + 2p
Where e = the number of edges in the graph
n = the number of nodes in the graph
P = the number of connected components in the graph.
According to Arthur (Arthu, 1985) the Cyclomatic complexity metric is based on the
number of decision elements (IF-THEN-ELSE, DO WHILE, DO UNTIL, CASE) in the
language and the number of AND, OR, and NOT phrases in each decision. The formula of
the metric is: Cyclomatic complexity = number of decisions +number of conditions + 1. The
calculation counts represent “the total number of structure test paths in the program” and
“The number of the logic in the program”. Information flow metric describes the amount of
information which flows into and out of a procedure. The complexity of a procedure p definer
as: (Lewis and Henry, 1990).
c p ( fan in * fan out ) 2
Where Fan-in: The number of local flows into a procedure plus the number of global data
structures from which a procedure retrieves information.
Fan-out: The number of local flows into a procedure plus the number of global data
structures from which a procedure updates.
18
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
2.2.3. Reliability Metrics: A varies often used measure of reliability and availability in
computer-based system is mean time between failures (MTBF). The sum of mean time to
failure (MTTF) and mean time to repair (MTTR) gives the measure, i.e.
MTBF = MTTF + MTTR
The availability measure of software is the percentage that a program is operating
according to requirement at a given time and is given by the formula:
Availability = MTTP / (MTTF +MTTE)* 100%
The reliability growth models assume in general that all defects during the development
and testing phases are correct, and new errors are not introduced during theses phases. All
models seem to include some constraints on the distribution of defects or the hazard rate, i.e.
defect remaining in the system.
2.2.4. Readability Metrics: Walston and Fellx (1977) give a ration of document pages to
LOC as
D 49L1.01
Where D= number of pages of document
L = number of 1000 lines of code.
2.2.5. Error Prediction Metrics: Halstead’s program volume as base for her prediction of the
number of error B1 Volume / 3000 found during the validation phase. She also gives in
approximation of total number of error found during the entire development process as
B2 Volume / 750
The approach of Fenton and Pfleeger (1997) is based in the flow graphs based on the
following Dijkstra structure.
The Cyclomatic numbers are V (a) = V (b) =V(c) = V (d) =2 and V (e) = n-1
19
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
Where l is an index over the domain of x and n x is the cardinality of the domain x. H(x) is
the average information per sample measurement object from the distribution. The probability
mass function p is determined by the situation of the executed operators or functions in
software system. The entropy can be very helpful consideration the dynamic software
measurement.
the sample size, and d i is the difference in rank of the i th pair data.
Decision rule: The rs value must exceed a specified threshold, or rs> r, with significance
level .
We reject H 0 , otherwise accept H 0 .
Example B: Mann-Whitency U test
H 0 : A and B population are identical
H1 : There are some different in sample A and B
Test statistic: Let U = min (U1, U2). Let n1 be the size of smallest sample and n2 is the size of
the biggest sample. R1 and R2 are the total ranks of each sample.
Where
n 1 (n1 1)
U1 = n1 n2 + R1
2
n 2 (n2 1)
U2= n1 n2 + R2
2
Critical value: Use the table to find the critical value for the U statistic at 5% level for
sample size n1 and n2.
20
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
For example, a group of subjects had been instructed to solve the same problem (Sort
Experiment) in C++, while another group of subjects had been instructed to solve the same
problem in Pascal. Table 3 is showed as the ranks programming times for the Sorting
Experiment.
R1 = 183.5 (C++) and R2 = 281.5 (PASCAL). Consequently, U1 = 161.5 and U2 = 63.5,
leading to U = 63.5. From Critical values for the Critical Values for the Mann-Whitney table,
If = 0.05, we calculate Uc =64
U= 63.5<(Uc =) 64, we will reject H0.
Therefore, we can conclude the performance for the two languages are different, with H0
rejected at the 0.05 levels.
21
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
12 Sign test for match pairs When one member of the pair is associated with the
treatment A and the other with treatment B, sign
test has wide applicability.
13 Run test for randomness Run test is used for examining whether or not a set
of observations constitutes a random sample from
an infinite population. Test of randomness is of
major importance because the assumption of
randomness underlies statistical inference.
14 Wilcoxon signed rank test for Where there is some kind of pairing between
matcher pairs observations in two samples, ordinary two sample
tests are not appropriate.
15 Kolmogorov-Smirnov test Where there is unequal number of observations in
two samples, Kolmogorov-Smirnov test is
appropriate. This test is used to test whether there
is any significant difference between two
treatments A and B.
Test statistic: Let r denote the number of runs. To obtain r, list the n1 + n2 observations from
two samples in order of magnitude. Denote observation from one sample by x’s and y’s.
Count the number of runs.
Critical value: Consequently, critical region for this test is always one-side. The critical value
to decide whether or not the run of runs are few is obtained from the table. The table gives
critical value r c for n1 and n2 at 5% level of significance.
Decision rule: If r r c reject H0. For sample sizes large than 20 critical value r c is given,
rc 1.96 at 5% level of significance.
2n1n2 2n1n2 (2n1n2 n1n2 )
Where 1 and
n1 n2 (n1 n2 ) 2 (n1 n 2 1)
For example, a group of subjects had been instructed to solve the same problem (Sort
Experiment) in C++, while another group of subjects had been instructed to solve the same
problem in Pascal. For sample pieces of executive time were collected in A (Pascal), and five
22
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
sample pieces of executive time were collected in B (C++). Table 4 shows origin of piece and
ranks. Table 4 is showed as the origin of piece and ranks. Table 5 is showed as the combined
ordered data.
H 0 : A and B population are identical
H1 : There are some different in sample A and B
23
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
(3) Release-origin defects (field and internal) per KCSI ( a measure of development
quality)
(4) Release-origin field defects per KCSI ( a number of development quality per defects
found by customers)
Consider the following hypothetical example:
Initial release of product X
KCSI = KSSI = 50KLOC
Defects / KCSI = 2.0
Total number of defects =2.0 x 50 = 100
Second release of product X
KCSI = 20
KSSI = 50 + 20 (new and changed lines of code) – 4 (assuming 20% are changed)
Line of codes) = 66
Defects / KCSI = 1.8 (assuming 10% improvement over the first release)
Total number of defects =1.8 x 20 = 36
Where wij are the weighting factors of the five components by complexity level (low,
average, high) and xij are the numbers of each component in the application. It is a weighted
of five major components are:
・External input: Low complexity, 3; average complexity, 4; high complexity, 6
・External output:Low complexity, 4; average complexity, 5; high complexity, 7
・Logical internal file: Low complexity, 5; average complexity, 7; high complexity, 10
・External interface file: Low complexity, 7; average complexity, 10; high complexity, 15
・External inquiry: Low complexity, 3; average complexity, 4; high complexity, 6
Step 2: it involves a scale from 0 to 5 to assess the impact of 14 general system characteristics
in terms of their likely on the application. There are 14 characteristics: data communication
distributed function, heavily used configuration, transaction rate, online data entry, end user
efficiency, online update, complex processing, reusability, installation ease, operational ease,
multiple sites, and facilitation of change.
The scores (ranging from 0 to 5) for these characteristics are then summed, based on the
following formulas, to arrive at the value adjustment factor (VAF)
14
VAF 0.65 0.01 ci
i 1
24
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
Where ci is the score for general system characteristic. The number of function points is
obtained by multiplying function counts and the value adjustment factor:
FP FC VAF
Example 3: by applying the defect removal efficiency to the oval defect rate per function
point, the following defect rates for the delivered software were estimated. On Software
Engineering Institute (SEI) capability maturity model (CMM), the estimated defect rates per
function point are follows:
・SEI CMM level 1: 0.75
・SEI CMM level 2: 0.44
・SEI CMM level 3: 0.27
・SEI CMM level 1: 0.14
・SEI CMM level 1: 0.05
3.2.2. Defect Arrival / removal During Testing: The objective is always to look for defect
arrivals that stabilize at a very low level, or times between failures that are far apart, before
ending effort and releasing the software to the field. Some metrics for defect arrival during
testing are:
。Bad Fix defect: defect whose resolution give rise to new defects are bad fix defect. Bad Fix
defect = (Number of Bad Fix defects / Total number of valid defects)*100
25
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
The Putnam estimation model (Putnam, 1978) assumes a specific distribution of effort over
the software development project. The distribution of effort can be described by the Royleigh-
Norden curve. The equation is:
L ck K 1/ 3t d 4 / 3
26
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
This metrics is not a metric for real-time delinquent management because it is for closed
problem only.
Fix quality or the number of defective fixes is another important quality metric for the
maintenance phase.
27
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
28
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
29
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
20 Performance This metric gives the Unit testing Some of the data points
execution data- detail information of Accept testing of this metrics is
client side client side for Response time running users, response
execution. time, throughput, total
transaction per second,
error per second etc..
21 Performance This metric gives the CPU time Some of the data points
execution data- detail information of Memory Utilization of this metrics is CPU
server side server side for time, Memory
execution. Utilization, Data base
connections per second
etc.
22 Performance This metric determine PTE = requirement Some of the
test efficiency the quality of during perform test / requirements of
(PTE) performance testing (requirement during performance testing are:
team in meeting the performance time + Average response time,
requirements which requirement after transaction per second,
can be used as an signoff of performance application must be able
input for further time) * 100% handle performance
improvement. max user load, Sever
stability.
23 Automation This metric gives the Total number of The higher this number,
scripting scripting productivity performed( no. of the higher is the
productivity for automated test clicks, no. of input automation scripting
(ASP) script on which can parameter, no. of productivity.
analyze and draw checkpoint added) /
most effective effort (hours)
conclusion from the
same.
24 Automatic This metrics gives the Total number of test The higher this number,
converge percentage of mutual case automated / the higher is the
test cases automated. Total number of test improvement of the
case automated of quality of the
manual performance editing
D D : Number of defects of this defect type that are detected after the test phase.
DT : Number of defects found by the test team during the product cycle
DU : Number of defects of found in the product under test (before official release)
D F : Number of defects found in the product after release the test phase
D N : Number of defects of this defect type (any particular type) that remain uncovered after the
test phase.
Example 5:
In production, average response time is greater than expected, requirement met during
perform test = 4, requirement not met after signoff of perform test = 1. Consider, average
response time is important requirement which has not met, then tester can open defect with
severity as critical. Performance severity index = (4* 1)/1 = 4 (critical). Performance test
efficiency = 4/ (4+1)* 100 = 80%.
30
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
Example 6:
Some companies use the net satisfaction index (NSI) to facilitate comparisons across
product. The NSI has the following weight factors:
Completely satisfied = 100%, Satisfied = 75%, Neutral, = 50%, Dissatisfied = 25%, and
completely dissatisfied =0.
4. Conclusions
This paper is an introduction of software quality found in the software engineering
literature. Software measurement and metrics help us a lot of evaluating software process as
well as the software product. The set of measures identified in this paper provide the
organization with better insight into the validation activity, improving the software process
towards the goal of the having a management process.
Well-designed metrics with documented objectives can help an organization obtain the
information it needs to continue to improve its software product, processes, and customer
services. Therefore, future research is need to extend and improve the methodology to extend
metrics that have been validated on one project, using our criteria, valid measures of quality
on future software project.
References
[1] L. J. Arthur, “Measuring programmer productivity and software quality”, John Wiley & Son, NY, (1985).
[2] J. H. Baumert and M. S. McWihinnet, “Software measurement and the capability maturity model”, Software
Engineering Institute Technical Report, cMMI/SEI-92-TR, ESC-TR-92-0, (1992).
[3] B. W. Boehm, “Software Engineering Economics”, Englewood Cliffs, NJ, Prentice Hall, (1981).
[4] M. Bundschuh and A. Fabry, “Auswandschatzung von IT-project”, MITP publisher, Bonn, (2000).
[5] K. Christensen, G. P. Fistos and C. P. Smith, “A perspective on the software science”, IBM systems Journal,
(1988), vol. 29, no. 4, pp. 372-387.
[6] M. Dao, M. Huchard, T. Libourel and H. Leblance, “A new approach to factorization-introducing metrics”,
Proceeding of the IEEE Symposium on Software Metrics METRICS, (2002) June 4-7, pp. 227-236.
[7] R. Dumake, M. Lother and C. Wille, “Situation and treads in Software Measurement-A statistical Analysis of
the SML Metrics Biolography, Dumke / Abran: Software Measurement and Estimation, Shaker Publisher,
(2002), pp. 298-514.
[8] S. U. Farooq and A. M. K. Quadri, “Software measurements and metrics: Role in effective software testing”,
International Journal of Engineering Science and Technology, vol. 3, no. 1, (2011), pp. 671-680.
31
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
[9] S. U. Farooq, S. M. K. and N. Ahmad, “Software measurements and metrics: role in effective software
testing”, Internal Journal of Engineering Science and Technology, vol. 3, no. 1, (2011), pp. 671-680.
[10] N. E. Febton and S. L. Pfleeger, “Software Metrics-a rigorous and practical approach”, Thompson Publ.,
(1997).
[11] N. Fenton, P. Krause and M. Neil, “Software measurement: Uncertainty and causal modeling”, IEEE
Software, (2002) July-August, pp. 116-122.
[12] R. T. Futrell, F. Donald and L. S. Shafer, “Quality software project management”, Prentice Hall Professional,
(2002).
[13] T. F. Hammer, L. J. Huffman and L. H. Rosenberg, “Doing requirements right the first time”, CEOSSTALK,
the journal of Defense Software Engineering, (1998) December, pp. 20-25,.
[14] D. Janakiram and M. S. Rajasree, “ReQuEst Requirements-driven quality estimator”, ACM SIGSOFT
software engineering notes, vol. 30, no. 1, (2005).
[15] N. Juristo and A. M. Moreno, “Basic of software Engineering Experimentation”, Kluwer Academic, publisher,
Boston, (2003).
[16] A. Kaur, B. Suri and A. Sharma, “Software testing product metrics-a survey”, Proceedings of National
Conference on Challenges & Opportunities in Information Technology (COIT-2007), RIMT-IET, Mandi
Gobindgarh, (2007) March 23.
[17] T. M. Khoshgoftaar and N. Seliya, “Three-based Software Quality Estimation Models for Fault, Prediction,
“Proceeding of eight IEEE Symposium on Software Metrics (METRICS 2002), (2002) June 4-7, Ottawa, pp.
203-215.
[18] M. Khraiwesh, “Validation measure in CMMI”, World in computer science and information Technology
Journal, vol. 1, no. 2, (2011), pp. 26-33.
[19] A. Kuar, B. Suri and A. Sharma, “Software testing product metrics-A Survey”, Proc. of national Conference
in Challenges & Opportunities in Information Technology, RIMT-JET, Mandi Gobindgarti, (2007) March 23.
[20] S. Lei and M. R. Smith, “Evaluation of several non-paramedic bootstrap methods to estimate Conference
Interval for Software Metrics”, IEEE Trabsactions on Software Engineering, vol. 29, no. 1, (2003), pp.
996-1004.
[21] J. A. Lewis and S. M. Henry, “On the benefits and difficulties of a maintainability via metrics methodology”,
Journal of Software maintenance, Research and Practice, vol. 2, no. 2, (1990), pp. 113-131.
[22] Y. Liu, W. P. Cheah, B. K. Kim and H. Park, “Predict software failure-prone by learning Bayesian network”,
International Journal of Advance Science and Technology, vol. 1, no. 1, (2008), pp. 35-42.
[23] A. Loconsole, “Measuring the requirements management key process area”, Proceedings of ESCOM –
European Software Control and Metrics Conference, London, UK, (2001) April.
[24] Y. Ma, K. He, D. Du, J. Liu and Y. Yan, “A complexity metrics set for large-scale object-oriented software
system”, Proceedings of the Sixth IEEE International Conference on Computer and Information Technology,
Washington, DC, USA, (2006), pp. 189-189.
[25] T. J. McCable, “A complexity measure”, IEEE Transaction on Software Engineering, SE-2(4), (1976)
December, pp. 308-320.
[26] E. E. Millers, “Software metrics SEI curriculum module SEI-CM-12”, Carnegie Mellon University, Software
Technology Engineering Institute, (1988) December.
[27] J. C. Munson, “Software engineering Measurement”, CRC Press Company, Boca Raton London, NY, (2003).
[28] C. R. Pandian, “Software metrics-A guide to planning, Analysis, and Application”, CRC press Co., (2004).
[29] M. C. Paulk, C. V. Weber, S. Garcia, M. B. Chrissis and M. Bush, “Key practices of the capability maturity
model version 1.1”, Software engineering institute technical report, CMU/SEI-93-TR-25vESC-TR-93-178,
Pittsburgh, PA, USA, (1993) February.
[30] R. E. Prather, “An Axiomatic theory of software complexity measure”, The Computer Journal, vol. 27, no. 4,
(1984), pp. 340-347.
[31] B. N. Premal and K. V. Kale, “A brief overview of software testing metrics”, International Journal of
Computer Science and Engineering, vol. 1, no. 3/1, (2011), pp. 204-211.
[32] R. S. Pressman, “Making Software engineering happen: A Guide for instituting the technology”, Prentice Hall,
New Jersey, (1988).
[33] L. H. Putnam, “A general empirical solution to the macro software and software sizing and estimating
problem”, IEEE Transaction on Software Engineering, SE-4 (4), (1978) July, pp. 345-361.
[34] P. M. Shanthi and K. Duraiswamy, “An empirical validation of software quality metric suits on open source
software for fault-proneness prediction in object oriented system”, European journal of Scientific Research,
vol. 5, no. 2, (2011), pp. 168-181.
[35] J. Tian and V. Zelkowitz, “Complexity measure evaluation and selection”, IEEE Transaction on Software
Engineering, vol. 21, no. 8, (1995), pp. 641-650.
[36] D. K. Walace and R. U. Fujii, “Software verification and validation, an overview”, IEEE Software, (1989)
May, pp. 10-17.
32
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
[37] C. E. Walston and C. P. Felix, “A method of programming measurement, and estimation”, IBM Systems
Journal, vol. 16, (1977), pp. 54-73.
[38] T. L. Woodings and G. A. Bundell, “A framework for software project metrics”, Proceeding of the 12th
European Conference on Software Control and Metrics, (2001).
[39] H. Zuse and P. Bolimam, “Using measurement theory to describe the properties and scale of static software
complexity metrics”, SIGPLAN Notices, vol. 24, no. 8, (1989), pp. 23-33.
[40] H. Zuse and P. Bolimam, “Measurement theory and software measure”, Proceeding of the BCS-FACS
workshop, London, (1992) May.
33
International Journal of Software Engineering and Its Applications
Vol. 7, No. 4, July, 2013
34