UNIT III Software Metrics
UNIT III Software Metrics
Definitions
Measure - quantitative indication of extent,
amount, dimension, capacity, or size of some
attribute of a product or process.
E.g., Number of errors
To analyze defects
To determine productivity
Why Measure Software?
Determine the quality of the current product or process
Measured by:
individual
module
during development
Processes
Activities related to production of software
Resources
Inputs into the software development activities
hardware, knowledge, people
Product vs. Process
Process Metrics
Insights of process paradigm, software engineering tasks, work
product, or milestones
Lead to long term process improvement
Product Metrics
Assesses the state of the project
Track potential risks
Uncover problem areas
Adjust workflow or tasks
Evaluate teams ability to control quality
Types of Measures
Direct Measures (internal attributes)
Cost, effort, LOC, speed, memory
Easy to compute
Capability-Maturity assessment
Software Engg. Institute (SEI) proposed a Capability Maturity Model
(CMM) to measure an organization ability to develop quality software
The CMM describes an evolutionary important path from an adhoc,
immature process(dependent on individuals) to a mature, disciplined
process that could be optimized based on continuous feedback.
Classifying software measures
Three types of software entities to measure :-
26
Classifying software measures
Within each class, we have two attributes :-
Internal attributes –
Internal attribute of a product, process or resource are those that can be measured
purely in terms of the entity itself.
These attributes are measured without executing the code.
Internal Attributes can be measured by examining the product, process or resource on
its own.
External attributes –
External attribute of a product, process or resource are those that can be measured with
respect to how entity relates to its environment.
Behavior of the entity is important
Managers want to be able to measure and predict external attributes
However, external attributes are more difficult to measure than internal ones, and are
measured late in the development process
Desire is to predict external attributes in terms of more easily-measured internal
attributes
Processes :-
We can measure…
How long it takes for a process to complete ?
How much it costs ?
Comparison with other processes.
Properties of sub-processes
Internal process Attributes :-
Duration of process or one of its activities
Effort associated with process
Number of requirement changes
Number of coding faults
External Process Attribute :-
Quality
Cost
Stability
Effectiveness
Products :-
We can Measure…
Size of product by measuring the number of pages or number of words it contains.
Assessment of specification in terms of their length, reuse, redundancy and syntactic
Reliability
understandability of document
Size of sub-products and average module size
Internal Product Attributes :-
Size, reuse, redundancy, functionality, algorithmic, complexity, control-flow etc.
External Product Attributes:-
Quality, reliability, usability, maintainability, integrity, reusability, portability. Etc.
Θ(n2): n2
5n2+ 4n + 6
n2 + 5
Θ(log n): log n
log n2
log (n + n3)
SOFTWARE METRICS
Measurement and Scaling
Chapter Outline
1) Overview
2) Measurement and Scaling
3) Primary Scales of Measurement
i. Nominal Scale
ii. Ordinal Scale
iii. Interval Scale
iv. Ratio Scale
4) A Comparison of Scaling Techniques
Chapter Outline
5) Comparative Scaling Techniques
i. Paired Comparison
ii. Rank Order Scaling
iii. Constant Sum Scaling
iv. Q-Sort and Other Procedures
6) Verbal Protocols
7) International Marketing Research
8) Ethics in Marketing Research
Chapter Outline
9) Internet and Computer Applications
10) Focus on Burke
11) Summary
12) Key Terms and Concepts
Measurement and Scaling
Measurement means assigning numbers or other symbols
to characteristics of objects according to certain
prespecified rules.
One-to-one correspondence between the numbers and
the characteristics being measured.
The rules for assigning numbers should be standardized
and applied uniformly.
Rules must not change over objects or time.
Measurement and Scaling
Scaling involves creating a continuum upon which
measured objects are located.
Interval Performance
Rating on a 8.2 9.1 9.6
0 to 10 Scale
15.2 14.1 13.4
Ratio Time to
Finish, in
Primary Scales of Measurement
Nominal Scale
The numbers serve only as labels or tags for identifying
and classifying objects.
When used for identification, there is a strict one-to-one
correspondence between the numbers and the objects.
The numbers do not reflect the amount of the characteristic
possessed by the objects.
The only permissible operation on the numbers in a
nominal scale is counting.
Only a limited number of statistics, all of which are based
on frequency counts, are permissible, e.g., percentages, and
mode.
Measurement Basics
Nominal scale
Most primitive form of measurement – define classes or categories, and
place each category in a particular class or category
Two major characteristics
Empirical relation consists only of different classes – no notion of
ordering
Any distinct number or symbolic representation is an acceptable
measure – no notion of magnitude associated with numbers or symbols.
Any two mappings, M and M’, will be related to each other in that M’ can
be obtained from M by a one-to-one mapping
Example – software faults can belong to one of the following classes,
according to where they were first introduced during development:
Specification
Design
Code
44
Primary Scales of Measurement
Ordinal Scale
A ranking scale in which numbers are assigned to objects
to indicate the relative extent to which the objects possess
some characteristic.
Can determine whether an object has more or less of a
characteristic than some other object, but not how much
more or less.
Any series of numbers can be assigned that preserves the
ordered relationships between the objects.
In addition to the counting operation allowable for nominal
scale data, ordinal scales permit the use of statistics based
on centiles, e.g., percentile, quartile, median.
Measurement Basics
Measurement types and scale
Ordinal scale
Augments nominal scale with ordering information.
Three major characteristics
Empirical relation system consists of classes that are ordered with
respect to the attribute
Any mapping preserving the ordering (i.e., a monotonic function)
is acceptable
Numbers represent ranking only, so arithmetic operations have no
meaning
Set of admissible transformations is set of all monotonic mappings
Example – software “complexity” – two valid measures
Value Meaning Value Meaning
1 Trivial 2 Trivial
2 Simple 4 Simple
3 Moderate 6 Moderate
4 Complex 9 Complex
46 5 Incomprehensible 12 Incomprehensible
Primary Scales of Measurement
Interval Scale
Numerically equal distances on the scale represent equal
values in the characteristic being measured.
It permits comparison of the differences between objects.
The location of the zero point is not fixed. Both the zero
point and the units of measurement are arbitrary.
Any positive linear transformation of the form y = a + bx
will preserve the properties of the scale.
It is meaningful to take ratios of scale values.
Statistical techniques that may be used include all of those
that can be applied to nominal and ordinal data, and in
addition the arithmetic mean, standard deviation, and other
statistics commonly used in marketing research.
Measurement Basics
Measurement type and scale
Interval scale
Captures information about size of intervals that separate classes.
Three characteristics
Preserves order
Preserves differences, but not ratios
Addition and subtraction are acceptable, but not multiplication and
division
Class of admissible transformations is the set of affine transformations:
M’=aM+b, where a>0.
Example – software complexity – suppose the difference in complexity
between a trivial and a simple system is the same as that between a simple
and a moderate system. Where this equal step applies to each class, we have
an attribute measurable on an interval scale.
Value Meaning Value Meaning Value Meaning
1 Trivial 0 Trivial 1.1 Trivial
2 Simple 2 Simple 2.2 Simple
3 Moderate 4 Moderate 3.3 Moderate
4 Complex 6 Complex 4.4 Complex
5 Incomprehensible 8 Incomprehensible 5.5 Incomprehensible
48
Primary Scales of Measurement
Ratio Scale
Possesses all the properties of the nominal, ordinal, and
interval scales.
It has an absolute zero point.
It is meaningful to compute ratios of scale values.
Only proportionate transformations of the form y = bx,
where b is a positive constant, are allowed.
All statistical techniques can be applied to ratio data.
Measurement Basics
Measurement type and scale
Ratio scale
Most useful scale, common in physical sciences – captures
information about ratios.
4 characteristics
Preserves ordering, size of intervals between entities, and ratios
between entities
There is a zero element, representing total lack of the attribute
Measurement mapping must start at 0 and increase at equal
intervals (units)
All arithmetic can be meaningfully applied to classes in the range
of the mapping.
Acceptable transformations are ratio transformations – M’=aM,
where a is a scalar.
Example – program length can be measured by lines of code,
number of characters, etc. Number of characters may be obtained
by multiplying the number of lines by the average number of
50
characters per line.
Primary Scales of Measurement
Table 8.1
Scale Basic Common Marketing Permissible Statistics
Characteristics Examples Examples Descriptive Inferential
Nominal Numbers identify Social Security Brand nos., store Percentages, Chi-square,
& classify objects nos., numbering types mode binomial test
of football players
Ordinal Nos. indicate the Quality rankings, Preference Percentile, Rank-order
relative positions rankings of teams rankings, market median correlation,
of objects but not in a tournament position, social Friedman
the magnitude of class ANOVA
differences
between them
Interval Differences Temperature Attitudes, Range, mean, Product-
between objects (Fahrenheit) opinions, index standard moment
Ratio Zero point is fixed, Length, weight Age, sales, Geometric Coefficient of
ratios of scale income, costs mean, harmonic variation
values can be mean
compared
A Classification of Scaling
Techniques
Figure 8.2
Scaling Techniques
Comparative Noncomparative
Scales Scales
Continuous ItemizedRating
Paired Rank Constant Q- Sort &
Rating Scales Scales
Comparison Orde Sum Other
r Procedures
Semantic Stapel
Likert Differential
A Comparison of Scaling
Techniques
Comparative scales involve the direct comparison of
stimulus objects. Comparative scale data must be
interpreted in relative terms and have only ordinal or rank
order properties.
In noncomparative scales, each object is scaled
independently of the others in the stimulus set. The
resulting data are generally assumed to be interval or ratio
scaled.
Relative Advantages of Comparative Scales
Small differences between stimulus objects can be
detected.
Same known reference points for all respondents.
Easily understood and can be applied.
Involve fewer theoretical assumptions.
Tend to reduce halo or carryover effects from one
judgment to another.
Relative Disadvantages of Comparative Scales
Ordinal nature of the data
Inability to generalize beyond the stimulus objects scaled.
Comparative Scaling Techniques
Paired Comparison Scaling
A respondent is presented with two objects and asked to
select one according to some criterion.
The data obtained are ordinal in nature.
Paired comparison scaling is the most widely used
comparative scaling technique.
With n brands, [n(n - 1) /2] paired comparisons are
required
Under the assumption of transitivity, it is possible to
convert paired comparison data to a rank order.
Obtaining Shampoo Preferences Using Paired
Comparisons
Figure 8.3
Instructions: We are going to present you with ten pairs of
shampoo brands. For each pair, please indicate which one of the
two brands of shampoo you would prefer for personal use.
Jhirmack Finesse Vidal Head & Pert
Sassoon Shoulders
Recording
Jhirmack
Form: 0 0 1 0
Finesse 1a 0 1 0
Vidal Sassoon 1 1 1 1
Head & Shoulders 0 0 0 0
Pert 1 1 0 1
Number of Times 3 2 0 4 1
Preferredb
a
A 1 in a particular box means that the brand in that column was preferred
over the brand in the corresponding row. A 0 means that the row brand
was preferred over the column brand. bThe number of times a brand was
preferred is obtained by summing the 1s in each column.
Paired Comparison Selling
The most common method of taste testing is paired comparison. The
consumer is asked to sample two different products and select the
one with the most appealing taste. The test is done in private and a
minimum of 1,000 responses is considered an adequate sample. A
blind taste test for a soft drink, where imagery, self-perception and
brand reputation are very important factors in the consumer’s
purchasing decision, may not be a good indicator of performance in
the marketplace. The introduction of New Coke illustrates this point.
New Coke was heavily favored in blind paired comparison taste tests,
but its introduction was less than successful, because image plays a
major role in the purchase of Coke.
A paired comparison
taste test
Comparative Scaling Techniques
Rank Order Scaling
Respondents are presented with several objects
simultaneously and asked to order or rank them according
to some criterion.
It is possible that the respondent may dislike the brand
ranked 1 in an absolute sense.
Furthermore, rank order scaling also results in ordinal data.
Form
Brand Rank Order
1. Crest _________
2. Colgate _________
3. Aim _________
4. Gleem _________
5. Macleans _________
Instructions
On the next slide, there are eight attributes of
bathing soaps. Please allocate 100 points among
the attributes so that your allocation reflects the
relative importance you attach to each attribute.
The more points an attribute receives, the more
important the attribute is. If an attribute is not
at all important, assign it zero points. If an
attribute is twice as important as some other
attribute, it should receive twice as many points.
Importance of Bathing Soap Attributes Using a Constant Sum
Scale
Figure 8.5 cont.
Form
Average Responses of Three Segments
Attribute Segment
8 I Segment
2 II Segment
4
III 2 4 17
1. Mildness 3 9 7
2. Lather 53 17 9
3. Shrinkage 9 0 19
4. Price 7 5 9
5. Fragrance 5 3 20
6. Packaging 13 60 15
Sum 100 100 100
7. Moisturizing
8. Cleaning Power
Measurement Basics
Measurement type and scale - summary
65
Measurement Basics
Meaningfulness in measurement
After making measurements, key question is “can we deduce
meaningful statements about entities being measured?”
Harder to answer than it first appears – consider these
statements:
1. The number of errors discovered during the integration testing of a
program X was at least 100
2. The cost of fixing each error in program X is at least 100
3. A semantic error takes twice as long to fix as a syntactic error
4. A semantic error is twice as complex as a syntactic error
66
Measurement Basics
Meaningfulness in measurement (cont’d)
First statement seems to make sense
Second statement doesn’t make sense – number of errors may
be specified without reference to a particular scale, but cost to
fix them must be
Statement 3 seems sensible – the ratio of time taken is the
same, whether time is measured in second, hours, or fortnights
Statement 4 does not appear to be meaningful and requires
clarification:
If complexity means time to understand the error, than it makes sense
Other definitions of complexity may not admit measurement on a ratio
scale (e.g. examples in previous slides) in which case statement 4 is
meaningless.
67
Measurement Basics
Meaningfulness in measurement
Definition: a statement involving measurement is
meaningful if its truth value is invariant of
transformations of allowable scales.
68
Measurement Basics
Meaningfulness in measurement – examples
John is twice as tall as Fred
Implies measures are at least on the ratio scale. It’s meaningful because no
matter what transformation we use (and all we have is ratio transformations),
the truth or falsity of the statement remains constant.
Temperature in Tokyo today is twice that in London
Implies a ratio scale, but is not meaningful. We measure in ° F and ° C. If
Tokyo is 40° C and London is 20° C, then the statement is true, but if Tokyo
is 104° F and London is 68° F, the statement is no longer true.
Failure x is twice as critical as failure y
Not meaningful if we only have an ordinal scale for criticality (common scale
for software failures is catastrophic, significant, moderate, minor, and
insignificant).
69
Measurement Basics
Meaningfulness in measurement
Note that our notion of meaningfulness says nothing
about
Usefulness
Practicality
Worthwhile
Ease of measurement
70
Measurement Basics
Statistical operations on measures
Analyses don’t have to be sophisticated, but we want to know
something about how a set of data is distributed.
What types of statistical analysis are relevant to a given
measurement scale?
Scale type Defining relations Examples of appropriate statistics
71
Measurement Basics
Indirect measurement and meaningfulness
Done when measuring a complex attribute in terms of simpler
sub-attributes
Scale type for an indirect measure M is generally no stronger
than the weakest of the scale types of the sub-attributes
Example – testing efficiency=defects/effort
Defects is on the absolute scale, while effort is on the ratio scale. Therefore
effort is on the ratio scale.
What is E=2.7v+121w+26x+12y+22z-497, where
o v is the number of program instructions
o x and y are the number of internal and external documents
o z is the program size in words
o w is a subjective measure of complexity
72
Halstead’s Metrics
Amenable to experimental verification [1970s]
Program length: N = N1 + N2
Program vocabulary: n = n1 + n2
Estimated length:
N̂ = n1 log2 n1 + n2 log2 n2
Close estimate of length for well structured programs
Purity ratio: PR =
N̂ /N
Program Complexity
Volume: V = N log2 n
Number of bits to provide a unique designator for each of the n
items in the program vocabulary.
Difficulty
If-then-else Until
Cyclomatic Complexity
Set of independent paths through the graph (basis set)
V(G) = E – N + 2
E is the number of flow graph edges
N is the number of nodes
V(G) = P + 1
P is the number of predicate nodes
Example
i = 0;
while (i<n-1) do
j = i + 1;
while (j<n) do
if A[i]<A[j] then
swap(A[i], A[j]);
end do;
i=i+1;
end do;
Flow Graph
1
7 4 5
6
Computing V(G)
V(G) = 9 – 7 + 2 = 4
V(G) = 3 + 1 = 4
Basis Set
1, 7
1, 2, 6, 1, 7
1, 2, 3, 4, 5, 2, 6, 1, 7
1, 2, 3, 5, 2, 6, 1, 7
Another Example
1
2
4
3
5 6
8
What is V(G)?
Meaning
V(G) is the number of (enclosed) regions/areas of the
planar graph
PRODUCT OPERATION
Correctness Usability Efficiency
Reliability Integrity
A Comment
McCall’s quality factors were proposed in the
early 1970s. They are as valid today as they were
in that time. It’s likely that software built to conform
to these factors will exhibit high quality well into
the 21st century, even if there are dramatic changes
in technology.
Quality Model
product
Metrics
High level Design Metrics
Structural Complexity
Data Complexity
System Complexity
Card & Glass ’80
Graph based:
Nodes + edges
Modules + lines of control
Depth of tree, arc to node ratio
Coupling
Data and control flow
di – input data parameters
ci input control parameters
do output data parameters
co output control parameters
Global
gd global variables for data
gc global variables for control
Environmental
w fan in
r fan out
Metrics for Coupling
Mc = k/m, k=1
Formulate
Collect
Analysis
Interpretation
Feedback
Metrics for the Object Oriented
Chidamber & Kemerer ’94 TSE 20(6)
Direct measures
Weighted Methods per Class
n
WMC =
ci
i 1
Viewpoints:
Lower level subclasses inherit a number of methods
making behavior harder to predict
Deeper trees indicate greater design complexity
Number of Children
NOC is the number of subclasses immediately
subordinate to a class
Viewpoints:
As NOC grows, reuse increases - but the abstraction may be diluted
Classes higher up in the hierarchy should have more sub-classes then those
lower down
NOC gives an idea of the potential influence a class has on the design:
classes with large number of children may require more testing
Coupling between Classes
CBO is the number of collaborations between two classes
(fan-out of a class C)
the number of other classes that are referenced in the class C
(a reference to another class, A, is an reference to a method
or a data member of class A)
Viewpoints:
As collaboration increases reuse decreases
High fan-outs represent class coupling to other classes/objects and thus are
undesirable
High fan-ins represent good object designs and high level of reuse
Not possible to maintain high fan-in and low fan outs across the entire
system
Response for a Class
RFC is the number of methods that could be called
in response to a message to a class (local + remote)
Viewpoints:
As RFC increases
testing effort increases
greater the complexity of the object
harder it is to understand
Lack of Cohesion in Methods
LCOM – poorly described in Pressman
Thus LCOM = 1
Explanation
LCOM is the number of empty intersections minus
the number of non-empty intersections
MIF
= M i (Ci ) .
i 1
n
M (C )
M (C ) is the number of methods inherited and not
i 1
a i
i i
overridden in Ci
Ma(Ci) is the number of methods that can be invoked
with Ci
Md(Ci) is the number of methods declared in Ci
MIF
Ma(Ci) = Md(Ci) + Mi(Ci)
All that can be invoked = new or overloaded + things
inherited
MIF is [0,1]
MIF near 1 means little specialization
MIF near 0 means large change
Coupling Factor
CF=
i j
is _ client (Ci , C j ) .
(TC 2 TC )
is_client(x,y) = 1 iff a relationship exists between
the client class and the server class. 0 otherwise
Metrics calculated
Metrics calculated
Lines Of Code (LOC)
McCabe’s cyclomatic complexity
C&K suite (WMC, NOC, DIT, CBO)
freely available
http://cccc.sourceforge.net/
Jmetric
OO metric calculation tool for Java code (by Cain and
Vasa for a project at COTAR, Australia)
Metrics
Lines Of Code per class (LOC)
Cyclomatic complexity
LCOM (by Henderson-Seller)
Availability
is distributed under GPL
http://www.it.swin.edu.au/projects/jmetric/products/jmetric/
JMetric tool result
GEN++
(University of California, Davis and Bell Laboratories)