Unit 4 Notes
Unit 4 Notes
UNIT-IV
Product Metrics: Software Quality, Frame work for Product Metrics, Metrics for
Analysis Model, Metrics for Design Model, Metrics for Source code, Metrics for testing,
Metrics for maintenance.
Metrics for Process and Projects: Software Measurement, Metrics for software quality.
Testing Strategies
Testing is the process of exercising a program with the specific intent of finding errors
prior to delivery to the end user.
A strategy for software testing integrates software test case design methods into a
well-planned series of steps that results in the successful construction of software.
Testing begins at the component level and work outward toward the integration of
the entire computer-based system
Testing is conducted by the developer of the software and (for large projects) by an
independent test group
A strategy for software testing must accommodate low-level tests that are necessary
to verify that a small source code segment has been correctly implemented as well as
high level tests that validates major system functions against customer requirements.
A strategy must provide guidance for the practitioner and a set of milestones for the
manager. Because the steps of the test strategy occur at a time when dead line pressure
begins to rise, progress must be measurable and problems must surface as early as
possible.
It refers to the set of activities that ensure that software correctly implements a
specific function.
It refers to the set of activities that ensure that the software that has been built is
traceable to customer requirements.
Verification and validation encompasses a wide range array of SQA activities that
include formal technical reviews, quality and configuration audits, performance
monitoring, simulation, feasibility study, documentation review, database review,
algorithm analysis, development testing, usability testing, qualification testing, and
installation testing. Although testing plays an extremely important role in V & V, many
other activities are also necessary.
The software should be given to a secret team of testers who will test it
unmercifully
The testers get involved with the project only when the testing steps are
about to begin
The software developer is always responsible for testing the individual units of the
program, ensuring that each performs the function or exhibits the behavior for
which it was designed.
In many cases developer also conducts integration testing-a testing step that leads
to the construction of the complete software architecture. Only after the software
architecture is completed, and independent test involves:
To remove the inherent problems associated with letting the builder test the
software that has been built
Works closely with the software developer during analysis and design to
ensure that thorough testing occurs
The software process may be viewed as the spiral illustrated in the following
figure:
A strategy for software testing may also be viewed in the context of the spiral.
Unit testing begins at the vortex of the spiral and concentrates on each unit of
the software as implemented in source code.
Testing progresses by moving outward along the spiral to integration testing
where the focus is on design and the construction of the software architecture.
Taking another turn outward on the spiral, we encounter validation testing,
where requirements established as part of software requirements analysis are
validated against the software that has been constructed.
Finally, we arrive at system testing, where the software and other system
elements are tested as a whole.
Testing is actually a series of four steps that are implemented sequentially. The
steps are shown in the following figure:
Unit testing
Integration testing
Focuses on inputs and outputs, and how well the components fit together
and work together
Validation testing
Provides final assurance that the software meets all functional, behavioral,
and performance requirements
System testing
Must broaden testing to include detections of errors in analysis and design models
Unit testing loses some of its meaning and integration testing changes
significantly
Test "in the small" and then work out to testing "in the large―
Every time a user executes the software, the program is being tested
Sadly, testing usually stops when a project is running out of time, money, or
both
One approach is to divide the test results into various severity levels
Then consider testing to be complete when certain levels of errors no longer occur
or have been repaired or eliminated
Strategic Issues
Understand the user of the software (through use cases) and develop a profile for
each user category
Develop a testing plan that emphasizes rapid cycle testing to get quick feedback to
control quality levels and adjust the test strategy
Build robust software that is designed to test itself and can diagnose certain kinds
of errors
Use effective formal technical reviews as a filter prior to testing to reduce the
amount of testing required
Conduct formal technical reviews to assess the test strategy and test cases
themselves
Develop a continuous improvement approach for the testing process through the
gathering of metrics
At one extreme a software team could wait until the system is fully constructed
and then conduct tests on the overall system to find errors. This approach will
result in buggy software that disappoints the customer and end-user.
At the other extreme, a software engineer could conduct tests on a daily bases,
whenever any part of the system is constructed. This approach results in an
effective software. Unfortunately most software developers hesitate to use it.
A testing strategy that is chosen by most software teams fails between the two
extremes.
It takes an increment view of testing, beginning with the testing of individual program
units, moving to tests designed to facilitate the integration of the units, and culminating
with tests that exercise the constructed system.
Unit Testing:
Unit Testing Focuses verification effort on the smallest unit of software design –
software component or module
Concentrates on the internal processing logic and data structures within the
boundaries of a component.
The tests that occur as part of unit test are illustrated schematically in following
figure:
The module interface is tested to ensure that information properly flows into and
out of the program unit under test.
Local data structures are examined to ensure that data stored temporarily
maintains its integrity during all steps in an algorithms execution.
All independents paths through the control structures are exercise to ensure that
all statements in a module have been executed at least once.
Boundary conditions are tested to ensure that the module operates properly at
boundaries established to limit or restrict processing.
Finally all error handling paths are tested.
Among the potential errors that should be tested when error handling is
evaluated are:
Because a component is not a stand-alone program, driver and/or stub software must
be developed for each unit test.
Stubs are dummy modules that are always distinguish as "called programs", it
used when sub programs are under construction.
Stubs are considered as the dummy modules that always simulate the low
level modules.
Drivers are also considered as the form of dummy modules which are always
distinguished as "calling programs‖ , it is only used when main programs are
under construction.
Drivers are considered as the dummy modules that always simulate the
high level modules.
When only one function is addressed by a component, the number of test cases is
reduced and errors can be more easily predicted and uncovered.
Integration Testing:
Once all modules have been unit tested: ―If they all work individually, why do you
doubt that they‘ll work when we put them together?‖
Sub functions, when combined, may not produce the desired major
function;
Integration testing focuses on collaborating all unit tested components to form one
single architecture and then implementing test cases to uncover errors associated
with interfacing.
Incremental approach
All components are combined and the entire program is tested as a whole.
Correcting the errors is complicated. Once these errors are corrected, new ones appear
and the process continues (endless). Hence, it is kept aside.
The program is constructed and tested in small increments, where errors are
easier to isolate and correct;
A T1
T1
A
T1 T2
A B
T2
T2 B T3
T3
B C
T3 T4
C
T4
D T5
Top-down integration:
Bottom-up integration:
Top-down integration :
1. The main control module is used as a test driver and stubs are substituted
for all components directly subordinate to the main control module.
4. On completion of each set of tests, another stub is replaced with the real
component.
5. Regression testing may be conducted to ensure that new errors have not
been introduced.
The process continues from step2 unit the entire program structure is built. The
top-down integration strategy verifies major control or decision points early in the test
process.
The top-down integration follows the pattern illustrated in the following figure:
Bottom-up integration:
4. Drivers are removed and clusters are combined moving upward in the
program structure.
The Bottom-up integration follows the pattern illustrated in the following figure:
Regression Testing:
Each time a new module is added as part of integration testing, the software
changes.
New data flow paths are established, new I/O may occur, and new control logic is
invoked. These changes may cause problems with functions that previously
worked flawlessly.
Capture/playback tools enable the software engineer to capture test cases and
results for subsequent playback and comparison.
The regression test suite (the subset of tests to be executed) contains three
different classes of test cases:
Tests that focus on the software components that have been changed.
Smoke Testing
Basically there are many forms of testing, but two types of testing are of primary
focus. They are
Black-box testing
White box-testing
Black-Box Testing:
This type of testing is conducted so as to ensure that the software satisfies its
purpose of development.
The software is exercised in all its functional aspects and is closely analyzed to
conclude that its modules(together) functions to the expectations.
The errors detected with the normal functioning of the software are uncovered so
that they can be rectified in the future.
White-Box Testing:
This type of testing lays stress on testing the framework of the software satisfies
its purpose of development.
Each individual unit of the software is tested along with the way each unit
collaborates with others to bring up the required functionality.
Validation Testing:
Validation testing may begin after successfully completing the integration testing
Validation can be defined in many ways, but a simple definition is that validation
succeeds when software functions in a manner that can be reasonably expected
by the customer.
After each validation test case has been conducted one of two possible conditions
exist:
Configuration Review:
The alpha test is conducted at the developer‘s site by end-users. Alpha tests are
conducted in a controlled environment with the developer looking over the typical users
and recording errors and their usage problems.
The beta test is conducted at end-user sites. The beta test is a ―live‖ application of
the software in an environment that cannot be controlled by the developer. The end-user
records all problems that are encountered during beta testing and reports these to the
developer at regular intervals.
System Testing:
Although each test has a different purpose, all work to verify that system elements
have been properly integrated and perform allocated functions.
Following are the types of system tests that are worthwhile for software-based
systems.
Recovery Testing
Stress Testing
Performance Testing
Recovery Testing:
Recovery testing is a system test that forces the software to fail in a variety of ways
and verifies that recovery is properly performed.
Stress Testing:
Sensitivity testing attempts to uncover data combinations within valid input classes
that may cause instability or improper processing.
Performance Testing:
Performance testing occurs throughout all steps in the testing process. Even at
the unit level, the performance of an individual module may be assessed as white-
box tests are conducted
However, it is not until all system elements are fully integrated that the true
performance of a system can be ascertained.
Debugging Process:
Results are assessed and a lack of correspondence between expected and actual
performance is encountered.
In the latter case, The debugging process attempts to match symptom with cause,
thereby leading to error correction.
A few characteristics of bugs provide some clues such as
The symptom and the cause may be geographically remote. i.e., the
symptom may appear in one part of a program, while the cause may
actually be located at a site that is far removed. Highly coupled program
structures exacerbate this situation.
The symptom may be caused by human error that is not easily traced.
The symptom may be due to causes that are distributed across a number of
tasks running on different processors.
Debugging Strategies:
Brute force
Backtracking
Cause elimination
Each of these strategies can be conducted manually, but modern debugging tools can
make the process much more effective.
Debugging Tactics: The brute force category of debugging is probably the most common
and least efficient method for isolating the cause of a software error. We apply brute
force debugging methods when all else fails.
Product Metrics: Software Quality, Frame work for Product Metrics, Metrics for
Analysis Model, Metrics for Design Model, Metrics for Source code, Metrics for testing,
Metrics for maintenance.
Introduction:
Software process and product metrics are quantitative measures that enable
software people to gain insight into the efficacy of the software process and the
projects that are conducted using the process as a framework
Software metrics are analyzed and assessed by software managers. Measures are
often collected by software engineers.
If you don‘t measure, judgment can be based only on subjective evaluation. With
measurement, trends (either good or bad) can be spotted, better estimates can be
made, and true improvement can be accomplished over time.
Software quality:
There is a set of implicit requirements that often goes unmentioned (e.g., ease
of use). If software conforms to its explicit requirements but fails to meet
implicit requirements, software quality is suspect
The factors that affect software quality can be categorized in two broad groups:
McCall Richards and walters propose a useful categorization of factors that affect
software quality. These software quality factors as shown in following figure.
Correctness: The Extent to which a program satisfies its specification and fulfills the
customer‘s mission objectives.
Reliability: The extent to which a program can be expected to perform its intended
function with required precision.
Integrity: The extent to which access to software or data by unauthorized persons can
be controlled.
Usability: The effort required to learn, operate, prepare input for, and interpret output
of a program.
Testability: The effort required to test a program to ensure that it performs its intended
function
Portability: The effort required to transfer the program from one hardware and/or
software system environment to another.
The ISO 9126 standard was developed in an attempt to identify quality attributes
for computer software
Reliability: The amount of time that the software is available for use as
indicated by the following sub-attributes: maturity, fault tolerance,
recoverability.
Efficiency: The degree to which the software makes optimal use of system
resources as indicated by the following sub-attributes: time behavior,
resource behavior.
Maintainability: The ease with which repair may be made to the software as
indicated by the following sub-attributes:analyzability, changeability,
stability, testability.
Portability: The ease with which the software can be transposed from one
environment to another as indicated by the following sub-attributes:
adaptability, installability, conformance, replaceability.
It is worth while to establish a fundamental framework and a set of principles for the
measurement of product metrics for software
Although the terms measure, measurement, and metrics are often used
interchangeably, it is important to note the subtle differences between them.
When a single data point has been collected (e.g., the number of errors uncovered
within a single software component), a measure has been established.
Measurement occurs as the result of the collection of one or more data points
(e.g., a number of component reviews and unit tests are investigated to collect
measures of the number of errors for each).
A software metric relates the individual measures in some way (e.g., the average
number of errors found per review or the average number of errors found per unit
test).
A software engineer collects measures and develops metrics so that indicators will
be obtained.
Measurement Principles:
Software metrics will be useful only if they are characterized effectively and
validated so that their worth is proven. The following principles are representative
of many that can be proposed for metrics characterization and validation:
define a set of questions that must be answered in order to achieve the goal,
and
A goal definition template can be used to define each measurement goal. The
template takes the form:
Simple and computable: It should be relatively easy to learn how to derive the
metric, and its computation should not demand inordinate effort or time.
Empirically and intuitively persuasive: The metric should satisfy the engineer‘s
intuitive notions about the product attribute under consideration.
Consistent and objective: the metric should always yield results that are
unambiguous.
An effective mechanism for high-quality feedback. That is, the metric should
lead to a higher quality end product.
A wide variety of metrics taxonomies have been proposed, the following outline
addresses the most important metrics areas:
These metrics address various aspects of the analysis model and include:
These metrics quantify design attribute in a manner that allows a software engineer to
assess design quality. Metrics include:
These metrics measures the source code and can be used to assess its complexity,
maintainability and testability, among other characteristics:
These metrics assist in the design of test cases and evaluate the efficacy of testing:
Statement and branch coverage metrics--Lead to the design of test cases that
provide program coverage
These metrics examine the analysis model with the intent of predicting the size of the
resultant system. Size is sometimes an indicator of design complexity and is almost
always an indicator of increased coding, integration and testing effort.
Function-Based Metrics:
The function point (FP) metric can be used effectively as a means for measuring the
functionality delivered by a system.
estimate the cost or effort required to design, code, and test the software;
predict the number of errors that will be encountered during testing; and
Number of internal logical files (ILFs). Each internal logical file is a logical
grouping of data that resides within the application‘s boundary and is
maintained via external inputs.
The Fi (i = 1 to 14) are value adjustment factors (VAF) based on responses to the
following questions:
Is performance critical?
Does the online data entry require the input transaction to be built over
mul-tiple screens or operations?
Is the application designed to facilitate change and ease of use by the user?
Each of these questions is answered using a scale that ranges from 0 (not
important or applicable) to 5 (absolutely essential).
Completeness,
Correctness,
Understandability,
Verifiability,
Achievability,
Concision (pointedness),
Traceability,
Modifiability,
Reusability.
Design metrics for computer software, like all other software metrics are not
perfect. Debate continues over their efficacy and the manner in which they should be
applied.
Structural complexity,
System complexity.
S(i ) = f 2
out(i )
fan-out is defined as the number of modules that are directly invoked by module i.
where v(i ) is the number of input and output variables that are passed to
and from module i.
This leads to a greater likelihood that integration and testing effort will also
increase.
Fenton suggests a number of simple morphology (i.e., shape) metrics that enable
different program architectures to be compared using a set of straightforward
dimensions. Referring to the call-and-return architecture in Figure, the following
metrics can be defined:
Size = n + a = 35
Depth = 4
Width = 6.
r = 18 / 17 = 1.06.
The U.S. Air Force Systems Command has developed a number of software quality
indicators that are based on measurable design characteristics of a computer
program.
The Air Force uses information obtained from data and architectural design to
derive a design structure quality index (DSQI) that ranges from 0 to 1.
S4 = number of database items (includes data objects and all attributes that
define objects)
Once values S1 through S7 are determined for a computer program, the following
intermediate values can be computed:
Module independence: D2 = 1 - ( S2 / S1 ).
Database size: D4 = 1 - ( S5 / S4 ).
Database compartmentalization: D5 = 1 - ( S6 / S4 ).
Complexity.
Like size, there are many differing views of software complexity. Whitmire views
complexity in terms of structural characteristics by examining how classes of an
OO design are interrelated to one another.
Coupling.
The physical connections between elements of the OO design (e.g., the number of
collaborations between classes or the number of messages passed between
objects) represent coupling within an OO system.
Sufficiency.
A design component (e.g., a class) is sufficient if it fully reflects all properties of the
application domain object that it is modeling.
Completeness.
Sufficiency compares the abstraction from the point of view of the current
application. Completeness considers multiple points of view.
Cohesion.
Primitiveness.
Similarity.
The degree to which two or more classes are similar in terms of their structure,
function, behavior or purpose is indicated by this measure.
Volatility.
The class is the fundamental unit of an OO system. Therefore, measures and metrics
for an individual class, the class hierarchy, and class collaborations will be invaluable
when you are required to assess OO design quality.
Chidamber and Kemerer have proposed six class-based design metrics for OO systems.
Assume that n methods of complexity c1, c2, . . . , cn are defined for a class C. The
specific complexity metric that is chosen (e.g., cyclomatic complexity) should be
normalized so that nominal complexity for a method takes on a value of 1.0.
for i 1 to n.
This metric is ―the maximum length from the node to the root of the tree‖
As DIT grows, it is likely that lower-level classes will inherit many methods.
This leads to potential difficulties when attempting to predict the behavior of a
class. A deep class hierarchy (DIT is large) also leads to greater design complexity.
On the positive side, large DIT values imply that many methods may be reused.
As the number of children grows, reuse increases, but also, the abstraction
represented by the parent class can be diluted if some of the children are not
appropriate members of the parent class. As NOC increases, the amount of testing
(required to exercise each child in its operational context) will also increase.
The CRC model may be used to determine the value for CBO.
In essence, CBO is the number of collaborations listed for a class on its CRC
index card. As CBO increases, it is likely that the reusability of a class will
decrease. High values of CBO also complicate modifications and the testing that
ensues when modifications are made.
In general, the CBO values for each class should be kept as low as is
reasonable. This is consistent with the general guideline to reduce coupling in
conventional software.
RFC is the number of methods in the response set. As RFC increases, the
effort required for testing also increases because the test sequence grows. It also
follows that, as RFC increases, the overall design complexity of the class
increases.
Each method within a class C accesses one or more attributes (also called
instance variables). LCOM is the number of meth-ods that access one or more of
the same attributes.10 If no methods access the same attributes, then LCOM 0.
If LCOM is high, methods may be coupled to one another via attributes. This
increases the complexity of the class design. Although there are cases in which a
high value for LCOM is justifiable, it is desirable to keep cohesion high; that is,
keep LCOM low.
Harrison, Counsell, and Nithi propose a set of metrics for object-oriented design that
provide quantitative indicators for OO design characteristics. A sampling of MOOD
metrics follows.
The degree to which class architecture of OO system make use of inheritance for
both methods and attributes
i = 1 to Tc. Tc is the total no. of classes in the architecture,Ci is a class within the
architecture, and
i,j = 1 to Tc.
= 0, otherwise
As the value for CF increases, the complexity of the OO software will also
increase and understandability, maintainability, and the potential for reuse may suffer
as a result.
Cohesion Metrics:
Bieman and Ott define a collection of metrics that provide an indication of the
cohesiveness of a module.
Data slice. A data slice is a backward walk through a module that looks for
data values that affect the module location at which the walk began.
Data tokens. The variables defined for a module can be defined as data
tokens for the module.
Glue tokens. This set of data tokens lies on one or more data slice.
Superglue tokens. These data tokens are common to every data slice in a
module.
Coupling Metrics:
Sears suggests that layout appropriateness (LA) is a worthwhile design metric for
human/computer interfaces.
Layout appropriateness:
Set Primitive measures that may be derived after the code is generated or
estimated once design is complete
V will vary with programming language and represents the volume in ‗bits‘
required to specify a program
Volume ratio L = 2/ n1 * n2 / N2
e = V/PL
where e(k) is computed for module k using Equations and the summation in the
denominator of Equation is the sum of Halstead effort across all modules of the system.
Binder suggests a broad array of design metrics that have a direct influence on
the ―testability‖ of an OO system.
The higher the value of LCOM, the more states must be tested to ensure
that methods do not generate side effects.
This metric indicates the percentage of class attributes that are public or
protected. High values for PAP increase the likelihood of side effects among
classes because public and protected attributes lead to high potential for
coupling.16 Tests must be designed to ensure that such side effects are uncovered.
This metric indicates the number of classes (or methods) that can access
another class‘s attributes, a violation of encapsulation. High values for PAD lead
to the potential for side effects among classes. Tests must be designed to ensure
that such side effects are uncovered.
This metric is a count of the distinct class hierarchies that are described in
the design model. Test suites for each root class and the corresponding class
hierarchy must be developed. As NOR increases, testing effort also increases.
Fan-in (FIN).
Fc = the number of modules in the current release that have been changed
Fa = the number of modules in the current release that have been added.
Fd = the number of modules from the preceding release that were deleted in the
current release
Process metrics are collected across all projects and over long periods of time.
Their intent is to provide a set of process indicators that lead to long-term software
process improvement.
Software Measurement
Direct measures of the software process (e.g., cost and effort applied) and
product (e.g., lines of code (LOC) produced, execution speed, memory size,
and defects reported over some set period of time).
Size-Oriented Metrics:
Size-oriented metrics are not universally accepted as the best way to measure the
software process. Most of the controversy swirls around the use of lines of code as a key
measure.
Function-Oriented Metrics
The most widely used function-oriented metric is the function point (FP).
The relationship between lines of code and function points depends upon the
programming language that is used to implement the software and the quality of
the design.
Object-Oriented Metrics:
where initiator is the object that requests some service (that initiates a
message), action is the result of the request, and participant is the server object
that satisfies the request.
Key classes are the ―highly independent components‖ and are central to the
problem domain
Support classes are required to implement the sys-tem but are not
immediately related to the problem domain.
If the average number of support classes per key class were known for a
given problem domain, estimating (based on total number of classes) would be
greatly simplified.
Number of subsystems.
Use cases are used widely as a method for describing customer level or business
domain requirements that imply software features and functions.
The UCP is a function of the number of actors and transactions implied by the
use-case models.
To achieve this goal, you must apply effective methods coupled with modern tools
within the context of a mature software process.
Measuring Quality:
Correctness
The most common measure for correctness is defects per KLOC, where a defect is
defined as a verified lack of conformance to requirements.
Maintainability
Software maintenance and support accounts for more effort than any other
software engineering activity.
Maintainability is the ease with which a program can be corrected if an
error is encountered, adapted if its environment changes, or enhanced if the
customer desires a change in requirements. There is no way to measure
maintainability directly; therefore, you must use indirect measures.
A simple time-oriented metric is mean-time-to-change (MTTC), the time it
takes to analyze the change request, design an appropriate modification,
implement the change, test it, and distribute the change to all users. On average,
programs that are maintainable will have a lower MTTC (for equivalent types of
changes) than programs that are not maintainable.
Integrity
Usability
A quality metric that provides benefit at both the project and process level is defect
removal efficiency (DRE).
Defect removal efficiency (DRE) provides benefit at both the project and process
level.
DRE = E / (E + D)
where E is the number of errors found before delivery of the software to the
end user and D is the number of defects found after delivery.
The ideal value for DRE is 1. That is, no defects are found in the software.
Realistically, D will be greater than 0, but the value of DRE can still approach 1 as
E increases for a given value of D.
In fact, as E increases, it is likely that the final value of D will decrease (errors are
filtered out before they become defects).