Chap. 3 Software Quality Metrics
Chap. 3 Software Quality Metrics
-Software measurement is concerned with deriving a numeric value for some attribute of a software product or a software process.
Comparing these values to each other and to relevant standards allows drawing conclusions about the quality of software product or the effectiveness of software processes. Examples of software metrics: program size, defect count for a given program, head count required to deliver a system component, development effort and time, product cost, etc.
-These result in 3 broad classes of software metrics: project (i.e., resource), product and process metrics.
Process Metrics Measure the efficacy of processes Focus on quality achieved as a consequence of a repeatable or managed process Reuse (historical/past projects) data for prediction
Examples:
Product Metrics Measure predefined product attributes Focus on the quality of deliverables
Examples
-Software quality metrics can be further divided into end-product quality metrics and in-process quality metrics.
The ultimate goal of software quality engineering is to investigate the relationships among in-process metrics, project characteristics, and end-product quality, and based on these findings to engineer improvements in both process and product quality.
-The two metrics are correlated but are different both in purpose and usage.
MTTF:
- Most often used with special-purpose systems such as safety-critical systems -Measure the time between failures
Defect density:
-Mostly used in general-purpose systems or commercial-use software. -Measure the defects relative to the software size (e.g., lines of code, function points)
Size may be estimated: -Directly using size oriented metrics: lines of code (loc) -Indirectly using function oriented metrics: function points (FP)
Differences between physical lines and instruction statements (or logical lines of code) and differences among languages contribute to the huge variations in counting LOCs. Possible approaches used include:
-Count only executable lines -Count executable lines plus data definitions -Count executable lines, data definitions, and comments -Count executable lines, data definitions, comments, and job control language -Count lines as physical line on an input screen -Count lines as terminated by logical delimiters.
-In any case, when any data on size of program products and their quality are presented, the method for LOC counting should be described: whether it is based on physical or logical LOC.
When straight LOC count is used, size and defect rate comparisons across
Defect Density Metrics -At IBM, the following LOC count metrics, based on logical LOC, are used:
Shipped Source Instructions (SSI): LOC count for the total product Changed Source Instructions (CSI): LOC count for the new and changed code of the new release.
-Defects after the release of the product are tracked. Defects can be field defects (found by customers), or internal defects (found internally).
-Postrelease defect rate metrics can be computed by thousand SSI (KSSI) or per thousand CSI (KCSI):
1. Total defects per KSSI: a measure of code quality of the total product. 2. Field defects per KSSI: a measure of defect rate in the field 3. Release-origin defects (field and internal) per KCSI: a measure of development quality 4. Release-origin field defects per KCSI: a measure of development quality per defects found by customers.
Metrics (1) and (3) are the same for the initial release where the entire product is new; thereafter, metric (1) is affected by aging and the improvement (or deterioration) of metric (3). Metrics (1) and (3) are process measures; their field counterparts, metrics (2) and (4) represent the customers perspective. Given an estimated defect rate (KCSI or KSSI), software developers can minimize the impact to customers by finding and fixing the defects before customers encounter them. From the customers point of view, the defect rate is not as relevant as the total number of defects that might affect their business. Therefore, a good defect rate target should lead to a release-to-release reduction in the total number of defects, regardless of size.
a.
Considering that the quality goal of the company is to achieve 10% improvement in overall defect rates from release-to-release, calculate the total number of additional defects for the third release.
b. What should be the maximum (overall) defect rate target for the third release to ensure that the number of new defects does not exceed that of the second release.
-The problems metric is usually expressed in terms of problems per user month (PUM):
PUM = Total problems that customers reported (true defects and non-defect-oriented problems) for a time period / Total number of license-months of the software during the period
Where:
Number of license-months = Number of install licenses of the software Number of months in the calculation period
-The parameters used, for instance, by IBM to monitor customers satisfaction include the CUPRIMDSO categories:
capability, functionality, usability, performance, reliability, installability, maintainability, documentation/information, service, and overall.
-Based on the five-point scale, various metrics may be computed. For example:
(1) (2) (3) (4) Percent of completely satisfied customers Percent of satisfied customers : satisfied and completely satisfied Percent of dissatisfied customers: dissatisfied and completely dissatisfied Percent of nonsatisfied customers: neutral, dissatisfied, and completely dissatisfied
Methods of Survey Data Collection -Various methods can be used to gather customer survey data, including face-to-face interviews, phone interviews, and mailed questionnaires. -When the customer base is large, it is too costly to survey all customers. Estimating the satisfaction level through a representative sample is more efficient. -In general probability sampling methods such as simple random sampling are used to obtain representative samples.
Sample Size -For each probability sampling method, specific formulas are available to calculate sample size, some of which are quite complicated. -For simple random sampling, the sample size n required to estimate a population proportion (e.g., percent satisfied) is given by:
N Z 2 p (1 p ) n = N B 2 + Z 2 p (1 p )
Example: Given a software product, estimate the sample size required to achieve 90%
expected satisfaction level for 10,000 customers for a level of confidence with 5% margin of error.
-Hence, the simple metric of defects per KLOC or function point is a good indicator of quality while the software is still being tested. -Overall, defect density during testing is a summary indicator; more information are actually given by the pattern of defect arrivals (which is based on the times between failures).
Even with the same overall defect rate during testing, different patterns of defects arrivals indicate different quality levels in the field.
(test scenario 1)
(test scenario 2)
week
week
week
a very low level, or times between failures that are far apart, before stopping the testing effort and releasing the software to the field.
-The time unit for observing the arrival pattern is usually weeks and occasionally months. -For models that require execution time data, the time intervals is in units of CPU time.
Defect Removal Effectiveness Defect Injection and Removal -Defects are injected into the software product or intermediate deliverables of the product at various phases.
Development phase
Requirements
Defect Injection
Requirements gathering functional specifications Design work Design work Coding Integration and build process Bad fixes Bad fixes Bad fixes
Defect Removal
Requirements analysis and review
High-level design Low-level design Code implementation Integration/build Unit test Component test System test
High-level design inspections Low-level design inspections Code inspections Build verification Testing Testing itself Testing itself Testing itself
Defect Removal Effectiveness (DRE) Metrics -Measure the overall defect removal ability of the development process.
-When used for the front end, it is referred to as early defect removal -When used for specific phase, it is referred to as phase effectiveness
It can be calculated for the entire development process, for the front end (before code integration) or for each phase: The higher the value of the metric, the more effective the development process and the fewer defects escape to the next phase or to the field.
-Defined as follows:
Defects R e m o v e d during a Developm e n t P h a se DRE = 100% Defects latent in the product
-The number of defects latent in the product at any given phase is usually obtained through estimation as:
Defect latent = Defects removed during the phase + defects found later
High level Low level Code Design Design Inspection review (DR) review (LR) (CI)
Development Phase
4. Effort/Outcome Model
When tracking the defect removal rates against the model, lower actual defect removal could be the result of lower error injection or poor reviews and inspections. In contrast, higher actual defect removal could be the result of higher error injection or better reviews and inspections.
-One major problem with the defect removal model is that the interpretation of the rates could be flawed:
-To solve this problem, additional indicators must be incorporated into the context of the model for better interpretation of the data. -One such additional indicator is the quality of process execution.
For instance, the metric of inspection effort (operationalized as the number of hours the team spent on design and code inspections normalized per thousand of lines of code inspected) is used as an indicator for how rigorous the inspection process is executed. This metric combined with the defect rate can provide useful interpretation of the defect model.
The matrix is formed by combining the scenarios of an effort indicator and an outcome indicator. The model can be applied to any phase of the development process with any pairs of meaningful indicators. The high-low comparisons are between actual data and the model, or between the current and previous releases of the product.
(Outcome)
Lower
Higher
Good/Not Bad
Best Case
Cell3
Cell4
Worst-Case
Unsure
Best-case scenario (Cell2): the design/code was cleaner before inspections, and yet the team spent enough effort in design review/code inspection that good quality was ensured. Good/not bad scenario (Cell1): error injection may be higher, but higher effort spent is a positive sign and that may be why more defects were removed. Worst-case scenario (Cell3): High error injection but inspections were not rigorous enough. Chances are more defects remained in the design or code at the end of the inspection process. Unsure scenario (Cell4): one cannot ascertain whether the design and code were better, therefore less time was needed for inspection or inspections were hastily done, so fewer defects were found.