0% found this document useful (0 votes)
9 views19 pages

6 - Engineering Just Right Reliability

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views19 pages

6 - Engineering Just Right Reliability

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Engineering "Just Right" Reliability

(Software Reliability)
(SE-308)

By Priya Singh (Assistant Professor, Dept of SE, DTU)


1. Background
Importance of quantifying reliability-
• Defining what we mean by "necessary" reliability for a product in quantitative terms is one of the
key steps in achieving the benefits of software reliability engineering.
• The quantitative definition of reliability makes it possible for us to balance customer needs for
reliability, delivery date, and cost precisely and to develop and test the product more efficiently.
• A failure is a departure of system behavior in execution from user needs; it is a user-oriented
concept.
• A fault is the defect that causes or can potentially cause the failure when executed, a developer-
oriented concept.
• A fault doesn't necessarily result in a failure, but a failure can only occur if a fault exists. To
resolve a failure you must find the fault.
• Some failures have more impact than others. For this reason, projects typically assign failure
severity classes to differentiate them from each other.
• A failure severity class is a set of failures that have the same per failure impact on users.

By Priya Singh (Assistant Professor, Dept of SE, DTU)


• Extensive experience with software-based products has shown that it is often more
convenient to express failure intensity as failures per natural unit. A natural unit is a unit
that is related to the output of a software-based product and hence the amount of
processing done. E.g. pages of output (l failure/K pages printed), transactions such as
reservations, sales, or deposits (l failure/K transactions), and telephone calls (1 failure/ K
calls).
• Users prefer natural units because they express reliability in terms that are oriented
toward and important to their business.
• The measurement of natural units is often easier to implement than that of execution time,
especially for distributed systems, since otherwise, we must deal with a complex of
execution times.
• We have been talking about failure intensity as an alternative way of expressing
reliability. Thus, the units we choose for expressing failure intensity are used for
expressing reliability as well. For example, if we speak of a failure intensity of 5 failures /
1000 printed pages, we will express the reliability in terms of some specified number of
printed pages.

By Priya Singh (Assistant Professor, Dept of SE, DTU)


2. Seps in engineering "just right" reliability
1. Define what you mean by "failure"
2. Choose a common reference unit for all failure intensities
3. Set a system failure intensity objective for each associated system
4. For any software you develop:
a. Find the developed software failure intensity objective
b. Choose software reliability strategies to optimally meet the developed software
failure intensity objective.

• Suppliers who are system integrators only need Steps 1,2 and 3. They just acquire and assemble
components and do no software development.
• Although system engineers and system architects traditionally performed all the foregoing
activities, including testers in the activities provides a strong basis for a better testing effort.
• Similarly, you should involve users in failure definition and in setting system failure intensity
objectives.

By Priya Singh (Assistant Professor, Dept of SE, DTU)


2.1 Defining "failure" for the product
• Defining failures implies establishing negative requirements on program behavior, as desired by
users.
• This sharpens the definition of the function of the system by providing the perspective of what the
system should not be doing.
• Traditionally we specify only positive requirements for systems, but negative requirements are
important because they amplify and clarify the positive requirements.
• They indicate the product behaviors that the customer cannot accept.
• Even if negative requirements are not complete, they are still valuable.
• The degree of completeness is typically greater for legacy systems, and it increases with the age of
the product as you gain more experience.
• Always remember that you must focus on your users' definition of failure. You should make sure
that the definition is consistent over the life of a product release.
• You also define failure severity classes with examples at this point for later use in prioritizing
failure resolution (and hence fault removal), but this is not part of the core software reliability
engineering process.

By Priya Singh (Assistant Professor, Dept of SE, DTU)


2.2 Choosing a common measure for all associated systems
• When you choose a common reference unit for all failure intensities, failures per natural unit are
normally preferable.
• It expresses the frequency of failure in user terms.
• Thus using natural units improves communication with your users.
• In some cases, a product may have multiple natural units, each related to an important (from the
viewpoint of use and criticality) set of operations of the product. If your product has this situation,
select one natural unit as a reference and convert the others to it. If this is not possible, the best
solution in theory would be to use failures per unit execution time as the common measure.
• To choose among alternate natural units, consider (in order):
1. Average (over a typical failure interval) amount of processing for the natural unit (which can be
viewed as the number of instructions executed, summed over all machines in the case of
distributed processing) should be reasonably constant.
2. The natural unit should be meaningful to users
3. The natural unit should be easy to measure
• It is desirable but not essential that a natural unit represents the execution of an operation.
"Execution of an operation" means that it runs through to completion.

By Priya Singh (Assistant Professor, Dept of SE, DTU)


2.3 Setting system failure intensity objectives
• Our next step is to set the system failure intensity objective for
each associated system.
• How we do this will depend on whether your product has
supersystems or not.
• If supersystems exist, then you follow the left-hand path in the
figure.
1. You choose the failure intensity objectives of the
supersystems and from them determine the failure intensity
objectives of the base product and its variations. Choose failure
Choose failure
intensity objectives of
2. If you have no supersystems, then you follow the right-hand intensity
supersystem
path in the figure. You directly choose the failure intensity objectives of each
BP and variation
objective of each standalone base product or variation.
• In either case, the way in which you choose the failure
intensity objectives is the same.
• The only difference when supersystems exist is that you derive
the failure intensity objectives for the base product and its
variations from those for the related supersystems.

By Priya Singh (Assistant Professor, Dept of SE, DTU)


• It requires three steps to choose a system failure intensity objective:
1. Determine whether your users need reliability or availability or both.
2. Determine the overall (for all operations) reliability and/or availability objectives.
3. Find the common failure intensity objective for the reliability and availability
objectives.

By Priya Singh (Assistant Professor, Dept of SE, DTU)


Reliability and Availability
• Reliability-
i. The probability that a system or a capability of a system will continue to function
without failure for a specified period in a specified environment.
ii. "Failure" means the program in its functioning has not met user requirements in some
way. "Not functioning to meet user requirements" is really a very broad definition.
iii. Thus, reliability incorporates many of the properties that can be associated with the
execution of the program. E.g. it includes correctness, safety, and the operational
aspects of usability and user-friendliness.
iv. Note that safety is actually a specialized subcategory of software reliability.
v. Reliability does not include portability, modifiability, or understandability of
documentation.

By Priya Singh (Assistant Professor, Dept of SE, DTU)


Natural Unit-
• It is a unit other than the time that is related to the amount of processing performed by a
software-based product.
• E.g. runs, pages of output, transactions, telephone calls, jobs, semiconductor wafers,
queries, or API calls.
• Other possible natural units include database accesses, active user hours, and packets.
Customers generally prefer natural units.
• Failure intensity, an alternative way of expressing reliability, is stated in failures per
natural or time unit.

By Priya Singh (Assistant Professor, Dept of SE, DTU)


Availability-
i. It is the average (over time) probability that a system or a capability of a system is currently
functional in a specified environment.
ii. We usually define software availability as the expected fraction of operating time during which
a software component or system is functioning acceptably.
iii. Availability depends on the probability of failure and the length of downtime when a failure
occurs.
iv. Assume that the program is operational and that we are not modifying it with new features or
repairs. Then it has a constant failure intensity and constant availability. We can compute
availability for software as we do for hardware.
v. It is the ratio of uptime to the sum of uptime plus downtime, as the time interval over which the
measurement is made approaches infinity. The downtime for a given interval is the product of
the length of the interval, the failure intensity, and the meantime to repair (MTTR).
vi. Therefore, we ordinarily determine MTTR as the average time required to restore the data for a
program, reload the program, and resume execution.
vii. If we wish to determine the availability of a system containing both hardware and software
components, we find the MTTR as the maximum of the hardware repair and software
restoration times.

By Priya Singh (Assistant Professor, Dept of SE, DTU)


2.4 Determining developed software failure intensity objectives
• If you are developing any software for the product or its variations, then in each case you
will need to set the developed software failure intensity objective.
• You need the developed software failure intensity objectives so you can choose the
software reliability strategies you will need to use and so that you can track reliability
growth during system tests with the failure intensity to failure intensity objective ratio.
• The software you are developing may be all or a component of the base product or its
variations.
• Note that suppliers who simply integrate software components will not need developed
software failure intensity objectives unless the control program that links the components
is sufficiently large that we should consider it in itself as developed software.
• You first find the find expected acquired failure intensity and then compute the developed
software failure intensity objective for the base product and each variation.

By Priya Singh (Assistant Professor, Dept of SE, DTU)


• To find the expected acquired failure intensity, you must find the failure intensity for each
acquired component of the product and its variations.
• The acquired components include the hardware and the acquired software components.
• The estimates should be based (in order of preference) on:
1. Operational data
2. Vendor warranty or specification by your customer (if your customer is purchasing
these components separately and they are not under your contractual control)
3. Experience of experts

By Priya Singh (Assistant Professor, Dept of SE, DTU)


3. Engineering software reliability strategies
• Once you have set developed software failure intensity objectives for the product and its variations,
then in each case engineer the right balance among software reliability strategies so as to meet the
developed software failure intensity and schedule objectives with the lowest development cost.
• The quality and success of this engineering effort depend on the data you collect about your
software engineering process and fine-tune with feedback from your results.
• A software reliability strategy is a development activity that reduces failure intensity, incurring
development cost and perhaps development time in doing so.
• Since the objectives for the product and its variations are often the same, the strategies will often
be the same.
• You usually choose the reliability strategies when you plan the first release of a product. Fault-
tolerant features are generally designed and implemented at that time and then retained through all
the subsequent releases. Hence reliability strategies must also be chosen at that time.
• We plan the software reliability strategies for a new release of a product in its requirements phase.

By Priya Singh (Assistant Professor, Dept of SE, DTU)


• A software reliability strategy may be selectable (requirements, design, or code reviews) or
controllable (amount of system test, amount of fault tolerance).
• A selectable software reliability strategy is determined in a binary fashion, you either employ it or
you don’t.
• You can apply a "measured amount" of a controllable strategy.
• It is not at present practical to specify the "amount" of review that you undertake; we have no reliable
way to quantify the amount and relate it to the amount of failure intensity reduction that you will
obtain.
• However, the foregoing does not prevent you from maximizing the efficiency of the review by
allocating it among operations by use of the operational profile.
• In choosing software reliability strategies, we only make decisions that are reasonably optional (it is
not optional to skip written requirements). These optional decisions currently are:
i. use of requirements reviews,
ii. use of design reviews,
iii. use of code reviews,
iv. degree of fault tolerance designed into the system, and
v. amount of system test.

By Priya Singh (Assistant Professor, Dept of SE, DTU)


• We only consider the basic failure intensity for the new operations because we assume
that the failure intensity objective has already been achieved for the previous release.
• If the failure intensity objective changes between releases, the choice process for software
reliability strategies as presented here now gives only an approximation to optimal choice.
• However, it is probably impractical to make the adjustments required for optimality.
• You will end up with more system testing required if the failure intensity objective is
lower; less, if higher.

By Priya Singh (Assistant Professor, Dept of SE, DTU)


• The prediction of basic failure intensity is possible because the basic failure intensity
depends on parameters we know or can determine:
1. Fault exposure ratio (the probability that a pass through the program will cause a fault
in the program to produce a failure)
2. Fault density per source instruction at the start of the system test
3. Fraction of developed software code that is new
4. Average throughput (object instructions executed per unit execution time)
5. Ratio of object to source instructions

By Priya Singh (Assistant Professor, Dept of SE, DTU)


Procedure for choosing strategies-
• The procedure for choosing software reliability strategies is to first determine the required
failure intensity reduction objective and then to allocate the failure intensity reduction
objective among the software reliability strategies available with the present technology.

• To determine the failure intensity reduction objective:


1. Express the developed software failure intensity objective in execution time.
2. Compute the basic failure intensity.
3. Compute the failure intensity reduction objective.

By Priya Singh (Assistant Professor, Dept of SE, DTU)


• Usually you can identify software failure intensity objectives from the performance
analysis for the system.
• Divide the developed software failure intensity objective in natural or operating time units
by the execution time per natural or operating time unit.

By Priya Singh (Assistant Professor, Dept of SE, DTU)

You might also like