Unit 1: Software Engineering Fundamentals
Unit 1: Software Engineering Fundamentals
Unit 1
Software Engineering Fundamentals
Learning Objectives
After reading this unit, you should appreciate the following:
Software
Software Engineering Concepts
Software Characteristics
Software Applications
Software Metrics
Size Metrics
Halstead’s Theory
Function Point Analysis
Top
Software
Before 1970, very few people knew what "computer software" meant. But now a
days, professionals and many members of the public at large understand what the
software means.
If we think about the description of software, it will take the following form: Software
is
(1) basically instructions, (computer logical programs) that when these instructions
are executed then we can perform desired functions and maintain the performance,
(2) data structures that enable the programs to adequately manipulate information
for the required results, and
(3) documents that describe the operation and use of the programs for the better
understandability of the user of the program.
2 SOFTWARE ENGINEERING
Top
Top
Software Characteristics
For a better understanding of the software, it is important to examine the
characteristics of software that make it different from other things that human beings
build. When hardware is built, the human creative process (analysis, design,
construction, testing) is ultimately translated into a physical form. If we build a new
computer, our initial sketches, formal design drawings, and bread boarded prototype
evolve into a physical product (chips, circuit boards, power supplies, etc.).
Since software is purely logical rather than a physical system element, it therefore,
has characteristics that are entirely different than those of hardware:
1. Software is developed or engineered but it is not manufactured in the classical
sense: Although some similarities exist between software development and
hardware manufacture, the two activities are fundamentally different. In both
activities, high quality is achieved through good design, but the manufacturing
phase for hardware can introduce quality problems that are nonexistent (or
easily corrected) for software. Both activities are dependent on people, but the
relationship between people applied and work accomplished is entirely different.
Both activities require the construction of a "product" but the approaches are
different. Software costs are concentrated in engineering. This means that
software projects cannot be managed as if they were manufacturing projects.
Software doesn't "wear out."
Figure 1.1 shows failure rate as a function of time for hardware. The relationship,
often called the "bathtub curve," indicates that hardware exhibits relatively high
failure rates early in its life (these failures are often attributable to design or
manufacturing defects); defects are corrected and the failure rate drops to a
steady-state level (ideally, quite low) for some period of time. As time passes,
however, the failure rate rises again as hardware components suffer from the
cumulative affects of dust, vibration, abuse, temperature extremes, and many
other environmental maladies. Stated simply, the hardware begins to wear out.
4 SOFTWARE ENGINEERING
2. Consider the manner in which the control hardware for a computer-based product
is designed and built: The design engineer draws a simple schematic of the
digital circuitry, does some fundamental analysis to assure that proper function
will be achieved, and then goes to the shelf where catalogs of digital components
exist. Each integrated circuit (called an IC or a chip) has a part number, a defined
and validated function, a well-defined interface, and a standard set of integration
guidelines. After each component is selected, it can be ordered off the shelf.
SOFTWARE ENGINEERING FUNDAMENTAL 5
Top
Software Applications
Software may be applied in any situation for which a pre-specified set of procedural
steps (i.e., an algorithm) has been defined. Information content and determinacy are
important factors in determining the nature of a software application. Content refers
to the meaning and form of incoming and outgoing information. For example, many
business applications use highly structured input data (e.g., a database) and produce
formatted "reports." Software that controls an automated machine (e.g., a numerical
control) accepts discrete data items with limited structure and produces individual
machine commands in rapid succession.
Information determinacy refers to the predictability of the order and timing of
information. An engineering analysis program accepts data that have a predefined
order, executes the analysis algorithm(s) without interruption, and produces resultant
data in report or graphical format. Such applications are determinate. A multi-user
operating system, on the other hand, accepts inputs that have varied content and
arbitrary timing, executes algorithms that can be interrupted by external conditions,
and produces output that varies as a function of environment and time. Applications
with these characteristics are indeterminate.
System software: System software is a collection of programs and utilities for
providing service to other programs. Other system applications (e.g., operating
system components, drivers, telecommunications processors) process largely
indeterminate data. In either case, the system software area is characterized by
6 SOFTWARE ENGINEERING
Top
Software Metrics
Without software, computer hardware is of no use. A major share of the computing
budget is incurred on software development or its purchase, by each organization.
So, software systems are very precious and important products for both software
developers and users. Software is not a single attribute product but it has many
characteristics, which one can measure. For example, the size in lines of code, the
cost of development and maintenance in rupees, the time for development in person
- months, the size of memory required in bytes, and so on. Still, it is quite obvious
that different observers of the same computer program may get different results,
even when the same characteristic is measured.
For example, consider the lines of code property of a computer program. One
observer may count all the lines present in the program including blank lines and
comments. Another observer may drop comments and blank lines from the count by
realizing that these do not affect the performance of the program. Therefore, a
standard and precise definition of the line of code metric is required, so that for the
same program, different persons may get identical counts. Only under such standard,
identical and homogenous conditions, we can compare the results of empirical
studies conducted by different people at different times or places.
From time-to-time, different software metrics have been developed to quantify
various attributes of a software product. Broadly speaking, these may be grouped
into two categories. These are:
i. Product metrics
ii. Process metrics
The software metrics, like size, which can be derived from the software itself, are
called product metrics. While, all those measurements of a software product which
depend upon the development environment are called process metrics. Such metrics
do not require the analysis of the program itself and are related to the development
process. For example, measurement of the time required by a programmer to design,
code and test a program is a process metric. This metric depends upon many things
including the complexity of the problem, the knowledge and ability of the developer,
the type of algorithm used, and the availability of the computer time during the
8 SOFTWARE ENGINEERING
Top
Size Metrics
For solving different problems on computer, programs are developed, written and
implemented by different programmers. For achieving different objectives, programs
are written in different programming languages. Some programs are written in C, C+
+, few in Pascal and FORTRAN, some in COBOL, while others in C++, VB, VC++, Java,
Ada languages and so on. Some programs are of good quality, well documented and
written with latest software engineering techniques. While others are written in a
“quick-and-dirty” way with no comments and planning at all. Despite all these, there
is one common feature which all programs share - all have size.
Size measure is very simple, and important metric for software industry. It has many
useful characteristics like:
It is very easy to calculate, once the program is completed.
It plays an important role and is one of the most important parameter for many
software development models like cost and effort estimation.
Productivity is also expressed in terms of size measure.
Memory requirements can also be decided on the basis of size measure.
The principal size measures, which have got more attention than others, are:
1. Lines of Code (LOC)
2. Token count
3. Function count
Lines of Code
It is one of the earliest and the simplest metric for calculating the size of a computer
program. It is generally used in calculating and comparing the productivity of
programmers. Productivity is measured as LOC / man-month. Among researchers,
there is no general agreement what makes a line of code. Due to lack of standard and
precise definition of LOC measure, different workers for the same program may
obtain different counts. Further, it also gives an equal weightage to each line of code.
But, in fact some statements of a program are more difficult to code and comprehend
than others. Despite all this, this metric still continues to be popular and useful in
software industry because of its simplicity.
The most important characteristic of this metric is its precise and standard definition.
There is a general agreement among researchers that this measure should not
SOFTWARE ENGINEERING FUNDAMENTAL 9
include comment and blank lines because these are used only for internal
documentation of the program. Their presence or absence does not affect the
functionality, efficiency of the program. Some observers are also of the view that only
executable statements should be included in the count, because these only support
the functions of the program.
The predominant definition of LOC measure used today by various software personnel
is:
“Any line of program text excluding comment or blank lines, regardless of the
number of statements or parts of statements on the line, is considered a line of code
(LOC). It excludes all lines containing program headers, declarations, and non-
executable statements and includes only executable statements.”
Token Count
The drawback in LOC size measure of treating all lines alike can be solved by giving
more weight to those lines, which are difficult to code and have more “stuff”. One
natural solution to this problem may be to count the basic symbols used in a line
instead of lines themselves. These basic symbols are called “tokens”. Such a scheme
was used by Halstead in his theory of software science. In this theory, a computer
program is considered to be a collection of tokens, which may be classified as either
operators or operands. All software science metrics can be defined in terms of these
basic symbols. The basic measures are:
Function Count
The size of a large software product can be estimated in a better way, through a
larger unit called module, than the LOC measure. A module can be defined as
segment of code, which may be compiled independently. For large software systems,
10 SOFTWARE ENGINEERING
it is easier to predict the number of modules than the lines of code. For example, let a
software product require n modules. It is generally agreed that size of the module
should be about 50 - 60 lines of code. Therefore, size estimate of this software
product is about n x 60 lines of code. But, this metric requires precise and strict rules
for dividing a program into modules. Due to the absence of these rules, this metric
may not be so useful.
A module may consist of one or more functions. In a program, a function may be
defined as a group of executable statements, which performs a definite task. The
number of lines of code for a function should not be very large. It is because human
memory is limited and a programmer cannot perform a task efficiently if the
information to be manipulated is large.
Top
Halstead’s Theory
Researchers generally agree that simple size measures like lines of code (LOC) are
not adequate for determining software complexity and development effort. They are
of the view that, for this purpose, a programming process model is needed. This
model should be based upon manageable number of factors, which affect the
complexity and quality of the software systems. A number of researchers including
Halstead, McCabe have attempted to define such programming models.
Halstead’s model also known as theory of software science is based on the
hypothesis that program construction involves a process of mental manipulation of
the unique operators (n1) and unique operands (n2). It means that a program of N1
operators and N2 operands is constructed by selecting among n 1 unique operators
and n2 unique operands. By using this model, Halstead derived a number of
equations related to programming such as program level, the implementation effort,
language level and so on. Also, it is one of the most widely studied theories and has
been supported by a number of empirical studies.
An important and interesting characteristic of this model is that a program can be
analysed for various features like size, effort etc. by simply counting its basic
parameters n1, n2, N1 and N2.
N = n1 + n2
N = N1 + N 2
One of the hypotheses of this theory is that the length of a well - structured program
is a function of n1 and n2 only. This relationship is known as length prediction equation
and is defined as
This length equation estimates the size of the program from the counts of unique
operators (n1) and unique operands (n 2). If the actual length (N) agrees well with the
estimated value (Nh), then the program is considered as well structured.
V = N log2 n
It is assumed that during programming process, the human mind follows binary
search technique in selecting the next token from the vocabulary of size n.
Where
V* = (2 + n2 * ) log2 (2 + n2 * )
The minimum number of operands, n2 *, is the number of unique input and output
parameters. For small, simple programs, it can be calculated easily but for large,
complex programs like compiler, operating system, it is difficult to compute.
L = V* / V
difficult to determine potential volume (V*), so Halstead gave an alternate formula for
program level as:
L = ( 2 n2 ) / ( n 1 N 2 )
may be incorporated due to poorly designed, poorly structured, lengthy and complex
module. In a program, the following types of impurities may be included [15, 36]:
Complementary Operations
Ambiguous Operands
Redundant Conditions
Dead Code
Synonymous Operands
Common Sub-expressions
Unwarranted Assignment, and
Unfactored Expressions.
It is important that a program should not be ‘just’ a working program, but it must be
a good quality program. Earlier, we have explained various characteristics of a good
quality program and devoid of impurities is another step for it.
Though, the theory of software science metrics is very useful for the quantification of
various aspects of a program. These metrics have, also, been supported by many
experimental studies. But, it reflects only one type of program complexity–size. It
does not take into account the structure properties of the program or the modular
interactions of the program. Therefore, it cannot be used to measure the overall
complexity of a program.
Top
SOFTWARE ENGINEERING FUNDAMENTAL 15
Function-Point Analysis
Function-oriented software metrics use a measure of the functionality delivered by
the application as a normalization value. Since 'functionality' cannot be measured
directly, it must be derived indirectly using other direct measures. Function-oriented
metrics were first proposed as a measure called the function point. Function points
are derived using an empirical relationship based on countable (direct) measures of
software's information domain and assessments of software complexity.
Function points are computed by completing the table shown in Figure 1.3. Five
information domain characteristics are determined and counts are provided in the
appropriate table location. Information domain values are defined in the following
manner:
Number of user inputs: Each user input that provides distinct application–oriented
data to the software, is counted. Inputs should be distinguished from inquiries, which
are counted separately.
Number of user outputs: Each user output that provides application- oriented
information to the user is counted. In this context, output refers to reports, screens,
error messages, etc. Individual data items within a report are not counted separately.
Number of user inquiries: An inquiry is defined as an on-line input that results in the
generation of some immediate software response in the form of an on-line output.
Each distinct inquiry is counted.
Number of files: Each logical master file (i.e., a logical grouping of data that may be
one part of a large database or a separate file) is counted.
Number of external interfaces: All machine-readable interfaces (e.g., data files on
storage media) that are used to transmit information to another system are counted.
Once these data have been collected, a complexity value is associated with each
count. Organizations that use function point methods develop criteria for determining
whether a particular entry is simple, average, or complex. Nonetheless, the
determination of complexity is somewhat subjective.
Weighting factor
Number of files x 7 10 15 =
Count total
16 SOFTWARE ENGINEERING
Summary
Software engineering is the systematic approach to the development, operation,
maintenance, and retirement of software.
Software is a logical rather than a physical system element. Therefore, software
has characteristics that are considerably different than those of hardware.
A software component should be designed and implemented so that it can be
reused in many different programs.
Modem reusable components encapsulate both, data and the processing applied
to the data, enabling the software engineer to create new applications from
reusable parts
Within the software engineering context, a measure provides a quantitative
indication of the extent, amount, dimension, capacity, or size of some attribute of
a product or process. Measurement is the act of determining a measure. The
metrics is a quantitative measure of the degree to which a system, component,
or process possesses a given attribute.
Project metrics are used to minimize the development schedule by making the
adjustments necessary to avoid delays and mitigate potential problems and
risks. They are also used to assess product quality on an ongoing basis and,
when necessary, modify the technical approach to improve quality.
Size-oriented software metrics are derived by normalizing quality and/or
productivity measures by considering the size of the software that has been
produced.
Function-oriented software metrics use a measure of the functionality delivered by
the application as a normalization value.
Function points are derived using an empirical relationship based on countable
(direct) measures of software's information domain and assessments of software
complexity.
Complexity metrics can be used to predict critical information about reliability and
maintainability of software systems from automatic analysis of source code or
procedural design information. Complexity metrics also provide feedback during
the software project to help control the design activity.
18 SOFTWARE ENGINEERING
Self-assessment Questions
Solved Exercise
I. True or False
1. Function point and LOC measurements for the size of software are not
correlated.
2. In today’s computer systems, hardware is much costlier than software.
3. Both, McCabe and Halstead metrics, measure complexity of a program.
4. Software becomes old or aged without being worn out.
5. Artificial intelligence softwares make use of numerical algorithms.
II. Fill in the blanks.
1. A _________ system is heavily constrained on time of execution.
2. ________ is an example of private metric.
3. Minimum number of bits necessary to represent a program is known as
__________ of the program.
4. FP per month may be used as a metric for _________ of a programmer.
5. The number of statements between two successive references of a variable
is known as _______.
Answers
I. True or False.
1. False
2. False
3. True
4. True
5. False
II. Fill in the blanks.
1. real-time
2. defect-rate/module
3. Volumes
4. productivity
5. span
Unsolved Exercise
SOFTWARE ENGINEERING FUNDAMENTAL 19
I. True or False.
1. Information determinacy refers to the predictability of the order and timing
of information.
2. Function point metric is an absolute measure of a software characteristic.
3. McCabe metric is an absolute measure of a software complexity.
4. Software product and software process are two names for the same thing.
5. Variable span and program complexity are directly proportional to each
other.
Detailed Questions
1. Describe software metrics & different types of metric models.
2. Define the following:
a. Size metric
b. Complexity metric
c. Function point analysis
3. Describe software characteristics.
4. Describe Halstead’s theory in details and use in software engineering.
5. What are size metrics? How is function point metric advantageous over LOC
metric? Explain.
6. What is Halstead’s theory of software science? For a given program having
number of operators, n1 = 16 and number of operands n 2 = 32 determine the
following:
i. Program length (through estimator)
ii. Program volume
iii. Program level
iv. Language level
20 SOFTWARE ENGINEERING