Cobol
Cobol
B. Dwyer © 2001
Contents
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2
Introduction
1 Introduction
This is not the usual course on databases. Yes, it does cover the standard material on SQL, which you
will need if you are going to use a database properly. It also covers some Cobol. This may come as a
shock, especially if you thought Cobol was dead; the news of Cobol’s death has been grossly
exaggerated. Even today, about one-half of all source code is written in Cobol. Why do you need it?
Once, programmers wrote in machine code or assembler language. This was the first generation of
programming languages. Second generation languages knew how to evaluate arithmetic expressions or
format output. Third generation languages incorporated ideas like structured programming and type
checking to help protect programmers against common mistakes. It wasn’t hard to understand how the
first three generations of languages translated into machine code. All three were procedural; the pro-
grammer had to tell the computer what to do. Then came the fourth generation. In theory, no longer
did the programmer have to tell the computer how to solve the problem, the programmer stated the
problem, and the computer figured out how to solve it. Does this sound to good to be true?
There never was and never will be a way of solving all problems. What is possible is to identify
problems that fit a certain pattern, and to have a way of solving all problems of that kind. Discovering
patterns that can be solved is one way computer science makes progress. In 1970, E.F. Codd proved
that queries in an SQL-like language could be evaluated by a few basic operations on files. Codd’s
proof only showed it could be done; it didn’t show how to do it efficiently. Even now, we don’t know
how to find the fastest way to answer an SQL query, except by searching through many possibilities.
This has created a new situation. In third generation languages, the programmer can predict which
programs are efficient and which are not. But in the case of an SQL query, the programmer can only
guess, because it is up to the computer to find a good algorithm — at least in theory. In practice, many
database management systems are not very smart, and stating the same problem in different ways can
make a big difference to how fast the answer is found. As a result, the programmer has a good deal of
control over efficiency. This is important, because although computers go a lot faster today than they
did in the days of machine code, the disks on which they store their data don’t.
SQL was designed as a language for the ‘end user’. Someone with no knowledge of programming
would be able to answer complex questions just by querying a database. As many organisations found,
with very little training, end users were able to tie up computer resources so effectively that no serious
work could get done. To use SQL well, to be a computer scientist and not just a ‘dumb user’, you need
to understand operations on files. This is where Cobol comes in.
Cobol is a third-generation language with just the right range of features to show how databases work.
Cobol has been around since 1960, and has evolved steadily since. Many other languages we use
today are just as old; they just change their names. For example, ‘C’ became C++, and C++ became
Java, and it is possible to trace the evolution of most other languages in a similar way. Even so, Cobol
still has some features that definitely belong in the past. For the most part we shall ignore them.
Unfortunately, one ancient feature cannot be ignored: the layout of Cobol source programs is still
geared to 80-column punched cards. The sooner this anachronism is removed, the better.
Cobol is a big language, with many features. It is not the aim of this course to teach Cobol. It is the
aim of this course to teach some algorithms. These are the same algorithms an SQL database system
uses internally. It just happens that they are best expressed in Cobol. Even if you use a different
language, you will still need the same algorithms. The course will teach only enough Cobol to do the
job. It won’t teach you anything wrong, but it won’t tell you the whole truth. For that, you should
consult a Cobol programming manual or the official Cobol standard.
Apart from languages, what else does the course cover? Quite a range of things. We shall carry one
case study throughout the course: a stock control system, typical of many business systems. The case
study is simpler than a real system, ignoring sales taxes, sales representatives’ commissions, discounts,
and the like. It is reasonably complex though; it needs to be rich enough to provide many examples.
The same case study will be examined in Cobol, then in SQL. This will make a number of things
clear. We shall see how an SQL query can be much more concise than the corresponding Cobol
program. We shall see that Cobol can do things that SQL cannot. We shall understand how SQL
works, in terms of Cobol algorithms. In addition, we shall study disk drives, how files are
implemented, client-server databases, record locking, deadlock, and learn how to estimate the
performance of programs and queries.
3
Know Your Enemy
!!
Fig.!2.1: The important parts of a Hard Disk Drive
The (fictional) Phantom II drive specified in Table!2.1 has a stack of 8 disks. They are 3 in. (76 mm.)
in diameter. At 10,000 rpm (revolutions per minute), their rims travel at 140 kph. Due to centrifugal
force, points on their rims are subject to an acceleration over 4,000 times that due to gravity.
According to Table!2.1, the drive has 15 recording surfaces, so all but one of its 8 disks must be
coated with magnetic material on both sides. The remaining surface isn’t used to contain data, but the
specification doesn’t tell us what it is used for.
Data is read from the disks by 15 heads, one per surface. The heads don’t actually touch the disks;
they hover close to them on a thin cushion of air trapped by the speed of the disk. That is why the
disks have a mirror-like finish. Any small irregularity might cause a head to touch the disk, destroying
both the head and the disk surface.
At each position of the head assembly, each head can record (write) or play back (read) data on a
circular track. The 15 tracks at one head position make a stack of 15 circles. Taken together, they are
called a cylinder. The specification says that the drive has 10,000 cylinders, which means the read
heads can be moved to 10,000 different positions. Each of the 15 surfaces therefore has 10,000 tracks:
150,000 tracks altogether. The ‘track density’ specification tells us that these positions are only
0.0002!cm (about 0.00008!in) apart.
4
Know Your Enemy
A track does not consist of one long recording, but many short ones. Each recording is called a
sector. A sector contains 512 bytes of data. The number of sectors per track varies between 250 and
450, an average of 350. Therefore, each track contains between 128,000 and 230,400!bytes,
179,200!bytes on average. Multiplying this by the 150,000 tracks gives a total storage capacity of
26,880 million bytes. Why then does the specification claim a capacity of only 25!GB?
To engineers, 1K (kilo) means 1 thousand, 1M (mega) means 1 million, and 1G (giga) means 1 billion
(thousand million). In the other direction, 1m (milli) means 1!thousandth, 1µ (micro) means
5
Know Your Enemy
1!millionth, and 1n (nano) means 1 billionth. Computer scientists tend to use a different scale based on
powers of 2, especially when they are referring to storage capacity. 210 equals 1,024, which is
reasonably close to 1,000 or 1K. Computer scientists make 1K=1,024, 1M=1,048,576, and
1G=1,073,741,824. Unfortunately, they aren’t very consistent about this, so we sometimes need to
check. Here, the stated storage capacity of 25GB must be in computer science units.
The heads move quickly. On average, they take 6 ms (6 thousandths of a second) to move between
tracks and be ready to read data. In Table!2.1, this is called the ‘average seek time’. During an average
seek the heads are accelerated and decelerated with over 100 times the force of gravity.
Seek time varies. The further the heads have to move, the bigger it is. A ‘full seek’ is a movement
between the outermost and innermost tracks — the worst case. A track-to-track seek is a movement
between two adjacent tracks — the best case. An average seek is 1/3 of a full seek, not 1/2 as you might
expect. That is because if both the start and finish positions are chosen at random, the average distance
between them is 1/3 of the maximum. Notice that all the seek times are slightly longer for writing than
for reading. The head can start to read a track before it has even had time to stop moving, but when it
writes a track, it needs to be exactly in position.
When a sector has to be read from disk, four things have to happen:
1 The correct head has to be selected to read the required surface. This happens electronically, and
takes next to no time.
2 The heads have to move to the correct cylinder. As we have seen, this takes an average of 6!ms.
3 The correct sector has to arrive at the read head. Since the disks rotate at 10,000!rpm, one full
rotation also takes 6!ms. The required sector may have just gone past the head, in which case it
will pass it again in 6!ms, or it may be just going to pass it. On average, it will take 3!ms for the
sector to reach the head. In Table!2.1, this is called ‘average latency’.
4 To be read, the sector must pass under the read head. With 250 sectors or more per track, the
transfer time will be at most 0.024!ms.
The total average access time is therefore 9.024!ms. Because the time to read one sector is so short, it
can pay to read several sectors at a time. 10 sectors (5K bytes) can be read in 9.24!ms, and a whole
track (125K or more) can be read in 15!ms.
Writing to a sector is similar, but the seek time is a little longer, 7!ms on average. In addition, an
option called verification is often used, in which a sector is first written, then read back to ensure it was
written correctly. Since a sector can’t be read until the next rotation after it is written, this adds another
6!ms to the total, making a little over 15!ms altogether.
If you made a video recording of a disk drive and slowed it down to see the heads moving, you would
be disappointed. The heads can move between tracks 3 times between two video half-frames.
6
Know Your Enemy
When we read or write successive sectors, we are using sequential access. When we read or write
sectors scattered about the disk, we are using random access. Other things being equal, sequential
access is faster, but both techniques have their proper places, which we shall discuss in detail later.
When we use the word ‘random’, we don’t mean that the sector the computer reads is left to chance,
we mean that the program can choose which one to read at random: ie., without constraint.
three variants that differ from it by 1 bit. Therefore, we can only distinguish two (8÷4) patterns safely.
Suppose instead we have a stream of 127 bits. These can form 2127 patterns, each having 127 1-bit
variants. We can therefore distinguish 2120 (2127÷128) patterns. In this case the ability to recover from
1-bit errors increases the amount of data stored by only 6%. In general, the number of bits needed for
error correction grows only with the logarithm of the number of bits of data. Using this simple
scheme, only three extra bytes would be needed to correct all possible 1-bit errors in a 512-byte sector.
Practical schemes are much more sophisticated than this, but the general idea is the same: a few extra
bytes (called a ‘cyclic check sum’) can be used to detect, and often correct, corrupted data. When an
error is detected, the disk drive will typically attempt to read the sector again, in the hope that the data
was written correctly but read wrongly. Errors in reading are called ‘soft’ read errors. Errors in
writing are called ‘hard’ read errors.
Because of error checking, a disk drive can never read less than one complete sector at a time.
Although the assembly that moves them is large, the read heads themselves are minuscule. The
‘recording density’ is 100,000 bits/cm. (There is always some doubt whether ‘B’ stands for ‘bytes’ or
‘bits’. Here, ‘bytes’ would be inconsistent with the media transfer rate.) This means that one bit is
recorded in a length of only 100nm (billionths of a meter). For comparison, the wavelength of violet
light is 410nm, and the wavelength of red light is 770nm. We conclude that the head is probably
manufactured using ultra-violet photolithography, similar to how VLSI computer chips are made.
Presumably, future advances in chip manufacture will continue to be paralleled by equal
improvements in recording density, so that disk drives will always keep pace with RAM.
The width of one bit is effectively the distance between two tracks. The ‘track density’ is 5,000/cm,
which makes one bit 2000nm wide—only 4 wavelengths of light.
8
Know Your Enemy
allocate storage as it would like to, but must take storage where it can find it. As a result, instead of
files occupying a few large extents, they become fragmented into many small ones.
To use a file, a Cobol program must first open it, telling the operating system its name and directory
path. If the file already exists, the operating system then locates and reads its directory entry. If the file
is a new one, the operating system can create a directory entry for it. From then on, the program reads
or writes records. A record contains items relating to one object, such as a customer or product. Since
hardware reads or writes sectors, the Cobol run-time system communicates with the operating system
in terms of blocks of one or more sectors. When a program reads the first record of a file, the operat-
ing system will retrieve its first block. After that, the program may be able to read several records from
the block before the next block has to be fetched. Likewise, a program may write several records
before the operating system writes a block. When a program has finished with a file, it should close it,
to ensure that the operating system promptly writes its updated directory entry back to disk.
The following diagram summarises the terms used by Cobol, hardware and the operating system:
A Cobol file consists of many blocks, each containing several records. Records contain items, which
consist of one or more characters. A disk is divided into many cylinders, each of which contains
several tracks, one per surface. Tracks contain many sectors: short recordings, often of 512 bytes. A
Cobol block consists of one or more hardware sectors, and a Cobol character occupies one byte.
There is no particular relationship between records and sectors; sectors can contain several short
records, but long records can span several sectors. The operating system allocates space to files in
units called segments, which consist of at least one sector. A contiguous series of segments allocated
to a file is called an extent. Operating system files and Cobol files are usually the same thing.
If one program is reading one file sequentially, we may expect that very little time is wasted in seeks
(movements of the read heads). But if two programs are reading files sequentially, or one program is
reading two files sequentially, then the disk drive must constantly move the heads from one file to the
other and back again. The operating system can ease this problem by transferring several sectors of the
file at a time, which takes only a little longer than transferring one. A group of sectors read or written
in this way is called a block. Some operating systems choose block sizes dynamically according to the
patterns of accesses that occur. Others rely on the program to specify how many sectors should be
transferred as a block.
When two or more client programs make read or write requests at the same time, the server must
queue them. The response time experienced by an individual program then depends on how busy the
disk is. It is determined by the server’s service ratio, or load factor: the ratio of its actual load to its
potential throughput. We may estimate average response time using simple queuing theory: Suppose
a disk is busy 80% of the time, and idle 20% of the time. To any given program, the effect is as if the
disk had only 20% of its potential performance, so that access time is increased by a factor of 5. If the
disk is busy 95% and idle 5% of the time, the response time is increased by a factor of 20. At 100%
load, there is no idle time, and the average response time becomes infinite. (The argument may seem
simplistic, but the results are correct.) Anyone who has used a network file server is aware that as the
number of the network’s clients increases, its response time slows dramatically. Once the number of
clients reaches a critical level, the file system virtually stops.
An operating system or file server does not necessarily serve requests in the order they are made. In
the widely used ‘elevator algorithm’, the disk heads are swept alternately inwards and outwards from
track to track, like an elevator (lift) going up and down from floor to floor. In this way, requests are
served in an order that minimises seek time, rather than first-come, first-served order. The more
requests are in the queue, the smaller is the average seek distance. The Phantom II has a track-to-track
9
Know Your Enemy
seek time of 0.5ms. Therefore, in the limit, the elevator algorithm could reduce its average access time
from 9ms to 3.5ms — nearly tripling its potential throughput.
Finally, an operating system usually sets aside part of RAM as a disk cache. A disk cache is
essentially a large buffer that is shared by all files. Sectors read from disk remain in the cache until
room is needed for other sectors. The sector that is purged is usually determined by the LRU (least
recently used) algorithm. A consequence of this is that a small file may fit entirely within the cache, so
once a sector has been read, future reads from it are virtually free of delay.
What happens when a sector is written? One policy is to update the cache but to delay writing the
change to the disk until the sector is purged. In this way, several writes to the cache may result in only
one write to the disk. But this is a dangerous game. If the power fails, there may be insufficient time
to record the contents of the volatile cache on disk. A safer but slower alternative is to write sectors to
disk every time they are changed.
10
Know Your Enemy
no moving parts need means of being accessed, so each byte needs to be associated with a reasonably
complex physical structure. Having a small number of read heads that can move relative to a medium
may always prove cheaper. Likewise, the recording density on a uniform medium may always prove
greater than on a structured one. One day, perhaps the moving parts will be manufactured using
nanotechnology, so that even the largest stores will be very fast. Then, for most of us, efficient use of
these stores will no longer be an issue. But however fast they become, it will always be possible to
make even faster stores at somewhat greater cost. In other words, the idea of a storage hierarchy
(registers, RAM, secondary storage and back up) will always be with us. Somewhere, someone will
still have too much data and not enough time.
11
Cobol Basics
3 Cobol Basics
To illustrate files and databases, we shall use an example of a wholesale distribution operation:
Serv-U-Rite buys goods in bulk from suppliers and sells them in smaller numbers to customers, who
are typically retail stores. Orders are made through the postal system or by telephone. Serv-U-Rite’s
Cobol database consists of three master files: Suppliers, Customers, and Products. The master files
store information about the status of long-lived objects. The other files are transaction files, which
record business activities, such as a sale or a customer payment.
The case study may seem complex at first. Bear two things in mind: it has to be complicated enough
to illustrate a lot of different points, and real systems are far more complicated than this. We are going
to ignore discounts, taxes, commissions, and a host of other real-life complications.
default, Cobol systems typically store signs in these two bits. This changes the ASCII code to one that
represents some other character, so ‘+$123,456.78’ may well be stored as ‘1234567H’ — if the ASCII
character set is used. The sign is represented by an ‘s’. Because the sign is packed into unused bits,
and the decimal point is implicit, Balance occupies 8 bytes.
Cobol does not store numbers in binary form by default, although it is possible to over-ride this. The
reason is that Cobol programs don’t usually do much arithmetic compared with text input and output.
Storing numbers in decimal saves converting numbers between binary and decimal notation. Many
computers efficiently support packed-decimal arithmetic. Packed-decimal notation packs two decimal
digits per byte, and converting between packed decimal and ASCII is simple and fast.
Group items — those made up of elementary items, such as Address and Supplier — don’t need
pictures. To a good approximation, they are regarded simply as character strings whose length is the
sum of the lengths of the items they contain. Thus Address effectively has ‘pic x(90)’ and Supplier
effectively has ‘pic x(102)’.
Serv-U-Rite’s Customers file is used to keep track of how much customers owe. It must therefore
contain similar information to the Suppliers file. However, although Serv-U-Rite are willing to owe
large amounts of money to their suppliers, they don’t like their customers to owe them too much.
Accordingly, each customer is set a maximum amount that they may owe, called ‘Credit-Limit’. This
is always a positive multiple of $1,000. Serv-U-Rite also keep track of how much credit a customer
has left, called ‘Available-Credit’. If a customer orders goods that would make ‘Available-Credit’
become negative, the order is rejected.
It might seem that Available-Credit could be calculated as the difference between Credit-Limit and
Balance, but it’s not that simple. If a customer orders goods that are not in stock, Serv-U-Rite auto-
matically create a ‘back order’, a reminder to supply the goods when they become available. These
goods will eventually have to be paid for, so they reduce the Available-Credit, but since the customer
hasn’t received them, they don’t count towards the Balance. We have the following equation:
Available-Credit = Credit-Limit – Balance – Back-Orders
The Cobol record description of a customer record is as follows.
01 Customer.
02 Account pic a999.
02 Address.
03 Name pic x(30).
03 Street pic x(30).
03 Suburb pic x(30).
02 Balance pic s9(6)v99.
02 Credit-Limit pic 999ppp.
02 Available-Credit pic s9(6)v99.
Credit-Limit is never negative, so it does not need a sign, nor does it need a decimal point. Since its
last 3 digits are always zeros, they don’t need to be stored either. The letter ‘P’ indicates an implicit
zero. Credit-Limit therefore occupies only 3 bytes. Available-Credit is like Balance. It uses 8 bytes.
The group item ‘Address’ occupies 90 bytes. The level-1 ‘Customer’ record occupies 113 bytes.
Finally, we need to describe the Product records. Here is the Cobol description.
01 Product.
02 Item-No pic x(6).
02 Description pic x(40).
02 Supplier pic a999.
02 Stock pic 9999.
02 On-Order pic 9999.
02 Reorder-Level pic 9999.
02 Reorder-Qty pic 9999.
02 Price pic 9999v99.
02 Valuation pic 9(6)v99.
Each product is uniquely identified by a 6-character ‘Item-No’, such as ‘ACLOTP’. It has a 40-char-
acter description, such as ‘Alcatel One Touch Phone’. ‘Supplier’ identifies the supplier who currently
sells the product to Serv-U-Rite. It has the same format as the ‘Account’ in the Suppliers file. Serv-U-
Rite’s computer system is intended to reorder products when their stocks become low. The number of
items in stock is given by ‘Stock’. ‘On-Order’ records the number of items already on order from the
supplier, but not yet delivered. If the sum of these two does not exceed ‘Reorder-Level’, a new order
13
Cobol Basics
will be created to purchase ‘Reorder-Qty’ items. ‘Price’ indicates the unit price (charge per item) of
the product to customers. Since the cost of items as determined by the supplier varies, it is pointless to
store it. Instead, each product record stores a ‘Valuation’, which measures the total cost of all the
items in stock. When goods are added to stock, Valuation is increased by their true cost; when they are
sold, Valuation is decreased pro rata, ie., by the average cost per item.
Adding up the lengths of the elementary items, we see that each Product record occupies 80 bytes.
If we assume that records are stored within 512-byte blocks, the Suppliers file stores 5 records per
block, the Customers file stores 4 records per block, and the Products file stores 6 records per block.
14
Cobol Basics
15
Cobol Basics
All but the most trivial Cobol programs consist of the same four divisions: identification,
environment, data, and procedure.
• The identification division, as the name suggests, identifies the program.
• The environment division links the program to its environment: the operating system. This is
where we expect to find all the operating system dependent features of the program. If we ported
the program to a different operating system, we might expect to make a few changes here, but the
rest of it shouldn’t need to be touched.
• The data division describes all the data used by the program.
• The procedure division defines the executable instructions the computer should follow.
3.3.1 The Identification Division
Here is the identification division again.
000010 IDENTIFICATION DIVISION.
000020 Program-ID. copysupp.
000030
000040* Copies the supplier file.
000050* Written on 20/11/00.
000055* Balance added to Supplier file 04/02/01
First, let’s explain the numbers on the left. Once upon a time, they identified the lines of the program
for editing purposes. However, modern program editors have no use for line numbers, so this is our
last example that will show them. However, even if we don’t use them, we have to leave 6 blank
spaces in their place. Cobol assumes that lines consist of 80 columns, used as follows:
1–6 Line number
7 Comment indicator
8–11 Area A
12–72 Area B
73–80 Program identification
Column 7 is also normally blank, but if it contains an asterisk, the whole line following is a comment
and is ignored by the compiler.
Columns 73 onwards are always ignored. Some programmers like to write the date of any revisions
they make to the program there, so that the history of the program can be traced. Again, modern
revision control systems make this unnecessary. However, it is important to remember it, because if
any text strays beyond column 72 it is totally ignored.
Cobol also requires a minimum standard of program layout. Specifically, headings or Area-A entries,
have to begin in columns 8–11, and other statements (Area-B entries) have to begin in column 12
onward. Any additional attention to layout is nice to have, but Cobol doesn’t demand it.
Line 000010 is the identification division heading, and must begin in Area A. So must the
paragraph heading on Line 000020. The program-id paragraph gives the program a name. Most
Cobol compilers expect this name to agree with the name of the program file. For example, the
program file containing the ‘copysupp’ program would typically be named ‘copysupp.cbl’ — where
the ‘.cbl’ extension tells the Cobol compiler that it is a program text file.
In the early days of Cobol, the identification division was more complicated. There were standard
ways of noting who wrote the program, when and where it was written, and so on. Today, this
information is written as comments.
3.3.2 The Environment Division
The environment division is where we link the program to the operating system environment.
Because files are visible to both the Cobol program and the operating system, they are usually an
important element of the environment division. If we are using files at all, the environment division,
input-output section, and file-control paragraph headings are all required. These are then followed
here by two select statements, one for the file to be copied (‘Suppliers’), and one for the copy to be
created (‘Saved-Suppliers’). We have already discussed the other features of these statements.
16
Cobol Basics
000100 ENVIRONMENT DIVISION.
000110 Input-Output Section.
000120 File-Control.
000130 select Suppliers assign to "newsupp.ndx"
000140 organization is indexed,
000150 record key is Account of Suppliers
000160 access is sequential.
000170 select Saved-Suppliers assign to "oldsupp.ndx"
000180 organization is indexed,
000190 record key is Account of Saved-Suppliers
000200 access is sequential.
3.3.3 The Data Division
If a program uses any variables at all, the data division heading is required, and if it uses any files, so
is the file section heading. Each file then needs to be described by an FD (File Definition) entry. This
consists of the name of the file, followed by the record descriptions of the records it contains. We have
already discussed the description of the Suppliers file. Since both files have the same layout, their
record descriptions are identical.
000300 DATA DIVISION.
000310 File Section.
000320 FD Suppliers.
000330 01 Supplier.
000340 02 Account pic a999.
000350 02 Address.
000360 03 Name pic x(30).
000370 03 Street pic x(30).
000380 03 Suburb pic x(30).
000385 02 Balance pic s9(6)v99. 04/02/01
000390 FD Saved-Suppliers.
000400 01 Supplier.
000410 02 Account pic a999.
000420 02 Address.
000430 03 Name pic x(30).
000440 03 Street pic x(30).
000450 03 Suburb pic x(30).
000455 02 Balance pic s9(6)v99. 04/02/01
3.3.4 The Procedure Division
The Cobol procedure division specifies the program logic. It is usually divided into a number of
short procedures. This is an unfortunate name for them, because Cobol procedures do not correspond
to what are called procedures in other languages; they don’t have parameters. Cobol calls
parameterised procedures ‘programs’. Thus, it is possible to nest one Cobol program inside another,
or link to a library program. Cobol procedures are best thought of as refinements. We sketch the
outline of the program, then fill in the details later as refinements.
Let’s consider the procedure division one refinement at a time. After the procedure division heading,
comes the paragraph heading for the first refinement, ‘Process-All-Suppliers’.
000500 PROCEDURE DIVISION.
000510 Process-All-Suppliers.
000520 open input Suppliers, output Saved-Suppliers
000530 perform Get-Next-Supplier
000540 perform until Account of Suppliers = high-values
000550 perform Copy-One-Supplier
000560 perform Get-Next-Supplier
000570 end-perform
000580 close Suppliers, Saved-Suppliers
000590 stop run.
The program begins at Line 000520 by opening the existing supplier file for input, and its new copy
for output. Opening a file for input means checking that the file exists and finding where it is stored on
disk. This information is stored in the operating system directory structure, so the environment divis-
ion entry for ‘Suppliers’ is consulted to translate the name into ‘newsupp.ndx’, which the operating
system can understand. Similarly, opening the output file means that the operating system will create a
directory entry for it. If a file named ‘oldsupp.ndx’ already exists, it might be over-written. After the
17
Cobol Basics
open statement, ‘Suppliers’ is poised to read its first record, and ‘Saved-Suppliers’ is poised to write
its first record.
The program then does whatever is needed to read the first Suppliers record (Line 000530). The
perform verb indicates that the program will execute the procedure ‘Get-Next-Supplier’, which we
will refine shortly.
Lines 000540–000570 form a loop. The loop repeatedly performs ‘Copy-One-Supplier’ and ‘Get-
Next-Supplier’. This goes on until every record has been copied. This is signalled by the condition
‘Account of Suppliers = high-values’. Why we test this particular condition will be explained later.
The important thing to notice is that a loop is needed. Cobol processes files one record at a time. The
assumption is that the whole file will not fit into memory, but individual records will.
Confusingly, Cobol uses the perform verb for two unrelated purposes, to execute a procedure or, in
conjunction with end-perform, to delimit a loop. The reason is historical.
After the loop, Line 000580 closes the two files. Among other things, this ensures that the last block
of the Saved-Suppliers file is written to disk and that its directory entry is made to show which blocks
it contains.
Last, but not least, Line 000590 stops the program, returning control to the operating system. For
historical reasons, this does not happen automatically. If you forget to return control to the operating
system, all sorts of strange things can happen.
We now have two refinements to consider: ‘Get-Next-Supplier’ and ‘Copy-One-Supplier’.
001000 Get-Next-Supplier.
001010 read Suppliers next record,
001020 at end
001030 move high-values to Account of Suppliers
001040 end-read.
‘Get-Next-Supplier’ consists of a single read statement. The first time we execute this statement, it
will make the first record of the Suppliers file available in what is called the file’s current record area.
You can think of this area as the record defined in the file section, within the FD for the Suppliers file.
The second time ‘Get-Next-Supplier’ is executed — which is at the end of the first iteration of the
loop, the second record of the Suppliers file will be made available in the current record area. Each
iteration of the loop will make a new record available, until the last one is reached. After this, the read
statement cannot make a new record available, so instead, it activates its at end clause. This causes a
special value to be moved (ie, copied) into the first 4 bytes of the current record area (Line 001030).
Note that the end-read delimiter is needed to mark the end of the scope of the at end clause.
The special value used is a predefined ‘figurative constant’ called high-values. This denotes one or
more bytes containing binary 1’s, or ASCII code 255. On most computers, this is not a printable
character, and typically corresponds to DELETE. We can safely assume that no valid account code will
match high-values, and in fact high-values is greater than any valid account code.
If we consider the behaviour of ‘Account of Suppliers’ throughout the program, we can see it will
increase steadily. This is because the Suppliers file has indexed organization, so records will be read in
order of increasing primary key. When the end of file is reached and ‘Account of Suppliers’ finally is
set to high-values, the loop will exit and the program will terminate.
002000 Copy-One-Supplier.
002010 move Supplier of Suppliers to Supplier of Saved-Suppliers
002020 write Supplier of Saved-Suppliers
002030 invalid key
002040 stop run
002050 end-write.
Copying a supplier record requires two steps: first the record must copied from the current record area
of the Suppliers file to the current record area of the Saved-Suppliers file (Line 002010), then it must
be written to the file (lines 002020–002050). When the record is moved, it is copied as a whole.
There is no need to move each of its components separately.
The write statement adds one record to the file at a time. Because we are dealing with an indexed
file, Cobol requires that the write statement has an invalid key clause; since the file is in sequential
access mode, the records written to it must be written in ascending order of ‘Account’. It is impossible
for such a sequence error to occur in this program, because the input file is being read in ascending
18
Cobol Basics
order. Even so, we must specify something here, so we make the program return control to the
operating system.
A Cobol program always begins by executing the first procedure in its procedure division. This will
typically perform other procedures. The order in which these are written doesn’t matter. However, a
useful convention is to place each procedure somewhere after the last line where it is performed. Then,
someone reading a perform statement knows to look for the procedure somewhere further on rather
than further back.
Because we have specified in the environment division that the Saved-Suppliers file is indexed and its
record key is Account, the Cobol run-time system will ensure that an index is created for the file, and
saved to disk when the file is closed. It may seem strange that this information is written in the
environment division. The reason is that, although Unix and DOS don’t do it, many operating systems
provide special support for indexed files. Therefore the primary key information is of concern to the
operating system.
Why did we say using the program was better than copying the file using an operating system
command? Because when an indexed file is written sequentially, its records are stored sequentially
and the file is well organised and efficient to use. Once the file has been modified by updating, it can
get into quite a mess. Most operating systems would simply copy the mess.
We need to be careful in the use of the current record areas. In certain situations their contents are
said to be ‘undefined’. This means we should not rely on them having any particular value. For
example, when end of file is detected, an input area may still contain the last record of the file, but we
can’t rely on it. Likewise, after a record has been written, an output area may still contain the record
that was written, but again we can’t rely on it. It depends on the Cobol run-time system. In some
cases the output record area is copied to a buffer where the current block of the file is being built. In
other cases it is a part of the block buffer itself, indicated by a pointer associated with the file. In this
case, when the record is written the pointer moves on to another part of the buffer, which might contain
anything.
Worse still, if a file has not yet been opened or has already been closed, its current record area might
not even exist!
3.3.5 Qualification
The example program has made use of qualified names, such as ‘Account of Suppliers’, or ‘Supplier
of Saved-Suppliers’. This is necessary because there are two elementary items named Account, and
two records named Supplier. A name is fully qualified when it is qualified by the name of every
structure of which it is a part. For example ‘Street of Address of Supplier of Suppliers’ is a fully
qualified name. Cobol doesn’t require names to be qualified more than is necessary. Since the
Suppliers file definition only contains one item called ‘Street’, ‘Street of Suppliers’ is enough to
distinguish it from ‘Street of Saved-Suppliers’.
Incidentally, the word ‘in’ can be used interchangeably with ‘of’. If at line 002010 we had preferred
to write ‘move Supplier in Suppliers to Supplier in Saved-Suppliers’, that would have had exactly the
same effect. The choice is merely a matter of taste.
In these notes, we shall be careful to make sure that no two items within a file have the same name.
Therefore it will always be enough to give the item name and file name, as in ‘Street of Suppliers’.
This is just a private convention. It isn’t a rule of Cobol.
3.4 Flowcharts
A flowchart is a diagram that shows the flow of control in a program. They once were used as a step
in program design, but are no longer recommended for this purpose, because it is possible to draw a
flowchart that can’t be written as a properly structured program. However, they do help some people
understand programs better, so flowcharts will be sometimes used in these notes.
Flowcharts use three main kinds of boxes: rectangles represent actions executed by the program,
diamonds represent conditions the program tests, and wedge-shaped connectors mark its beginning and
ending. The boxes are linked by arrows that show how control passes from one box to another.
Actions have only one arrow leaving them, but conditions have two or more, each marked with a
possible result of the test (typically ‘true’ or ‘false’). Here is the flowchart of the copy program.
19
Cobol Basics
Start
True
End of
input?
False
Begin reading at the start connector. The program will be seen to open both files, then read the first
input record. This action is followed by a test for end of input, with flow continuing to the right if the
end of input is detected, or downwards if it is not. Assuming that the end of input is not detected, the
program copies the input record to the output area, writes the new output record, and then reads the
next input record. Control returns to the top of the loop, which is the test for end of input. Control will
flow round and round this loop once per record, until the input file is exhausted. When the end of
input is detected, the program will close both files, then stop.
20
An Example Sub-System
4 An Example Sub-System
To understand transaction files, we first need to examine the structure of a typical Cobol sub-system.
A sub-system consists of a collection of files and programs having a unified purpose. The programs
cooperate in the use of master files, and they communicate with each other via transfer files. Sub-
systems are often described by run diagrams. The following run diagram describes the sub-system
that deals with deliveries from suppliers and payments to suppliers.
Documents, such as deliveries, payments, etc. (1) are entered using a data preparation program (2) to
produce a transaction file (3). This file is read by a program (4) that, in the case of a delivery, checks
that the Item-No on each order is recorded on the Products file (9), adjusts the stock and valuation of
the product, then writes a record on the transfer file (5). In turn, this file is read by a program (6),
which checks that the account code on the order refers to a record on the Suppliers file (10), and debits
the balance owed to the supplier. In the case of Payments, the first program (4) simply copies the
transaction record to the transfer file (5), and the second program (6) credits the balance. The Stock
update program (4) produces a report (7) that displays the updated stock information. The Supplier
update program (6) reports the updated supplier Balances (8).
Don’t assume that the programs are run equally often. An operator may use the data entry program
several times to build up a batch of transactions. A batch may then be processed by the Update Stock
program. Several of these batches may be accumulated before updating the Suppliers file.
Apart from a brief description of each process, a run diagram does not attempt show the logic of the
programs. Nor does it specify the times when the programs are run. Although it makes no sense to
process transactions before they are prepared, programs often run to a fixed schedule, so it can happen.
Therefore, we need to be careful that programs won’t fail if their transaction files happen to be empty.
21
An Example Sub-System
fractions of a second. Every transaction record will contain a time-stamp. Another thing that every
transaction record needs is a code to indicate the kind of transaction.
Here is how Cobol describes a Delivery record.
01 Delivery.
02 Time-Stamp.
03 YY-MM-DD pic 9(6).
03 HH-MM-SS pic 9(6).
02 Kind pic x.
02 Item-No pic x(6).
02 Account pic a999.
02 Qty-Delivered pic 9999.
02 Cost pic 9(6)v99.
‘Time-Stamp’ has already been explained. ‘Kind’ is a single character that indicates the kind of
transaction, in this case the letter ‘D’. ‘Account’ specifies the customer making the order and ‘Item-
No’ specifies the product being ordered. Finally, ‘Qty-Ordered’ says how many items of the product
the customer wants to buy.
A supplier Payment record is defined as follows.
01 Payment.
02 Time-Stamp-2.
03 YY-MM-DD-2 pic 9(6).
03 HH-MM-SS-2 pic 9(6).
02 Kind-2 pic x.
02 Item-No-2 pic x(6).
02 Account-2 pic a999.
02 Amount pic 9(6)v99.
A payment is indicated by a Kind coded as ‘$’. ‘Account-2’ specifies the customer making the
payment, and ‘Amount’ is the amount paid. ‘Item-No-2’ is a dummy value whose use will be
explained in a later section.
Some explanations are now in order, which are easier to understand from a diagram:
01 Delivery
02 Time-Stamp Kind Item-No Account Qty-Delivered Cost
03 YY-MM-DD HH-MM-SS
01 Payment
02 Time-Stamp-2 Kind-2 Item-No-2 Account-2 Amount
03 YY-MM-DD-2 HH-MM-SS-2
Delivery records occupy 35 bytes, but Payment records are 4 bytes shorter. However, the first 12
bytes of either record contain the time-stamp, and the 13th byte always contains the kind, even though
the items have different identifiers. Suppose the program is reading transaction records from a file
called ‘Updates’. When it reads a record into the current record area, it can’t tell which kind of record
it has read until it checks the value of ‘Kind’. Logically, if it is a delivery, it should test ‘Kind of
Updates’; if it is a payment, it should test ‘Kind-2 of Updates’. Which should it test?
Actually, it doesn’t matter. Both names refer to the same byte in the current record area, so it doesn’t
matter which is used. The two names mean the same thing, they are aliases or synonyms. Writing
‘Kind of Updates’ defines the 13th byte of the record area. If the record area actually contains a
payment record, Cobol isn’t smart enough to care.
What doesn’t work, would be to change the identifier ‘Kind-2’ to ‘Kind’. Then both items would
then be called ‘Kind of Updates’. The catch is that the Cobol compiler will regard this as ambiguous;
it could mean ‘Kind of Delivery of Updates’ or it could mean ‘Kind of Payment of Updates’. The
compiler is not clever enough to deduce that the distinction doesn’t matter.
It is up to the programmer to handle different record types properly, or strange things can happen:
Suppose ‘Kind of Updates’ contains ‘$’, so that the record area contains a Payment, but the program
refers to ‘Qty-Delivered of Updates’, which is only present in a Delivery record. It will see bytes
24–27 of the record, which are the first 4 bytes of ‘Amount of Updates’. Similarly, if it refers to ‘Cost
of Updates’ it will see the last 4 bytes of ‘Amount of Updates’ followed by 4 bytes of garbage from
beyond the end of the record. Conversely, a program that refers to ‘Amount of Updates’ when a
22
An Example Sub-System
Delivery record is present will see 4 bytes from the Qty-Delivered, and the first 4 bytes of Cost. On
the other hand, the program can safely refer to ‘Time-Stamp’, ‘YY-MM-DD’, ‘HH-MM-SS’, ‘Kind’,
‘Account’ and ‘Item-No’ irrespective of what kind of record is present, because these occupy the same
positions in both kinds of record.
In what follows, we shall use the convention that names with suffixes, like ‘Kind-2’ are for doc-
umentation only. We won’t refer to them. They could be omitted without affecting the program.
23
An Example Sub-System
The procedure division also follows the previous example closely, at least at first:
PROCEDURE DIVISION.
Process-All-Updates.
open input Updates, output Saved-updates
perform Get-Next-Update
perform until Time-Stamp of Updates = high-values
perform Copy-One-Update
perform Get-Next-Update
end-perform
close Updates, Saved-Updates
stop run.
Reading a record from the Updates file is similar to reading one from the Suppliers file, even though
the Updates file contains more than one kind of record. A read statement refers to a file; it can’t poss-
ibly refer to a particular record. It can’t know what kind of record to read until it has already read it!
Get-Next-Update.
read Updates next record,
at end
move high-values to Time-Stamp of Updates
end-read.
Writing records must be done with care. When Delivery records are written, 35 bytes need to be
recorded. When payment records are written, 31 bytes need to be recorded. Cobol requires the
program to specify what kind of record is being written. The program must contain two different
write statements, so there are two cases to deal with. The rule is, ‘Read the file, write the record.’
Cobol deals with case analysis using the evaluate statement. Evaluating ‘Kind of Updates’ yields
two possible values, ‘D’ for a delivery, or ‘$’ for a payment. The two when clauses deal with these
cases by executing the proper move and write statements. End-evaluate marks the end of the final
when clause.
Copy-One-Update.
evaluate Kind of Updates
when "D"
move Delivery of Updates to Delivery of Saved-Updates
write Delivery of Saved-Updates
when "$"
move Payment of Updates to Payment of Saved-Updates
write Payment of Saved-Updates
end-evaluate.
The main difference between this and the earlier example is that although a master file often contains
only one kind of record, we expect a transaction file to contain several kinds.
By defining a pair of constants, there is an alternative, self-documenting way to write this paragraph.
We first define some suitable constants in the working-storage section of the data division:
Working-Storage Section.
77 Delivery-Code pic x value "D".
77 Payment-Code pic x value "$".
We may then write,
Copy-One-Update.
evaluate Kind of Updates
when Delivery-Code
move Delivery of Updates to Delivery of Saved-Updates
write Delivery of Saved-Updates
when Payment-Code
move Payment of Updates to Payment of Saved-Updates
write Payment of Saved-Updates
end-evaluate.
Later, if the coding scheme were changed, only the constant definitions would need to be modified;
‘Copy-One-Update’ could remain the same.
24
An Example Sub-System
Start
True
End of
input?
False
Stop
Move delivery Move payment
record to output record to output
25
An Example Sub-System
Process-All-Updates.
open input Updates, output Saved-updates
perform with test after
until Time-Stamp of Updates = high-values
perform Get-Next-Update
if Time-Stamp of Updates not = high-values
perform Copy-One-Update
end-if
end-perform
close Updates, Saved-Updates
stop run.
They reason that it makes more sense for the loop to read a record, then process it. But this creates
two exceptions: When the end of file is reached, the final read does not return a record, so we have to
be careful not to process it. Also, until we read the first record, we cannot be sure that record area
doesn’t contain high-values, so we have to avoid testing the loop condition before the first iteration.
Another style, recommended in at least one textbook, is as follows.
Process-All-Updates.
open input Updates, output Saved-updates
perform until 1 = 2
read Updates next record,
at end
close Updates, Saved-Updates
stop run
not at end
perform Copy-One-Update
end-read
end-perform.
We don’t recommend using either of these approaches. They work well enough when only one input
file is read, but they don’t adapt to reading more than one file at the same time. We have started as we
mean to go on. The read at the start of the procedure, we call a priming read. Together with the open
statement, it gets the first record into the record area. The read at the end of the loop, we call a
refreshing read. Once we have finally done with a record, the read replaces it by the next one.
In any case, we should not be embarrassed by having two read statements. If the file contains N
records, it is convenient for the loop to have N iterations, one for each record to be processed. But the
program must execute N+1 reads; the last one will return the at end condition instead of a record.
Therefore it is necessary to have one read outside the loop — and it can hardly come after it!
It is good style to keep all the input logic in one paragraph. For example, burying a read operation in
Copy-One-Update would make the program just that bit harder to understand.
26
An Example Sub-System
27
Interacting With The Operator
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Customers assign to "newcust.ndx"
organization is indexed
record key is Account of Customers
access is sequential.
DATA DIVISION.
File Section.
FD Customers.
copy "customer.cbl".
The main procedure is similar to the other read loops we have seen:
PROCEDURE DIVISION.
List-All-Customers.
open input Customers
perform Get-Next-Customer
perform until Account of Customers = high-values
perform Display-Customer
perform Get-Next-Customer
end-perform
close Customers
stop run.
So is the means of reading the Customers file:
Get-Next-Customer.
read Customers next record,
at end
move high-values to Account of Customers
end-read.
Listing a customer record can be trivial.
Display-Customer.
display Customer of Customers.
28
Interacting With The Operator
However, the simple display statement has a major drawback, which we experience as soon as we run
the program:
A001Autobarn Elizabeth 61 Elizabeth Way Elizabeth SA 5
158 0002085{0050047915{
B003BCR Mobile Installations 25 Sydney Street Ridgehaven SA
5058 0000000{0020020000{
B007Blaupunkt Cnr Centre and McNaughton Rds Clayton VIC 31
09 0102950{0250125372E
B012Bobs Electronic Repairs 28 Limbert Avenue Seacombe Garden
s SA 5047 0001295}0010011295{
and so on ...
By displaying customer records, we see how they are represented on file. The 113 bytes won’t fit on
one line of the monitor, and although the Account code and Address are readable, Balance, Credit-
Limit and Available-Credit are confusing, to say the least. Thus, in the case of account A001, Balance
is ‘0002085{’. This is less mysterious if we realise that ‘{’ is actually a zero with an extra bit set on to
indicate a ‘+’ sign. Remembering that the amount shown is in cents, Balance is therefore $208.50. the
next 3 bytes represent the (unsigned) Credit-Limit, in thousands of dollars, so ‘005’ is really $5,000.
Finally, ‘0047915{’ is the Available-Credit in cents, ie, $4,791.50. (Since the sum of Balance and
Available-Credit equals Credit-Limit, we can deduce that this customer has nothing on back order.)
and so on ...
We therefore have the task of editing the numbers to show money amounts. Cobol makes this easy.
We have already seen examples of picture clauses for numeric and alphanumeric data. Here is an
example of an edited-numeric picture: ‘$$$$,$$9.99bdb’. The ‘$’ signs show the positions where a
dollar sign, space or digit might go, depending on the value of the number. The ‘$’ sign is said to
‘float’. The comma shows where a comma, space or dollar sign might go, again depending on the
value of the number. A ‘9’ shows where a digit goes, irrespective of the value of the number. The ‘.’
displays an actual decimal point. The letter ‘b’ indicates a blank space. The pair of letters ‘db’ will
display as ‘DB’ if the number is negative, but will otherwise display as two blanks.
This is only one way to show a sign. The combination ‘cr’ works similarly to ‘db’. Which of these is
used is an accounting convention. The general rule is to display ‘cr’ if a negative amount is to the
benefit of the person for whom the report is intended, but to display ‘db’ if it is to their disadvantage.
Here, a negative balance owing means that Serv-U-Rite owe money to the customer, so we have used
‘db’. When in doubt, ask an accountant.
Other ways to show a sign are by means of ‘+’ and ‘–’. Both display as ‘–’ when the number is
negative. The difference is that ‘+’ displays as ‘+’ when the number is positive, but ‘–’ displays as a
blank. The sign can be placed either before or after the number. If the sign is to appear immediately
before the first digit, a series of signs should be written, as in ‘----,--9.99’. A ‘–‘ sign will then either
be displayed as a blank or an actual sign, depending on the size of the number.
29
Interacting With The Operator
Two other symbols can be used to replace zeros at the start of a number: ‘z’ and ‘*’. A ‘z’ either
prints as a digit or a blank. A ‘*’ either prints as a digit or an asterisk. This is used on bank cheques to
prevent fraudulent alteration.
Finally, ‘b’, as we have seen, inserts a blank, and ‘/’ inserts a slash, as in a date. The reason ‘b’ is
needed is that a picture cannot include spaces. For example, we must write ‘x(30)’, not ‘x (30)’.
Here are some examples of how eight different pictures cause three different data values to be edited.
Study them carefully.
Picture -123,456.78 +123,456.78 0
999999.99 123456.78 123456.78 000000.00
zzzzzz.99+ 123456.78- 123456.78+ .00+
zzz,zzz.zzb- 123,456.78 - 123,456.78!!
----,--9.99 -123,456.78 123,456.78 0.00
$***,**9.99 $123,456.78 $123,456.78 $***,**0.00
$$$$,$$9.99cr $123,456.78CR $123,456.78!! $0.00!!
$$$,$$9.99cr $23,456.78CR $23,456.78 ! $0.00 !
99/99/99 12/34/56 12/34/56 00/00/00
Armed with this information, we can see that a picture of ‘$$$$,$$9.99bdb’ is just what we need to
produce the desired output format. The same picture can serve for all the amounts concerned, although
‘Credit-Limit’ cannot actually be negative.
What we don’t do is to modify the pictures used in the Customers record. This would be silly for two
reasons: First, it would make the records longer, by including redundant characters. Second, it would
not be possible to do arithmetic on the items in the record. Edited-numeric data are more like character
strings than numbers. Instead, we must introduce a working variable into the data division. It is not
part of a file, so it goes in the working-storage section.
DATA DIVISION.
File Section.
FD Customers.
copy "customer.cbl".
Working-Storage Section.
77 Edited-Amount pic $$$$,$$9.99bdb.
Ordinary level numbers can range from 01 to 49. The special level number 77 shows that ‘Edited-
Amount’ is not part of a data structure. (The compiler is therefore free to align the item on a 32-bit or
64-bit word boundary if that would make the program more efficient.)
We can now write a more sophisticated version of ‘Display-Customer’:
Display-Customer.
move Credit-Limit of Customers to Edited-Amount
display Account of Customers, space, Name of Customers,
" Credit Limit: ", Edited-Amount
move Balance of Customers to Edited-Amount
display " ", Street of Customers,
" Balance Owing: ", Edited-Amount
move Available-Credit of Customers to Edited-Amount
display Account of Customers, space, Name of Customers,
" Available credit: ", Edited-Amount
display spaces.
The move statement does not copy data blindly; the numbers are scaled to have the decimal points in
the correct places. The implicit decimal point in the picture of ‘Balance’ is aligned with the actual
decimal point in ‘Edited-Amount’, effectively converting cents to dollars and cents. Likewise, the
implicit zeros in ‘Credit-Limit’ are replaced by actual zeros, expanding thousands to dollars and cents.
The display statement takes a list of operands, which can be a mixture of variables and constants.
Normally, each display statement causes a new line. ‘Display spaces’ displays a blank line.
Some programmers will prefer the following alternative style. It is long-winded, but it perhaps makes
it easier to get the spacing right:
30
Interacting With The Operator
DATA DIVISION.
File Section.
FD Customers.
copy "customer.cbl".
Working-Storage Section.
01 Edited-Customer.
02 First-Line.
03 Account pic a999.
03 pic x value space.
03 Name pic x(30).
03 pic x(19) value " Credit Limit: ".
03 Credit-Limit pic $$$$,$$9.99.
02 Second-Line.
03 pic x(5) value spaces.
03 Street pic x(30).
03 pic x(19) value " Balance Owing: ".
03 Balance pic $$$$,$$9.99bdb.
02 Third-Line.
03 pic x(5) value spaces.
03 Suburb pic x(30).
03 pic x(19) value " Available Credit: ".
03 Available-Credit pic $$$$,$$9.99bdb.
Display-Customer.
move Account of Customers to Account of Edited-Customer
move Name of Customers to Name of Edited-Customer
move Street of Customers to Street of Edited-Customer
move Suburb of Customers to Suburb of Edited-Customer
move Credit-Limit of Customers
to Credit-Limit of Edited-Customer
move Balance of Customers to Balance of Edited-Customer
move Available-Credit of Customers
to Available-Credit of Edited-Customer
display First-Line of Edited-Customer
display Second-Line of Edited-Customer
display Third-Line of Edited-Customer
display spaces.
It turns out that we will need to display customer records in several more examples to follow, so we
will assume that the working-storage entries are stored in the file ‘editcust.cbl’ and the listing
procedure is contained in the file ‘dispcust.cbl’. Once this is done, instead of the above, we can write,
DATA DIVISION.
File Section.
FD Customers.
copy "customer.cbl".
Working-Storage Section.
copy "editcust.cbl".
and,
Display-Customer.
copy "dispcust.cbl".
We can develop a similar program to list the Products file. Without some cues, its output would be
difficult to understand. The program should display output like this:
Listing product records ...
and so on ...
31
Interacting With The Operator
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Products assign to "newprod.ndx"
organization is indexed,
record key is Item-No of Products
access is sequential.
DATA DIVISION.
File Section.
FD Products.
copy "product.cbl".
Working-Storage Section.
copy "editprod.cbl".
32
Interacting With The Operator
PROCEDURE DIVISION.
Process-All-Products.
display "Listing product records ..."
display spaces
open input Products
perform Get-Next-Product
perform until Item-No of Products = high-values
perform Display-Product
perform Get-Next-Product
end-perform
close Products
display "Listing complete."
stop run.
Get-Next-Product.
read Products next record
at end
move high-values to Item-No of Products
end-read.
Display-Product.
copy "dispprod.cbl".
5.2 Constants
We have already seen some examples of alphanumeric literals. They are character strings enclosed in
double quotation marks. If a quotation mark appears as part of a literal, it must be written twice,
otherwise the compiler will assume it marks the end of the literal. Numeric literals are straightforward,
and don’t need quotation marks. There is therefore a distinction between –123456.78, which is
numeric, and "–123456.78", which is alphanumeric. When a ‘.’ character is used as a decimal point, it
must be directly followed by a digit. A ‘.’ used as a period end marker must be followed by white
space.
Cobol also has some built-in figurative constants, which define constants that are confusing or
impossible to write:
quote "
comma ,
space a blank
high-value ASCII 255
low-value ASCII 0
zero either a numeric or alphanumeric zero, depending on context.
These names may also be written in the plural as quotes, commas, spaces, high-values, low-values,
and zeros or zeroes. The effect is the same as in the singular, and the choice is one of taste.
Variables can be assigned values in the data division, using a value clause. This does not make these
variables into true constants. The value is moved to the variable at the start of the program, initialising
it, but the program can modify any variable later if desired.
In an alphanumeric move, if the receiving field has more characters than the sending field, they are
filled with spaces (blank fill). If it has less, the excess characters are lost (truncation). Normally the
left-most bytes are aligned, but if the receiving field has the justified option, the right-most bytes are
aligned instead.
Most moves between fields of different types are allowed, but cause type conversion to occur. In
particular, all moves involving group items are treated as alphanumeric moves of the whole item.
One of the more useful consequences is that constants do not have be the same length as the items
they are moved to. For example, we may write,
move "Not Known" to Address of Customers
Because of the blank fill rule, this would set ‘Name of Customers’ to “Not Known” followed by 21
spaces, and would set all of ‘Street of Customers’ and ‘Suburb of Customers’ to spaces too.
One of the less useful consequences of these rules is that moves are allowed between any pair of
group items. For example,
move Supplier of Suppliers to Customer of Customers
would copy the Account, Address and Balance of a Supplier record to a Customer record, but because
of blank fill, would set its Credit-Limit and Available-Credit to spaces rather than zeros.
A move in the opposite direction:
move Customer of Customers to Supplier of Suppliers
would copy Account, Address and Balance, but because of truncation, Credit-Limit and Available-
Credit would be lost.
However, a move such as
move Order-Item of Updates to Supplier of Suppliers
would result in chaos. For example, ‘Account of Suppliers’ will contain the first 4 bytes of ‘Time-
Stamp of Updates’. Nonetheless, it is legal. Any Cobol compiler will accept it without a murmur.
5.4 Accept
The accept statement reads data from the keyboard. When a program executes an accept statement,
it pauses, waiting for something to be typed. Once the operator hits the R ETURN key, the data is
moved to its destination. At least, that is what happens in theory. Some systems move the data as soon
as the last character that will fit the destination is typed. Others are prepared to apply special rules to
numeric data. For example, typing ‘$123,456.78’ might be treated the same as typing ‘123456.78’; in
other words, all non-numeric characters are ignored.
Special forms of the accept statement are also used to obtain information from the operating system.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Suppliers assign to "newsupp.ndx"
organization is indexed
record key is Account of Suppliers
access is random.
DATA DIVISION.
File Section.
FD Suppliers.
copy "supplier.cbl".
The main procedure is similar to other examples we have seen. This time, the Suppliers file is opened
for output. Remember that we don’t open the keyboard file. Also, we can’t use high-values to mark
34
Interacting With The Operator
the end of the input, because it is impossible to type high-values on most keyboards. So we use
spaces. The program also displays some redundant output to tell the operator what is happening.
PROCEDURE DIVISION.
Create-All-Suppliers.
display "Creating a new Suppliers file..."
display "(A blank account code terminates the program.)"
open output Suppliers
perform Get-Next-Account
perform until Account of Suppliers = spaces
perform Create-One-Supplier
perform Get-Next-Account
end-perform
close Suppliers
display "Program terminated by operator. All records saved."
stop run.
Getting the next account code is easy.
Get-Next-Account.
display "Account: " with no advancing
accept Account of Customers.
The computer will type ‘Account: ’, then pause, without advancing the cursor to the next line. The
operator is then expected to type the account code of the customer and hit the R ETURN key. To exit the
program, the operator types a blank account code. Because of the rules for moving alphanumeric data,
any number of blanks will do, including zero.
‘Create-One-Supplier’ has the job of reading the rest of the supplier data, then writing the new record.
It must read the address and balance details from the keyboard.
Create-One-Supplier.
display " Name: " with no advancing
accept Name of Suppliers
display " Street: " with no advancing
accept Street of Suppliers
display " Suburb: " with no advancing
accept Suburb of Suppliers
display "Balance: " with no advancing
accept Balance of Suppliers
write Supplier in Suppliers
invalid key
display "Sorry, a record already exists for account ",
Account of Suppliers
not invalid
display "Supplier record created, thank you."
end-write.
Because we are dealing with an indexed file, the write statement must include an invalid key clause.
It could only be activated if the operator tried to create two records with the same account number. No
two records can have the same primary key. The not invalid clause keeps the operator informed and
happy as each record is created.
A disadvantage of this program is that the operator can create records in any order. This means the
resulting file is not as well organised as it might be. Logically, its records will be in account code
order. Physically, they could be less well ordered; they might be stored in the order they were created.
This will lead to the file being inefficient both to read or update. Copying the file sequentially, using a
program such as ‘copysupp’, will optimise the internal structure of the file.
Incidentally, it would be trivial to change the program to use sequential access. Only two things need
to be changed: the access clause in the environment division, and the display statement in the invalid
key clause of the write statement. This should be replaced by two display statements to read,
display "Sorry, accounts must be in ascending order. ",
display Account of Suppliers,
" is smaller than the previous account code."
When an indexed file is written sequentially, each key value must be greater than the previous one.
The operator should therefore first sort the records to be entered into the right order.
35
Interacting With The Operator
Start
Open supplier
output file
True
End of
input?
False
Yes No
Duplicate
record?
36
Interacting With The Operator
01 Edited-Update.
02 Edited-Date pic 99/99/99.
02 Edited-Time pic 99/99/99.
02 Qty-Delivered pic z,zz9.
02 Cost pic $$$$,$$9.99.
02 Amount pic $$$$,$$9.99.
Likewise, imagine the following text is stored in ‘dispupd8.cbl’.
move YY-MM-DD of Updates to Edited-Date
move HH-MM-SS of Updates to Edited-Time
inspect Edited-Time replacing all "/" by ":"
evaluate Kind of Updates
when Delivery-Code
move Cost of Updates to Cost of Edited-Update
move Qty-Delivered of Updates
to Qty-Delivered of Edited-Update
display Edited-Date, space, Edited-Time ", Delivery: ",
Item-No of Updates, space,
Account of Updates, space,
Qty-Delivered of Edited-Update, space,
Cost of Edited-Update
when Payment-Code
move Amount of Updates to Amount of Edited-Update
display Edited-Date, space, Edited-Time ", Payment: ",
Account of Updates, space,
Amount of Edited-Update
end-evaluate.
The only new feature here is the way the date and time are displayed. The picture of ‘Edited-Date’
ensures that a value of ‘011225’ in ‘YY-MM-DD of Updates’ will be displayed as ‘01/12/25’. We
would like the picture of ‘Edited-Time’ to be ‘99:99:99’. Unfortunately this is not something we can
achieve simply by choosing the right picture. Instead, we allow a value such as ‘123645’ in
‘HH-MM-SS of Updates’ to be converted to ‘12/36/45’ by the move. The inspect statement will then
replace both instances of ‘/’ by ‘:’, giving ‘12:36:45’. The Cobol inspect statement can be used for a
variety of editing functions, but it is inappropriate to discuss it further here.
With these tools at our disposal, we can then develop a program to record transactions.
IDENTIFICATION DIVISION.
Program-ID. makeupd8.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select optional Updates assign to "updates.seq"
organization is sequential.
In the environment division, the Updates file is described as optional. This is because the program
will open it in extend mode. Extend mode is similar to output mode, but if the file exists, new
records are appended to the end of the existing records. Making the file optional means that it will not
be an error if the file does not exist, and a file will be created containing the new records.
DATA DIVISION.
File Section.
FD Updates.
copy "update.cbl".
Working-Storage Section.
copy "editupd8.cbl".
copy "constant.cbl".
77 HH-MM-SS-hh pic 9(8).
77 Yes-No-Response pic x.
The working-storage section contains two level-77 items whose use will be explained shortly.
37
Interacting With The Operator
PROCEDURE DIVISION.
Process-All-Updates.
display "Creating Update records ..."
open extend Updates
perform Get-Next-Kind
perform until Kind of Updates = "Q"
perform Process-One-update
perform Get-Next-Kind
end-perform
close Updates
display "Job complete."
stop run.
The procedure division opens the Updates file in extend mode. This allows the operator to add new
transactions to the end of the existing Updates file. The procedure division then continues with the
usual read loop, which terminates when the operator types the letter ‘Q’ (for ‘Quit’).
Otherwise, since different kinds of transaction require different data, the ensuing dialogue depends on
‘Kind of Updates’. If it does not equal ‘D’ or ‘$’, the program displays a list of valid options. A
payment has a dummy item number.
Process-One-Update.
perform Make-Time-Stamp
evaluate Kind of Updates
when Delivery-Code
perform Get-Item-No
perform Get-Account
perform Get-Qty-Delivered
perform Get-Cost
perform Confirm-Update
when Payment-Code
move Dummy-Item-No to Item-No of Updates
perform Get-Account
perform Get-Amount
perform Confirm-Update
when other
display "Choose one of the following:"
display "D Record a delivery from a supplier."
display "$ Record a payment to a supplier."
display "Q Quit the program."
end-evaluate.
After requesting all the required items, the program displays the transaction, then asks the operator to
confirm that it is correct. Any value in ‘Yes-No-Response’ other than ‘Y’ or ‘y’ is taken to mean ‘No’.
A ‘Yes’ response results in the correct record being written to the Updates file; a ‘No’ response results
in nothing being written. In each case, the operator is told what action was taken:
Confirm-Update.
perform Display-Update
display "Is this correct (Y/N)? " with no advancing
accept Yes-No-Response
if Yes-No-Response = "Y" or Yes-No-Response = "y"
evaluate Kind of Updates
when Delivery-Code
write Delivery of Updates
when Payment-Code
write Payment of Updates
end-evaluate
display "Transaction written to file."
else
display "Transaction ignored."
end-if.
Since the Updates file has sequential organization, it has no primary key. Consequently, an invalid
key clause is neither needed nor allowed.
The time-stamp data is obtained by special forms of the accept statement. These do not ask the
operator for information, they ask the operating system. The operating system gives the time to one
hundredth of a second. The hundredths are discarded by the divide statement.
38
Interacting With The Operator
Make-Time-Stamp.
accept YY-MM-DD of Updates from Date
accept HH-MM-SS-hh from Time
divide HH-MM-SS-hh by 100 giving HH-MM-SS of Updates.
Finally, there are a series of simple procedures for accepting information from the operator.
Get-Next-Kind.
display " Kind: " with no advancing
accept Kind of Updates.
Get-Item-No.
display " Item No: " with no advancing
accept Item-No of Updates.
Get-Account.
display " Account: " with no advancing
accept Account of Updates.
Get-Qty-Delivered.
display "Qty Delivered: " with no advancing
accept Qty-Delivered of Updates.
Get-Cost.
display " Total Cost: " with no advancing
accept Cost of Updates.
Get-Amount.
display " Amount Paid: " with no advancing
accept Amount of Updates.
Display-Update.
copy "dispupd8.cbl".
In reality, these procedures should contain extra statements to check that the data typed are reason-
able. Checking input is an interesting topic, but it is not part of this course. As it is, the program is
happy to accept blank account codes and item numbers, and might even fail if the operator types non-
numeric data when a number is expected — the Cobol standard doesn’t say what will happen.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Customers assign to "newcust.ndx"
organization is indexed
record key is Account of Customers
access is random.
DATA DIVISION.
File Section.
FD Customers.
copy "customer.cbl".
Working-Storage Section.
77 Edited-Amount pic $$$$,$$9.99bdb.
77 Desired-Account pic a999.
The main procedure is similar to that of ‘makesupp’:
39
Interacting With The Operator
PROCEDURE DIVISION.
Find-Random-Customers.
display "Customers file enquiry program ..."
display "(A blank account code terminates the program.)"
display spaces
open input Customers
perform Get-Next-Account
perform until Desired-Account = spaces
perform Process-One-Customer
perform Get-Next-Account
end-perform
close Customers
display "Program terminated. Thank you."
stop run.
‘Get-Next-Account’ follows the previous pattern, except that the account code is accepted into a
working-storage variable, ‘Desired-Account’:
Get-Next-Account.
display "Account: " with no advancing
accept Desired-Account.
Provided the operator doesn’t terminate the program, ‘Desired-Account’ will contain a putative
account code, and the program will perform ‘Process-One-Customer’:
Process-One-Customer.
move Desired-Account to Account of Customers
read Customers,
invalid key
display "Sorry, there is no customer with account ",
Desired-Account
display spaces
not invalid
perform Display-Customer
end-read.
There are two things to note here: When a file is read in random access mode, the at end clause is
replaced by an invalid key clause. The invalid key clause is activated if no record actually has the
primary key that is specified, otherwise the not invalid clause (if it exists) is activated. Second, the
read statement offers no way to specify the desired primary key. Instead, the key value must be moved
to the current record area. (The reasoning behind this is that when a record is written, its key is in the
record area, so, by analogy, it should be in the same place when a record is read.)
Actually, there is no real need for the variable ‘Desired-Account’. ‘Get-Next-Account’ could read,
Get-Next-Account.
display "Account: " with no advancing
accept Account of Customers.
It would also be necessary to make a few other modifications. One of these would be to remove the
redundant move statement at the start of ‘Process-One-Customer’. Although this is neater in some
ways, it is less obvious how the correct key value finds its way to the record area.
The ‘Display-Customer’ paragraph is exactly the same as in ‘listcust’. In fact, it pays to store the
following text in the file ‘dispcust.cbl’ so it can be copied wherever it is needed.
Display-Customer.
move Credit-Limit of Customers to Edited-Amount
display Account of Customers, space, Name of Customers,
" Credit Limit: ", Edited-Amount
move Balance of Customers to Edited-Amount
display " ", Street of Customers,
" Balance Owing: ", Edited-Amount
move Available-Credit of Customers to Edited-Amount
display Account of Customers, space, Name of Customers,
" Available credit: ", Edited-Amount
display spaces.
Running the program might result in the following dialogue.
40
Interacting With The Operator
Customers file enquiry program ...
(A blank account code terminates the program.)
Account: B007ø
B007 Blaupunkt Credit Limit: $25,000.00
Cnr Centre and McNaughton Rds Balance Owing: $10,295.00
Clayton VIC 3109 Available Credit: $12,537.25
Account: B005ø
Sorry, there is no customer with account B005
Account: A001ø
A001 Autobarn Elizabeth Credit Limit: $5,000.00
61 Elizabeth Way Balance Owing: $208.50
Elizabeth SA 5158 Available Credit: $4,791.50
Account:ø
Program terminated. Thank you.
Start
Open customer
input file
True
End of
input?
False
Try to read
customer record Close customer
Read next code file
from keyboard
Yes No
Record Stop
found?
Display edited
Warn user
record
41
Projection and Selection
6.1 Selection
First, we consider listing the records of those customers who have back orders. The method is to read
the Customers file sequentially, testing each record to see if Available-Credit equals the difference
between Credit-Limit and Balance. If it doesn’t, the discrepancy is the value of goods on back order.
We list only those records where the difference is non-zero.
The output should look like this,
Listing customers with back orders ...
and so on ...
Listing complete.
The first three divisions are similar to ‘listcust’:
IDENTIFICATION DIVISION.
Program-ID. slctcust.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Customers assign to "newcust.ndx"
organization is indexed,
record key is Account of Customers
access is sequential.
DATA DIVISION.
File Section.
FD Customers.
copy "customer.cbl".
Working-Storage Section.
copy "editcust.cbl".
The main loop of the procedure division follows the usual formula:
PROCEDURE DIVISION.
Process-All-Customers.
display "Listing customers with back orders ..."
display spaces
open input Customers
perform Get-Next-Customer
perform until Account of Customers = high-values
perform Process-One-Customer
perform Get-Next-Customer
end-perform
close Customers
display "Listing complete."
stop run.
42
Projection and Selection
So does ‘Get-Next-Customer’:
Get-Next-Customer.
read Customers next record,
at end
move high-values to Account of Customers
end-read.
In ‘Process-One-Customer’ we perform ‘Display-One-Customer’ only if Available-Credit does not
equal the difference between Credit-Limit and Balance:
Process-One-Customer.
if Available-Credit of Customers not =
Credit-Limit of Customers - Balance of Customers
perform Display-One-Customer
end-if.
Display-One-Customer.
copy "dispcust.cbl".
There are two points of syntax: First, ‘≠’ is written as ‘not =’. Second, a minus operator must be
surrounded by spaces; ‘Customers-Balance’ would look like a data name. Indeed, the same rule
should be followed for all operators.
Start
Open customer
input file
Read first
customer record
True
End of
input?
False
Non-zero
back order? Stop
True
6.2 Projection
Projection basically means suppressing some of the information in a record. The term derives from
coordinate geometry; the projection of a point (x,y,z) on the y plane is (x,z). It can also mean
displaying information derived from a record; for example, (x+y,z) is also a projection.
In this case, the program calculates the value of goods on back order, and displays the result. It also
displays the Account and Name of each customer, but that is all.
The output will look like this,
Listing customer back order values ...
A001 Autobarn Elizabeth
B003 BCR Mobile Installations
B007 Blaupunkt $2,167.75
B012 Bobs Electronic Repairs
C002 Car Audio Designs
C005 Car Audio Services
C007 Cargear Pty Ltd $135.00
C010 Cartronics
C020 Citisound $180.85
43
Projection and Selection
C027 Complete Audio
C031 Custom Audio Sound $362.15
D014 Doug Sunstroms Sound Mart
D015 Doug Sunstroms Sound Mart
E003 Electric Bug Pty Ltd $242.55
E007 Afrotechnics
F002 Fujitsu Ten (Australia) P/L $1,000.00 DB
G010 Global Car Audio
J005 JayCar Pty Ltd
N012 National Car Audio
N014 Northern Car Radio $156.74
P001 Pioneer Car Audio Services
R003 RS Automotive Development
S004 Sound 4 Australia Pty Ltd
S011 Southern Car Audio
S015 Strathfield Car Radios
T002 Tonkins Car Audio Pty Ltd
T003 Tonkins Car Audio Pty Ltd $27.00
T004 Tonkins Car Audio Pty Ltd
Listing complete.
In working-storage, we describe the output. Because the picture clause of ‘Back-Order-Value’
includes no 9’s, zero values will print as blanks (called ‘blank when zero’). This helps to highlight the
customers whose Back-Order-Value is not zero:
IDENTIFICATION DIVISION.
Program-ID. prjtcust.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Customers assign to "newcust.ndx"
organization is indexed,
record key is Account of Customers
access is sequential.
DATA DIVISION.
File Section.
FD Customers.
copy "customer.cbl".
Working-Storage Section.
01 Edited-Customer.
02 Account pic a999.
02 pic x value space.
02 Name pic x(30).
02 pic x value space.
02 Back-Order-Value pic $$$$,$$$.$$bdb.
The procedure division starts in the usual way:
PROCEDURE DIVISION.
Process-All-Customers.
display "Listing customer back order values ..."
open input Customers
perform Get-Next-Customer
perform until Account of Customers = high-values
perform Process-One-Customer
perform Get-Next-Customer
end-perform
close Customers
display "Listing complete."
stop run.
Get-Next-Customer.
read Customers next record,
at end
move high-values to Account of Customers
end-read.
44
Projection and Selection
‘Process-One-Customer’ shows how Cobol usually does arithmetic. Cobol syntax was originally
designed for those who hadn’t done high-school algebra:
Process-One-Customer.
move Account of Customers to Account of Edited-Customer
move Name of Customers to Name of Edited-Customer
subtract Available-Credit of Customers, Balance of Customers
from Credit-Limit of Customers
giving Back-Order-Value of Edited-Customer
display Edited-Customer.
You may prefer a more algebraic style:
Process-One-Customer.
move Account of Customers to Account of Edited-Customer
move Name of Customers to Name of Edited-Customer
compute Back-Order-Value of Edited-Customer
= Credit-Limit of Customers
- Available-Credit of Customers
- Balance of Customers
display Edited-Customer.
The important thing to notice here is that the result of an arithmetic operation can be an edited-
numeric item, but the sources of operands should be numeric items. In principle, Cobol moves all
arithmetic operands to 18-digit registers, computes the result, then moves the result to the destination.
Any conversions associated with the move operations are carried out in the standard way.
45
Ordering and Grouping
7.1 Sorting
Unless the file already happens to be in the correct order, these operations require the file to be sorted.
Cobol provides a sort verb for this purpose. Conceptually, the sort statement takes an input file,
copies it to a work file, the work file is sorted in the order required, then copied to an output file. A
consequence is that the input file is not altered in any way — unless the input file is also used to
receive the output.
Here is the start of a program to sort the Customers file into descending order of Balance.
IDENTIFICATION DIVISION.
Program-ID. sortcust.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Unsorted-Customers assign to "newcust.ndx"
organization is indexed,
record key is Account of Unsorted-Customers
access is sequential.
select Sorted-Customers assign to "newcust.seq"
organization is sequential.
select Customers-Work-File assign to "work.tmp".
The environment division defines three files: the existing Customers file (‘newcust.ndx’), which is
indexed — so we have to say so. Even though this program does not use the index, describing the file
any other way than how it is would certainly lead to a run-time error. The second file (‘newcust.seq’)
can’t be indexed. If it were indexed, it would always be read in ascending order of its primary key.
Here, we want descending order, and Balance can’t be used as a primary key anyway; two records can
have the same Balance. The third file is the work file used by sort. It has neither indexed nor
sequential organization. It can’t be used at all outside the context of the sort operation. Indeed, if the
file to be sorted is small enough to fit in main memory, the sort work file may never be created.
DATA DIVISION.
File Section.
FD Unsorted-Customers.
copy "customer.cbl".
FD Sorted-Customers.
copy "customer.cbl".
SD Customers-Work-File.
copy "customer.cbl".
The data division describes the three files as having the same record structure. However, the sort
work file entry is written as SD rather than FD. This highlights that it is not a regular file.
PROCEDURE DIVISION.
Sort-Customers-by-Balance.
display "Sorting customers file ..."
sort Customers-Work-File
on descending Balance of Customers-Work-File
using Unsorted-Customers
giving Sorted-Customers
display "Sort complete."
stop run.
The procedure division is easy enough to understand. Note that the sort sequence must specify one or
more items of the work file, not the input or output file.
46
Ordering and Grouping
7.2 Ordering
In most programs, we sort a file in order to use it for some purpose. Rather than write the sorted
records to a new file, we can deal with them as they are sorted. This is done by using an output
procedure.
The following program displays the customer records with the greatest debts first.
IDENTIFICATION DIVISION.
Program-ID. debtors.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Unsorted-Customers assign to "newcust.ndx"
organization is indexed,
record key is Account of Unsorted-Customers
access is sequential.
select Customers assign to "work.tmp".
DATA DIVISION.
File Section.
FD Unsorted-Customers.
copy "customer.cbl".
SD Customers.
copy "customer.cbl".
Working-Storage Section.
copy "editcust.cbl".
The environment and data divisions are simpler than before; there is no output file. It turns out to be
convenient to call the work file ‘Customers’. Then we can use the usual procedure to display it.
Apart from using an output procedure, the procedure division starts like the last example:
PROCEDURE DIVISION.
Sort-Customers-by-Balance.
display "Listing customers according to balance owing ..."
display spaces
sort Customers
on descending Balance of Customers
using Unsorted-Customers
output procedure Process-All-Customers
display "List complete."
stop run.
The output procedure itself consists of a familiar read loop:
Process-All-Customers.
perform Get-Next-Customer
perform until Account of Customers = high-values
perform Display-Customer
perform Get-Next-Customer
end-perform.
However, a sort file is not a regular file, and needs special syntax:
Get-Next-Customer.
return Customers record
at end
move high-values to Account of Customers
end-return.
We use the usual procedure to display the customer records:
Display-Customer.
copy "dispcust.cbl".
There are no open or close statements in this program. The sort opens and closes its files all by itself.
47
Ordering and Grouping
List complete.
Now consider the problem of sorting delivery records in descending order of Cost.
Unfortunately, the Cost in a delivery record occupies the same bytes of the current record area as the
last 4 bytes of the Amount in a payment record, plus 4 bytes beyond the end of it. An attempt to sort
on the imaginary Cost of a payment might cause the program to fail. The Cobol compiler will actually
detect this particular error because Cost lies beyond the end of a payment record. For the program to
be acceptable, the sort work file must be defined to contain delivery records, but nothing else.
IDENTIFICATION DIVISION.
Program-ID. Costs.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Updates assign to "updates.seq"
organization is sequential.
select Deliveries assign to "work.tmp".
DATA DIVISION.
File Section.
FD Updates.
copy "update.cbl".
SD Deliveries.
01 Delivery.
02 Time-Stamp.
03 YY-MM-DD pic 9(6).
03 HH-MM-SS pic 9(6).
02 Kind pic x.
02 Item-No pic x(6).
02 Account pic a999.
02 Qty-Delivered pic 9999.
02 Cost pic 9(6)v99.
Working-Storage Section.
copy "editupd8.cbl".
copy "constant.cbl".
Since the only records in the work file are deliveries, it is necessary to eliminate the payments before
the sort operation. We therefore use a sort input procedure to eliminate the unwanted records. It will
read the updates file, and write the deliveries to the sort work file, ignoring the payments.
In the procedure division, we use sort with both an input procedure and an output procedure. The
input procedure selects the deliveries; the output procedure displays them in sorted order:
48
Ordering and Grouping
PROCEDURE DIVISION.
Sort-Deliveries-by-Cost.
display "Listing Deliveries according to Cost ..."
display spaces
sort Deliveries
on descending Cost of Deliveries
input procedure Select-Deliveries
output procedure Process-All-Deliveries
display "List complete."
stop run.
The input procedure reads the unsorted Updates file using the usual read loop. Because the work file
is not a regular file, Cobol syntax requires the use of release instead of write. Deliveries are written to
the work-file; all other kinds of record are ignored.
Select-Deliveries.
open input Updates
perform Get-Next-Update
perform until Time-Stamp of Updates = high-values
perform Process-One-update
perform Get-Next-Update
end-perform
close Updates.
Process-One-Update.
if Kind of Updates = Delivery-Code
move Delivery in Updates to Delivery in Deliveries
release Delivery of Deliveries
end-if.
Get-Next-Update.
read Updates next record
at end
move high-values to Time-Stamp of Updates
end-read.
The output procedure is another read loop. In order to display delivery records, we copy the
procedure in the ‘dispupd8.cbl’ file. Although it displays with several other kinds of record as well as
deliveries, this is obviously harmless.
Process-All-Deliveries.
perform Get-Next-Delivery
perform until Time-Stamp of Deliveries = high-values
perform Display-Delivery
perform Get-Next-Delivery
end-perform.
Get-Next-Delivery.
return Deliveries record
at end
move high-values to Time-Stamp of Deliveries
end-return.
Display-Delivery.
copy "dispupd8.cbl".
7.3 Grouping
A possible use of grouping is to find the total cost of deliveries and the total payments made to each
supplier. An obvious way to do this would be to read the updates file, and to accumulate the totals in a
table, with an entry for each supplier. This poses a problem. However big we make the table, it is
possible that the Suppliers file will grow to exceed its capacity.
It is safer, and easier, to sort the Updates file into Account order. Then all the records for the first
Account code will be grouped together. This makes it possible to reuse the same two accumulators
over and over for each supplier. No table is needed.
49
Ordering and Grouping
Here are the first three divisions of a program that groups by sorting.
IDENTIFICATION DIVISION.
Program-ID. suppgrp.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Unsorted-Updates assign to "updates.seq"
organization is sequential.
select Updates assign to "work.tmp".
DATA DIVISION.
File Section.
FD Unsorted-Updates.
copy "update.cbl".
SD Updates.
copy "update.cbl".
Working-Storage Section.
77 Current-Account pic a999.
77 Total-Cost-This-Account pic 9(6)v99.
77 Total-Paid-This-Account pic 9(6)v99.
77 Grand-Total-Cost pic 9(8)v99.
77 Grand-Total-Paid pic 9(8)v99.
77 Edited-Cost pic $$$,$$$,$$$.$$.
77 Edited-Paid pic $$$,$$$,$$$.$$.
‘Current-Account’ is used to detect when each pair of totals should be displayed. ‘Total-Cost-This-
Account’ and ‘Total-Paid-This-Account’ are used to accumulate the sub-totals for each supplier.
‘Grand-Total-Cost’ and ‘Grand-Total-Paid’ are used to accumulate the grand totals for the entire file.
‘Edit-Cost’ and ‘Edit-Paid’ are used to edit them for display.
The start of the procedure division is like an earlier example:
PROCEDURE DIVISION.
Sort-Updates-by-Account.
sort Updates on ascending Account of Updates
using Unsorted-Updates
output procedure Process-All-Updates
stop run.
However, the output procedure is more complex than before:
Process-All-Updates.
perform Start-All-Accounts
perform Get-Next-Update
perform until Account of Updates = high-values
move Account of Updates to Current-Account
perform Start-One-Account
perform until Account of Updates not = Current-Account
perform Process-One-Update
perform Get-Next-Update
end-perform
perform End-One-Account
end-perform
perform End-All-Accounts.
Get-Next-Update.
return Updates
at end
move high-values to Account of Updates
end-return.
Instead of one loop, it contains two nested loops. The outer loop iterates once per supplier account.
The inner loop iterates once per update transaction. Apart from ‘Get-Next-Update’ which must
obviously be done once per update record, this divides the rest of the program into 5 procedures:
50
Ordering and Grouping
Start-One-Account.
move zeros to Total-Cost-This-Account,
Total-Paid-This-Account.
Process-One-Update.
evaluate Kind of Updates
when Delivery-Code
add Cost of Updates to Total-Cost-This-Account
when Payment-Code
add Amount of Updates to Total-Paid-This-Account
end-evaluate.
End-One-Account.
add Total-Cost-This-Account to Grand-Total-Cost
add Total-Paid-This-Account to Grand-Total-Paid
move Total-Cost-This-Account to Edited-Cost
move Total-Paid-This-Account to Edited-Paid
display Current-Account, space,
Edited-Cost, space,
Edited-Paid.
End-All-Accounts.
move Grand-Total-Cost to Edited-Cost
move Grand-Total-Paid to Edited-Paid
display "----------------------------------"
display "Total", Edited-Cost, space, Edited-Paid
display "==================================".
In general, if a file is sorted on one key, we may expect to see two levels of loop, if on two keys, three
levels of loop, and so on.
The output of the program looks like this.
Account Delivered Paid
----------------------------------
N001 $19,590.00
N002 $2,840.46
P004 $1,000.00
----------------------------------
Total $20,590.00 $2,840.46
==================================
51
Ordering and Grouping
Start
True
End of
input?
Save account
number
Close customer
End one account
file
Stop
Yes Change of
account
number?
Process one
update
52
Ordering and Grouping
Process-All-Updates.
perform Start-All-Accounts
perform Get-Next-Update
move Account of Updates to Current-Account
perform Start-One-Account
perform until Account of Updates = high-values
if Account of Updates not = Current-Account
perform End-One-Account
perform Start-One-Account
move Account of Updates to Current-Account
end-if
perform Process-One-Update
perform Get-Next-Update
end-perform
perform End-One-Account
perform End-All-Accounts.
This is messy. ‘Start-One-Account’ and ‘End-One-Account’ have to be performed in two different
places, and there are two move statements as well. And there is still a bug: If the input file is empty,
the program will display a random account code, and two zeros.
DATA DIVISION.
File Section.
FD Unsorted-Updates.
copy "update.cbl".
SD Updates.
copy "update.cbl".
Working-Storage Section.
copy "constant.cbl".
77 Current-Item-No pic x(6).
77 Total-Cost-This-Item-No pic 9(6)v99.
77 Total-Qty-This-Item-No pic 9999.
77 Grand-Total-Cost pic 9(8)v99.
77 Grand-Total-Qty pic 9(8)v99.
77 Edited-Cost pic $$$,$$$,$$$.$$.
77 Edited-Qty pic zzz,zz9.
77 Edited-Unit-Cost pic $$$,$$$.$$.
This involves sorting and grouping on item number. A problem arises here. Payments to suppliers
don’t have an item number to sort on. One way to deal with this would be to use an input procedure to
ignore the payments, as we did in an earlier example. Here we consider an alternative approach.
53
Ordering and Grouping
We actually foresaw this problem when we defined the Payment record structure, by including a
dummy Item-No. This will always contain low-values, not because of the value clause in the record
definition — which is effectively a comment, but because the program that created the Updates file
initialises Item-No correctly. This means that after the sort, all the Payment records will be grouped
together at the start of the output sequence.
In the procedure division, the output procedure includes an extra loop to skip over any records with
low-valued item numbers:
PROCEDURE DIVISION.
Sort-Updates-by-Item-No.
sort Updates on ascending Item-No of Updates,
using Unsorted-Updates
output procedure Process-All-Updates
stop run.
Process-All-Updates.
perform Start-All-Items.
perform Get-Next-Update
perform until Item-No in Updates not = Dummy-Item-No
perform Get-Next-Update
end-perform
perform until Item-No of Updates = high-values
move Item-No of Updates to Current-Item-No
perform Start-One-Item-No
perform until Item-No of Updates not = Current-Item-No
perform Process-One-Update
perform Get-Next-Update
end-perform
perform End-One-Item-No
end-perform
perform End-All-Items.
Get-Next-Update.
return Updates record
at end
move high-values to Item-No of Updates
end-return.
Without this preliminary loop, the program would have a bug. It would enter the main loop, setting
Current-Item-No to low-values. Since there are no deliveries that have a low-valued account number,
the program would attempt to find the statistics for an empty set. There would then be 3 separate bugs
in the code that follows,
Start-All-Items.
move zeros to Grand-Total-Cost, Grand-Total-Qty
display "Item-No Qty Cost Unit Cost"
display "----------------------------------------".
Start-One-Item-No.
move zeros to Total-Cost-This-Item-No,
Total-Qty-This-Item-No.
Process-One-Update.
add Cost of Updates to Total-Cost-This-Item-No
add Qty-Delivered of Updates to Total-Qty-This-Item-No.
End-One-Item-No.
add Total-Cost-This-Item-No to Grand-Total-Cost
add Total-Qty-This-Item-No to Grand-Total-Qty
if Total-Cost-This-Item not < 10
move Total-Cost-This-Item-No to Edited-Cost
move Total-Qty-This-Item-No to Edited-Qty
divide Total-Cost-This-Item-No by Total-Qty-This-Item-No
giving Edited-Unit-Cost rounded
display Current-Item-No, space, Edited-Qty, space,
Edited-Cost, space, Edited-Unit-Cost
end-if.
54
Ordering and Grouping
End-All-Items.
move Grand-Total-Cost to Edited-Cost
move Grand-Total-Qty to Edited-Qty
display "----------------------------------------"
display "Total ", Edited-Qty, space, Edited-Cost
display "=============================".
First, the program will attempt to display a ‘Current-Item-No’ of low-values. The effect is
unpredictable, because low-values is equivalent to ASCII null characters.
Second, ‘Process-One-Update’ will try to treat Payment records as if they were Deliveries. Due to
aliasing, ‘Cost of Updates’ will be assigned the value of the first 4 bytes of ‘Amount of Updates’, and
‘Qty-Delivered of Updates’ will be assigned its last 4 bytes, followed by 4 bytes of garbage. This
problem could be fixed by altering ‘Process-One-Update’ as follows,
Process-One-Update.
if Kind of Updates = Delivery-Code
add Cost of Updates to Total-Cost-This-Item-No
add Qty-Delivered of Updates to Total-Qty-This-Item-No
end-if.
which would reveal the third bug: In ‘End-One-Item-No’, ‘Total-Qty-This-Item-No’ would be zero,
causing a divide-by-zero error.
This problem could be fixed as follows,
End-One-Item-No.
add Total-Cost-This-Item-No to Grand-Total-Cost
add Total-Qty-This-Item-No to Grand-Total-Qty
move Total-Cost-This-Item-No to Edited-Cost
move Total-Qty-This-Item-No to Edited-Qty
if Total-Qty-This-Item not = zero
divide Total-Cost-This-Item-No by Total-Qty-This-Item-No
giving Edited-Unit-Cost rounded
else
move zero to Edited-Unit-Cost
display Current-Item-No, space, Edited-Qty, space,
Edited-Cost, space, Edited-Unit-Cost.
where the picture clause for ‘Edited-Unit-Cost’ will display a zero value as blank.
Both these changes are wise moves anyway. We shall consider several additional kinds of
transactions in later examples. This is quite realistic: in real life, systems are often modified by adding
new kinds of transaction. The modification to ‘Process-One-Update’ would ignore the new kinds of
transaction, and not misinterpret them as deliveries. But it would then be possible for group of
transactions to occur for a valid Item-No, which contained no deliveries. The modification to ‘End-
One-Item-No’ would then be essential.
The rounded option of the divide statement means that ‘Edited-Unit-Cost’ will be computed to the
nearest cent. For example, if ‘Total-Cost-This-Item-No’ equals $1,231.00 and ‘Total-Qty-This-Item-
No’ equals 16, the exact unit cost is $76.9375. Since ‘Edited-Unit-Cost’ has only two decimal places,
without the rounded option, this will be truncated to $76.93; with it, it will be rounded up to $76.94.
Of course, this whole problem would be much easier if the transaction file contained just deliveries,
and nothing else. This is best done by using a sort input procedure to select just the deliveries, as in an
earlier example.
Finally, suppose we want to display statistics only for those Items having a total cost exceeding some
preset amount, say $10,000. We would need to modify ‘End-One-Item-No’ as follows.
55
Ordering and Grouping
End-One-Item-No.
add Total-Cost-This-Item-No to Grand-Total-Cost
add Total-Qty-This-Item-No to Grand-Total-Qty
if Total-cost-This-Item-No > 10000
move Total-Cost-This-Item-No to Edited-Cost
move Total-Qty-This-Item-No to Edited-Qty
if Total-Qty-This-Item not = zero
divide Total-Cost-This-Item-No
by Total-Qty-This-Item-No
giving Edited-Unit-Cost rounded
else
move zero to Edited-Unit-Cost
display Current-Item-No, space, Edited-Qty, space,
Edited-Cost, space, Edited-Unit-Cost
end-if.
Note that this is process not correctly called ‘selection’. Selection is a process of choosing particular
records, such as deliveries. Here we are displaying aggregate data having certain properties.
56
Set Union, Intersection and Difference
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Suppliers assign to "newsupp.ndx"
organization is indexed,
record key is Account of Suppliers
access is sequential.
select Customers assign to "newcust.ndx"
organization is indexed,
record key is Account of Customers
access is sequential.
In the data division, we declare ‘Current-Account’. This will keep track of which account the
program is processing. The copied statements are used to edit supplier and customer records.
57
Set Union, Intersection and Difference
DATA DIVISION.
File Section.
FD Customers.
copy "customer.cbl".
FD Suppliers.
copy "supplier.cbl".
Working-Storage Section.
77 Current-Account pic a999.
copy "editsupp.cbl".
copy "editcust.cbl".
The procedure division follows the logic outlined above:
PROCEDURE DIVISION.
Process-All-Accounts.
display "Listing all suppliers and customers ..."
open input Suppliers, Customers
perform Get-Next-Supplier
perform Get-Next-Customer
perform Choose-Current-Account
perform until Current-Account = high-values
perform Process-One-Account
if Account of Suppliers = Current-Account
perform Get-Next-Supplier
end-if
if Account of Customers = Current-Account
perform Get-Next-Customer
end-if
perform Choose-Current-Account
end-perform
close Suppliers, Customers
display "Listing complete."
stop run.
where ‘Choose-Current-Account’ sets ‘Current-Account’ to the lower of the two account codes:
Choose-Current-Account.
if Account of Suppliers < Account of Customers
move Account of Suppliers to Current-Account
else
move Account of Customers to Current-Account
end-if.
The program can test if there is a supplier record or customer record for the current account by
comparing its key with the value of ‘Current-Account’. This test is used to decide which refreshing
reads are needed at the end of the loop. Note that if the key of the Customers record or the Suppliers
record does not equal ‘Current-Account’, it must exceed it, because ‘Current-Account’ is the smaller of
the two keys. That means the record has not yet been processed, so no read is necessary. In a sense,
the read has already been done.
The trick of using high-values to mark the end of file proves especially useful here. There are three
cases to consider: both files end with the same key, the Customers file ends on a lower key than the
Suppliers file, or the Suppliers file ends on a lower key than the Customers file. Because high-values
is greater than any real account code, it will never be chosen as the current key until both files have
reached their end. This deals with all three cases properly. The loop finally exits when ‘Current-
Account’ equals high-values.
‘Process-One-Account’ must allow for either file to have a missing record, but in the case that both
are present, it is required to flag the account:
Process-One-Account.
if Account of Suppliers = Current-Account
perform Display-Supplier
end-if
if Account of Customers = Current-Account
perform Display-Customer
end-if
if Account of Suppliers = Account of Customers
display "Duplicate account number!"
display "========================="
end-if.
58
Set Union, Intersection and Difference
Note that if the two account codes are equal, they must also equal ‘Current-Account’.
The rest of the program contains no surprises. We assume that we can copy the statements needed to
display the customer and supplier records:
Get-Next-Supplier.
read Suppliers next record,
at end
move high-values to Account of Suppliers
end-read.
Get-Next-Customer.
read Customers next record,
at end
move high-values to Account of Customers
end-read.
Display-Customer.
copy "dispcust.cbl".
Display-Supplier.
copy "dispsupp.cbl".
Start
True
End of
Close both files
input?
False
Stop
Process the
account
Is the Yes
Choose the lower Read next
supplier
account number supplier record
present?
No
Is the Yes
Read next
customer
customer record
present?
No
59
Set Union, Intersection and Difference
C S
1 2 3
The rectangle U represents the universe of all possible account codes. The circle C encloses the set of
all Customer account codes. The circle S encloses the set of all Supplier account codes. In general,
these sets overlap. The area marked 2 represents the region of overlap, containing the codes both files
have in common. The following terms describe potentially interesting sets of account codes:
C union S Areas 1, 2 and 3 All codes in either file. (All values of Current-Account.)
C intersect S Area 2 only The codes common to both files. (Those to be flagged.)
C minus S Area 1 only Customer codes that are not also Supplier codes.
S minus C Area 3 only Supplier codes that are not also Customer codes.
Of the other possibilities, Area 4 represents the set of nearly 26,000 account codes that aren’t
currently used, and is therefore of limited interest. There are 8 ways of choosing the remaining 3
areas: Four have already been considered. Of the rest, one is the empty set, Areas 1 and 2 make set C,
Areas 2 and 3 make set S, leaving the combination 1 and 3, which is called the symmetric difference of
C and S, the set of all codes that are not common to both files: their union minus their intersection.
In these terms, the functions of the above program are to display accounts in the union of C and S, and
flag accounts in the intersection of C and S. This description is a harmless abuse of the terminology of
set theory. Strictly speaking, set operations can only be applied to the information shared by both files.
in this case, only the codes themselves.
In this particular example, both files shared a common primary key, so they were already in the same
order. If this had not been the case, we could have sorted one or both files to bring them into the same
order. It should be clear that, if desired, the program could easily be modified to find any desired set
operation. However, the program still assumes that each file contains at most one record with a given
key. We shall see how to deal with multiple records in the next section.
60
Join Algorithms
9 Join Algorithms
We have seen how to read and write files, select particular records, project particular results, and to
summarise totals. All these operations involved just one input file. We have also seen how to combine
information from two files that share a common primary key. But we often need to combine
information from files in more interesting ways than this. For example, we may want to associate
product data with its corresponding supplier data, or we may want to discover all pairs of customers
and suppliers that have addresses in the same suburb as one another. We call the combination of data
from two or more files a join. Unlike set union, intersection and difference, a join does not need the
files being combined to share a common primary key.
There are three basic algorithms for joining files, and each has its place:
Nested-loops For each record of file A we read all records of file B
Random-access For each record of file A we read the record of file B with the matching key.
Sort-merge We merge the files as above, but allow multiple records on one of the files.
The nested loops method is the most general, because it allows any kind of join to be made. The
other two methods can only be used when matching is done on a primary key of one of the files.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Suppliers assign to "newsupp.ndx",
organization is indexed,
record key is Account of Suppliers,
access is sequential.
select Customers assign to "newcust.ndx",
organization is indexed,
record key is Account of Customers,
access is sequential.
DATA DIVISION.
File Section.
FD Suppliers.
copy "supplier.cbl".
FD Customers.
copy "customer.cbl".
Working-Storage Section.
copy "editsupp.cbl".
copy "editcust.cbl".
61
Join Algorithms
Get-Next-Customer.
read Customers record
at end
move high-values to Account of Customers
end-read.
Display-Customer.
copy "dispcust.cbl".
Display-Supplier.
copy "dispsupp.cbl".
If ‘Process-Supplier-Customer-Pair’ did not contain the condition on Suburb, the program would list
every possible pair of supplier and customer records. Mathematically, the resulting set of pairs is
called a cartesian product. The pairs are ordered: supplier-customer pairs are distinct from customer-
supplier pairs. Because of the condition however, the program only lists a subset of the cartesian
product. A subset of a cartesian product is called a relation.
We can regard the files themselves as being formed in a similar way. If we take the set of all account
numbers, all names, all street addresses, all suburbs and all balances, then form their cartesian product,
we obtain a set containing every possible supplier record. The Suppliers file contains only a subset of
the cartesian product, so mathematically speaking, it too is a relation. Thus relations are a
mathematical abstraction of files. This is the origin of the term ‘relational database’. The join
operation is a means of forming new relations from existing relations.
62
Join Algorithms
Start
Open suppliers,
read 1st record
At end of True
supplier
file?
False
Close supplier
file
Open customers,
read 1st record
Stop
Read next
supplier record
At end of True
customer
file?
False
True
Same Display supplier-
suburb? customer pair
False
Close customer
file
Read next
customer record
The condition that determines which pairs of records are present in the join is called the join
condition. When the join condition requires an equality between two fields, it is called an equi-join.
Any other condition is called a theta-join (q-join). For example, a join where the supplier’s Balance
exceeded the customer’s Credit-Limit would be a theta-join. Strictly, a join that involves fields with
the same name (such as Suburb) is called a natural join, although this term is often used loosely with a
slightly different meaning, as we shall see shortly.
Joins can be many-to-many, many-to-one or one-to-one. The join on Suburb we have just considered
is potentially many-to-many. There is nothing to prevent many customers and many suppliers sharing
the same suburb. In that case, each customer in that suburb would be paired with many suppliers, and
each supplier in the suburb would be paired with many customers. However, if we joined the Products
and Suppliers files so that each product record is paired with the supplier that normally supplies it,
there can be at most one supplier paired with each product — although many products can be paired
with each supplier — so the relationship is many-to-one. Finally, if we join the Customers and
Suppliers files on account number, there is at most one record from each file, so the join is one-to-one.
There is little difference between a one-to-one equi-join and set intersection.
What happens if we pair each product with its supplier and find that through an error some product
records don’t refer to valid supplier records? If we omit those product records from the join, it is
called an inner join. If instead, we pair the product records with dummy supplier records, that is called
an outer join.
Finally, it is possible to have a self-join. In this case a single file is paired with itself, forming for
example, a series of customer-customer pairs. From this product, it would be possible, for example, to
find all pairs of customers that share the same suburb. This would be a simple modification of the
above program.
63
Join Algorithms
True
At end of
child file?
False
Close both files
Read matching
parent record
Stop
Parent False
Read next child
record
record
found?
True
We illustrate the method by joining the Products and Suppliers files on the account of the supplier.
Since every product must have exactly one supplier, but a supplier can supply any number of products,
the Suppliers file is the parent, and the Products file is the child.
We can adapt two programs we have already studied: one to list the Products file sequentially
(‘listprod’), and one to read the Suppliers file randomly (‘findsupp’). In effect, all we have to do is
adapt ‘findsupp’ by replacing its keyboard input file by the Products file.
In the environment division, we select sequential access mode for the child file, and random access
mode for the parent file:
IDENTIFICATION DIVISION.
Program-ID. randjoin.
* Joins products and suppliers by random access to supplier.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Suppliers assign to "newsupp.ndx",
organization is indexed,
record key is Account of Suppliers,
access is random.
select Products assign to "newprod.ndx",
organization is indexed,
record key is Item-No of Products,
access is sequential.
64
Join Algorithms
Display-Product.
copy "dispprod.cbl".
Display-Supplier.
copy "dispsupp.cbl".
We can see that, when it is applicable, this method is much faster than the nested loops method. For
each product, it reads the required supplier record directly, instead of reading the entire Suppliers file.
This program is best described as forming an outer join, because products that have no matching
supplier are listed in the join, along with an error message. It forms a equi-join, because the Supplier
in the product record must equal the Account in the supplier record. (Note the first line of ‘Display-
Supplier-Status’!) Because ‘Supplier’ and ‘Account’ are not the same name, it is not strictly a natural
join. On the other hand, many programmers use the term loosely to refer to any join on the primary
key of a file. Actually, the supplier account code gets listed twice, once as part of the product record
and once as part of the supplier record. Although this redundancy is reassuring, normally only one of
the occurrences would be used in the join.
65
Join Algorithms
Choose smaller
join key
True
Choose smaller At end of
Close both files
join key both files?
False
Stop
Child False
matches
join key?
True
Parent True
Process parent-
matches
child pair
join key
False
Read next parent
Read next child record
record
IDENTIFICATION DIVISION.
Program-ID. sortjoin.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Suppliers assign to "newsupp.ndx"
organization is indexed,
record key is Account of Suppliers
access is sequential.
select Unsorted-Products assign to "newprod.ndx"
organization is indexed,
record key is Item-No of Unsorted-Products
access is sequential.
select Products assign to "work.tmp".
66
Join Algorithms
DATA DIVISION.
File Section.
FD Unsorted-Products.
copy "product.cbl".
SD Products.
copy "product.cbl".
FD Suppliers.
copy "supplier.cbl".
Working-Storage Section.
77 Current-Supplier pic x(4).
77 Supplier-Status pic x.
77 Supplier-Exists pic x value "Y".
77 Supplier-Missing pic x value "N".
copy "editprod.cbl".
copy "editsupp.cbl".
The working-storage section contains ‘Current-Supplier’, which keeps track of the account code that
is being processed. It also contains a variable called ‘Supplier-Status’, which is used to keep track of
whether a supplier record exists for the current account number. Strictly speaking, it is redundant,
because we can check this condition by checking if ‘Account of Supplier’ equals ‘Current-Account’.
However, its use makes the program more readable. ‘Supplier-Status’ can have one of two values, ‘Y’
or ‘N’, referred to as ‘Supplier-Exists’ and ‘Supplier-Missing’.
The procedure division starts by sorting the product records into order by supplier account code. The
Suppliers file is already in this order because it is indexed by ‘Account’.
PROCEDURE DIVISION.
Process-All-Suppliers.
display "Sorting Products ..."
sort Products on ascending Supplier of Products,
using Unsorted-Products
output procedure Process-Sorted-Products
stop run.
The sort operation structures the Products file as a series of groups sharing the same account code.
The output procedure therefore needs two levels of loop: The outer loop deals with an account code;
the inner loop deals with individual products within each supplier group. The logic is a meld of the
‘suppgrp’ program, which finds the total cost of deliveries and total of the payments for each supplier,
and the ‘accounts’ program, which merges two files:
Process-Sorted-Products.
display "Joining Suppliers and Products ..."
display spaces
open input Suppliers
perform Get-Next-Supplier
perform Get-Next-Product
perform Choose-Current-Supplier
perform until Current-Supplier = high-values
perform Check-Supplier-Status
perform until Supplier of Products not = Current-Supplier
perform Process-One-Product
perform Get-Next-Product
end-perform
if Account of Suppliers = Current-Supplier
perform Get-Next-Supplier
end-if
perform Choose-Current-Supplier
end-perform
close Suppliers
display "Join complete.".
Choose-Current-Supplier.
if Supplier of Products < Account of Suppliers
move Supplier of Products to Current-Supplier
else
move Account of Suppliers to Current-Supplier
end-if.
67
Join Algorithms
The parent-child relationship means that the two files are not treated symmetrically. It is an error if a
child has no parent, but it is acceptable for a parent to have no child. Thus, the program forms a kind of
outer join, because products that do not match suppliers are listed, along with an error message.
The new feature here is that the inner loop allows there to be zero, one or many child records for each
parent record. If a product record is an orphan, the outer loop skips the refreshing read from the parent
file, because the account code in the supplier record area must already exceed ‘Current-Supplier’.
‘Check-Supplier-Status’ exists mainly to make the program easier to understand. Its true value will
be appreciated when we study updating in a later section:
Check-Supplier-Status.
if Account of Suppliers = Current-Supplier,
move Supplier-Exists to Supplier-Status
else
move Supplier-Missing to Supplier-Status
end-if.
We make use of Supplier-Status in Process-One-Product to test whether the product has a matching
supplier. We do not need to test if a supplier has a matching product, because Process-One-Product
would not be preformed. The inner loop would have zero iterations.
Process-One-Product.
perform Display-Product
if Supplier-Status = Supplier-Exists
perform Display-Supplier
else
display Current-Supplier " is not on the supplier file."
display spaces
end-if.
The rest is routine:
Display-Supplier.
copy "dispsupp.cbl".
Display-Product.
copy "dispprod.cbl".
Get-Next-Supplier.
read Suppliers next record,
at end
move high-values to Account of Suppliers
end-read.
Get-Next-Product.
return Products record,
at end
move high-values to Supplier of Products
end-return.
As in the merge program, the trick of using high-values to indicate the end of file works beautifully.
It is worth noting that the sort-merge method doesn’t need either file to be indexed, and a join could
be made on any field that has a unique value for each parent record.
The sort-merge method is often the fastest of the three. It guarantees to read each parent record
exactly once, even if there are several children that have the same parent. One the other hand, it must
read every parent record, even if no child matches it. It is therefore likely to be better than the random
access method if a high proportion of parent records are accessed, and to perform worse than the
random access method if a low proportion are accessed.
68
Join Algorithms
One of the advantages of the sort-merge method or the skip-sequential method is that we can exploit
grouping at the same time as making the join. We illustrate this by applying the skip-sequential
method to write a program that shows a typical use of the join operation; it tells Serv-U-Rite what
products need to be purchased from suppliers. Its output should look like this,
Product Purchase Orders by Supplier
-----------------------------------
and so on ...
US Audio Imports
5 Penna Ave
Clayton VIC 3109
10 Audio-Gods Box Speakers
10 (Subtotal for supplier)
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Suppliers assign to "newsupp.ndx"
organization is indexed,
record key is Account of Suppliers
access is random.
select Unsorted-Products assign to "newprod.ndx"
organization is indexed,
record key is Item-No of Unsorted-Products
access is sequential.
select Products assign to "workfile.tmp".
DATA DIVISION.
File Section.
FD Unsorted-Products.
copy "product.cbl".
SD Products.
copy "product.cbl".
FD Suppliers.
copy "supplier.cbl".
69
Join Algorithms
Working-Storage Section.
77 Current-Supplier pic a999.
77 Supplier-Status pic x.
77 Supplier-Exists pic x value "Y".
77 Supplier-Missing pic x value "N".
77 Reorder-Qty-Subtotal pic 999999.
77 Reorder-Qty-Grand-Total pic 999999.
01 Report-Line.
02 Reorder-Qty pic zzz,zz9.
02 pic x value space.
02 Description pic x(40).
The procedure division begins by sorting the Products file:
PROCEDURE DIVISION.
Process-All-Suppliers.
sort Products on ascending Supplier of Products
using Unsorted-Products
output procedure Process-Sorted-Products
stop run.
It retains the nested loop structure of the sort-merge algorithm, but the Supplier record is read (within
‘Start-One-Supplier’) at the start of each group.
Process-Sorted-Products.
perform Start-All-Suppliers
open input Suppliers
perform Get-Next-Product
perform until Supplier of Products = high-values
move Supplier of Products to Current-Supplier
perform Start-One-Supplier
perform until Supplier of Products not = Current-Supplier
perform Process-One-Product
perform Get-Next-Product
end-perform
perform End-One-Supplier
end-perform
close Suppliers
perform End-All-Suppliers.
‘Start-All-Suppliers’ displays the heading, and clears the final total:
Start-All-Suppliers.
display "Product Purchase Orders by Supplier"
display "-----------------------------------"
display spaces
move zero to Reorder-Qty-Grand-Total.
‘Start-One-Supplier’ reads the matching supplier record in the same way as the random access
method. It then displays the supplier’s address and clears the sub-total.
Start-One-Supplier.
move Supplier of Products to Account of Suppliers
read Suppliers
invalid key
move Supplier-Missing to Supplier-Status
not invalid key
move Supplier-Exists to Supplier-Status
end-read.
if Supplier-Status = Supplier-Exists
display Name of Suppliers
display Street of Suppliers
display Suburb of Suppliers
else
display Current-Supplier " is not on the supplier file."
end-if
move zero to Reorder-Qty-Subtotal.
Processing a product consists of testing if the number of items in stock plus the number on order is
below the reorder level. If so, the required reorder quantity and the description of the product are
displayed and the sub-total for the current supplier is updated:
70
Join Algorithms
Process-One-Product.
if Stock of Products + On-Order of Products
< Reorder-Level of Products
move Reorder-Qty of Products
to Reorder-Qty of Report-Line
move Description of Products
to Description of Report-Line
display Report-Line
add Reorder-Qty of Products to Reorder-Qty-Subtotal
end-if.
At the end of each group of products, the sub-total is displayed, then added to the final total:
End-One-Supplier.
move "(Subtotal for supplier)"
to Description of Report-Line
move Reorder-Qty-Subtotal to Reorder-Qty of Report-Line
display Report-Line
display spaces
add Reorder-Qty-Subtotal to Reorder-Qty-Grand-Total.
At the end of file, the final total is displayed:
End-All-Suppliers.
move "(Grand Total)" to Description of Report-Line
move Reorder-Qty-Grand-Total to Reorder-Qty of Report-Line
display Report-Line.
Finally, we meet an old friend:
Get-Next-Product.
return Products record,
at end
move high-values to Supplier of Products
end-return.
We can see that the skip-sequential method is efficient both when a high proportion of supplier
records are accessed and when a low proportion are accessed. In the first case it behaves like the sort-
merge method. In the second case it behaves roughly like the random access method. Although
sorting the Products file adds some overhead, this will almost certainly be rewarded by a more orderly
access to the Suppliers file, reducing seek time.
This particular use of the skip-sequential method has the weakness that it accesses suppliers and
displays their addresses even if none of their products need to be reordered. This can be corrected by
selecting the required product records before the join is made:
Get-Next-Product.
perform with test after
until Supplier of Products = high-values
or Stock of Products + On-Order of Products
< Reorder-Level of Products
return Products,
at end
move high-values to Supplier of Products
end-return
end-perform.
The if statement in Process-One-Product is then redundant.
71
Join Algorithms
We can estimate the time taken by the nested-loops method as follows: The Suppliers file is read
once. This should take 100¥10ms = 1 second. The Products file can be read once in 1,000¥10ms =
10!seconds, but it must be read many times: once for each supplier. Since there are 500 suppliers,
5,000 seconds will be spent reading the Products file, so the total time taken is 5,001 seconds — over
1!hour and 23!minutes.
It is worth asking what would happen if the order of the nested loops were reversed. At first sight, we
might expect there to be no change in performance. With 500 suppliers and 6,000 products, there are
bound to be 3,000,000 reads in the inner loop, whichever order the loops are nested. Even so, there is
a difference: Reading the Products file once takes 10 seconds, and reading the Suppliers file 6,000
times takes 6,000 seconds, so the total is 6,010 seconds. Our first choice was better because there are
more product records than supplier records per sector. The question is how many blocks have to be
read, not how many records.
Using the same assumptions, how long should the random-access method take? The Products file is
read sequentially, taking 10 seconds, as before. The Suppliers file is read randomly, once for each
product record. If we assume that there is negligible chance of two successive supplier records lying
in the same sector, each access to the Suppliers file will take 10ms, so the total time spent accessing it
will be 6,000¥10ms = 60 seconds. The total access time, including 10 seconds for the Products file, is
70!seconds.
To estimate the time taken by the sort-merge method, we need to know how long it takes to sort the
Products file. Since we have not yet discussed how sorting is done, we need to take a certain amount
on trust at this stage. We will assume that the unsorted Products file is copied to the work file, then the
work file records are read by the sort output procedure. We assume that all these transfers are made
one sector at a time. The time taken to sort the Products file is therefore 10 seconds to read it, 15
seconds to write it to the work file (including read-after-write verification), and 10 seconds to read the
work file: 35 seconds in all. Since the Suppliers file is read sequentially, the total access time is
36!seconds.
The skip-sequential method is likely to take exactly the same time. It would be faster only if one or
more whole sectors of the Suppliers file were skipped, which is most improbable in this case.
We therefore have the following estimates:
Nested-loops 5,001 seconds
Random-access 70 seconds
Sort-merge 36 seconds
Skip-sequential 36 seconds
It would be wrong to conclude form this that the sort-merge method is always the best choice. First,
although the nested-loops method is certainly the least efficient, it can make many-to-many joins,
while the other methods can only make one-to-one or one-to-many joins. Consequently, nested-loops
is the method of last resort, being used only when more efficient methods cannot.
Second, before rushing to premature conclusions about the other three methods, consider what
happens if the Suppliers file is 100 times bigger. The random-access method is unaffected by this
change, but since it now takes 100 seconds to read all 10,000 sectors of the Suppliers file sequentially,
the sort-merge method takes 135!seconds. The skip-sequential method cannot possibly need to read
more than 6,000 sectors from the Suppliers file, so it takes at most 95 seconds. The sort-merge method
moves from equal first to third place, and the random-access method moves from third to first place.
The nested-loops method maintains its fourth place, taking over 5 days.
To a close approximation, the random access method will read Rc blocks of this file (one per child
record), whereas the sort-merge method must read all Bp blocks (which is constant). Random access is
therefore likely to be faster when Rc < Bp, and sequential access is likely to be faster when R c > Bp. If
Rc = Bp, this is called the break-even point.
Break-even occurs when the number of child records equals the number of parent blocks.
Since records are typically much smaller than blocks, at the break-even point the child file is typically
much smaller than the parent file, so this justifies ignoring the time taken to read it or sort it.
We may test this rule in the case of the Products and Suppliers files. Since the Suppliers file contains
100 blocks, at the break-even point the Products file contains 100 records, or 14 blocks. Reading the
Products file then takes 0.14 seconds, and sorting it takes 0.49 seconds. At the break-even point, read-
ing the Suppliers file randomly or sequentially takes 1 second. Therefore, the random-access method
takes 1.14 seconds, and the sort-merge method takes 1.49 seconds, so the approximation is close, but
not exact. Actually, with so few records, the whole Products file would occupy less than 7KB. In
these circumstances the Cobol sort algorithm would almost certainly discard the work file, and sort the
Products file in main memory. The two methods would then have exactly the same performance.
If the hit rate is so low that no parent record is accessed twice, it equals Rc ÷ Rp, where Rp is the
number of records in the parent file. By dividing the equation Rc = Bp by Rp, we get Rc ÷ Rp = Bp ÷ Rp.
The ratio Rp ÷ B p is the number of records per block of the parent file, or its blocking factor. This
suggests the following alternative rule:
Break-even occurs when the hit rate equals the inverse of the parent file blocking factor.
Since the Suppliers file has a blocking factor of 5 records per sector, the break-even hit rate must be
0.2, or 20%. Since it contains 500 records, this hit-rate occurs when there are 100 product records, as
before.
This second formulation of the break-even point stresses that increasing the blocking factor will
reduce the hit rate needed to reach break-even. The more we increase the blocking factor of the parent
file, the less blocks it will have, the less seeks will occur, and the faster the sequential method will
become. For this reason, files are often written in blocks of many sectors. What limits block size is
either the length of a track on disk, or the amount of main memory that can be set aside to buffer a
block. Transfer time usually is only a small fraction of total access time, so larger blocks do not slow
random accesses much.
Usually the choice of join method is clear cut. If there is no parent-child relationship, nested-loops is
the only choice, If there is, the hit-rate will usually determine that random-access or sort-merge is the
clear winner. For example, the parent file may have a blocking factor of 20, with a break-even hit-rate
of 5%, but the actual hit rate may be 50%. In the rare case the two methods take much the same time,
it usually doesn’t matter which we choose. In the even rarer case that the choice is critical, there is
really no substitute for experiment. The skip-sequential method is always a safe choice: It mimics the
random-access method when the hit-rate is low, and it mimics the sort-merge method when it is high.
73
Join Algorithms
IDENTIFICATION DIVISION.
Program-ID. lkupjoin.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Suppliers assign to "newsupp.ndx",
organization is indexed,
record key is Account of Suppliers,
access is sequential.
select Products assign to "newprod.ndx",
organization is indexed,
record key is Item-No of Products,
access is sequential.
DATA DIVISION.
File Section.
FD Suppliers.
copy "supplier.cbl".
FD Products.
copy "product.cbl".
Working-Storage Section.
01 Supplier-Table.
02 Supplier-Entry occurs 1000 times.
copy "supplier.cbl" replacing 01 by 03, 02 by 04, 03 by 05.
copy "editprod.cbl".
77 No-Of-Suppliers pic 9999.
77 Item pic 9999.
77 Least pic 9999.
77 Most pic 9999.
copy "editsupp.cbl".
An array of 1,000 entries is declared in the working storage, along with a few auxiliary variables.
Each entry has the same format as a supplier record. Unfortunately, the level numbers within the
copied text need to be adjusted in this context, so we use ‘copy … replacing …’. This makes a
textual replacement of ‘01’ by ‘03’, ‘02’ by ‘04’ and ‘03’ by ‘05’. We need to be careful. Replacing
‘1’ by ‘3’, ‘2’ by ‘4’ and ‘3’ by ‘5’ could have a disastrous side-effect: pic!x(30) could be changed to
pic!x(50). This is probably the best argument for writing ‘01’ instead of ‘1’, etc.
Cobol arrays are always numbered from 1 onwards, so the elements have indices from 1 to 1,000:
The usual read loop is preceded by loading the Suppliers file into working storage:
PROCEDURE DIVISION.
Join-Suppliers-and-Products.
perform Load-Look-Up-Table
open input Products
perform Get-Next-Product
perform until Item-No of Products = high-values
perform Process-One-Product
perform Get-Next-Product
end-perform
close Products
stop run.
Loading the array requires another read loop. At the end of the loop the supplier records lie in
locations 1 to ‘No-Of-Suppliers’. If there are more than 1,000 supplier records, the program fails.
Because an indexed sequential file is stored in order of its primary key, the resulting table is in account
code order:
74
Join Algorithms
Load-Look-Up-Table.
open input Suppliers
move zero to No-Of-Suppliers
perform Get-Next-Supplier
perform until Account of Suppliers = high-values
add 1 to No-Of-Suppliers
move Supplier of Suppliers
to Supplier of Supplier-Entry (No-Of-Suppliers)
perform Get-Next-Supplier
end-perform
close Suppliers.
‘Process-One-Product’ displays the product record, then attempts to find the matching supplier by
executing ‘Binary-Search’.
Process-One-Product.
perform Display-Product
perform Binary-Search
if Least > Most
display Supplier of Products,
" is not on the supplier file."
display spaces
else
perform Display-Supplier
end-if.
Binary-Search.
move 1 to Least
move No-Of-Suppliers to Most
compute Item = (Least + Most) / 2
perform until Least > Most or
Account of Supplier-Entry (Item) = Supplier of Products
if Account of Supplier-Entry (Item) < Supplier of Products
add 1, Item giving Least
else
if Account of Supplier-Entry (Item)
> Supplier of Products
subtract 1 from Item giving Most
end-if
end-if
compute Item = (Least + Most) / 2
end-perform.
In case you are not already familiar with binary search, here is how it works: ‘Least’ and ‘Most’ store
the lowest and highest positions of the table between which the desired supplier record might lie.
Equally, the record can neither lie below ‘Least’ nor above ‘Most’. The search progresses by bringing
‘Least’ and ‘Most’ closer together, narrowing the interval containing the desired account. To do this, it
tests an element, ‘Item’, within the range ‘Least’ to ‘Most’. Any element would do, but the one at the
mid-point between ‘Least’ and ‘Most’ is best because this halves the interval. If the account number of
the mid-point element is less than the desired account, the position above it is the lowest where the
desired element can lie. If it is greater than the desired account, the position below it is the highest
where the desired element can lie. If it is equal to the desired account, the proper supplier has been
found. If the account number of the product is not in the table at all, the values of ‘Least’ and ‘Most’
will eventually cross.
The method is efficient because each iteration halves the area of the table where the desired element
might lie. We assume there are 500 supplier records, as before. 500 can be halved only 9 times before
reaching 1, so the search makes at most 9 iterations even on an unsuccessful search. A simple linear
search of the table taking each element in turn would take an average of 250 iterations to find an
element, and 500 on an unsuccessful search. In general, the number of iterations in ‘Binary-Search’
increases only with the logarithm of the size of the array.
Cobol actually provides a verb, ‘search … all …’, that allows a binary search to be written as a single
statement. Unfortunately, explaining how to use it here would involve a long digression into features
of Cobol that are otherwise irrelevant to us. But it’s nice to know it’s there!
75
Join Algorithms
Display-Product.
copy "dispprod.cbl".
Get-Next-Supplier.
read Suppliers record
at end
move high-values to Account of Suppliers
end-read.
Get-Next-Product.
read Products record
at end
move high-values to Item-No of Products
end-read.
Since the supplier record that should be displayed is in the table, not the Suppliers file, copying
‘dispsupp.cbl’ will not work correctly unless the word ‘Suppliers’ is replaced by ‘Supplier-
Entry!(Item)’, so again we must use ‘copy … replacing …’.
Giving the table space for 1,000 supplier records is generous compared with our test population.
What would happen if the file were much bigger? We could increase the number of entries in the
table. Many compilers refuse to create tables above a certain size, for example, 65,536 (216) entries.
(The supplier table would then occupy over 6MB.) Some compilers allow bigger tables, but it is
important to avoid using virtual memory. Virtual memory is implemented by using secondary storage
to make RAM look bigger than it is. Using virtual memory would mean that at least part of the array
was really on disk, and efficiency would suffer.
The table is effectively a local cache for the Suppliers file. Operating systems and disk drives often
provide caches to store recently used records. The program uses the table a bit more intelligently than
an operating system uses a cache, because the operating system might discard a supplier record in
preference to a more recently used product record. This is not a good idea, because product records are
only used once, but the operating system cannot know this.
Even so, if the Suppliers file is actually smaller than the operating system’s disk cache, the random-
access method is likely to have much the same performance as using an internal table. Although the
program would access each supplier record many times, after the first access the record would usually
be fetched from the cache rather than the disk. In other words, the effort of writing the ‘lkupjoin’
program could have been saved simply by making the operating system’s cache big enough, or perhaps
by requesting enough buffer space in some other way.
On the other hand, if the parent file is much bigger than the size of RAM, neither a table or a cache
can be expected to work well. For example, if the file is 5 times bigger than the cache, we might
expect 1 access in 5 to hit the cache, and 4 accesses in 5 to have to read from disk. In theory, the cache
would only reduce access time by 20%.
In practice, the system might perform better than this, because of what is known as the 80-20 law.
This ‘law’ says that 80% of the accesses occur to the active 20% of the records, and 20% of the
accesses occur to the inactive 80% of the records. If we assume this law applies to the Suppliers file,
we would expect 80% of the cache to contain active records and 20% to contain inactive records, in
proportion to the numbers of accesses. Assuming that the file is 5 times bigger than the cache, as
before, 80% of the cache would hold 80% of the active records, but the remaining 20% of the cache
could hold only 5% of the inactive records. Thus 4 accesses in 5 would have 80% chance of hitting the
cache, and the remaining 1 in 5 would have a 5% chance. Overall, 65% of accesses would hit the
cache and only 35% would need to read from disk. The cache might reduce access time by 65%.
The 80-20 law is claimed to be recursive, so that 80% of 80% of the accesses occur to 20% of 20% of
the records, and so on. In other words, 64% of the accesses occur to 4% of the records. We can
virtually guarantee that these records will remain in the cache at all times, implying that only about
20% of accesses would result in disk activity. In this particular case the cache would make the random-
access method five times faster.
76
Join Algorithms
These statistical complications make it difficult to estimate the effect of a cache on actual data,
especially if the operating system shares the same cache space between many programs. However, it is
clear that the bigger the file is compared the cache, the less effective the cache can be. Test files are
likely to be so small that caching will obscure any performance differences between the methods. If
we attempted to measure the number of accesses used by each method, we would almost certainly get
silly answers. Even the same program might give better results when it was run a second time. Only
realistic data can give reliable results.
Since the table look-up program’s performance is limited by disk access, it actually makes little
difference whether its internal search is efficient. If we had used a linear search to find the matching
supplier, the program would have been analogous to the nested-loops method. Indeed, a similar
internal table could also be used to speed up the many-to-many join of Suppliers and Customers on
‘Suburb’ discussed earlier. The only restriction is that one of the files to be joined should fit into main
memory. The same consideration applies to a cache: If the cache was bigger than the whole Suppliers
file, the nested-loops method and the random-access methods would have virtually the same
performance.
A development of this idea can be used to implement a fast version of the nested-loops method that
works even if one file is too big to fit into main memory. This time there are three loops. In the outer
loop, the program reads as much of the Suppliers file as will fit into its internal table. In the two inner
loops it reads a record from the Products file and forms its join with all the records in the table. It then
reads the next product record and joins it with the records in the table, and so on, until the Products file
is exhausted. It then returns to the outer loop, and fills the table from the next section of the Suppliers
file. The table is joined with the Products file as before, and the process is repeated until the whole
Suppliers file has been read.
In the case of equi-joins, the same trick can be combined with the sort-merge method. Suppose we
want to join the Customers and Suppliers files on ‘Suburb’, as before. If our program sorts both files
on Suburb, it will group all records with the same suburb together, and each file will consist of a series
of suburb groups. The join can only contain records in matching groups. We read the first suburb
group from the Suppliers file into a table, then join it with the records from the Customers file that
belong to the same suburb group. We then read the next suburb group from the Suppliers file into the
table and join it with the matching group from the Customers file, and so on, until all the suburbs have
been processed. If it should happen that there are so many suppliers in one suburb that the table is too
small to hold them, the program must read as many as it can, and read through the corresponding
customer records more than once. If this is to be done efficiently, the program must be able to return
to the start of the current group in the file and read it again.
77
Updating
10 Updating
Updating a file means changing the data stored in its records. Updating can occur on a periodic basis,
or it can occur as the result of transactions. In some cases, new records can be added to the file, or
existing records can be deleted. Combining these options with different access modes or different join
algorithms leads to a wide range of possibilities. We shall start with the simplest.
Earlier, we saw how the skip-sequential join could be used to tell Serv-U-Rite what products needed
reordering. Let us suppose that Serv-U-Rite followed these recommendations exactly, so the products
are now on order. We therefore need to update the On-Order amounts in the Products file. We can do
this by creating a new copy of the file or by altering the existing copy.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Old-Products assign to "oldprod.ndx"
organization is indexed,
record key is Item-No of Old-Products
access is sequential.
select Products assign to "newprod.ndx"
organization is indexed,
record key is Item-No of Products
access is sequential.
DATA DIVISION.
File Section.
FD Old-Products.
copy "product.cbl".
FD Products.
copy "product.cbl".
Working-Storage Section.
copy "editprod.cbl".
The procedure division starts in the familiar fashion:
PROCEDURE DIVISION.
Process-All-Products.
display "Updating and copying product records ..."
display spaces
open input Old-Products, output Products
perform Get-Next-Product
perform until Item-No of Old-Products = high-values
perform Process-One-Product
perform Get-Next-Product
end-perform
close Old-Products, Products
display "Update complete."
stop run.
Process-One-Product must copy the record in the input area to the output record area, modify it as
necessary, then write it to the new copy of the file. It also displays the modified record. Depending on
the Cobol run-time system, it may be important to display the record before writing it, because the
contents of the output area are undefined after the write statement.
78
Updating
Process-One-Product.
move Product of Old-Products to Product of Products
if Stock of Products + On-Order of Products
< Reorder-Level of Products
add Reorder-Qty of Products to On-Order of Products
end-if
perform Display-Product
write Product of Products
invalid key
display "Write error on above record."
stop run
end-write.
There are two ways an error could occur on writing a record: there is a logic error in the program, or
there is something wrong with the hardware. Either way, the wisest thing is to stop the program.
The rest is routine:
Get-Next-Product.
read Old-Products next record
at end
move high-values to Item-No of Old-Products
end-read.
Display-Product.
copy "dispprod.cbl".
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Products assign to "newprod.ndx"
organization is indexed,
record key is Item-No of Products
access is sequential.
DATA DIVISION.
File Section.
FD Products.
copy "product.cbl".
Working-Storage Section.
copy "editprod.cbl".
The procedure division has the usual form, except Products is opened in i-o (input-output) mode:
PROCEDURE DIVISION.
Process-All-Products.
display "Updating and copying product records ..."
display spaces
open i-o Products
perform Get-Next-Product
perform until Item-No of Products = high-values
perform Process-One-Product
perform Get-Next-Product
end-perform
close Products
display "Update complete."
stop run.
79
Updating
If a product should be placed on order, its record is updated, then the record is rewritten to the file,
using rewrite. (A write statement would attempt to create a new record.)
Process-One-Product.
if Stock of Products + On-Order of Products
< Reorder-Level of Products
add Reorder-Qty of Products to On-Order of Products
perform Display-Product
rewrite Product of Products
invalid key
display "Rewrite error on above record."
stop run
end-rewrite
end-if.
The rest is familiar territory:
Get-Next-Product.
read Products next record
at end
move high-values to Item-No of Products
end-read.
Display-Product.
copy "dispprod.cbl".
A very common application is one that updates a master file in response to a batch of recorded
transactions. This scheme was explained earlier in a run diagram, repeated above. For example, a
Delivery transaction (3) should increase the stock on hand of the product concerned, decrease the
number of items on order, update the valuation of the stock, and increase the balance owed to the
supplier. The product file (9) is updated first (4), producing a stock report (7). Then the supplier file
(10) is updated (6), and another report (8) is produced:
Blue Waters Nominees B007
1 Francis Road
Paramatta NSW 2150
Opening balance $120,459.56
Closing balance $120,459.56
80
Updating
The above report shows the financial activity for each supplier: There is no activity for account
B007, which is currently owed $120,459.56. Supplier N001 has supplied 50 products with Item-No
PLPDCD, costing a total of $14,950.00. This has increased the balance owing to them from $127.99
to $15077.99. Supplier N002 has received a payment of $2,790.46, bringing their balance owing to
zero, then an overpayment of $50.00, resulting in a credit of $50.00.
In order to produce this report, the two programs must be connected by a transfer file. This file
contains two kinds of record:
01 Delivery.
02 Time-Stamp.
03 YY-MM-DD pic 9(6).
03 HH-MM-SS pic 9(6).
02 Kind pic x.
02 Account pic a999.
02 Item-No pic x(6).
02 Qty-Delivered pic 9999.
02 Cost pic 9(6)v99.
02 Description pic x(40).
01 Payment.
02 Time-Stamp-2.
03 YY-MM-DD-2 pic 9(6).
03 HH-MM-SS-2 pic 9(6).
02 Kind-2 pic x.
02 Account-2 pic a999.
02 Amount pic 9(6)v99.
The ‘Delivery’ record is similar to that in the Updates file, but it also includes ‘Description’. When
the product file is updated, the description of the product is copied into the transfer record, so that it
can appear on the supplier report. (This is called extending a transaction.) The ‘Payment’ record is
essentially the same as in the Updates file, except that it no longer needs a dummy Item-No.
Creating the transfer file is essentially a matter of joining the Products and Updates files. However,
the program must also update the product records and display a report. The report shows each delivery
transaction, followed by the updated product record:
Updating Products ...
and so on ...
81
Updating
PROCEDURE DIVISION.
Process-All-Products.
display "Updating Products ..."
display spaces
open input Updates, i-o Products, extend Transfers
perform Get-Next-Update
move Item-No of Updates to Current-Item-No
perform until Current-Item-No = high-values
perform Get-Product-Status
perform Process-One-Update
perform Record-Product-Status
perform Get-Next-Update
move Item-No of Updates to Current-Item-No
end-perform
close Updates, Products, Transfers
display "Update complete."
stop run.
The Updates file is read sequentially in the usual way:
Get-Next-Update.
read Updates next record,
at end
move high-values to Item-No of Updates
end-read.
82
Updating
Finding the initial status of the product record means moving ‘Current-Item-No’ to the record area,
then attempting to read the matching product. However, Payment transactions have low-valued item
numbers, so the program must avoid trying to read products for them:
Get-Product-Status.
if Current-Item-No not = Dummy-Item-No
move Current-Item-No to Item-No of Products
read Products record,
invalid key
move Product-Missing to Product-Status
not invalid
move Product-Exists to Product-Status
end-read
else
move Product-Missing to Product-Status
end-if.
The processing of an update transaction depends on the kind of transaction and ‘Product-Status’.
There are three cases to consider: a delivery for a product that exists, a payment, and a delivery for a
non-existent product. In the first case, there are two potential error conditions that need to be checked.
Serv-U-Rite have deemed that they should be processed as follows. If the number of items delivered
exceeds the number on order, the delivery transaction is rejected. If the account of the supplier from
whom the delivery is received is different from the supplier account in the product record, then the
condition is flagged, but the delivery is accepted.
Process-One-Update.
evaluate Kind of Updates also Product-Status
when Delivery-Code also Product-Exists
perform Display-Update
if Qty-Delivered of Updates > On-Order of Products
display "More delivered than ordered. Ignored."
else
if Account of Updates not = Supplier of Products
display "Supplier is not the expected one."
end-if
add Qty-Delivered of Updates to Stock of Products
subtract Qty-Delivered of Updates
from On-Order of Products
add Cost of Updates to Valuation of Products
move Time-Stamp of Updates to Time-Stamp of Transfers
move Kind of Updates to Kind of Transfers
move Account of Updates to Account of Transfers
move Item-No of Updates to Item-No of Transfers
move Qty-Delivered of Updates to Qty-Delivered of Transfers
move Cost of Updates to Cost of Transfers
move Description of Products
to Description of Transfers
write Delivery of Transfers
end-if
perform Display-Product
when Payment-Code also any
move Time-Stamp of Updates to Time-Stamp of Transfers
move Kind of Updates to Kind of Transfers
move Account of Updates to Account of Transfers
move Amount of Updates to Amount of Transfers
write Payment of Transfers
when any also Product-Missing
perform Display-Update
display "Product update ignored. "
Current-Item-No " is not on file."
end-evaluate.
Apart from updating and displaying the product record in response to valid deliveries, it is also
necessary to write transfer records for deliveries and payments.
83
Updating
Finally, recording the updated product requires a rewrite—but only if the product exists:
Record-Product-Status.
if Product-Status = Product-Exists
rewrite Product of Products
invalid key
display "Error rewriting Product record."
perform Dump-and-Quit
end-rewrite
end-if.
Display-Update.
copy "dispupd8.cbl".
Display-Product.
copy "dispprod.cbl".
Cobol requires an invalid key clause to be specified here. Since there is no way it should ever be
invoked, it is all the more important to deal with it carefully:
Dump-and-Quit.
display "Current-Item-No: ", Current-Item-No,
", Product-Status: ", Product-Status,
display "Product record: "
perform Display-Product
display "Update record: "
perform Display-Update
stop run.
Like the random access join, this program only reads product records that are updated. It is therefore
efficient if the fraction of updated records is low.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Old-Products assign to "oldprod.ndx"
organization is indexed,
record key is Item-No of Old-Products
access is sequential.
select Products assign to "newprod.ndx"
organization is indexed,
record key is Item-No of Products
access is sequential.
select Unsorted-Updates assign to "updates.seq"
organization is sequential.
select Updates assign to "work.tmp".
select optional Transfers assign to "transfer.seq"
organization is sequential.
‘Old-Products’ is the original version of the Products file, and ‘Products’ is the new updated version.
84
Updating
Working-Storage Section.
77 Current-Item-No pic x(6).
77 Product-Status pic x.
77 Product-Exists pic x value "Y".
77 Product-Missing pic x value "N".
copy "editupd8.cbl".
copy "editprod.cbl".
copy "constant.cbl".
The procedure division uses a sort output procedure:
PROCEDURE DIVISION.
Process-All-Products.
display "Sorting Updates ..."
sort Updates on ascending Item-No of Updates,
Time-Stamp of Updates
using Unsorted-Updates
output procedure Process-Sorted-Updates
stop run.
Since the sort operation will place all the low-valued records at the start of the sequence, the program
can deal with them before the main loop. Sorting also groups all the deliveries for the same product.
Therefore the program has two levels of loop. The sort key includes ‘Time-Stamp’. This means the
transactions for each group will be applied in the intended order.
Process-Sorted-Updates.
display "Updating Products ..."
display spaces
open input Old-Products, output Products, extend Transfers
perform Get-Next-Update
perform until Item-No in Updates not = Dummy-Item-No
perform Process-One-Update
perform Get-Next-Update
end-perform
perform Get-Next-Product
perform Choose-Current-Item-No
perform until Current-Item-No = high-values
perform Get-Product-Status
perform until Item-No of Updates not = Current-Item-No
perform Process-One-Update
perform Get-Next-Update
end-perform
perform Record-Product-Status
if Item-No of Old-Products = Current-Item-No
perform Get-Next-Product
end-if
perform Choose-Current-Item-No
end-perform
close Old-Products, Products, Transfers
display "Update complete.".
85
Updating
The next few paragraphs are exactly the same as in the sort-merge join:
Choose-Current-Item-No.
if Item-No of Updates < Item-No of Old-Products
move Item-No of Updates to Current-Item-No
else
move Item-No of Old-Products to Current-Item-No
end-if.
Get-Next-Product.
read Old-Products next record,
at end
move high-values to Item-No of Old-Products
end-read.
Get-Next-Update.
return Updates record,
at end
move high-values to Item-No of Updates
end-return.
Get-Product-Status.
if Item-No of Old-Products = Current-Item-No,
move Product-Exists to Product-Status
move Product of Old-Products to Product of Products
else
move Product-Missing to Product-Status
end-if.
The text of ‘Process-One-Update’ is exactly the same as in the random access method just considered.
This is because we have been careful to separate the input-output logic from the business logic—a
really important idea. In a real-world application program, ‘Process-One-Update’ would deal with
many more cases than this example, whereas the input-output logic would be no more complex. The
input-output logic is generic, but ‘Process-One-Update’ contains the specific business rules. Keeping
these two concerns separate makes it possible to change the join method and updating mode without
affecting, or having to rewrite, any application-dependent code.
Finally, the product record, if any, needs to be written to the Products file:
Record-Product-Status.
if Product-Status = Product-Exists
write Product of Products
invalid key
display "Error writing Product record."
perform Dump-and-Quit
end-rewrite
end-if.
Display-Update.
copy "dispupd8.cbl".
Display-Product.
copy "dispprod.cbl".
Dump-and-Quit.
display "Current-Item-No: ", Current-Item-No,
", Product-Status: ", Product-Status,
display "Product record: "
perform Display-Product
display "Update record: "
perform Display-Update
stop run.
86
Updating
87
Updating
01 Open-Account.
02 Time-Stamp-2.
03 YY-MM-DD-2 pic 9(6).
03 HH-MM-SS-2 pic 9(6).
02 Kind-2 pic x.
02 Account-2 pic a999.
02 Address.
03 Name pic x(30).
03 Street pic x(30).
03 Suburb pic x(30).
01 Close-Account.
02 Time-Stamp-2.
03 YY-MM-DD-2 pic 9(6).
03 HH-MM-SS-2 pic 9(6).
02 Kind-2 pic x.
02 Account-2 pic a999.
Because the Transfers file has fewer kinds of records than the Updates file, and is therefore simpler,
we shall begin by using it to update the Suppliers file. As in the previous example, we use the sort-
merge join and copy mode updating. The program produces a report of each supplier’s business
activity, as shown earlier.
IDENTIFICATION DIVISION.
Program-ID. sminsdel.
* Update the Suppliers from the Transfers file using sort-merge.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select optional Old-Suppliers assign to "oldsupp.ndx"
organization is indexed,
record key is Account of Old-Suppliers
access is sequential.
select Suppliers assign to "newsupp.ndx"
organization is indexed,
record key is Account of Suppliers
access is sequential.
select Unsorted-Transfers assign to "transfer.seq"
organization is sequential.
select optional Transfers assign to "work.tmp".
The working-storage section of the data division includes a description of how payments and
deliveries are set out in the report. This approach makes it easy to space its columns correctly.
DATA DIVISION.
File Section.
FD Unsorted-Transfers.
copy "transfer.cbl".
SD Transfers.
copy "transfer.cbl".
FD Old-Suppliers.
copy "supplier.cbl".
FD Suppliers.
copy "supplier.cbl".
Working-Storage Section.
copy "constant.cbl"
01 Detail-Line.
02 YY-MM-DD pic 99/99/99.
02 pic x value space.
02 Qty-Delivered pic z,zz9.
02 pic x value space.
02 Item-No pic x(6).
02 pic x value space.
02 Description pic x(40).
02 pic x value space.
02 Debit pic $$$$,$$$.$$.
02 pic x value space.
02 Credit pic $$$$,$$$.$$.
02 pic x value space.
02 Balance pic $$$$,$$9.99bCR.
88
Updating
77 Current-Account pic a999.
77 Supplier-Status pic x.
77 Supplier-Exists pic x value "Y".
77 Supplier-Missing pic x value "N".
copy "editxfr.cbl".
The procedure division uses a sort output procedure, as before:
PROCEDURE DIVISION.
Process-All-Suppliers.
display "Sorting Transfers ..."
sort Transfers on ascending Account of Transfers,
ascending Time-Stamp of Transfers,
using Unsorted-Transfers
output procedure Process-Sorted-Transfers
stop run.
Because of the sort operation, the transfers are ordered by ‘Time-Stamp’ within ‘Account’. This
suggests the output procedure should contain three levels of loop. However, since each transaction
should have a unique time stamp, the file can only be grouped by ‘Account’. A ‘Time-Stamp’ group
can only contain one record. Therefore only two levels of loop are needed.
Process-Sorted-Transfers.
display "Updating Suppliers ..."
display spaces
open input Old-Suppliers, output Suppliers
perform Get-Next-Transfer
perform Get-Next-Supplier
perform Choose-Current-Account
perform until Current-Account = high-values
perform Get-Supplier-Status
perform Start-One-Account
perform until Account of Transfers not = Current-Account
perform Process-One-Transfer
perform Get-Next-Transfer
end-perform
perform End-One-Account
perform Record-Supplier-Status
if Account of Old-Suppliers = Current-Account
perform Get-Next-Supplier
end-if
perform Choose-Current-Account
end-perform
close Old-Suppliers, Suppliers
display "Update complete.".
Since this program has some reporting to do at the start and end of each group, ‘Start-One-Account’
and ‘End-One-Account’ are written as separate procedures, to keep the main output procedure as
clean as possible, and separate it from the business logic.
The next few paragraphs are routine:
Choose-Current-Account.
if Account of Transfers < Account of Old-Suppliers
move Account of Transfers to Current-Account
else
move Account of Old-Suppliers to Current-Account
end-if.
Get-Next-Supplier.
read Old-Suppliers next record,
at end
move high-values to Account of Old-Suppliers
end-read.
Get-Next-Transfer.
return Transfers record,
at end
move high-values to Account of Transfers
end-return.
89
Updating
Get-Supplier-Status.
if Account of Old-Suppliers = Current-Account,
move Supplier-Exists to Supplier-Status
move Supplier of Old-Suppliers to Supplier of Suppliers
else
move Supplier-Missing to Supplier-Status
end-if.
Before considering ‘Process-One-Transfer’, let’s look at ‘Record-Supplier-Status’. It writes a record
only if ‘Supplier-Status’ equals ‘Supplier-Exists’:
Record-Supplier-Status.
if Supplier-Status = Supplier-Exists
write Supplier of Suppliers
end-if.
Processing Delivery and Payment transfer records is simple enough. However, opening a new
supplier account requires that there should not be an existing supplier record for the account, ie,
‘Supplier-Status’ should equal ‘Supplier-Missing’. Now here’s the trick. The procedure initialises the
output record area, then sets ‘Supplier-Status’ to ‘Supplier-Exists’. First, this allows the record area to
be updated by payments and deliveries. Second, it will force ‘Record-Supplier-Status’ to write the
supplier record to the new Suppliers file, as soon as the group of updates finishes.
The complementary trick is that to close an account it is merely necessary to set ‘Supplier-Status’ to
‘Supplier-Missing’. Then ‘Record-Supplier-Status’ will fail to write it to the new file. Note that, in
one batch of transactions, it is even safe to open, update and close the same account several times.
Sorting on ‘Time-Stamp’ within ‘Account’ ensures that everything will be done in the right order.
Process-One-Transfer.
evaluate Kind of Transfers also Supplier-Status
when Delivery-Code also Supplier-Exists
add Cost of Transfers to Balance of Suppliers
perform Debit-Detail
when Payment-Code also Supplier-Exists
subtract Amount of Transfers from Balance of Suppliers
perform Credit-Detail
when Open-Account-Code also Supplier-Missing
move Account of Transfers to Account of Suppliers
move Address of Transfers to Address of Suppliers
move zero to Balance of Suppliers
move Supplier-Exists to Supplier-Status
display "Account opened."
perform Start-One-Account
when Close-Account-Code also Supplier-Exists
if Balance of Suppliers not = zero
display "Can't close account. Balance is non-zero."
else
move Supplier-Missing to Supplier-Status
perform End-One-Account
display "Account Closed."
end-if
when Open-Account-Code also Supplier-Exists
display "Can't open account. It already exists."
perform Display-Transfer
when Close-Account-Code also Supplier-Missing
display "Can't close account ", Current-Account,
". It doesn't exist."
perform Display-Transfer
when any also Supplier-Missing
display "Supplier Update ignored. Account not on file."
perform Display-Transfer
end-evaluate.
Display-Transfer.
copy "dispxfr.cbl".
The evaluate statement checks both the kind of transaction and ‘Supplier-Status’. After dealing with
the valid cases, it deals with attempts to open an account that already exists, close one that doesn’t, or
to update an account that isn’t open.
90
Updating
Finally, procedures are needed to report the changes that occur to each supplier:
Start-One-Account.
if Supplier-Status = Supplier-Exists
display Name of Suppliers space Current-Account
display Street of Suppliers
display Suburb of Suppliers
move spaces to Detail-Line
move "Opening balance" to Description of Detail-Line
move Balance of Suppliers to Balance of Detail-Line
display Detail-Line
else
display Current-Account " is closed."
end-if.
Debit-Detail.
move spaces to Detail-Line
move YY-MM-DD of Transfers to YY-MM-DD of Detail-Line
move Item-No of Transfers to Item-No of Detail-Line
move Qty-Delivered of Transfers
to Qty-Delivered of Detail-Line
move Description of Transfers to Description of Detail-Line
move Cost of Transfers to Debit of Detail-Line
move Balance of Suppliers to Balance of Detail-Line
display Detail-Line.
Credit-Detail.
move spaces to Detail-Line
move YY-MM-DD of Transfers to YY-MM-DD of Detail-Line
move "Payment" to Description of Detail-Line
move Amount of Transfers to Credit of Detail-Line
move Balance of Suppliers to Balance of Detail-Line
display Detail-Line.
End-One-Account.
if Supplier-Status = Supplier-Exists
move spaces to Detail-Line
move "Closing balance" to Description of Detail-Line
move Balance of Suppliers to Balance of Detail-Line
display Detail-Line
display spaces
else
display Current-Account " is closed."
end-if.
This program reports each supplier, even one that has no transactions. Serv-U-Rite consider this a
feature, as it is an unusual situation, and it is more noticeable if the supplier records are reported.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Products assign to "newprod.ndx"
organization is indexed,
record key is Item-No of Products
access is random.
select Unsorted-Updates assign to "updates.seq"
organization is sequential.
select Updates assign to "work.tmp".
select optional Transfers assign to "transfer.seq"
organization is sequential.
91
Updating
The data division introduces a new variable, ‘Original-Status’, whose use will be explained later:
DATA DIVISION.
File Section.
FD Unsorted-Updates.
copy "update.cbl".
SD Updates.
copy "update.cbl".
FD Products.
copy "product.cbl".
FD Transfers.
copy "transfer.cbl".
Working-Storage Section.
77 Current-Item-No pic x(6).
77 Product-Status pic x.
77 Original-Status pic x.
77 Product-Exists pic x value "Y".
77 Product-Missing pic x value "N".
copy "constant.cbl".
copy "editupd8.cbl".
copy "editprod.cbl".
The procedure division uses a sort output procedure. The Products file is opened in i-o (input-
output) mode. Because transactions that only affect the Suppliers file have low-valued item-numbers,
they are processed in a separate loop. The status of each product record is displayed before each group
of transactions, then the final updated status is displayed after the group:
PROCEDURE DIVISION.
Process-All-Products.
display "Sorting Updates ..."
sort Updates on ascending Item-No of Updates,
Time-Stamp of Updates,
using Unsorted-Updates
output procedure Process-Sorted-Updates
stop run.
Process-Sorted-Updates.
display "Updating Products ..."
display spaces
open i-o Products, extend Transfers
perform Get-Next-Update
perform until Item-No in Updates not = Dummy-Item-No
perform Process-One-Update
perform Get-Next-Update
end-perform
move Item-No of Updates to Current-Item-No
perform until Current-Item-No = high-values
perform Get-Product-Status
perform Display-Product-Status
perform until Item-No of Updates not = Current-Item-No
perform Process-One-Update
perform Get-Next-Update
end-perform
perform Display-Product-Status
perform Record-Product-Status
move Item-No of Updates to Current-Item-No
end-perform
close Products, Transfers
display "Update complete.".
The next few paragraphs are routine, except that ‘Product-Status’ is copied to ‘Original-Status’.
Get-Next-Update.
return Updates record,
at end
move high-values to Item-No of Updates
end-return.
92
Updating
Get-Product-Status.
move Current-Item-No to Item-No of Products
read Products record
invalid key
move Product-Missing to Product-Status
not invalid
move Product-Exists to Product-Status
end-read
move Product-Status to Original-Status.
Process-One-Update repeats the same trick of altering Product-Status to insert new product records or
to delete old ones. Otherwise, it is long, but straightforward:
Process-One-Update.
perform Display-Update
evaluate Kind of Updates also Product-Status
when Delivery-Code also Product-Exists
if Qty-Delivered of Updates > On-Order of Products
display "Item: ", Current-Item-No,
", delivery exceeds No. on order. Ignored."
perform Display-Update
else
if Account of Updates not = Supplier of Products
display "Supplier is not the expected one."
end-if
add Qty-Delivered of Updates to Stock of Products
subtract Qty-Delivered of Updates
from On-Order of Products
add Cost of Updates to Valuation of Products
move Time-Stamp of Updates to Time-Stamp of Transfers
move Kind of Updates to Kind of Transfers
move Account of Updates to Account of Transfers
move Cost of Updates to Cost of Transfers
write Delivery of Transfers
end-if
when Payment-Code also any
move Time-Stamp of Updates to Time-Stamp of Transfers
move Kind of Updates to Kind of Transfers
move Account of Updates to Account of Transfers
move Amount of Updates to Amount of Transfers
write Payment of Transfers
when New-Product-Code also Product-Missing
move Item-No of Updates to Item-No of Products
move Account of Updates to Supplier of Products
move Description of Updates to Description of Products
move Reorder-Level of Updates
to Reorder-Level of Products
move Reorder-Qty of Updates to Reorder-Qty of Products
move Price of Updates to Price of Products
move zeros to Stock of Products, On-Order of Products
move Product-Exists to Product-Status
when Withdraw-Product-Code also Product-Exists
evaluate true
when Stock of Products not = zero
display "Item: ", Current-Item-No,
" is still in stock. Ignored."
perform Display-Update
when On-Order of Products not = zero
display "Item: ", Current-Item-No,
" is currently on order. Ignored."
perform Display-Update
when other
move Product-Missing to Product-Status
end-evaluate
when Open-Account-Code also any
move Time-Stamp of Updates to Time-Stamp of Transfers
move Kind of Updates to Kind of Transfers
move Account of Updates to Account of Transfers
move Address of Updates to Address of Transfers
write Open-Account of Transfers
93
Updating
when Close-Account-Code also any
move Time-Stamp of Updates to Time-Stamp of Transfers
move Kind of Updates to Kind of Transfers
move Account of Updates to Account of Transfers
write Close-Account of Transfers
when New-Product-Code also Product-Exists
display "New product ignored. Item-No already on file."
perform Display-Update
when Withdraw-Product-Code also Product-Missing
display "Withdraw product ignored. Item-No not on file."
perform Display-Update
when any also Product-Missing
display "Product update ignored. Item-No not on file."
end-evaluate.
Unlike the copy mode update, which just needs a write statement, Record-Product-Status needs a
write statement to insert a new record, a rewrite statement to update an existing record, and a delete
statement to delete existing records. The choice must be made by comparing the original state of the
product with its updated state. If ‘Product-Status’ changes from ‘Product-Missing’ to ‘Product-Exists’,
a new record must be written. If it changes from ‘Product-Exists’ to ‘Product-Missing’, the record
must be deleted. If it remains as ‘Product-Exists’, the record must be rewritten. If it remains as
‘Product-Missing’, nothing needs to be done. If the program had not saved the original value of
‘Product-Status’ in ‘Original-Status’, it wouldn’t know which action to take.
Record-Product-Status.
evaluate Original-Status also Product-Status
when Product-Exists also Product-Exists
rewrite Product of Products
invalid key
display "Error rewriting Products record."
perform Dump-and-Quit
end-rewrite
when Product-Exists also Product-Missing
delete Products record
invalid key
display "Error deleting Products record."
perform Dump-and-Quit
end-delete
when Product-Missing also Product-Exists
write Product of Products
invalid key
display "Error writing Products record."
perform Dump-and-Quit
end-write
end-evaluate.
Like read, a delete statement must specify a file, not a record.
The invalid key clauses are required by Cobol. There seems to be no way they can be triggered here.
Therefore, just in case, the program dumps all the useful information it can, then stops.
Dump-and-Quit.
display "Current-Item-No: ", Current-Item-No,
", Product-Status: ", Product-Status,
", Original-Status: ”, Original-Status
display "Product record: "
perform Display-Product
display "Update record: "
perform Display-Update
stop run.
The rest is routine:
Display-Product-Status.
if Product-Status = Product-Exists
perform Display-Product
else
display Current-Item-No " is not on file."
display spaces
end-if.
94
Updating
Display-Update.
copy "dispupd8.cbl".
Display-Product.
copy "dispprod.cbl".
It is equally possible to update a master file in random access mode. There are no new ideas
involved. The program is similar to the skip-sequential update, except that the sort operation could
omitted. However, the same product might be displayed several times, and care would have to be
taken when dealing with low-valued item numbers. But ‘Process-One-Update’ would remain exactly
the same, as indeed it would even if it was transplanted to a copy mode update program.
95