0% found this document useful (0 votes)

31 views95 pages

Cobol

basic information and Insight to cobal

Uploaded by

theisaacayobamihub

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views95 pages

Cobol

basic information and Insight to cobal

Uploaded by

theisaacayobamihub

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 95

File Algorithms in Cobol 85

Contents

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 KNOW YOUR ENEMY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 EXTREME ENGINEERING .......................................................................................... 4
2.2 SERIAL ACCESS AND RANDOM ACCESS....................................................................... 6
2.3 WHY INPUT-OUTPUT R EMAINS A PROBLEM ................................................................ 7
2.4 DISK TECHNOLOGY ................................................................................................ 7
2.5 OPERATING S YSTEM CONSIDERATIONS ....................................................................... 8
2.6 OTHER MEDIA ..................................................................................................... 10
2.7 WHAT’S NEXT? .................................................................................................... 11
3 COBOL BASICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1 THE MASTER FILE RECORDS .................................................................................. 12
3.2 THE MASTER FILES .............................................................................................. 14
3.3 AN EXAMPLE PROGRAM ........................................................................................ 15
3.3.1 The Identification Division .......................................................................... 16
3.3.2 The Environment Division ........................................................................... 16
3.3.3 The Data Division...................................................................................... 17
3.3.4 The Procedure Division .............................................................................. 17
3.3.5 Qualification............................................................................................. 19
4 AN EXAMPLE SUB-SYSTEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1 TRANSACTION F ILES ............................................................................................. 21
4.2 COPYING A TRANSACTION FILE............................................................................... 23
4.3 PUNCTUATION IN COBOL ....................................................................................... 25
4.4 MATTERS OF STYLE .............................................................................................. 25
4.5 THE C OPY STATEMENT ......................................................................................... 26
5 INTERACTING WITH THE OPERATOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.1 EDITED-NUMERIC ITEMS........................................................................................ 29
5.2 CONSTANTS ........................................................................................................ 33
5.3 THE TRUTH ABOUT MOVE ...................................................................................... 33
5.4 ACCEPT.............................................................................................................. 34
5.5 CREATING A S UPPLIERS F ILE .................................................................................. 34
5.6 CREATING A TRANSACTION FILE ............................................................................. 36
5.7 RANDOM ACCESS ................................................................................................. 39
6 PROJECTION AND SELECTION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.1 SELECTION ......................................................................................................... 42
6.2 PROJECTION ........................................................................................................ 43
6.3 COMBINING S ELECTION AND PROJECTION ................................................................. 45
7 ORDERING AND GROUPING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.1 SORTING ............................................................................................................ 46
7.2 ORDERING .......................................................................................................... 47
7.3 GROUPING .......................................................................................................... 48
7.4 STYLE AGAIN ...................................................................................................... 52
7.5 DUMMY KEYS ..................................................................................................... 53
Contents

8 SET UNION, INTERSECTION AND DIFFERENCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

8.1 FINDING DUPLICATE ACCOUNTS.............................................................................. 57
8.2 SET THEORY TERMINOLOGY .................................................................................. 59
9 JOIN ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
9.1 THE NESTED-LOOPS METHOD ................................................................................. 61
9.2 THE R ANDOM ACCESS METHOD .............................................................................. 64
9.3 THE S ORT-MERGE METHOD ................................................................................... 66
9.4 THE S KIP-SEQUENTIAL METHOD ............................................................................. 68
9.5 COMPARING THE METHODS .................................................................................... 71
9.6 THE B REAK-EVEN EQUATION ................................................................................. 72
9.7 THE EFFECT OF CACHING ....................................................................................... 73
10 UPDATING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
10.1 COPY MODE .................................................................................................... 78
10.2 IN SITU UPDATING ............................................................................................ 79
10.3 UPDATING FROM TRANSACTIONS ......................................................................... 80
10.4 RANDOM ACCESS UPDATING ............................................................................... 82
10.5 SEQUENTIAL ACCESS UPDATING .......................................................................... 84
10.6 UPDATING WITH INSERTIONS AND DELETIONS ......................................................... 87
10.7 SKIP -SEQUENTIAL UPDATING .............................................................................. 91

2
Introduction

1 Introduction
This is not the usual course on databases. Yes, it does cover the standard material on SQL, which you
will need if you are going to use a database properly. It also covers some Cobol. This may come as a
shock, especially if you thought Cobol was dead; the news of Cobol’s death has been grossly
exaggerated. Even today, about one-half of all source code is written in Cobol. Why do you need it?
Once, programmers wrote in machine code or assembler language. This was the first generation of
programming languages. Second generation languages knew how to evaluate arithmetic expressions or
format output. Third generation languages incorporated ideas like structured programming and type
checking to help protect programmers against common mistakes. It wasn’t hard to understand how the
first three generations of languages translated into machine code. All three were procedural; the pro-
grammer had to tell the computer what to do. Then came the fourth generation. In theory, no longer
did the programmer have to tell the computer how to solve the problem, the programmer stated the
problem, and the computer figured out how to solve it. Does this sound to good to be true?
There never was and never will be a way of solving all problems. What is possible is to identify
problems that fit a certain pattern, and to have a way of solving all problems of that kind. Discovering
patterns that can be solved is one way computer science makes progress. In 1970, E.F. Codd proved
that queries in an SQL-like language could be evaluated by a few basic operations on files. Codd’s
proof only showed it could be done; it didn’t show how to do it efficiently. Even now, we don’t know
how to find the fastest way to answer an SQL query, except by searching through many possibilities.
This has created a new situation. In third generation languages, the programmer can predict which
programs are efficient and which are not. But in the case of an SQL query, the programmer can only
guess, because it is up to the computer to find a good algorithm — at least in theory. In practice, many
database management systems are not very smart, and stating the same problem in different ways can
make a big difference to how fast the answer is found. As a result, the programmer has a good deal of
control over efficiency. This is important, because although computers go a lot faster today than they
did in the days of machine code, the disks on which they store their data don’t.
SQL was designed as a language for the ‘end user’. Someone with no knowledge of programming
would be able to answer complex questions just by querying a database. As many organisations found,
with very little training, end users were able to tie up computer resources so effectively that no serious
work could get done. To use SQL well, to be a computer scientist and not just a ‘dumb user’, you need
to understand operations on files. This is where Cobol comes in.
Cobol is a third-generation language with just the right range of features to show how databases work.
Cobol has been around since 1960, and has evolved steadily since. Many other languages we use
today are just as old; they just change their names. For example, ‘C’ became C++, and C++ became
Java, and it is possible to trace the evolution of most other languages in a similar way. Even so, Cobol
still has some features that definitely belong in the past. For the most part we shall ignore them.
Unfortunately, one ancient feature cannot be ignored: the layout of Cobol source programs is still
geared to 80-column punched cards. The sooner this anachronism is removed, the better.
Cobol is a big language, with many features. It is not the aim of this course to teach Cobol. It is the
aim of this course to teach some algorithms. These are the same algorithms an SQL database system
uses internally. It just happens that they are best expressed in Cobol. Even if you use a different
language, you will still need the same algorithms. The course will teach only enough Cobol to do the
job. It won’t teach you anything wrong, but it won’t tell you the whole truth. For that, you should
consult a Cobol programming manual or the official Cobol standard.
Apart from languages, what else does the course cover? Quite a range of things. We shall carry one
case study throughout the course: a stock control system, typical of many business systems. The case
study is simpler than a real system, ignoring sales taxes, sales representatives’ commissions, discounts,
and the like. It is reasonably complex though; it needs to be rich enough to provide many examples.
The same case study will be examined in Cobol, then in SQL. This will make a number of things
clear. We shall see how an SQL query can be much more concise than the corresponding Cobol
program. We shall see that Cobol can do things that SQL cannot. We shall understand how SQL
works, in terms of Cobol algorithms. In addition, we shall study disk drives, how files are
implemented, client-server databases, record locking, deadlock, and learn how to estimate the
performance of programs and queries.
3
Know Your Enemy

2 Know Your Enemy

Computer storage forms a hierarchy: registers, primary storage (RAM), secondary storage, and back
up. Each level of the hierarchy is cheaper, holds more data, and is slower. Registers, and RAM store
data as electrical currents or charges; secondary storage and back up media usually store data as
magnetised patterns. Primary storage is usually volatile: data is lost if the power is turned off.
Secondary storage is persistent: data can be retained for years even in the absence of a power source.
Persistent data is clearly important to organisations that want to track information over periods of
months or years. The set of data they store is usually referred to as their database.
Electronic computers and magnetic storage devices differ in speed by several orders of magnitude.
Moving data between primary and secondary storage can easily come to dominate the time taken to
solve a problem. If a problem is dominated by the time it takes to transfer data, we say it is data-
intensive. If not, we say it is computation-intensive. This course is solely about problems that are
data-intensive.
The dominant medium for storing persistent data is the magnetic disk. We therefore begin by looking
at this highly evolved piece of engineering and learning about its strengths and its weaknesses.

2.1 Extreme Engineering

With the lid off, a hard disk drive looks roughly like Fig.!2.1. It contains a stack of disks that rotate at
high speed. The disks are coated with magnetic material. A set of read-write heads can make
recordings on the disks, and play them back. The heads are mounted on a comb-like assembly whose
teeth fit between the disks. A motor rotates the assembly so that the heads move inwards or outwards
across the disks. Each position of the motor accesses a different set of tracks.

!!
Fig.!2.1: The important parts of a Hard Disk Drive
The (fictional) Phantom II drive specified in Table!2.1 has a stack of 8 disks. They are 3 in. (76 mm.)
in diameter. At 10,000 rpm (revolutions per minute), their rims travel at 140 kph. Due to centrifugal
force, points on their rims are subject to an acceleration over 4,000 times that due to gravity.
According to Table!2.1, the drive has 15 recording surfaces, so all but one of its 8 disks must be
coated with magnetic material on both sides. The remaining surface isn’t used to contain data, but the
specification doesn’t tell us what it is used for.
Data is read from the disks by 15 heads, one per surface. The heads don’t actually touch the disks;
they hover close to them on a thin cushion of air trapped by the speed of the disk. That is why the
disks have a mirror-like finish. Any small irregularity might cause a head to touch the disk, destroying
both the head and the disk surface.
At each position of the head assembly, each head can record (write) or play back (read) data on a
circular track. The 15 tracks at one head position make a stack of 15 circles. Taken together, they are
called a cylinder. The specification says that the drive has 10,000 cylinders, which means the read
heads can be moved to 10,000 different positions. Each of the 15 surfaces therefore has 10,000 tracks:
150,000 tracks altogether. The ‘track density’ specification tells us that these positions are only
0.0002!cm (about 0.00008!in) apart.

4
Know Your Enemy

Table!2.1: Specifications of the Phantom II Disk Drive

Formatted capacity 25 GB
Disks 8
Surfaces 15
Diameter 3 in (76mm)
Bytes/sector 512
Cylinders 10,000
Sectors/track 250–450 (11 zones)
Recording density 254,000 BPI (100,000 B/cm)
Track Density 12,700 TPI (5000/cm)
Interface Ultra2 SCSI
Seek time: Track to track 0.5 ms (read) 1 ms (write)
Average 6 ms (read) 7 ms (write)
Full 13 ms (read) 14 ms (write)
Transfer rate Media 23–40 MB/s
Host 50 MB/s
Buffer size 2048KB
Rotation 10,000 rpm
Average Latency 3 ms
Spin-up 30 s
Spin-down 30 s
Power: Idle 15W
Seek 20W
Spin-up 30W
Dimensions: Height 15/8 in. (42 mm)
Width 4 in (102 mm)
Depth 53/4 in (146 mm)
Weight 21/2 lb (5.5 kg)
Ambient temperature: Operating 40˚F–120˚F (5˚C–50˚C)
Non-operating -40˚F–140˚F (-40˚C–60˚C)
Max. gradient 27˚F/hour (15˚C/hour).
Reliability: MTBF 1,000,000 hours
Start/stop cycles 25,000
Seek errors 1 per 107 seeks
Hard read errors 1 per 1014 bits
Warranty 5 years (2 day turn-around)
Technology: Read head Magneto-resistive
Head position Rotary VCM with digital servo
Encoding EPR4ML

A track does not consist of one long recording, but many short ones. Each recording is called a
sector. A sector contains 512 bytes of data. The number of sectors per track varies between 250 and
450, an average of 350. Therefore, each track contains between 128,000 and 230,400!bytes,
179,200!bytes on average. Multiplying this by the 150,000 tracks gives a total storage capacity of
26,880 million bytes. Why then does the specification claim a capacity of only 25!GB?
To engineers, 1K (kilo) means 1 thousand, 1M (mega) means 1 million, and 1G (giga) means 1 billion
(thousand million). In the other direction, 1m (milli) means 1!thousandth, 1µ (micro) means

5
Know Your Enemy

1!millionth, and 1n (nano) means 1 billionth. Computer scientists tend to use a different scale based on
powers of 2, especially when they are referring to storage capacity. 210 equals 1,024, which is
reasonably close to 1,000 or 1K. Computer scientists make 1K=1,024, 1M=1,048,576, and
1G=1,073,741,824. Unfortunately, they aren’t very consistent about this, so we sometimes need to
check. Here, the stated storage capacity of 25GB must be in computer science units.
The heads move quickly. On average, they take 6 ms (6 thousandths of a second) to move between
tracks and be ready to read data. In Table!2.1, this is called the ‘average seek time’. During an average
seek the heads are accelerated and decelerated with over 100 times the force of gravity.
Seek time varies. The further the heads have to move, the bigger it is. A ‘full seek’ is a movement
between the outermost and innermost tracks — the worst case. A track-to-track seek is a movement
between two adjacent tracks — the best case. An average seek is 1/3 of a full seek, not 1/2 as you might
expect. That is because if both the start and finish positions are chosen at random, the average distance
between them is 1/3 of the maximum. Notice that all the seek times are slightly longer for writing than
for reading. The head can start to read a track before it has even had time to stop moving, but when it
writes a track, it needs to be exactly in position.
When a sector has to be read from disk, four things have to happen:
1 The correct head has to be selected to read the required surface. This happens electronically, and
takes next to no time.
2 The heads have to move to the correct cylinder. As we have seen, this takes an average of 6!ms.
3 The correct sector has to arrive at the read head. Since the disks rotate at 10,000!rpm, one full
rotation also takes 6!ms. The required sector may have just gone past the head, in which case it
will pass it again in 6!ms, or it may be just going to pass it. On average, it will take 3!ms for the
sector to reach the head. In Table!2.1, this is called ‘average latency’.
4 To be read, the sector must pass under the read head. With 250 sectors or more per track, the
transfer time will be at most 0.024!ms.
The total average access time is therefore 9.024!ms. Because the time to read one sector is so short, it
can pay to read several sectors at a time. 10 sectors (5K bytes) can be read in 9.24!ms, and a whole
track (125K or more) can be read in 15!ms.
Writing to a sector is similar, but the seek time is a little longer, 7!ms on average. In addition, an
option called verification is often used, in which a sector is first written, then read back to ensure it was
written correctly. Since a sector can’t be read until the next rotation after it is written, this adds another
6!ms to the total, making a little over 15!ms altogether.
If you made a video recording of a disk drive and slowed it down to see the heads moving, you would
be disappointed. The heads can move between tracks 3 times between two video half-frames.

2.2 Serial Access and Random Access

How fast can we transfer data from a disk into main memory? Referring to Table!2.1, we can see that
if the computer reads a complete track at a time, it will receive at least 250 sectors during one rotation
of the disk: 125KB in 6ms That is a transfer rate of at least 21.3MB/s (million bytes per second).
Reading from the outer tracks is faster: 450 sectors are read in the same time, a rate of 38.4MB/s.
It is impossible to synchronise the transfer of data from the disk directly with its transfer to the central
processor, so the drive has an internal buffer capable of holding 2MB (‘buffer size’ in Table!2.1).
Transfers between this buffer and the processor take place at 50MB/s (‘host transfer rate’).
The operating system on my personal computer occupies about 16MB of RAM, which, at this rate,
could be transferred in about one second. So why does it take several minutes for my computer to start
up? My drive (like the Phantom!II) takes only 30 seconds to spin up. However, once my computer
starts loading its operating system, I can hear frantic activity from its hard drive. It is not loading
memory at top speed. Instead, it is collecting parts of the operating system from all over the disk.
How long would it take to load 16MB of RAM if the computer could only read one sector at a time?
One megabyte takes up 2048 sectors, so loading RAM would take about 32,000 reads. Since reading
one sector takes about 10ms (1/100th of a second), the whole job should take 320 seconds, or just over
5 minutes. This is very close to the time it actually takes.

6
Know Your Enemy

When we read or write successive sectors, we are using sequential access. When we read or write
sectors scattered about the disk, we are using random access. Other things being equal, sequential
access is faster, but both techniques have their proper places, which we shall discuss in detail later.
When we use the word ‘random’, we don’t mean that the sector the computer reads is left to chance,
we mean that the program can choose which one to read at random: ie., without constraint.

2.3 Why Input-Output Remains A Problem

Basically similar disk drives have been around since the 1960s. How have things changed since then?
First, computers are about 10,000 times faster now than they were in the 1960s. The storage capacity
of their internal stores (RAM) has grown by about 10,000 times. The capacity of their disk stores has
increased by about 10,000 times. Disk drives weigh about 100 times less, take up 1/100th of the space,
consume 1/100th the power and cost 1/100th the price. These gains are impressive. However, over the
same period the time taken to read a sector from disk has decreased by a factor of only 10.
This is both good news and bad news. Back in the 1960s, computers had so little RAM that it was
hard to fit much data into main memory, and data had to be fetched piecemeal from disk. Now we
need to use about 10,000 times as much data before this becomes a problem. Back then, reading a
sector from disk might take as long as a 1,000 clock cycles, but now it takes over a 1,000,000. If a
modern computer has to read from disk more often than once every million clock cycles, the speed of
its disks become a bottleneck. This puts a premium on making every read or write count to the full.
Let us assume pessimistically that data is moved one sector at a time. Back in the 1960s, when disk
transfers took about 100ms per sector, it took about 6 seconds to fill a 32KB RAM. Now, when disk
transfers are ten times faster but memories are a thousand times bigger, with random access, it takes
over 10 minutes to fill a 32MB RAM. So although we need to use a lot of data before the slowness of
the disk becomes a problem, it is at least a hundred times the problem it used to be.
Can we process more data by simply adding more RAM? In a way, yes, but this is an expensive
option. Faster storage always costs more than slower storage, and although this isn’t quite a law of
nature, the situation is not likely to change. However much RAM we have, some set of data will prove
too big for it. For example, some internet search engines have RAM that is measured in gigabytes, but
the files they store on disk are measured in terabytes (thousands of gigabytes).

2.4 Disk Technology

We can glean some more information about the Phantom II.
What does it mean to say that its ‘media transfer rate’ is 23–40MB/s? Obviously, 23MB/s refers to
the innermost tracks with only 250 sectors, and 40MB/s to the outermost tracks with 450 sectors.
Didn’t we already calculate that the transfer rate for 250 sectors is 21.3MB/s? How can the stated rate
be faster than our estimate?
There are two reasons: The first is that the sectors have gaps between them where nothing is record-
ed. The gaps are needed because it is impossible to record a sector in exactly the right spot. Without
the gaps, recording a sector might over-write the end of the sector before it or the start of the one after
it. The gaps mean data must be recorded more densely than we thought, so it comes off the disk faster.
Because of record gaps, a disk drive can never write less than one complete sector at a time.
The second reason is that, along with the data we are interested in, the disk also records redundant
information. This extra information detects whether a sector has become corrupted. It can often
correct errors too.
To see how this works, consider recording just one bit of information. Because the hardware is
working close to the limits of technology, a bit recorded as a 0 can sometimes be read as 1, or vice
versa, and we would not know an error had occurred. But suppose we record each bit twice, so that 0
becomes 00 and 1 becomes 11. If we read back 01 or 10, we know that the data has been corrupted.
Extending this idea, suppose we record each bit three times, so that 0 becomes 000 and 1 becomes
111. Then 001, 010 and 100 are probably corrupted versions of 000, and 110, 101 and 011 are
probably corrupted versions of 111. We can therefore correct any error of one bit. (Although a 2-bit
error will be ‘corrected’ wrongly.)
At first sight this scheme does not seem promising because it triples the amount of data we have to
record. But this is not true in general. Three bits give rise to 8 possible patterns. Each pattern has
7
Know Your Enemy

three variants that differ from it by 1 bit. Therefore, we can only distinguish two (8÷4) patterns safely.
Suppose instead we have a stream of 127 bits. These can form 2127 patterns, each having 127 1-bit
variants. We can therefore distinguish 2120 (2127÷128) patterns. In this case the ability to recover from
1-bit errors increases the amount of data stored by only 6%. In general, the number of bits needed for
error correction grows only with the logarithm of the number of bits of data. Using this simple
scheme, only three extra bytes would be needed to correct all possible 1-bit errors in a 512-byte sector.
Practical schemes are much more sophisticated than this, but the general idea is the same: a few extra
bytes (called a ‘cyclic check sum’) can be used to detect, and often correct, corrupted data. When an
error is detected, the disk drive will typically attempt to read the sector again, in the hope that the data
was written correctly but read wrongly. Errors in reading are called ‘soft’ read errors. Errors in
writing are called ‘hard’ read errors.
Because of error checking, a disk drive can never read less than one complete sector at a time.
Although the assembly that moves them is large, the read heads themselves are minuscule. The
‘recording density’ is 100,000 bits/cm. (There is always some doubt whether ‘B’ stands for ‘bytes’ or
‘bits’. Here, ‘bytes’ would be inconsistent with the media transfer rate.) This means that one bit is
recorded in a length of only 100nm (billionths of a meter). For comparison, the wavelength of violet
light is 410nm, and the wavelength of red light is 770nm. We conclude that the head is probably
manufactured using ultra-violet photolithography, similar to how VLSI computer chips are made.
Presumably, future advances in chip manufacture will continue to be paralleled by equal
improvements in recording density, so that disk drives will always keep pace with RAM.
The width of one bit is effectively the distance between two tracks. The ‘track density’ is 5,000/cm,
which makes one bit 2000nm wide—only 4 wavelengths of light.

2.5 Operating System Considerations

When an application program reads or writes to a disk, it does not do it directly. It cannot specify that
it wants to write a particular sector of a particular track on a particular surface. Only the operating
system can do this. A program can only ask to read or write the Nth sector of a file.
A file is a collection of data that is recognised by the operating system. The operating system cata-
logues information about files in its directory structure. Usually the structure is hierarchical, so that
directories can contain other directories, as required. Files occupy the lowest level of this hierarchy.
Among other things, the directory containing a file stores its name, and where it is stored on disk.
Since most files consist of more than one sector, the directory entry for the file specifies a list of sector
addresses. As a file grows, the operating system allocates sectors to it from a pool of unused sectors.
When the file is deleted, the sectors it owns are returned to the pool. If a sector proves unreadable, the
operating system removes it from circulation.
There is a problem here. The Phantom II has 52,500,000 sectors. To address a cylinder needs 15
bits, to address a surface needs 4 bits, and to address a sector within a track needs 9 bits. To address
any particular sector would require 28 bits altogether.
A 25MB file would occupy 50,625 sectors. With 28 bits per sector, the list of sectors in this file’s
directory entry would itself occupy over 173Kbytes (346 sectors). This is wasteful. Consequently,
operating systems usually allocate storage to files several sectors at a time, where the sectors all lie on
the same track or cylinder. The operating system then only has to record the address of the first sector,
and the number of sectors. This not only saves space, but also reduces time when the file is accessed.
Different operating systems choose how many sectors to allocate in different ways. Some use
estimates given by the program. Some assume that the existing length of the file gives a good guide.
Others use a fixed segments. All run the danger that they will allocate space that doesn’t get used.
When a disk has just been formatted and is empty of information, storage can be allocated to files
systematically. A file can be written to successive segments, which form a single extent on disk. Its
directory entry then needs only contain the starting point and length of this extent.
Two things spoil this perfect state: The first is that if a program is writing more than one file at a
time, or more than one program is writing files, successive sectors will be allocated to different files,
so that each directory entry must track many short extents. The second is that when a disk has been
used for a long time, the operating system recycles storage released by deleted files. It can no longer

8
Know Your Enemy

allocate storage as it would like to, but must take storage where it can find it. As a result, instead of
files occupying a few large extents, they become fragmented into many small ones.
To use a file, a Cobol program must first open it, telling the operating system its name and directory
path. If the file already exists, the operating system then locates and reads its directory entry. If the file
is a new one, the operating system can create a directory entry for it. From then on, the program reads
or writes records. A record contains items relating to one object, such as a customer or product. Since
hardware reads or writes sectors, the Cobol run-time system communicates with the operating system
in terms of blocks of one or more sectors. When a program reads the first record of a file, the operat-
ing system will retrieve its first block. After that, the program may be able to read several records from
the block before the next block has to be fetched. Likewise, a program may write several records
before the operating system writes a block. When a program has finished with a file, it should close it,
to ensure that the operating system promptly writes its updated directory entry back to disk.
The following diagram summarises the terms used by Cobol, hardware and the operating system:

A Cobol file consists of many blocks, each containing several records. Records contain items, which
consist of one or more characters. A disk is divided into many cylinders, each of which contains
several tracks, one per surface. Tracks contain many sectors: short recordings, often of 512 bytes. A
Cobol block consists of one or more hardware sectors, and a Cobol character occupies one byte.
There is no particular relationship between records and sectors; sectors can contain several short
records, but long records can span several sectors. The operating system allocates space to files in
units called segments, which consist of at least one sector. A contiguous series of segments allocated
to a file is called an extent. Operating system files and Cobol files are usually the same thing.
If one program is reading one file sequentially, we may expect that very little time is wasted in seeks
(movements of the read heads). But if two programs are reading files sequentially, or one program is
reading two files sequentially, then the disk drive must constantly move the heads from one file to the
other and back again. The operating system can ease this problem by transferring several sectors of the
file at a time, which takes only a little longer than transferring one. A group of sectors read or written
in this way is called a block. Some operating systems choose block sizes dynamically according to the
patterns of accesses that occur. Others rely on the program to specify how many sectors should be
transferred as a block.
When two or more client programs make read or write requests at the same time, the server must
queue them. The response time experienced by an individual program then depends on how busy the
disk is. It is determined by the server’s service ratio, or load factor: the ratio of its actual load to its
potential throughput. We may estimate average response time using simple queuing theory: Suppose
a disk is busy 80% of the time, and idle 20% of the time. To any given program, the effect is as if the
disk had only 20% of its potential performance, so that access time is increased by a factor of 5. If the
disk is busy 95% and idle 5% of the time, the response time is increased by a factor of 20. At 100%
load, there is no idle time, and the average response time becomes infinite. (The argument may seem
simplistic, but the results are correct.) Anyone who has used a network file server is aware that as the
number of the network’s clients increases, its response time slows dramatically. Once the number of
clients reaches a critical level, the file system virtually stops.
An operating system or file server does not necessarily serve requests in the order they are made. In
the widely used ‘elevator algorithm’, the disk heads are swept alternately inwards and outwards from
track to track, like an elevator (lift) going up and down from floor to floor. In this way, requests are
served in an order that minimises seek time, rather than first-come, first-served order. The more
requests are in the queue, the smaller is the average seek distance. The Phantom II has a track-to-track
9
Know Your Enemy

seek time of 0.5ms. Therefore, in the limit, the elevator algorithm could reduce its average access time
from 9ms to 3.5ms — nearly tripling its potential throughput.
Finally, an operating system usually sets aside part of RAM as a disk cache. A disk cache is
essentially a large buffer that is shared by all files. Sectors read from disk remain in the cache until
room is needed for other sectors. The sector that is purged is usually determined by the LRU (least
recently used) algorithm. A consequence of this is that a small file may fit entirely within the cache, so
once a sector has been read, future reads from it are virtually free of delay.
What happens when a sector is written? One policy is to update the cache but to delay writing the
change to the disk until the sector is purged. In this way, several writes to the cache may result in only
one write to the disk. But this is a dangerous game. If the power fails, there may be insufficient time
to record the contents of the volatile cache on disk. A safer but slower alternative is to write sectors to
disk every time they are changed.

2.6 Other Media

There are several alternatives to magnetic fixed-media disk drives such as the Phantom II:
The most closely related are hard drives with removable media. These are similar in construction to
fixed-media disk drives, but the disks are in a removable cartridge. Because it is impossible to obtain
such close mechanical tolerances as in a fixed-disk drive, removable media drives are doomed to have
lower storage capacities. However, because the disks can be removed, their off-line storage capacity is
unlimited. They are excellent for back up, archiving little used material, or exchanging large files
between computer systems.
Floppy disk drives are similar, but their heads are in physical contact with a single flexible disk. This
wears both the heads and the disks, and severely limits the recording density. Floppy disks are mainly
used for exchanging small files between computers, and are fast becoming obsolete.
Data can also be recorded on magnetic tape. Early tape drives used linear multi-track recordings,
similar to an audio cassette. Modern drives use a rotating head, similarly to a video cassette recorder.
They have storage capacities measured in gigabytes. Although it is possible to read or write tapes
sequentially, random access is very slow. (Think in terms of videotape.) As a result, they are used
almost exclusively for back up storage.
Data can be stored optically rather than magnetically. Optical CD-ROM drives store information as
pits in a light reflecting disk. The pits are made mechanically by a press. As the disk surface rotates
past a photo-detector, it detects the pits by the changes they cause to the intensity of the light they
reflect from a small laser. Like the heads on a magnetic drive, the laser and photo-detector are moved
from track to track by a servo motor. Another servo keeps them correctly focussed on the surface.
CDs (compact disks) are slower than magnetic disk. An audio CD (1¥) is played at a transfer rate of
about 150KB/s, so a 32¥ drive has a rate of about 4.8MB/s. Because CDs are never exactly round or
flat, the speed of CD drives is limited by the ability of their servo-motors to keep the read head in
focus over the correct track.
The bit density of any optical device is limited by the wavelength of the light it uses. CDs use red
laser light, so their bit density is lower than the Phantom II’s. CD-ROMs store about 650MB. CDs are
mainly used for distributing software. Most CD drives are read-only, hence the term ‘CD-ROM’
(read-only memory). Drives that can write CDs (‘CD-burners’) are more expensive. They write by
changing the colour of a light-sensitive dye. CD-burners are mainly used for the same purposes as
removable-media drives.
Decreasing the wavelength of the light improves recording density. DVD drives use blue light
instead of red, so they can store 4.5GB per surface. DVDs can have more than one recording surface.
A 1¥ DVD drive has a 1.26MB/s transfer rate. A third generation optical drive may eventually appear
that uses violet or near ultra-violet light, but this might be the end of the line. Far ultra-violet light
causes chemical damage.
There is a class of magneto-optical drives that write to a disk magnetically, but the magnetic medium
can only be magnetised where it is heated. The heat is concentrated in a very small area by a laser
beam. The disks are usually removable and the drives are used for back up or archiving.
All these devices use rotating disks and moving heads. It is these mechanical components that limit
their speed. Will they be replaced by stores that have no moving parts? Perhaps not. Stores that have

10
Know Your Enemy

no moving parts need means of being accessed, so each byte needs to be associated with a reasonably
complex physical structure. Having a small number of read heads that can move relative to a medium
may always prove cheaper. Likewise, the recording density on a uniform medium may always prove
greater than on a structured one. One day, perhaps the moving parts will be manufactured using
nanotechnology, so that even the largest stores will be very fast. Then, for most of us, efficient use of
these stores will no longer be an issue. But however fast they become, it will always be possible to
make even faster stores at somewhat greater cost. In other words, the idea of a storage hierarchy
(registers, RAM, secondary storage and back up) will always be with us. Somewhere, someone will
still have too much data and not enough time.

2.7 What’s Next?

We have seen that large sets of data need to be stored on disk. We have learnt that disks are fast in
human terms, but slow in computer terms. When a problem is data-intensive, we need to plan access
to the disk to minimise the time it takes.
In what follows, we shall concentrate on information systems. These are systems that store
information about real-world objects of interest to an organisation. For example, a business might
store information about its customers and the products it sells — such as how much the customers owe,
and how many items it has in stock. This collection of information is the business’s database.
When something happens to these objects, a transaction is entered into the information system to
update the database. From time to time, people can make queries to extract information about the
current state of the database. An information system therefore serves to record and model the real
world. It is useful because it is usually much quicker and cheaper to ask questions about the computer
model than to ask them in the real world.
Information systems use two main types of files: indexed and sequential. An indexed file has records
that are uniquely identified by a primary key; such as an account number, or a product code number.
It is possible to access quickly any record whose primary key is known, so that the records of an
indexed file can be processed in any order (random access). Indexed files can have more than one
index, so that products might also be accessed by their descriptions. Sequential files are much more
restricted. It is possible to read a sequential file only by starting at its first record, then reading its sec-
ond, its third, and so on (sequential access).
As a rule, indexed files are used to store information about the states of a set of entities (eg., the
numbers of products in stock). Sequential files are used to store the transactions that cause states to
change (eg., sales of products). The records of sequential files are often stored in order by time: the
order in which the transactions were recorded and should be acted on. For example, a business
database might contain an indexed file to represent products, an indexed file to represent customers,
and a sequential file to represent the sales of products to customers.
The word ‘database’ is also used in a more technical sense, being a set of data controlled by a
database management system (DBMS). In this case, the operating system may see the whole database
as a single file. The sets of data within the database are then referred to as ‘tables’.
Although they are common, information systems are not the only kind of data-intensive applications
of computers. However, the same basic principles and algorithms are applicable to many problems.
A key element in understanding Cobol files or SQL tables is to realise that they express relationships
between things. Well designed databases express these relationships in the form of relations.
Relations are mathematical objects that have a useful algebra. Selection and projection operations
allow us to extract chosen information. Join operations allow us to combine information from
different relations in useful ways. Relations are special kinds of sets, so we can also combine them
using set operations such as union, intersection and difference. We shall study Cobol algorithms for
these operations, and see how they are used in SQL. We shall also look at some other operations that
Cobol and SQL have in common, for example, the ability to summarise statistical information such as
averages and totals.

11
Cobol Basics

3 Cobol Basics
To illustrate files and databases, we shall use an example of a wholesale distribution operation:
Serv-U-Rite buys goods in bulk from suppliers and sells them in smaller numbers to customers, who
are typically retail stores. Orders are made through the postal system or by telephone. Serv-U-Rite’s
Cobol database consists of three master files: Suppliers, Customers, and Products. The master files
store information about the status of long-lived objects. The other files are transaction files, which
record business activities, such as a sale or a customer payment.
The case study may seem complex at first. Bear two things in mind: it has to be complicated enough
to illustrate a lot of different points, and real systems are far more complicated than this. We are going
to ignore discounts, taxes, commissions, and a host of other real-life complications.

3.1 The Master File Records

The Suppliers file stores one record for each supplier. Each record has a unique ‘Account’, which is a
code that identifies it. Serv-U-Rite need these records to order products from suppliers, so each one
needs to contain an ‘Address’. The addresses consist of 3 lines, arbitrarily called ‘Name’, ‘Street’ and
‘Suburb’. The records also store the ‘Balance’ of any amounts that Serv-U-Rite owes the supplier for
goods that have been delivered but not yet paid for. Occasionally, due to over payment or through
returning faulty goods, this amount can be negative. The layout of a supplier record is shown below.
1 Supplier
2 Account Address Balance
3 Name Street Suburb
The numbers on the left indicate levels of structure. A (level 1) Supplier record consists of the
Account, the Address and the Balance, all at level 2. The (level 2) Address consists of the Name,
Street and Suburb, all at level 3.
To describe this record in Cobol, we need to decide how many characters each item contains. We
choose 30 characters each for the Name, Street and Suburb, to fit nicely in the window of an envelope.
The Balance must allow for 6 figures of dollars and 2 of cents, so the maximum value that can be
represented is $999,999.99. This is more than Serv-U-Rite ever expect to owe a supplier, even
allowing for future inflation. The Account codes consist of a letter followed by 3 digits, eg., ‘A123’.
The Cobol description of a Supplier record follows.
01 Supplier.
02 Account pic a999.
02 Address.
03 Name pic x(30).
03 Street pic x(30).
03 Suburb pic x(30).
02 Balance pic s9(6)v99.
Each item is preceded by a level number: 01, 02 or 03. We could have written 1, 2 and 3 instead, but
writing 01, etc., is a tradition among Cobol programmers, and we don’t want to look odd! It isn’t
necessary to number consecutively, either. Many programmers prefer to write 01, 05, 10, and so on.
The second point is that each item has a name: ‘Supplier’, ‘Account’, etc. The third is that the
elementary items — the ones that are not further subdivided — are described by ‘pictures’. The Name,
Street and Suburb have pictures (‘pic’) of x(30). The letter ‘x’ stands for ‘any character’ We could
have written ‘pic xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx’, but ‘pic x(30)’ is easier. Account has ‘pic
a999’. The letter ‘a’ means ‘a letter or space’, and ‘9’ means a digit, so this describes an account code.
Unfortunately, it is only a comment, because Cobol treats the mixture of a’s and 9’s as ‘xxxx’.
The picture for Balance needs the most explanation. The point is that we don’t need to store dollar
signs, commas, or even decimal points on a file; they would merely be a waste of space. So instead of
‘$123,456.78’, we store ‘12345678’: in other words, the integer number of cents. Provided we tell
Cobol where the decimal point is supposed to be, it will pretend that it is there. This gives us a picture
of ‘999999v99’, where the ‘v’ indicates the missing decimal point.
The second complication is the + or – sign. In the ASCII character set, the digits ‘0’–‘9’ are
represented by codes 48–57. In an 8-bit byte, this means that the first 2 bits are always zeros. By
12
Cobol Basics

default, Cobol systems typically store signs in these two bits. This changes the ASCII code to one that
represents some other character, so ‘+$123,456.78’ may well be stored as ‘1234567H’ — if the ASCII
character set is used. The sign is represented by an ‘s’. Because the sign is packed into unused bits,
and the decimal point is implicit, Balance occupies 8 bytes.
Cobol does not store numbers in binary form by default, although it is possible to over-ride this. The
reason is that Cobol programs don’t usually do much arithmetic compared with text input and output.
Storing numbers in decimal saves converting numbers between binary and decimal notation. Many
computers efficiently support packed-decimal arithmetic. Packed-decimal notation packs two decimal
digits per byte, and converting between packed decimal and ASCII is simple and fast.
Group items — those made up of elementary items, such as Address and Supplier — don’t need
pictures. To a good approximation, they are regarded simply as character strings whose length is the
sum of the lengths of the items they contain. Thus Address effectively has ‘pic x(90)’ and Supplier
effectively has ‘pic x(102)’.
Serv-U-Rite’s Customers file is used to keep track of how much customers owe. It must therefore
contain similar information to the Suppliers file. However, although Serv-U-Rite are willing to owe
large amounts of money to their suppliers, they don’t like their customers to owe them too much.
Accordingly, each customer is set a maximum amount that they may owe, called ‘Credit-Limit’. This
is always a positive multiple of $1,000. Serv-U-Rite also keep track of how much credit a customer
has left, called ‘Available-Credit’. If a customer orders goods that would make ‘Available-Credit’
become negative, the order is rejected.
It might seem that Available-Credit could be calculated as the difference between Credit-Limit and
Balance, but it’s not that simple. If a customer orders goods that are not in stock, Serv-U-Rite auto-
matically create a ‘back order’, a reminder to supply the goods when they become available. These
goods will eventually have to be paid for, so they reduce the Available-Credit, but since the customer
hasn’t received them, they don’t count towards the Balance. We have the following equation:
Available-Credit = Credit-Limit – Balance – Back-Orders
The Cobol record description of a customer record is as follows.
01 Customer.
02 Account pic a999.
02 Address.
03 Name pic x(30).
03 Street pic x(30).
03 Suburb pic x(30).
02 Balance pic s9(6)v99.
02 Credit-Limit pic 999ppp.
02 Available-Credit pic s9(6)v99.
Credit-Limit is never negative, so it does not need a sign, nor does it need a decimal point. Since its
last 3 digits are always zeros, they don’t need to be stored either. The letter ‘P’ indicates an implicit
zero. Credit-Limit therefore occupies only 3 bytes. Available-Credit is like Balance. It uses 8 bytes.
The group item ‘Address’ occupies 90 bytes. The level-1 ‘Customer’ record occupies 113 bytes.
Finally, we need to describe the Product records. Here is the Cobol description.
01 Product.
02 Item-No pic x(6).
02 Description pic x(40).
02 Supplier pic a999.
02 Stock pic 9999.
02 On-Order pic 9999.
02 Reorder-Level pic 9999.
02 Reorder-Qty pic 9999.
02 Price pic 9999v99.
02 Valuation pic 9(6)v99.
Each product is uniquely identified by a 6-character ‘Item-No’, such as ‘ACLOTP’. It has a 40-char-
acter description, such as ‘Alcatel One Touch Phone’. ‘Supplier’ identifies the supplier who currently
sells the product to Serv-U-Rite. It has the same format as the ‘Account’ in the Suppliers file. Serv-U-
Rite’s computer system is intended to reorder products when their stocks become low. The number of
items in stock is given by ‘Stock’. ‘On-Order’ records the number of items already on order from the
supplier, but not yet delivered. If the sum of these two does not exceed ‘Reorder-Level’, a new order
13
Cobol Basics

will be created to purchase ‘Reorder-Qty’ items. ‘Price’ indicates the unit price (charge per item) of
the product to customers. Since the cost of items as determined by the supplier varies, it is pointless to
store it. Instead, each product record stores a ‘Valuation’, which measures the total cost of all the
items in stock. When goods are added to stock, Valuation is increased by their true cost; when they are
sold, Valuation is decreased pro rata, ie., by the average cost per item.
Adding up the lengths of the elementary items, we see that each Product record occupies 80 bytes.
If we assume that records are stored within 512-byte blocks, the Suppliers file stores 5 records per
block, the Customers file stores 4 records per block, and the Products file stores 6 records per block.

3.2 The Master Files

Cobol supports three kinds of file organi zation. Please note the spelling! They are sequential,
indexed and relative. Relative files have rather specialised uses, and won’t be discussed until later.
Sequential organization is straightforward. Records are stored in the order they are written.
Sequential organization allows records to be read one by one from the beginning of the file to the end.
The only way to read the 100th record of a sequential file is to read 99 other records first. Sequential
files are typically used to record transactions, and the transactions are typically stored in the order they
occurred. A transaction file therefore records a history of what happened: First customer A123 ordered
6 ACLOPT products, then customer B007 paid $50, then supplier S004 delivered 20 KEN3CD
products costing $2,400, and so on.
Sequential organization is not usually suitable for master files, as, given an Account code or Item-No,
we often need to find the particular supplier, customer or product record quickly. Indexed files provide
this feature. The records of indexed files are identified by a primary key: an Account or Item-No. No
two records in the same file can have the same value of the primary key.
An indexed file has, as its name suggests, an index. An index is a table, usually stored in main
memory while the file is in use. It translates keys into disk locations, so that the block containing the
record with any given key can be accessed in one hit, without having to read any other blocks first.
Less obvious from their name, indexed files can also be read sequentially, in which case the records
are read in primary key order, eg., starting with the lowest numbered Account, then the next lowest,
and so on. This happens irrespective of which order the records were written to the file.
Here is how we might describe the master files in Cobol:
select Products assign to "newprod.ndx"
organization is indexed
record key is Item-No of Products
access is sequential.
select Suppliers assign to "newsupp.ndx"
organization is indexed
record key is Account of Suppliers
access is random.
select Customers assign to "newcust.ndx"
organization is indexed
record key is Account of Customers
access is random.
Each entry has the same structure: The name following select is the name by which the file is known
within the Cobol program. The name following assign to is the name by which it is known to the
operating system. The two names are different for several reasons. The most important is that the
exact syntax of what follows assign to depends on the operating system and the Cobol compiler. In
these examples, we assume that what follows are Unix or DOS path names. Other possibilities include
operating system environment variables, or Cobol working variables to which the actual path names
can be assigned. We have chosen to use the extension ‘.ndx’ for indexed files. This is just to remind
us what kind of file we have. It is not a Unix or DOS requirement.
The ‘organization’ clause should now be self-explanatory, as should the ‘record key’ clause. The
‘access mode’ clause indicates how the file is to be used in a particular program. Sequential mode
means that the Products file will be accessed in ascending order of Item-No. Random mode means that
the Suppliers and Customers files will be accessed by specifying arbitrary Account codes, at random.
Please note a convention used in all the examples. Records will be given names in the singular, such
as ‘Supplier’; files will be given names in the plural, such as ‘Suppliers’.

14
Cobol Basics

3.3 An Example Program

Suppose we want to make a back-up copy of the Suppliers file. We could do this using the operating
system’s copy command (‘cp’ in Unix), but there are advantages to using a Cobol program:
000010 IDENTIFICATION DIVISION.
000020 Program-ID. copysupp.
000030
000040* Copies the supplier file.
000050* Written on 20/11/00.
000055* Balance added to Supplier file 04/02/01
000060
000100 ENVIRONMENT DIVISION.
000110 Input-Output Section.
000120 File-Control.
000130 select Suppliers assign to "newsupp.ndx"
000140 organization is indexed,
000150 record key is Account of Suppliers
000160 access is sequential.
000170 select Saved-Suppliers assign to "oldsupp.ndx"
000180 organization is indexed,
000190 record key is Account of Saved-Suppliers
000200 access is sequential.
000210
000300 DATA DIVISION.
000310 File Section.
000320 FD Suppliers.
000330 01 Supplier.
000340 02 Account pic a999.
000350 02 Address.
000360 03 Name pic x(30).
000370 03 Street pic x(30).
000380 03 Suburb pic x(30).
000385 02 Balance pic s9(6)v99. 04/02/01
000390 FD Saved-Suppliers.
000400 01 Supplier.
000410 02 Account pic a999.
000420 02 Address.
000430 03 Name pic x(30).
000440 03 Street pic x(30).
000450 03 Suburb pic x(30).
000455 02 Balance pic s9(6)v99. 04/02/01
000460
000500 PROCEDURE DIVISION.
000510 Process-All-Suppliers.
000520 open input Suppliers, output Saved-Suppliers
000530 perform Get-Next-Supplier
000540 perform until Account of Suppliers = high-values
000550 perform Copy-One-Supplier
000560 perform Get-Next-Supplier
000570 end-perform
000580 close Suppliers, Saved-Suppliers
000590 stop run.
000600
001000 Get-Next-Supplier.
001010 read Suppliers next record,
001020 at end
001030 move high-values to Account of Suppliers
001040 end-read.
001050
002000 Copy-One-Supplier.
002010 move Supplier of Suppliers to Supplier of Saved-Suppliers
002020 write Supplier of Saved-Suppliers
002030 invalid key
002040 stop run
002050 end-write.
There is a lot to absorb here, so let’s take it bit by bit, from the top.

15
Cobol Basics

All but the most trivial Cobol programs consist of the same four divisions: identification,
environment, data, and procedure.
• The identification division, as the name suggests, identifies the program.
• The environment division links the program to its environment: the operating system. This is
where we expect to find all the operating system dependent features of the program. If we ported
the program to a different operating system, we might expect to make a few changes here, but the
rest of it shouldn’t need to be touched.
• The data division describes all the data used by the program.
• The procedure division defines the executable instructions the computer should follow.
3.3.1 The Identification Division
Here is the identification division again.
000010 IDENTIFICATION DIVISION.
000020 Program-ID. copysupp.
000030
000040* Copies the supplier file.
000050* Written on 20/11/00.
000055* Balance added to Supplier file 04/02/01
First, let’s explain the numbers on the left. Once upon a time, they identified the lines of the program
for editing purposes. However, modern program editors have no use for line numbers, so this is our
last example that will show them. However, even if we don’t use them, we have to leave 6 blank
spaces in their place. Cobol assumes that lines consist of 80 columns, used as follows:
1–6 Line number
7 Comment indicator
8–11 Area A
12–72 Area B
73–80 Program identification
Column 7 is also normally blank, but if it contains an asterisk, the whole line following is a comment
and is ignored by the compiler.
Columns 73 onwards are always ignored. Some programmers like to write the date of any revisions
they make to the program there, so that the history of the program can be traced. Again, modern
revision control systems make this unnecessary. However, it is important to remember it, because if
any text strays beyond column 72 it is totally ignored.
Cobol also requires a minimum standard of program layout. Specifically, headings or Area-A entries,
have to begin in columns 8–11, and other statements (Area-B entries) have to begin in column 12
onward. Any additional attention to layout is nice to have, but Cobol doesn’t demand it.
Line 000010 is the identification division heading, and must begin in Area A. So must the
paragraph heading on Line 000020. The program-id paragraph gives the program a name. Most
Cobol compilers expect this name to agree with the name of the program file. For example, the
program file containing the ‘copysupp’ program would typically be named ‘copysupp.cbl’ — where
the ‘.cbl’ extension tells the Cobol compiler that it is a program text file.
In the early days of Cobol, the identification division was more complicated. There were standard
ways of noting who wrote the program, when and where it was written, and so on. Today, this
information is written as comments.
3.3.2 The Environment Division
The environment division is where we link the program to the operating system environment.
Because files are visible to both the Cobol program and the operating system, they are usually an
important element of the environment division. If we are using files at all, the environment division,
input-output section, and file-control paragraph headings are all required. These are then followed
here by two select statements, one for the file to be copied (‘Suppliers’), and one for the copy to be
created (‘Saved-Suppliers’). We have already discussed the other features of these statements.

16
Cobol Basics
000100 ENVIRONMENT DIVISION.
000110 Input-Output Section.
000120 File-Control.
000130 select Suppliers assign to "newsupp.ndx"
000140 organization is indexed,
000150 record key is Account of Suppliers
000160 access is sequential.
000170 select Saved-Suppliers assign to "oldsupp.ndx"
000180 organization is indexed,
000190 record key is Account of Saved-Suppliers
000200 access is sequential.
3.3.3 The Data Division
If a program uses any variables at all, the data division heading is required, and if it uses any files, so
is the file section heading. Each file then needs to be described by an FD (File Definition) entry. This
consists of the name of the file, followed by the record descriptions of the records it contains. We have
already discussed the description of the Suppliers file. Since both files have the same layout, their
record descriptions are identical.
000300 DATA DIVISION.
000310 File Section.
000320 FD Suppliers.
000330 01 Supplier.
000340 02 Account pic a999.
000350 02 Address.
000360 03 Name pic x(30).
000370 03 Street pic x(30).
000380 03 Suburb pic x(30).
000385 02 Balance pic s9(6)v99. 04/02/01
000390 FD Saved-Suppliers.
000400 01 Supplier.
000410 02 Account pic a999.
000420 02 Address.
000430 03 Name pic x(30).
000440 03 Street pic x(30).
000450 03 Suburb pic x(30).
000455 02 Balance pic s9(6)v99. 04/02/01
3.3.4 The Procedure Division
The Cobol procedure division specifies the program logic. It is usually divided into a number of
short procedures. This is an unfortunate name for them, because Cobol procedures do not correspond
to what are called procedures in other languages; they don’t have parameters. Cobol calls
parameterised procedures ‘programs’. Thus, it is possible to nest one Cobol program inside another,
or link to a library program. Cobol procedures are best thought of as refinements. We sketch the
outline of the program, then fill in the details later as refinements.
Let’s consider the procedure division one refinement at a time. After the procedure division heading,
comes the paragraph heading for the first refinement, ‘Process-All-Suppliers’.
000500 PROCEDURE DIVISION.
000510 Process-All-Suppliers.
000520 open input Suppliers, output Saved-Suppliers
000530 perform Get-Next-Supplier
000540 perform until Account of Suppliers = high-values
000550 perform Copy-One-Supplier
000560 perform Get-Next-Supplier
000570 end-perform
000580 close Suppliers, Saved-Suppliers
000590 stop run.
The program begins at Line 000520 by opening the existing supplier file for input, and its new copy
for output. Opening a file for input means checking that the file exists and finding where it is stored on
disk. This information is stored in the operating system directory structure, so the environment divis-
ion entry for ‘Suppliers’ is consulted to translate the name into ‘newsupp.ndx’, which the operating
system can understand. Similarly, opening the output file means that the operating system will create a
directory entry for it. If a file named ‘oldsupp.ndx’ already exists, it might be over-written. After the

17
Cobol Basics

open statement, ‘Suppliers’ is poised to read its first record, and ‘Saved-Suppliers’ is poised to write
its first record.
The program then does whatever is needed to read the first Suppliers record (Line 000530). The
perform verb indicates that the program will execute the procedure ‘Get-Next-Supplier’, which we
will refine shortly.
Lines 000540–000570 form a loop. The loop repeatedly performs ‘Copy-One-Supplier’ and ‘Get-
Next-Supplier’. This goes on until every record has been copied. This is signalled by the condition
‘Account of Suppliers = high-values’. Why we test this particular condition will be explained later.
The important thing to notice is that a loop is needed. Cobol processes files one record at a time. The
assumption is that the whole file will not fit into memory, but individual records will.
Confusingly, Cobol uses the perform verb for two unrelated purposes, to execute a procedure or, in
conjunction with end-perform, to delimit a loop. The reason is historical.
After the loop, Line 000580 closes the two files. Among other things, this ensures that the last block
of the Saved-Suppliers file is written to disk and that its directory entry is made to show which blocks
it contains.
Last, but not least, Line 000590 stops the program, returning control to the operating system. For
historical reasons, this does not happen automatically. If you forget to return control to the operating
system, all sorts of strange things can happen.
We now have two refinements to consider: ‘Get-Next-Supplier’ and ‘Copy-One-Supplier’.
001000 Get-Next-Supplier.
001010 read Suppliers next record,
001020 at end
001030 move high-values to Account of Suppliers
001040 end-read.
‘Get-Next-Supplier’ consists of a single read statement. The first time we execute this statement, it
will make the first record of the Suppliers file available in what is called the file’s current record area.
You can think of this area as the record defined in the file section, within the FD for the Suppliers file.
The second time ‘Get-Next-Supplier’ is executed — which is at the end of the first iteration of the
loop, the second record of the Suppliers file will be made available in the current record area. Each
iteration of the loop will make a new record available, until the last one is reached. After this, the read
statement cannot make a new record available, so instead, it activates its at end clause. This causes a
special value to be moved (ie, copied) into the first 4 bytes of the current record area (Line 001030).
Note that the end-read delimiter is needed to mark the end of the scope of the at end clause.
The special value used is a predefined ‘figurative constant’ called high-values. This denotes one or
more bytes containing binary 1’s, or ASCII code 255. On most computers, this is not a printable
character, and typically corresponds to DELETE. We can safely assume that no valid account code will
match high-values, and in fact high-values is greater than any valid account code.
If we consider the behaviour of ‘Account of Suppliers’ throughout the program, we can see it will
increase steadily. This is because the Suppliers file has indexed organization, so records will be read in
order of increasing primary key. When the end of file is reached and ‘Account of Suppliers’ finally is
set to high-values, the loop will exit and the program will terminate.
002000 Copy-One-Supplier.
002010 move Supplier of Suppliers to Supplier of Saved-Suppliers
002020 write Supplier of Saved-Suppliers
002030 invalid key
002040 stop run
002050 end-write.
Copying a supplier record requires two steps: first the record must copied from the current record area
of the Suppliers file to the current record area of the Saved-Suppliers file (Line 002010), then it must
be written to the file (lines 002020–002050). When the record is moved, it is copied as a whole.
There is no need to move each of its components separately.
The write statement adds one record to the file at a time. Because we are dealing with an indexed
file, Cobol requires that the write statement has an invalid key clause; since the file is in sequential
access mode, the records written to it must be written in ascending order of ‘Account’. It is impossible
for such a sequence error to occur in this program, because the input file is being read in ascending
18
Cobol Basics

order. Even so, we must specify something here, so we make the program return control to the
operating system.
A Cobol program always begins by executing the first procedure in its procedure division. This will
typically perform other procedures. The order in which these are written doesn’t matter. However, a
useful convention is to place each procedure somewhere after the last line where it is performed. Then,
someone reading a perform statement knows to look for the procedure somewhere further on rather
than further back.
Because we have specified in the environment division that the Saved-Suppliers file is indexed and its
record key is Account, the Cobol run-time system will ensure that an index is created for the file, and
saved to disk when the file is closed. It may seem strange that this information is written in the
environment division. The reason is that, although Unix and DOS don’t do it, many operating systems
provide special support for indexed files. Therefore the primary key information is of concern to the
operating system.
Why did we say using the program was better than copying the file using an operating system
command? Because when an indexed file is written sequentially, its records are stored sequentially
and the file is well organised and efficient to use. Once the file has been modified by updating, it can
get into quite a mess. Most operating systems would simply copy the mess.
We need to be careful in the use of the current record areas. In certain situations their contents are
said to be ‘undefined’. This means we should not rely on them having any particular value. For
example, when end of file is detected, an input area may still contain the last record of the file, but we
can’t rely on it. Likewise, after a record has been written, an output area may still contain the record
that was written, but again we can’t rely on it. It depends on the Cobol run-time system. In some
cases the output record area is copied to a buffer where the current block of the file is being built. In
other cases it is a part of the block buffer itself, indicated by a pointer associated with the file. In this
case, when the record is written the pointer moves on to another part of the buffer, which might contain
anything.
Worse still, if a file has not yet been opened or has already been closed, its current record area might
not even exist!
3.3.5 Qualification
The example program has made use of qualified names, such as ‘Account of Suppliers’, or ‘Supplier
of Saved-Suppliers’. This is necessary because there are two elementary items named Account, and
two records named Supplier. A name is fully qualified when it is qualified by the name of every
structure of which it is a part. For example ‘Street of Address of Supplier of Suppliers’ is a fully
qualified name. Cobol doesn’t require names to be qualified more than is necessary. Since the
Suppliers file definition only contains one item called ‘Street’, ‘Street of Suppliers’ is enough to
distinguish it from ‘Street of Saved-Suppliers’.
Incidentally, the word ‘in’ can be used interchangeably with ‘of’. If at line 002010 we had preferred
to write ‘move Supplier in Suppliers to Supplier in Saved-Suppliers’, that would have had exactly the
same effect. The choice is merely a matter of taste.
In these notes, we shall be careful to make sure that no two items within a file have the same name.
Therefore it will always be enough to give the item name and file name, as in ‘Street of Suppliers’.
This is just a private convention. It isn’t a rule of Cobol.

3.4 Flowcharts
A flowchart is a diagram that shows the flow of control in a program. They once were used as a step
in program design, but are no longer recommended for this purpose, because it is possible to draw a
flowchart that can’t be written as a properly structured program. However, they do help some people
understand programs better, so flowcharts will be sometimes used in these notes.
Flowcharts use three main kinds of boxes: rectangles represent actions executed by the program,
diamonds represent conditions the program tests, and wedge-shaped connectors mark its beginning and
ending. The boxes are linked by arrows that show how control passes from one box to another.
Actions have only one arrow leaving them, but conditions have two or more, each marked with a
possible result of the test (typically ‘true’ or ‘false’). Here is the flowchart of the copy program.

19
Cobol Basics

Start

Open input and

output files

Read first record

from input

True
End of
input?

False

Write new output Stop

record

Begin reading at the start connector. The program will be seen to open both files, then read the first
input record. This action is followed by a test for end of input, with flow continuing to the right if the
end of input is detected, or downwards if it is not. Assuming that the end of input is not detected, the
program copies the input record to the output area, writes the new output record, and then reads the
next input record. Control returns to the top of the loop, which is the test for end of input. Control will
flow round and round this loop once per record, until the input file is exhausted. When the end of
input is detected, the program will close both files, then stop.

20
An Example Sub-System

4 An Example Sub-System
To understand transaction files, we first need to examine the structure of a typical Cobol sub-system.
A sub-system consists of a collection of files and programs having a unified purpose. The programs
cooperate in the use of master files, and they communicate with each other via transfer files. Sub-
systems are often described by run diagrams. The following run diagram describes the sub-system
that deals with deliveries from suppliers and payments to suppliers.

Documents, such as deliveries, payments, etc. (1) are entered using a data preparation program (2) to
produce a transaction file (3). This file is read by a program (4) that, in the case of a delivery, checks
that the Item-No on each order is recorded on the Products file (9), adjusts the stock and valuation of
the product, then writes a record on the transfer file (5). In turn, this file is read by a program (6),
which checks that the account code on the order refers to a record on the Suppliers file (10), and debits
the balance owed to the supplier. In the case of Payments, the first program (4) simply copies the
transaction record to the transfer file (5), and the second program (6) credits the balance. The Stock
update program (4) produces a report (7) that displays the updated stock information. The Supplier
update program (6) reports the updated supplier Balances (8).
Don’t assume that the programs are run equally often. An operator may use the data entry program
several times to build up a batch of transactions. A batch may then be processed by the Update Stock
program. Several of these batches may be accumulated before updating the Suppliers file.
Apart from a brief description of each process, a run diagram does not attempt show the logic of the
programs. Nor does it specify the times when the programs are run. Although it makes no sense to
process transactions before they are prepared, programs often run to a fixed schedule, so it can happen.
Therefore, we need to be careful that programs won’t fail if their transaction files happen to be empty.

4.1 Transaction Files

Although we have explained how the run diagram deals with deliveries and payments, real life is
more complicated than this. In reality, such a sub-system would support the processing of many other
kinds of transaction. For example, it could allow new products to be added to the Products file, prices,
etc. of existing products to be adjusted, and existing products to be deleted from the file. A similar
range of actions is possible for updates to the Suppliers file. All these business activities would
typically be recorded in a single file, in the order they happened. This means a transaction file usually
contains many different kinds of records, one for each kind of business activity.
For the present we shall consider only two kinds of transaction:
• Deliveries from suppliers,
• Payments to suppliers.
For reasons we will see later, it helps to serial-number the transactions. One way to do this is through
a time-stamp. Cobol systems allow programs to read the date and time from the system clock. Cobol
yields the date in the form ‘yymmdd’: 2 digits each for the year, month and day. Cobol 85 does not
yield the first 2 digits of the year: the century is ignored. (This is one source of the notorious Y2K
problem.) Similarly, Cobol yields the time in the form ‘hhmmssdd’: 2 digits each for the hour,
minutes, seconds and hundredths of seconds. A 24-hour clock is used, with the time starting from zero
at midnight. We shall use a 12-digit time stamp combining the system date and time, but ignoring

21
An Example Sub-System

fractions of a second. Every transaction record will contain a time-stamp. Another thing that every
transaction record needs is a code to indicate the kind of transaction.
Here is how Cobol describes a Delivery record.
01 Delivery.
02 Time-Stamp.
03 YY-MM-DD pic 9(6).
03 HH-MM-SS pic 9(6).
02 Kind pic x.
02 Item-No pic x(6).
02 Account pic a999.
02 Qty-Delivered pic 9999.
02 Cost pic 9(6)v99.
‘Time-Stamp’ has already been explained. ‘Kind’ is a single character that indicates the kind of
transaction, in this case the letter ‘D’. ‘Account’ specifies the customer making the order and ‘Item-
No’ specifies the product being ordered. Finally, ‘Qty-Ordered’ says how many items of the product
the customer wants to buy.
A supplier Payment record is defined as follows.
01 Payment.
02 Time-Stamp-2.
03 YY-MM-DD-2 pic 9(6).
03 HH-MM-SS-2 pic 9(6).
02 Kind-2 pic x.
02 Item-No-2 pic x(6).
02 Account-2 pic a999.
02 Amount pic 9(6)v99.
A payment is indicated by a Kind coded as ‘$’. ‘Account-2’ specifies the customer making the
payment, and ‘Amount’ is the amount paid. ‘Item-No-2’ is a dummy value whose use will be
explained in a later section.
Some explanations are now in order, which are easier to understand from a diagram:
01 Delivery
02 Time-Stamp Kind Item-No Account Qty-Delivered Cost
03 YY-MM-DD HH-MM-SS

01 Payment
02 Time-Stamp-2 Kind-2 Item-No-2 Account-2 Amount
03 YY-MM-DD-2 HH-MM-SS-2

Delivery records occupy 35 bytes, but Payment records are 4 bytes shorter. However, the first 12
bytes of either record contain the time-stamp, and the 13th byte always contains the kind, even though
the items have different identifiers. Suppose the program is reading transaction records from a file
called ‘Updates’. When it reads a record into the current record area, it can’t tell which kind of record
it has read until it checks the value of ‘Kind’. Logically, if it is a delivery, it should test ‘Kind of
Updates’; if it is a payment, it should test ‘Kind-2 of Updates’. Which should it test?
Actually, it doesn’t matter. Both names refer to the same byte in the current record area, so it doesn’t
matter which is used. The two names mean the same thing, they are aliases or synonyms. Writing
‘Kind of Updates’ defines the 13th byte of the record area. If the record area actually contains a
payment record, Cobol isn’t smart enough to care.
What doesn’t work, would be to change the identifier ‘Kind-2’ to ‘Kind’. Then both items would
then be called ‘Kind of Updates’. The catch is that the Cobol compiler will regard this as ambiguous;
it could mean ‘Kind of Delivery of Updates’ or it could mean ‘Kind of Payment of Updates’. The
compiler is not clever enough to deduce that the distinction doesn’t matter.
It is up to the programmer to handle different record types properly, or strange things can happen:
Suppose ‘Kind of Updates’ contains ‘$’, so that the record area contains a Payment, but the program
refers to ‘Qty-Delivered of Updates’, which is only present in a Delivery record. It will see bytes
24–27 of the record, which are the first 4 bytes of ‘Amount of Updates’. Similarly, if it refers to ‘Cost
of Updates’ it will see the last 4 bytes of ‘Amount of Updates’ followed by 4 bytes of garbage from
beyond the end of the record. Conversely, a program that refers to ‘Amount of Updates’ when a

22
An Example Sub-System

Delivery record is present will see 4 bytes from the Qty-Delivered, and the first 4 bytes of Cost. On
the other hand, the program can safely refer to ‘Time-Stamp’, ‘YY-MM-DD’, ‘HH-MM-SS’, ‘Kind’,
‘Account’ and ‘Item-No’ irrespective of what kind of record is present, because these occupy the same
positions in both kinds of record.
In what follows, we shall use the convention that names with suffixes, like ‘Kind-2’ are for doc-
umentation only. We won’t refer to them. They could be omitted without affecting the program.

4.2 Copying a Transaction File

We take this in stages, as before. The identification division contains nothing new:
IDENTIFICATION DIVISION.
Program-ID. copyupd8.

* Copies the customer updates file.

* Written on 21/11/00.
The environment division is simpler than before. A sequentially organized file has no key, so its key
doesn’t need to be described. Also, only sequential access is possible, so it is pointless to specify it.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Updates assign to "updates.seq"
organization is sequential.
select Saved-Updates assign to "updates.bak"
organization is sequential.
The new feature of the data division is that each file definition is followed by a list of record
descriptions; two in this case.
DATA DIVISION.
File Section.
FD Updates.
01 Delivery.
02 Time-Stamp.
03 YY-MM-DD pic 9(6).
03 HH-MM-SS pic 9(6).
02 Kind pic x.
02 Item-No pic x(6).
02 Account pic a999.
02 Qty-Delivered pic 9999.
02 Cost pic 9(6)v99.
01 Payment.
02 Time-Stamp-2.
03 YY-MM-DD-2 pic 9(6).
03 HH-MM-SS-2 pic 9(6).
02 Kind-2 pic x.
02 Item-No-2 pic x(6)
02 Account-2 pic a999.
02 Amount pic 9(6)v99.
FD Saved-Updates.
01 Delivery.
02 Time-Stamp.
03 YY-MM-DD pic 9(6).
03 HH-MM-SS pic 9(6).
02 Kind pic x.
02 Item-No pic x(6).
02 Account pic a999.
02 Qty-Delivered pic 9999.
02 Cost pic 9(6)v99.
01 Payment.
02 Time-Stamp-2.
03 YY-MM-DD-2 pic 9(6).
03 HH-MM-SS-2 pic 9(6).
02 Kind-2 pic x.
02 Item-No-2 pic x(6)
02 Account-2 pic a999.
02 Amount pic 9(6)v99.

23
An Example Sub-System

The procedure division also follows the previous example closely, at least at first:
PROCEDURE DIVISION.
Process-All-Updates.
open input Updates, output Saved-updates
perform Get-Next-Update
perform until Time-Stamp of Updates = high-values
perform Copy-One-Update
perform Get-Next-Update
end-perform
close Updates, Saved-Updates
stop run.
Reading a record from the Updates file is similar to reading one from the Suppliers file, even though
the Updates file contains more than one kind of record. A read statement refers to a file; it can’t poss-
ibly refer to a particular record. It can’t know what kind of record to read until it has already read it!
Get-Next-Update.
read Updates next record,
at end
move high-values to Time-Stamp of Updates
end-read.
Writing records must be done with care. When Delivery records are written, 35 bytes need to be
recorded. When payment records are written, 31 bytes need to be recorded. Cobol requires the
program to specify what kind of record is being written. The program must contain two different
write statements, so there are two cases to deal with. The rule is, ‘Read the file, write the record.’
Cobol deals with case analysis using the evaluate statement. Evaluating ‘Kind of Updates’ yields
two possible values, ‘D’ for a delivery, or ‘$’ for a payment. The two when clauses deal with these
cases by executing the proper move and write statements. End-evaluate marks the end of the final
when clause.
Copy-One-Update.
evaluate Kind of Updates
when "D"
move Delivery of Updates to Delivery of Saved-Updates
write Delivery of Saved-Updates
when "$"
move Payment of Updates to Payment of Saved-Updates
write Payment of Saved-Updates
end-evaluate.
The main difference between this and the earlier example is that although a master file often contains
only one kind of record, we expect a transaction file to contain several kinds.
By defining a pair of constants, there is an alternative, self-documenting way to write this paragraph.
We first define some suitable constants in the working-storage section of the data division:
Working-Storage Section.
77 Delivery-Code pic x value "D".
77 Payment-Code pic x value "$".
We may then write,
Copy-One-Update.
evaluate Kind of Updates
when Delivery-Code
move Delivery of Updates to Delivery of Saved-Updates
write Delivery of Saved-Updates
when Payment-Code
move Payment of Updates to Payment of Saved-Updates
write Payment of Saved-Updates
end-evaluate.
Later, if the coding scheme were changed, only the constant definitions would need to be modified;
‘Copy-One-Update’ could remain the same.

24
An Example Sub-System
Start

Open input and

output files

Read first record

from input

True
End of
input?

False

Write new Write new

delivery record payment record

4.3 Punctuation in Cobol

Cobol allows programs to be punctuated with commas, semicolons and periods (full-stops). Commas
and semicolons have no effect, and the programmer can use them virtually anywhere a space can be
used. Periods, on the other hand, are an important part of Cobol syntax, and are required after division,
section and paragraph headings, after file definition entries, after data definition entries, and at the end
of every paragraph in the procedure division. There are some other places where they are optional, but
we shall ignore these.
There is a serious problem with periods. In the procedure division, the period at the end of a
paragraph marks the end of every structure that has not yet been completed by an end marker. For
example, if end-read were omitted from ‘Get-Next-Update’, the period at the end of the paragraph
would also mark the end of the read statement. Likewise, omitting end-evaluate in ‘Copy-One-
Update’ would make no difference. For compatibility with earlier versions of Cobol, these are not
considered errors, and the compiler will almost certainly not complain. The danger is that if an end
marker is missing before the end of a paragraph, it will be assumed to occur at the end. This can lead
to some subtle logic errors. Don’t rely on periods to supply end markers, always write end markers
where they belong.

4.4 Matters of Style

Both the example programs have had the same basic scheme:
get the first record;
until end of file
process the record;
get the next record;
repeat the loop;
This is not the only scheme that could have been used. Some programmers would prefer to write the
first procedure of ‘copyupd8’ as follows.

25
An Example Sub-System
Process-All-Updates.
open input Updates, output Saved-updates
perform with test after
until Time-Stamp of Updates = high-values
perform Get-Next-Update
if Time-Stamp of Updates not = high-values
perform Copy-One-Update
end-if
end-perform
close Updates, Saved-Updates
stop run.
They reason that it makes more sense for the loop to read a record, then process it. But this creates
two exceptions: When the end of file is reached, the final read does not return a record, so we have to
be careful not to process it. Also, until we read the first record, we cannot be sure that record area
doesn’t contain high-values, so we have to avoid testing the loop condition before the first iteration.
Another style, recommended in at least one textbook, is as follows.
Process-All-Updates.
open input Updates, output Saved-updates
perform until 1 = 2
read Updates next record,
at end
close Updates, Saved-Updates
stop run
not at end
perform Copy-One-Update
end-read
end-perform.
We don’t recommend using either of these approaches. They work well enough when only one input
file is read, but they don’t adapt to reading more than one file at the same time. We have started as we
mean to go on. The read at the start of the procedure, we call a priming read. Together with the open
statement, it gets the first record into the record area. The read at the end of the loop, we call a
refreshing read. Once we have finally done with a record, the read replaces it by the next one.
In any case, we should not be embarrassed by having two read statements. If the file contains N
records, it is convenient for the loop to have N iterations, one for each record to be processed. But the
program must execute N+1 reads; the last one will return the at end condition instead of a record.
Therefore it is necessary to have one read outside the loop — and it can hardly come after it!
It is good style to keep all the input logic in one paragraph. For example, burying a read operation in
Copy-One-Update would make the program just that bit harder to understand.

4.5 The Copy Statement

The record descriptions for the Updates file and Saved-Updates file are identical. They will also need
to be the same in any program that reads, writes or updates either of the files. It is therefore a good
idea to store the descriptions once, and copy them into every program that uses them.
Suppose we create a file ‘update.cbl’ containing the following text,
01 Delivery.
02 Time-Stamp.
03 YY-MM-DD pic 9(6).
03 HH-MM-SS pic 9(6).
02 Kind pic x.
02 Item-No pic x(6).
02 Account pic a999.
02 Qty-Delivered pic 9999.
02 Cost pic 9(6)v99.
01 Payment.
02 Time-Stamp-2.
03 YY-MM-DD-2 pic 9(6).
03 HH-MM-SS-2 pic 9(6).
02 Kind-2 pic x.
02 Item-No-2 pic x(6)
02 Account-2 pic a999.
02 Amount pic 9(6)v99.

26
An Example Sub-System

and also a file ‘constant.cbl’ containing the following,

77 Delivery-Code pic x value "D".
77 Payment-Code pic x value "$".
77 Dummy-Item-No pic x(6) value low-values.
We can then replace the data division of ‘copyupd8’ with the following.
DATA DIVISION.
File Section.
FD Updates.
copy "update.cbl".
FD Saved-Updates.
copy "update.cbl".
Working-Storage Section.
copy "constant.cbl".
The copy statement can be used to copy lines of text anywhere in a Cobol program. It is also possible
to adjust the text when it is copied. Suppose we have saved the following lines as ‘suppfile.cbl’.
select Suppliers assign to "newsupp.ndx"
organization is indexed,
record key is Account of Suppliers
access is sequential.
We could then code the environment division of ‘copysupp.cbl’ as follows.
000100 ENVIRONMENT DIVISION.
000110 Input-Output Section.
000120 File-Control.
000130 copy "suppfile.cbl".
000140 copy "suppfile.cbl"
000150 replacing Suppliers by Saved-Suppliers,
000160 "newsupp.ndx" by "oldsupp.ndx".

27
Interacting With The Operator

5 Interacting with the Operator

Cobol treats the operator’s keyboard and monitor (standard input and standard output in Unix) rather
differently to other files. Instead of read and write, it uses accept and display. Also, the keyboard
and display are regarded as permanently open, so no open or close statement is needed.
We can use display to list the contents of the Customers file. First, let’s assume that the following
record description has been stored in the file ‘customer.cbl’.
01 Customer.
02 Account pic a999.
02 Address.
03 Name pic x(30).
03 Street pic x(30).
03 Suburb pic x(30).
02 Balance pic s9(6)v99.
02 Credit-Limit pic 999ppp.
02 Available-Credit pic 9(6)v99.
We may then write the first three divisions of the program. There is nothing we haven’t seen before:
IDENTIFICATION DIVISION.
Program-ID. listcust.

* Lists the Customers file.

* Written on 05/02/01.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Customers assign to "newcust.ndx"
organization is indexed
record key is Account of Customers
access is sequential.

DATA DIVISION.
File Section.
FD Customers.
copy "customer.cbl".
The main procedure is similar to the other read loops we have seen:
PROCEDURE DIVISION.
List-All-Customers.
open input Customers
perform Get-Next-Customer
perform until Account of Customers = high-values
perform Display-Customer
perform Get-Next-Customer
end-perform
close Customers
stop run.
So is the means of reading the Customers file:
Get-Next-Customer.
read Customers next record,
at end
move high-values to Account of Customers
end-read.
Listing a customer record can be trivial.
Display-Customer.
display Customer of Customers.

28
Interacting With The Operator

However, the simple display statement has a major drawback, which we experience as soon as we run
the program:
A001Autobarn Elizabeth 61 Elizabeth Way Elizabeth SA 5
158 0002085{0050047915{
B003BCR Mobile Installations 25 Sydney Street Ridgehaven SA
5058 0000000{0020020000{
B007Blaupunkt Cnr Centre and McNaughton Rds Clayton VIC 31
09 0102950{0250125372E
B012Bobs Electronic Repairs 28 Limbert Avenue Seacombe Garden
s SA 5047 0001295}0010011295{

and so on ...
By displaying customer records, we see how they are represented on file. The 113 bytes won’t fit on
one line of the monitor, and although the Account code and Address are readable, Balance, Credit-
Limit and Available-Credit are confusing, to say the least. Thus, in the case of account A001, Balance
is ‘0002085{’. This is less mysterious if we realise that ‘{’ is actually a zero with an extra bit set on to
indicate a ‘+’ sign. Remembering that the amount shown is in cents, Balance is therefore $208.50. the
next 3 bytes represent the (unsigned) Credit-Limit, in thousands of dollars, so ‘005’ is really $5,000.
Finally, ‘0047915{’ is the Available-Credit in cents, ie, $4,791.50. (Since the sum of Balance and
Available-Credit equals Credit-Limit, we can deduce that this customer has nothing on back order.)

5.1 Edited-Numeric Items

We might prefer to see something along the following lines.
A001 Autobarn Elizabeth Credit Limit: $5,000.00
61 Elizabeth Way Balance Owing: $208.50
Elizabeth SA 5158 Available Credit: $4,791.50

B003 BCR Mobile Installations Credit Limit: $2,000.00

25 Sydney Street Balance Owing: $0.00
Ridghaven SA 5058 Available Credit: $2,000.00

B007 Blaupunkt Credit Limit: $25,000.00

Cnr Centre and McNaughton Rds Balance Owing: $10,295.00
Clayton VIC 3109 Available Credit: $12,537.25

B012 Bobs Electronic Repairs Credit Limit: $1,000.00

28 Limbert Avenue Balance Owing: $129.50 DB
Seacombe Gardens SA 5047 Available Credit: $1,129.50

and so on ...
We therefore have the task of editing the numbers to show money amounts. Cobol makes this easy.
We have already seen examples of picture clauses for numeric and alphanumeric data. Here is an
example of an edited-numeric picture: ‘$$$$,$$9.99bdb’. The ‘$’ signs show the positions where a
dollar sign, space or digit might go, depending on the value of the number. The ‘$’ sign is said to
‘float’. The comma shows where a comma, space or dollar sign might go, again depending on the
value of the number. A ‘9’ shows where a digit goes, irrespective of the value of the number. The ‘.’
displays an actual decimal point. The letter ‘b’ indicates a blank space. The pair of letters ‘db’ will
display as ‘DB’ if the number is negative, but will otherwise display as two blanks.
This is only one way to show a sign. The combination ‘cr’ works similarly to ‘db’. Which of these is
used is an accounting convention. The general rule is to display ‘cr’ if a negative amount is to the
benefit of the person for whom the report is intended, but to display ‘db’ if it is to their disadvantage.
Here, a negative balance owing means that Serv-U-Rite owe money to the customer, so we have used
‘db’. When in doubt, ask an accountant.
Other ways to show a sign are by means of ‘+’ and ‘–’. Both display as ‘–’ when the number is
negative. The difference is that ‘+’ displays as ‘+’ when the number is positive, but ‘–’ displays as a
blank. The sign can be placed either before or after the number. If the sign is to appear immediately
before the first digit, a series of signs should be written, as in ‘----,--9.99’. A ‘–‘ sign will then either
be displayed as a blank or an actual sign, depending on the size of the number.

29
Interacting With The Operator

Two other symbols can be used to replace zeros at the start of a number: ‘z’ and ‘*’. A ‘z’ either
prints as a digit or a blank. A ‘*’ either prints as a digit or an asterisk. This is used on bank cheques to
prevent fraudulent alteration.
Finally, ‘b’, as we have seen, inserts a blank, and ‘/’ inserts a slash, as in a date. The reason ‘b’ is
needed is that a picture cannot include spaces. For example, we must write ‘x(30)’, not ‘x (30)’.
Here are some examples of how eight different pictures cause three different data values to be edited.
Study them carefully.
Picture -123,456.78 +123,456.78 0
999999.99 123456.78 123456.78 000000.00
zzzzzz.99+ 123456.78- 123456.78+ .00+
zzz,zzz.zzb- 123,456.78 - 123,456.78!!
----,--9.99 -123,456.78 123,456.78 0.00
$***,**9.99 $123,456.78 $123,456.78 $***,**0.00
$$$$,$$9.99cr $123,456.78CR $123,456.78!! $0.00!!
$$$,$$9.99cr $23,456.78CR $23,456.78 ! $0.00 !
99/99/99 12/34/56 12/34/56 00/00/00
Armed with this information, we can see that a picture of ‘$$$$,$$9.99bdb’ is just what we need to
produce the desired output format. The same picture can serve for all the amounts concerned, although
‘Credit-Limit’ cannot actually be negative.
What we don’t do is to modify the pictures used in the Customers record. This would be silly for two
reasons: First, it would make the records longer, by including redundant characters. Second, it would
not be possible to do arithmetic on the items in the record. Edited-numeric data are more like character
strings than numbers. Instead, we must introduce a working variable into the data division. It is not
part of a file, so it goes in the working-storage section.
DATA DIVISION.
File Section.
FD Customers.
copy "customer.cbl".
Working-Storage Section.
77 Edited-Amount pic $$$$,$$9.99bdb.
Ordinary level numbers can range from 01 to 49. The special level number 77 shows that ‘Edited-
Amount’ is not part of a data structure. (The compiler is therefore free to align the item on a 32-bit or
64-bit word boundary if that would make the program more efficient.)
We can now write a more sophisticated version of ‘Display-Customer’:
Display-Customer.
move Credit-Limit of Customers to Edited-Amount
display Account of Customers, space, Name of Customers,
" Credit Limit: ", Edited-Amount
move Balance of Customers to Edited-Amount
display " ", Street of Customers,
" Balance Owing: ", Edited-Amount
move Available-Credit of Customers to Edited-Amount
display Account of Customers, space, Name of Customers,
" Available credit: ", Edited-Amount
display spaces.
The move statement does not copy data blindly; the numbers are scaled to have the decimal points in
the correct places. The implicit decimal point in the picture of ‘Balance’ is aligned with the actual
decimal point in ‘Edited-Amount’, effectively converting cents to dollars and cents. Likewise, the
implicit zeros in ‘Credit-Limit’ are replaced by actual zeros, expanding thousands to dollars and cents.
The display statement takes a list of operands, which can be a mixture of variables and constants.
Normally, each display statement causes a new line. ‘Display spaces’ displays a blank line.
Some programmers will prefer the following alternative style. It is long-winded, but it perhaps makes
it easier to get the spacing right:

30
Interacting With The Operator
DATA DIVISION.
File Section.
FD Customers.
copy "customer.cbl".
Working-Storage Section.
01 Edited-Customer.
02 First-Line.
03 Account pic a999.
03 pic x value space.
03 Name pic x(30).
03 pic x(19) value " Credit Limit: ".
03 Credit-Limit pic $$$$,$$9.99.
02 Second-Line.
03 pic x(5) value spaces.
03 Street pic x(30).
03 pic x(19) value " Balance Owing: ".
03 Balance pic $$$$,$$9.99bdb.
02 Third-Line.
03 pic x(5) value spaces.
03 Suburb pic x(30).
03 pic x(19) value " Available Credit: ".
03 Available-Credit pic $$$$,$$9.99bdb.
Display-Customer.
move Account of Customers to Account of Edited-Customer
move Name of Customers to Name of Edited-Customer
move Street of Customers to Street of Edited-Customer
move Suburb of Customers to Suburb of Edited-Customer
move Credit-Limit of Customers
to Credit-Limit of Edited-Customer
move Balance of Customers to Balance of Edited-Customer
move Available-Credit of Customers
to Available-Credit of Edited-Customer
display First-Line of Edited-Customer
display Second-Line of Edited-Customer
display Third-Line of Edited-Customer
display spaces.
It turns out that we will need to display customer records in several more examples to follow, so we
will assume that the working-storage entries are stored in the file ‘editcust.cbl’ and the listing
procedure is contained in the file ‘dispcust.cbl’. Once this is done, instead of the above, we can write,
DATA DIVISION.
File Section.
FD Customers.
copy "customer.cbl".
Working-Storage Section.
copy "editcust.cbl".
and,
Display-Customer.
copy "dispcust.cbl".
We can develop a similar program to list the Products file. Without some cues, its output would be
difficult to understand. The program should display output like this:
Listing product records ...

ACLOTP Alcatel One Touch Phone

Acct ROL ROQ Ordr Stck Price ea Valuation
P004 10 20 9 $69.50 $449.55

ALTCIU Altec Caller ID Unit

Acct ROL ROQ Ordr Stck Price ea Valuation
P004 5 10 12 $12.00 $84.33

ALTPCD Altec Portable CD Including Headphones

Acct ROL ROQ Ordr Stck Price ea Valuation
P004 10 30 30 3 $37.00 $75.00

and so on ...

31
Interacting With The Operator

We can describe this layout in file ‘editprod.cbl’:

01 Edited-Product.
02 Line-1.
03 Item-No pic x(6).
03 pic x.
03 Description pic x(40).
02 Line-2.
03 pic x(7) value spaces.
03 pic x(46) value
"Acct ROL ROQ Ordr Stck Price ea Valuation".
02 Line-3.
03 pic x(7) value spaces.
03 Supplier pic a999.
03 pic x value space.
03 Reorder-Level pic zzz9.
03 pic x value space.
03 Reorder-Qty pic zzz9.
03 pic x value space.
03 On-Order pic zzzz.
03 pic x value space.
03 Stock pic zzz9.
03 pic x value space.
03 Price pic $$,$$9.99.
03 pic x value space.
03 Valuation pic $$$$,$$$.$$.
We then save the statements that edit and display the items in the file ‘dispprod.cbl’:
move Item-No of Products to Item-No of Edited-Product
move Description of Products to Description of Edited-Product
move Supplier of Products to Supplier of Edited-Product
move Reorder-Level of Products
to Reorder-Level of Edited-Product
move Reorder-Qty of Products to Reorder-Qty of Edited-Product
move On-Order of Products to On-Order of Edited-Product
move Stock of Products to Stock of Edited-Product
move Price of Products to Price of Edited-Product
move Valuation of Products to Valuation of Edited-Product
display Line-1 of Edited-Product
display Line-2 of Edited-Product
display Line-3 of Edited-Product
display spaces.
The resulting ‘listprod’ program is then similar to the previous example:
IDENTIFICATION DIVISION.
Program-ID. listprod.
* List the products file.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Products assign to "newprod.ndx"
organization is indexed,
record key is Item-No of Products
access is sequential.

DATA DIVISION.
File Section.
FD Products.
copy "product.cbl".
Working-Storage Section.
copy "editprod.cbl".

32
Interacting With The Operator
PROCEDURE DIVISION.
Process-All-Products.
display "Listing product records ..."
display spaces
open input Products
perform Get-Next-Product
perform until Item-No of Products = high-values
perform Display-Product
perform Get-Next-Product
end-perform
close Products
display "Listing complete."
stop run.

Get-Next-Product.
read Products next record
at end
move high-values to Item-No of Products
end-read.

Display-Product.
copy "dispprod.cbl".

5.2 Constants
We have already seen some examples of alphanumeric literals. They are character strings enclosed in
double quotation marks. If a quotation mark appears as part of a literal, it must be written twice,
otherwise the compiler will assume it marks the end of the literal. Numeric literals are straightforward,
and don’t need quotation marks. There is therefore a distinction between –123456.78, which is
numeric, and "–123456.78", which is alphanumeric. When a ‘.’ character is used as a decimal point, it
must be directly followed by a digit. A ‘.’ used as a period end marker must be followed by white
space.
Cobol also has some built-in figurative constants, which define constants that are confusing or
impossible to write:
quote "
comma ,
space a blank
high-value ASCII 255
low-value ASCII 0
zero either a numeric or alphanumeric zero, depending on context.
These names may also be written in the plural as quotes, commas, spaces, high-values, low-values,
and zeros or zeroes. The effect is the same as in the singular, and the choice is one of taste.
Variables can be assigned values in the data division, using a value clause. This does not make these
variables into true constants. The value is moved to the variable at the start of the program, initialising
it, but the program can modify any variable later if desired.

5.3 The Truth about Move

Understanding move is important, because many other operations are defined in terms of move. A
value clause that initialises a data item works by moving the value to the item; arithmetic operations
compute a result, then move it to the destination, accept (discussed shortly) reads a line of input, then
moves it to the destination, and so on.
There are two basic kinds of move: numeric moves and alphanumeric moves. A numeric move aligns
data with respect to decimal points; implicit (‘v’), actual (‘.’) or implied (integers). An alphanumeric
move normally aligns the left-most bytes of data.
In a numeric move, if the receiving field has more places than the sending field, either before or after
the decimal point, the extra places are filled with zeros (zero fill). If it has less, the excess digits are
lost (truncation). If the sending field is unsigned and the receiving field is signed, a plus sign is
inserted. If the sending field is signed and the receiving field is unsigned, the sign is lost, and only the
absolute value remains.
33
Interacting With The Operator

In an alphanumeric move, if the receiving field has more characters than the sending field, they are
filled with spaces (blank fill). If it has less, the excess characters are lost (truncation). Normally the
left-most bytes are aligned, but if the receiving field has the justified option, the right-most bytes are
aligned instead.
Most moves between fields of different types are allowed, but cause type conversion to occur. In
particular, all moves involving group items are treated as alphanumeric moves of the whole item.
One of the more useful consequences is that constants do not have be the same length as the items
they are moved to. For example, we may write,
move "Not Known" to Address of Customers
Because of the blank fill rule, this would set ‘Name of Customers’ to “Not Known” followed by 21
spaces, and would set all of ‘Street of Customers’ and ‘Suburb of Customers’ to spaces too.
One of the less useful consequences of these rules is that moves are allowed between any pair of
group items. For example,
move Supplier of Suppliers to Customer of Customers
would copy the Account, Address and Balance of a Supplier record to a Customer record, but because
of blank fill, would set its Credit-Limit and Available-Credit to spaces rather than zeros.
A move in the opposite direction:
move Customer of Customers to Supplier of Suppliers
would copy Account, Address and Balance, but because of truncation, Credit-Limit and Available-
Credit would be lost.
However, a move such as
move Order-Item of Updates to Supplier of Suppliers
would result in chaos. For example, ‘Account of Suppliers’ will contain the first 4 bytes of ‘Time-
Stamp of Updates’. Nonetheless, it is legal. Any Cobol compiler will accept it without a murmur.

5.4 Accept
The accept statement reads data from the keyboard. When a program executes an accept statement,
it pauses, waiting for something to be typed. Once the operator hits the R ETURN key, the data is
moved to its destination. At least, that is what happens in theory. Some systems move the data as soon
as the last character that will fit the destination is typed. Others are prepared to apply special rules to
numeric data. For example, typing ‘$123,456.78’ might be treated the same as typing ‘123456.78’; in
other words, all non-numeric characters are ignored.
Special forms of the accept statement are also used to obtain information from the operating system.

5.5 Creating a Suppliers File

Here is a program that will create a Suppliers file from data typed at the keyboard.
IDENTIFICATION DIVISION.
Program-ID. makesupp.

* Creates a Suppliers file.

* Written on 06/02/01.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Suppliers assign to "newsupp.ndx"
organization is indexed
record key is Account of Suppliers
access is random.

DATA DIVISION.
File Section.
FD Suppliers.
copy "supplier.cbl".
The main procedure is similar to other examples we have seen. This time, the Suppliers file is opened
for output. Remember that we don’t open the keyboard file. Also, we can’t use high-values to mark

34
Interacting With The Operator

the end of the input, because it is impossible to type high-values on most keyboards. So we use
spaces. The program also displays some redundant output to tell the operator what is happening.
PROCEDURE DIVISION.
Create-All-Suppliers.
display "Creating a new Suppliers file..."
display "(A blank account code terminates the program.)"
open output Suppliers
perform Get-Next-Account
perform until Account of Suppliers = spaces
perform Create-One-Supplier
perform Get-Next-Account
end-perform
close Suppliers
display "Program terminated by operator. All records saved."
stop run.
Getting the next account code is easy.
Get-Next-Account.
display "Account: " with no advancing
accept Account of Customers.
The computer will type ‘Account: ’, then pause, without advancing the cursor to the next line. The
operator is then expected to type the account code of the customer and hit the R ETURN key. To exit the
program, the operator types a blank account code. Because of the rules for moving alphanumeric data,
any number of blanks will do, including zero.
‘Create-One-Supplier’ has the job of reading the rest of the supplier data, then writing the new record.
It must read the address and balance details from the keyboard.
Create-One-Supplier.
display " Name: " with no advancing
accept Name of Suppliers
display " Street: " with no advancing
accept Street of Suppliers
display " Suburb: " with no advancing
accept Suburb of Suppliers
display "Balance: " with no advancing
accept Balance of Suppliers
write Supplier in Suppliers
invalid key
display "Sorry, a record already exists for account ",
Account of Suppliers
not invalid
display "Supplier record created, thank you."
end-write.
Because we are dealing with an indexed file, the write statement must include an invalid key clause.
It could only be activated if the operator tried to create two records with the same account number. No
two records can have the same primary key. The not invalid clause keeps the operator informed and
happy as each record is created.
A disadvantage of this program is that the operator can create records in any order. This means the
resulting file is not as well organised as it might be. Logically, its records will be in account code
order. Physically, they could be less well ordered; they might be stored in the order they were created.
This will lead to the file being inefficient both to read or update. Copying the file sequentially, using a
program such as ‘copysupp’, will optimise the internal structure of the file.
Incidentally, it would be trivial to change the program to use sequential access. Only two things need
to be changed: the access clause in the environment division, and the display statement in the invalid
key clause of the write statement. This should be replaced by two display statements to read,
display "Sorry, accounts must be in ascending order. ",
display Account of Suppliers,
" is smaller than the previous account code."
When an indexed file is written sequentially, each key value must be greater than the previous one.
The operator should therefore first sort the records to be entered into the right order.

35
Interacting With The Operator

Start

Open supplier
output file

Read 1st code

from keyboard

True
End of
input?

False

Accept rest of Close supplier

supplier data file

Try to write new Stop

supplier record
Read next code
from keyboard

Yes No
Duplicate
record?

Warn user Confirm update

5.6 Creating a Transaction File

Creating a transaction file differs from the previous case in that the file has sequential organization,
and contains more than one kind of record. Here is a sample dialogue. The operator’s responses are
shown in bold face type. The ‘ø’ symbol represents RETURN . The program is a little more user-
friendly than the previous example, but not much.
Creating Update records ...
Kind: $ø
Account: N002ø
Amount Paid: 2790.46ø
01/02/14 15:56:17, Payment: N002 $2,790.46
Is this correct (Y/N)? Yø
Transaction written to file.
Kind: dø
Choose one of the following:
D Record a delivery from a supplier.
$ Record a payment to a supplier.
Q Quit the program.
Kind: Dø
Item No: PLPDCDø
Account: N001ø
Qty Delivered: 50ø
Total Cost: 14950ø
01/02/14 15:57:03, Delivery: PLPDCD N001 50 $14,950.00
Is this correct (Y/N)? Nø
Transaction ignored.
Kind: Qø
Job complete.
In order to display the transaction records, imagine the following text is stored in ‘editupd8.cbl’.

36
Interacting With The Operator
01 Edited-Update.
02 Edited-Date pic 99/99/99.
02 Edited-Time pic 99/99/99.
02 Qty-Delivered pic z,zz9.
02 Cost pic $$$$,$$9.99.
02 Amount pic $$$$,$$9.99.
Likewise, imagine the following text is stored in ‘dispupd8.cbl’.
move YY-MM-DD of Updates to Edited-Date
move HH-MM-SS of Updates to Edited-Time
inspect Edited-Time replacing all "/" by ":"
evaluate Kind of Updates
when Delivery-Code
move Cost of Updates to Cost of Edited-Update
move Qty-Delivered of Updates
to Qty-Delivered of Edited-Update
display Edited-Date, space, Edited-Time ", Delivery: ",
Item-No of Updates, space,
Account of Updates, space,
Qty-Delivered of Edited-Update, space,
Cost of Edited-Update
when Payment-Code
move Amount of Updates to Amount of Edited-Update
display Edited-Date, space, Edited-Time ", Payment: ",
Account of Updates, space,
Amount of Edited-Update
end-evaluate.
The only new feature here is the way the date and time are displayed. The picture of ‘Edited-Date’
ensures that a value of ‘011225’ in ‘YY-MM-DD of Updates’ will be displayed as ‘01/12/25’. We
would like the picture of ‘Edited-Time’ to be ‘99:99:99’. Unfortunately this is not something we can
achieve simply by choosing the right picture. Instead, we allow a value such as ‘123645’ in
‘HH-MM-SS of Updates’ to be converted to ‘12/36/45’ by the move. The inspect statement will then
replace both instances of ‘/’ by ‘:’, giving ‘12:36:45’. The Cobol inspect statement can be used for a
variety of editing functions, but it is inappropriate to discuss it further here.
With these tools at our disposal, we can then develop a program to record transactions.
IDENTIFICATION DIVISION.
Program-ID. makeupd8.

* Creates an sequential Updates file from keyboard input.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select optional Updates assign to "updates.seq"
organization is sequential.
In the environment division, the Updates file is described as optional. This is because the program
will open it in extend mode. Extend mode is similar to output mode, but if the file exists, new
records are appended to the end of the existing records. Making the file optional means that it will not
be an error if the file does not exist, and a file will be created containing the new records.
DATA DIVISION.
File Section.
FD Updates.
copy "update.cbl".
Working-Storage Section.
copy "editupd8.cbl".
copy "constant.cbl".
77 HH-MM-SS-hh pic 9(8).
77 Yes-No-Response pic x.
The working-storage section contains two level-77 items whose use will be explained shortly.

37
Interacting With The Operator
PROCEDURE DIVISION.
Process-All-Updates.
display "Creating Update records ..."
open extend Updates
perform Get-Next-Kind
perform until Kind of Updates = "Q"
perform Process-One-update
perform Get-Next-Kind
end-perform
close Updates
display "Job complete."
stop run.
The procedure division opens the Updates file in extend mode. This allows the operator to add new
transactions to the end of the existing Updates file. The procedure division then continues with the
usual read loop, which terminates when the operator types the letter ‘Q’ (for ‘Quit’).
Otherwise, since different kinds of transaction require different data, the ensuing dialogue depends on
‘Kind of Updates’. If it does not equal ‘D’ or ‘$’, the program displays a list of valid options. A
payment has a dummy item number.
Process-One-Update.
perform Make-Time-Stamp
evaluate Kind of Updates
when Delivery-Code
perform Get-Item-No
perform Get-Account
perform Get-Qty-Delivered
perform Get-Cost
perform Confirm-Update
when Payment-Code
move Dummy-Item-No to Item-No of Updates
perform Get-Account
perform Get-Amount
perform Confirm-Update
when other
display "Choose one of the following:"
display "D Record a delivery from a supplier."
display "$ Record a payment to a supplier."
display "Q Quit the program."
end-evaluate.
After requesting all the required items, the program displays the transaction, then asks the operator to
confirm that it is correct. Any value in ‘Yes-No-Response’ other than ‘Y’ or ‘y’ is taken to mean ‘No’.
A ‘Yes’ response results in the correct record being written to the Updates file; a ‘No’ response results
in nothing being written. In each case, the operator is told what action was taken:
Confirm-Update.
perform Display-Update
display "Is this correct (Y/N)? " with no advancing
accept Yes-No-Response
if Yes-No-Response = "Y" or Yes-No-Response = "y"
evaluate Kind of Updates
when Delivery-Code
write Delivery of Updates
when Payment-Code
write Payment of Updates
end-evaluate
display "Transaction written to file."
else
display "Transaction ignored."
end-if.
Since the Updates file has sequential organization, it has no primary key. Consequently, an invalid
key clause is neither needed nor allowed.
The time-stamp data is obtained by special forms of the accept statement. These do not ask the
operator for information, they ask the operating system. The operating system gives the time to one
hundredth of a second. The hundredths are discarded by the divide statement.

38
Interacting With The Operator
Make-Time-Stamp.
accept YY-MM-DD of Updates from Date
accept HH-MM-SS-hh from Time
divide HH-MM-SS-hh by 100 giving HH-MM-SS of Updates.
Finally, there are a series of simple procedures for accepting information from the operator.
Get-Next-Kind.
display " Kind: " with no advancing
accept Kind of Updates.

Get-Item-No.
display " Item No: " with no advancing
accept Item-No of Updates.
Get-Account.
display " Account: " with no advancing
accept Account of Updates.

Get-Qty-Delivered.
display "Qty Delivered: " with no advancing
accept Qty-Delivered of Updates.

Get-Cost.
display " Total Cost: " with no advancing
accept Cost of Updates.

Get-Amount.
display " Amount Paid: " with no advancing
accept Amount of Updates.

Display-Update.
copy "dispupd8.cbl".
In reality, these procedures should contain extra statements to check that the data typed are reason-
able. Checking input is an interesting topic, but it is not part of this course. As it is, the program is
happy to accept blank account codes and item numbers, and might even fail if the operator types non-
numeric data when a number is expected — the Cobol standard doesn’t say what will happen.

5.7 Random Access

We have seen earlier how a file can be written in random access mode. We can use random access to
read records on demand. The next program, ‘findcust’, accepts an account number from the operator,
then displays the corresponding customer details. Apart from the access mode, the first three divisions
are almost the same as those in ‘listcust’:
IDENTIFICATION DIVISION.
Program-ID. findcust.
* Displays customer records given their account codes.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Customers assign to "newcust.ndx"
organization is indexed
record key is Account of Customers
access is random.

DATA DIVISION.
File Section.
FD Customers.
copy "customer.cbl".
Working-Storage Section.
77 Edited-Amount pic $$$$,$$9.99bdb.
77 Desired-Account pic a999.
The main procedure is similar to that of ‘makesupp’:

39
Interacting With The Operator
PROCEDURE DIVISION.
Find-Random-Customers.
display "Customers file enquiry program ..."
display "(A blank account code terminates the program.)"
display spaces
open input Customers
perform Get-Next-Account
perform until Desired-Account = spaces
perform Process-One-Customer
perform Get-Next-Account
end-perform
close Customers
display "Program terminated. Thank you."
stop run.
‘Get-Next-Account’ follows the previous pattern, except that the account code is accepted into a
working-storage variable, ‘Desired-Account’:
Get-Next-Account.
display "Account: " with no advancing
accept Desired-Account.
Provided the operator doesn’t terminate the program, ‘Desired-Account’ will contain a putative
account code, and the program will perform ‘Process-One-Customer’:
Process-One-Customer.
move Desired-Account to Account of Customers
read Customers,
invalid key
display "Sorry, there is no customer with account ",
Desired-Account
display spaces
not invalid
perform Display-Customer
end-read.
There are two things to note here: When a file is read in random access mode, the at end clause is
replaced by an invalid key clause. The invalid key clause is activated if no record actually has the
primary key that is specified, otherwise the not invalid clause (if it exists) is activated. Second, the
read statement offers no way to specify the desired primary key. Instead, the key value must be moved
to the current record area. (The reasoning behind this is that when a record is written, its key is in the
record area, so, by analogy, it should be in the same place when a record is read.)
Actually, there is no real need for the variable ‘Desired-Account’. ‘Get-Next-Account’ could read,
Get-Next-Account.
display "Account: " with no advancing
accept Account of Customers.
It would also be necessary to make a few other modifications. One of these would be to remove the
redundant move statement at the start of ‘Process-One-Customer’. Although this is neater in some
ways, it is less obvious how the correct key value finds its way to the record area.
The ‘Display-Customer’ paragraph is exactly the same as in ‘listcust’. In fact, it pays to store the
following text in the file ‘dispcust.cbl’ so it can be copied wherever it is needed.
Display-Customer.
move Credit-Limit of Customers to Edited-Amount
display Account of Customers, space, Name of Customers,
" Credit Limit: ", Edited-Amount
move Balance of Customers to Edited-Amount
display " ", Street of Customers,
" Balance Owing: ", Edited-Amount
move Available-Credit of Customers to Edited-Amount
display Account of Customers, space, Name of Customers,
" Available credit: ", Edited-Amount
display spaces.
Running the program might result in the following dialogue.

40
Interacting With The Operator
Customers file enquiry program ...
(A blank account code terminates the program.)

Account: B007ø
B007 Blaupunkt Credit Limit: $25,000.00
Cnr Centre and McNaughton Rds Balance Owing: $10,295.00
Clayton VIC 3109 Available Credit: $12,537.25

Account: B005ø
Sorry, there is no customer with account B005

Account: A001ø
A001 Autobarn Elizabeth Credit Limit: $5,000.00
61 Elizabeth Way Balance Owing: $208.50
Elizabeth SA 5158 Available Credit: $4,791.50

Account:ø
Program terminated. Thank you.
Start

Open customer
input file

Read 1st code

from keyboard

True
End of
input?

False

Try to read
customer record Close customer
Read next code file
from keyboard

Yes No
Record Stop
found?

Display edited
Warn user
record

41
Projection and Selection

6 Projection and Selection

When we choose only certain records from a file, this is called selection. When we choose some
items from records and ignore others, this is called projection.

6.1 Selection
First, we consider listing the records of those customers who have back orders. The method is to read
the Customers file sequentially, testing each record to see if Available-Credit equals the difference
between Credit-Limit and Balance. If it doesn’t, the discrepancy is the value of goods on back order.
We list only those records where the difference is non-zero.
The output should look like this,
Listing customers with back orders ...

B007 Blaupunkt Balance: $10,295.00

Cnr Centre and McNaughton Rds Credit Limit: $25,000.00
Clayton VIC 3109 Available Credit: $12,537.25

C007 Cargear Pty Ltd Balance: $0.00

453 Magill Road Credit Limit: $1,000.00
St Morris SA 5059 Available Credit: $865.00

and so on ...

Listing complete.
The first three divisions are similar to ‘listcust’:
IDENTIFICATION DIVISION.
Program-ID. slctcust.

* Lists those customers that have back orders.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Customers assign to "newcust.ndx"
organization is indexed,
record key is Account of Customers
access is sequential.

DATA DIVISION.
File Section.
FD Customers.
copy "customer.cbl".
Working-Storage Section.
copy "editcust.cbl".
The main loop of the procedure division follows the usual formula:
PROCEDURE DIVISION.
Process-All-Customers.
display "Listing customers with back orders ..."
display spaces
open input Customers
perform Get-Next-Customer
perform until Account of Customers = high-values
perform Process-One-Customer
perform Get-Next-Customer
end-perform
close Customers
display "Listing complete."
stop run.

42
Projection and Selection

So does ‘Get-Next-Customer’:
Get-Next-Customer.
read Customers next record,
at end
move high-values to Account of Customers
end-read.
In ‘Process-One-Customer’ we perform ‘Display-One-Customer’ only if Available-Credit does not
equal the difference between Credit-Limit and Balance:
Process-One-Customer.
if Available-Credit of Customers not =
Credit-Limit of Customers - Balance of Customers
perform Display-One-Customer
end-if.

Display-One-Customer.
copy "dispcust.cbl".
There are two points of syntax: First, ‘≠’ is written as ‘not =’. Second, a minus operator must be
surrounded by spaces; ‘Customers-Balance’ would look like a data name. Indeed, the same rule
should be followed for all operators.
Start

Open customer
input file

Read first
customer record

True
End of
input?

Edit and display

customer record

6.2 Projection
Projection basically means suppressing some of the information in a record. The term derives from
coordinate geometry; the projection of a point (x,y,z) on the y plane is (x,z). It can also mean
displaying information derived from a record; for example, (x+y,z) is also a projection.
In this case, the program calculates the value of goods on back order, and displays the result. It also
displays the Account and Name of each customer, but that is all.
The output will look like this,
Listing customer back order values ...
A001 Autobarn Elizabeth
B003 BCR Mobile Installations
B007 Blaupunkt $2,167.75
B012 Bobs Electronic Repairs
C002 Car Audio Designs
C005 Car Audio Services
C007 Cargear Pty Ltd $135.00
C010 Cartronics
C020 Citisound $180.85
43
Projection and Selection
C027 Complete Audio
C031 Custom Audio Sound $362.15
D014 Doug Sunstroms Sound Mart
D015 Doug Sunstroms Sound Mart
E003 Electric Bug Pty Ltd $242.55
E007 Afrotechnics
F002 Fujitsu Ten (Australia) P/L $1,000.00 DB
G010 Global Car Audio
J005 JayCar Pty Ltd
N012 National Car Audio
N014 Northern Car Radio $156.74
P001 Pioneer Car Audio Services
R003 RS Automotive Development
S004 Sound 4 Australia Pty Ltd
S011 Southern Car Audio
S015 Strathfield Car Radios
T002 Tonkins Car Audio Pty Ltd
T003 Tonkins Car Audio Pty Ltd $27.00
T004 Tonkins Car Audio Pty Ltd
Listing complete.
In working-storage, we describe the output. Because the picture clause of ‘Back-Order-Value’
includes no 9’s, zero values will print as blanks (called ‘blank when zero’). This helps to highlight the
customers whose Back-Order-Value is not zero:
IDENTIFICATION DIVISION.
Program-ID. prjtcust.

* Lists back order amounts for all customers.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Customers assign to "newcust.ndx"
organization is indexed,
record key is Account of Customers
access is sequential.
DATA DIVISION.
File Section.
FD Customers.
copy "customer.cbl".
Working-Storage Section.
01 Edited-Customer.
02 Account pic a999.
02 pic x value space.
02 Name pic x(30).
02 pic x value space.
02 Back-Order-Value pic $$$$,$$$.$$bdb.
The procedure division starts in the usual way:
PROCEDURE DIVISION.
Process-All-Customers.
display "Listing customer back order values ..."
open input Customers
perform Get-Next-Customer
perform until Account of Customers = high-values
perform Process-One-Customer
perform Get-Next-Customer
end-perform
close Customers
display "Listing complete."
stop run.

Get-Next-Customer.
read Customers next record,
at end
move high-values to Account of Customers
end-read.

44
Projection and Selection

‘Process-One-Customer’ shows how Cobol usually does arithmetic. Cobol syntax was originally
designed for those who hadn’t done high-school algebra:
Process-One-Customer.
move Account of Customers to Account of Edited-Customer
move Name of Customers to Name of Edited-Customer
subtract Available-Credit of Customers, Balance of Customers
from Credit-Limit of Customers
giving Back-Order-Value of Edited-Customer
display Edited-Customer.
You may prefer a more algebraic style:
Process-One-Customer.
move Account of Customers to Account of Edited-Customer
move Name of Customers to Name of Edited-Customer
compute Back-Order-Value of Edited-Customer
= Credit-Limit of Customers
- Available-Credit of Customers
- Balance of Customers
display Edited-Customer.
The important thing to notice here is that the result of an arithmetic operation can be an edited-
numeric item, but the sources of operands should be numeric items. In principle, Cobol moves all
arithmetic operands to 18-digit registers, computes the result, then moves the result to the destination.
Any conversions associated with the move operations are carried out in the standard way.

6.3 Combining Selection and Projection

We have illustrated selection and projection as separate operations, but they are often combined in the
same program. Here we list the back order values of only those customers for whom they are non-
zero. The output should look like this,
Listing customers with back orders ...
B007 Blaupunkt $2,167.75
C007 Cargear Pty Ltd $135.00
C020 Citisound $180.85
C031 Custom Audio Sound $362.15
E003 Electric Bug Pty Ltd $242.55
F002 Fujitsu Ten (Australia) P/L $1,000.00 DB
N014 Northern Car Radio $156.74
T003 Tonkins Car Audio Pty Ltd $27.00
Listing complete.
The program is a straightforward adaptation of the previous two examples. The only substantial
change is to ‘Process-One-Customer’:
Process-One-Customer.
if Available-Credit of Customers not =
Credit-Limit of Customers - Balance of Customers
move Account of Customers
to Account of Edited-Customer
move Name of Customers to Name of Edited-Customer
subtract Available-Credit of Customers,
Balance of Customers
from Credit-Limit of Customers
giving Back-Order-Value of Edited-Customer
display Edited-Customer
end-if.
This approach can obviously be generalised to any selection condition and any selection of outputs,
with one exception: If the selection condition required ‘Account’ to have some particular value, or
belong to a small set of values, it would be more efficient to use random access, as in ‘findcust’.

45
Ordering and Grouping

7 Ordering and Grouping

Ordering refers to the idea of listing information in some particular order. For example, to see which
customers owe the most money to Serv-U-Rite, we might display those with the biggest balances first,
so that the most important debtors are followed by the least important.
Grouping refers to gathering together records that share some common information, such as all orders
for the same product, or all orders for the same customer. Often, grouping is combined with finding
statistics, such as the total quantity of orders for each product.

7.1 Sorting
Unless the file already happens to be in the correct order, these operations require the file to be sorted.
Cobol provides a sort verb for this purpose. Conceptually, the sort statement takes an input file,
copies it to a work file, the work file is sorted in the order required, then copied to an output file. A
consequence is that the input file is not altered in any way — unless the input file is also used to
receive the output.
Here is the start of a program to sort the Customers file into descending order of Balance.
IDENTIFICATION DIVISION.
Program-ID. sortcust.

* Sorts customers in descending order of balance owing.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Unsorted-Customers assign to "newcust.ndx"
organization is indexed,
record key is Account of Unsorted-Customers
access is sequential.
select Sorted-Customers assign to "newcust.seq"
organization is sequential.
select Customers-Work-File assign to "work.tmp".
The environment division defines three files: the existing Customers file (‘newcust.ndx’), which is
indexed — so we have to say so. Even though this program does not use the index, describing the file
any other way than how it is would certainly lead to a run-time error. The second file (‘newcust.seq’)
can’t be indexed. If it were indexed, it would always be read in ascending order of its primary key.
Here, we want descending order, and Balance can’t be used as a primary key anyway; two records can
have the same Balance. The third file is the work file used by sort. It has neither indexed nor
sequential organization. It can’t be used at all outside the context of the sort operation. Indeed, if the
file to be sorted is small enough to fit in main memory, the sort work file may never be created.
DATA DIVISION.
File Section.
FD Unsorted-Customers.
copy "customer.cbl".
FD Sorted-Customers.
copy "customer.cbl".
SD Customers-Work-File.
copy "customer.cbl".
The data division describes the three files as having the same record structure. However, the sort
work file entry is written as SD rather than FD. This highlights that it is not a regular file.
PROCEDURE DIVISION.
Sort-Customers-by-Balance.
display "Sorting customers file ..."
sort Customers-Work-File
on descending Balance of Customers-Work-File
using Unsorted-Customers
giving Sorted-Customers
display "Sort complete."
stop run.
The procedure division is easy enough to understand. Note that the sort sequence must specify one or
more items of the work file, not the input or output file.
46
Ordering and Grouping

7.2 Ordering
In most programs, we sort a file in order to use it for some purpose. Rather than write the sorted
records to a new file, we can deal with them as they are sorted. This is done by using an output
procedure.
The following program displays the customer records with the greatest debts first.
IDENTIFICATION DIVISION.
Program-ID. debtors.

* Displays customers in descending order of balance owing.

DATA DIVISION.
File Section.
FD Unsorted-Customers.
copy "customer.cbl".
SD Customers.
copy "customer.cbl".
Working-Storage Section.
copy "editcust.cbl".
The environment and data divisions are simpler than before; there is no output file. It turns out to be
convenient to call the work file ‘Customers’. Then we can use the usual procedure to display it.
Apart from using an output procedure, the procedure division starts like the last example:
PROCEDURE DIVISION.
Sort-Customers-by-Balance.
display "Listing customers according to balance owing ..."
display spaces
sort Customers
on descending Balance of Customers
using Unsorted-Customers
output procedure Process-All-Customers
display "List complete."
stop run.
The output procedure itself consists of a familiar read loop:
Process-All-Customers.
perform Get-Next-Customer
perform until Account of Customers = high-values
perform Display-Customer
perform Get-Next-Customer
end-perform.
However, a sort file is not a regular file, and needs special syntax:
Get-Next-Customer.
return Customers record
at end
move high-values to Account of Customers
end-return.
We use the usual procedure to display the customer records:
Display-Customer.
copy "dispcust.cbl".
There are no open or close statements in this program. The sort opens and closes its files all by itself.

47
Ordering and Grouping

The resulting output looks like this,

Listing customers according to balance owing ...

B007 Blaupunkt Credit Limit: $25,000.00

Cnr Centre and McNaughton Rds Balance Owing: $10,295.00
Clayton VIC 3109 Available Credit: $12,537.25

R003 RS Automotive Development Credit Limit: $5,000.00

19 Warburton Road Balance Owing: $4,720.85
Valley View SA 5056 Available Credit: $279.15

S004 Sound 4 Australia Pty Ltd Credit Limit: $5,000.00

22 Pinn Street Balance Owing: $2,758.75
St Marys SA 5038 Available Credit: $2,241.25

and so on, until ...

B012 Bobs Electronic Repairs Credit Limit: $1,000.00

28 Limbert Avenue Balance Owing: $129.50 DB
Seacombe Gardens SA 5047 Available Credit: $1,129.50

List complete.
Now consider the problem of sorting delivery records in descending order of Cost.
Unfortunately, the Cost in a delivery record occupies the same bytes of the current record area as the
last 4 bytes of the Amount in a payment record, plus 4 bytes beyond the end of it. An attempt to sort
on the imaginary Cost of a payment might cause the program to fail. The Cobol compiler will actually
detect this particular error because Cost lies beyond the end of a payment record. For the program to
be acceptable, the sort work file must be defined to contain delivery records, but nothing else.
IDENTIFICATION DIVISION.
Program-ID. Costs.

* Displays Deliveries in descending order of Cost.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Updates assign to "updates.seq"
organization is sequential.
select Deliveries assign to "work.tmp".

DATA DIVISION.
File Section.
FD Updates.
copy "update.cbl".
SD Deliveries.
01 Delivery.
02 Time-Stamp.
03 YY-MM-DD pic 9(6).
03 HH-MM-SS pic 9(6).
02 Kind pic x.
02 Item-No pic x(6).
02 Account pic a999.
02 Qty-Delivered pic 9999.
02 Cost pic 9(6)v99.
Working-Storage Section.
copy "editupd8.cbl".
copy "constant.cbl".
Since the only records in the work file are deliveries, it is necessary to eliminate the payments before
the sort operation. We therefore use a sort input procedure to eliminate the unwanted records. It will
read the updates file, and write the deliveries to the sort work file, ignoring the payments.
In the procedure division, we use sort with both an input procedure and an output procedure. The
input procedure selects the deliveries; the output procedure displays them in sorted order:

48
Ordering and Grouping
PROCEDURE DIVISION.
Sort-Deliveries-by-Cost.
display "Listing Deliveries according to Cost ..."
display spaces
sort Deliveries
on descending Cost of Deliveries
input procedure Select-Deliveries
output procedure Process-All-Deliveries
display "List complete."
stop run.
The input procedure reads the unsorted Updates file using the usual read loop. Because the work file
is not a regular file, Cobol syntax requires the use of release instead of write. Deliveries are written to
the work-file; all other kinds of record are ignored.
Select-Deliveries.
open input Updates
perform Get-Next-Update
perform until Time-Stamp of Updates = high-values
perform Process-One-update
perform Get-Next-Update
end-perform
close Updates.

Process-One-Update.
if Kind of Updates = Delivery-Code
move Delivery in Updates to Delivery in Deliveries
release Delivery of Deliveries
end-if.

Get-Next-Update.
read Updates next record
at end
move high-values to Time-Stamp of Updates
end-read.
The output procedure is another read loop. In order to display delivery records, we copy the
procedure in the ‘dispupd8.cbl’ file. Although it displays with several other kinds of record as well as
deliveries, this is obviously harmless.
Process-All-Deliveries.
perform Get-Next-Delivery
perform until Time-Stamp of Deliveries = high-values
perform Display-Delivery
perform Get-Next-Delivery
end-perform.

Get-Next-Delivery.
return Deliveries record
at end
move high-values to Time-Stamp of Deliveries
end-return.

Display-Delivery.
copy "dispupd8.cbl".

7.3 Grouping
A possible use of grouping is to find the total cost of deliveries and the total payments made to each
supplier. An obvious way to do this would be to read the updates file, and to accumulate the totals in a
table, with an entry for each supplier. This poses a problem. However big we make the table, it is
possible that the Suppliers file will grow to exceed its capacity.
It is safer, and easier, to sort the Updates file into Account order. Then all the records for the first
Account code will be grouped together. This makes it possible to reuse the same two accumulators
over and over for each supplier. No table is needed.

49
Ordering and Grouping

Here are the first three divisions of a program that groups by sorting.
IDENTIFICATION DIVISION.
Program-ID. suppgrp.

* Finds the costs of deliveries and payments for each supplier.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Unsorted-Updates assign to "updates.seq"
organization is sequential.
select Updates assign to "work.tmp".

DATA DIVISION.
File Section.
FD Unsorted-Updates.
copy "update.cbl".
SD Updates.
copy "update.cbl".
Working-Storage Section.
77 Current-Account pic a999.
77 Total-Cost-This-Account pic 9(6)v99.
77 Total-Paid-This-Account pic 9(6)v99.
77 Grand-Total-Cost pic 9(8)v99.
77 Grand-Total-Paid pic 9(8)v99.
77 Edited-Cost pic $$$,$$$,$$$.$$.
77 Edited-Paid pic $$$,$$$,$$$.$$.
‘Current-Account’ is used to detect when each pair of totals should be displayed. ‘Total-Cost-This-
Account’ and ‘Total-Paid-This-Account’ are used to accumulate the sub-totals for each supplier.
‘Grand-Total-Cost’ and ‘Grand-Total-Paid’ are used to accumulate the grand totals for the entire file.
‘Edit-Cost’ and ‘Edit-Paid’ are used to edit them for display.
The start of the procedure division is like an earlier example:
PROCEDURE DIVISION.
Sort-Updates-by-Account.
sort Updates on ascending Account of Updates
using Unsorted-Updates
output procedure Process-All-Updates
stop run.
However, the output procedure is more complex than before:
Process-All-Updates.
perform Start-All-Accounts
perform Get-Next-Update
perform until Account of Updates = high-values
move Account of Updates to Current-Account
perform Start-One-Account
perform until Account of Updates not = Current-Account
perform Process-One-Update
perform Get-Next-Update
end-perform
perform End-One-Account
end-perform
perform End-All-Accounts.

Get-Next-Update.
return Updates
at end
move high-values to Account of Updates
end-return.
Instead of one loop, it contains two nested loops. The outer loop iterates once per supplier account.
The inner loop iterates once per update transaction. Apart from ‘Get-Next-Update’ which must
obviously be done once per update record, this divides the rest of the program into 5 procedures:

50
Ordering and Grouping

Start-All-Accounts is performed once only, at the start of the program.

Start-One-Account is performed once per account, at the start of it.
Process-One-Update is performed once for each update.
End-One-Account is performed once for each account, at the end of it.
End-All-Accounts is performed once only, at the end of the program.
All the programmer has to do now is decide just when each operation needs to be done. Observe that
editing is done just before results are displayed, not each time totals are updated. Note too that it is
slightly better to accumulate the grand totals from the sub-totals rather than from individual records.
Marking the end of file with a high-valued account code simplifies the logic. When the end of file
occurs, ‘Account of Updates’ will not match ‘Current-Account’, so the inner loop will exit, and the last
pair of totals will be displayed.
Start-All-Accounts.
move zeros to Grand-Total-Cost, Grand-Total-Paid
display "Account Delivered Paid"
display "----------------------------------".

Start-One-Account.
move zeros to Total-Cost-This-Account,
Total-Paid-This-Account.

Process-One-Update.
evaluate Kind of Updates
when Delivery-Code
add Cost of Updates to Total-Cost-This-Account
when Payment-Code
add Amount of Updates to Total-Paid-This-Account
end-evaluate.

End-One-Account.
add Total-Cost-This-Account to Grand-Total-Cost
add Total-Paid-This-Account to Grand-Total-Paid
move Total-Cost-This-Account to Edited-Cost
move Total-Paid-This-Account to Edited-Paid
display Current-Account, space,
Edited-Cost, space,
Edited-Paid.

End-All-Accounts.
move Grand-Total-Cost to Edited-Cost
move Grand-Total-Paid to Edited-Paid
display "----------------------------------"
display "Total", Edited-Cost, space, Edited-Paid
display "==================================".
In general, if a file is sorted on one key, we may expect to see two levels of loop, if on two keys, three
levels of loop, and so on.
The output of the program looks like this.
Account Delivered Paid
----------------------------------
N001 $19,590.00
N002 $2,840.46
P004 $1,000.00
----------------------------------
Total $20,590.00 $2,840.46
==================================

51
Ordering and Grouping

Start

Sort updates by Read 1st update

account record

Open updates Start all

input file accounts

True
End of
input?

False End all accounts

Save account
number

Close customer
End one account
file

Start one account

Stop

Yes Change of
account
number?

No Read next update

record

Process one
update

7.4 Style Again

Most programmers, left to their own devices, come up with an output procedure similar to this,
Process-All-Updates.
perform Start-All-Accounts
perform Get-Next-Update
perform until Account of Updates = high-values
if Account of Updates not = Current-Account
perform End-One-Account
perform Start-One-Account
move Account of Updates to Current-Account
end-if
perform Process-One-Update
perform Get-Next-Update
end-perform
perform End-All-Accounts.
This has only one loop, but it has several bugs: It begins by processing the end of a non-existent
account, so it displays a random pair of totals, but it makes up for this by failing to display the last pair.
If these problems are fixed in the most straightforward way, the procedure looks like this,

52
Ordering and Grouping
Process-All-Updates.
perform Start-All-Accounts
perform Get-Next-Update
move Account of Updates to Current-Account
perform Start-One-Account
perform until Account of Updates = high-values
if Account of Updates not = Current-Account
perform End-One-Account
perform Start-One-Account
move Account of Updates to Current-Account
end-if
perform Process-One-Update
perform Get-Next-Update
end-perform
perform End-One-Account
perform End-All-Accounts.
This is messy. ‘Start-One-Account’ and ‘End-One-Account’ have to be performed in two different
places, and there are two move statements as well. And there is still a bug: If the input file is empty,
the program will display a random account code, and two zeros.

7.5 Dummy Keys

We can write a similar program to find the total cost and quantity of deliveries of each product. In
addition, we only want to display those products for which the total cost exceeds $100.00, but for each
of these, we want to show the average cost per item, the ‘unit cost’. The output should look like this,
Item-No Qty Cost Unit Cost
----------------------------------------
ALTCIU 20 $190.00 $9.50
ALTPCD 30 $900.00 $30.00
PLPDCD 50 $14,950.00 $299.00
PLPRCM 50 $4,550.00 $91.00
----------------------------------------
Total 150 $20,590.00
=============================
The first three divisions of the program are similar to the last example:
IDENTIFICATION DIVISION.
Program-ID. prodgrp.
* Finds the costs and quantities of deliveries for each product.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Unsorted-Updates assign to "updates.seq"
organization is sequential.
select Updates assign to "work.tmp".

DATA DIVISION.
File Section.
FD Unsorted-Updates.
copy "update.cbl".
SD Updates.
copy "update.cbl".

Working-Storage Section.
copy "constant.cbl".
77 Current-Item-No pic x(6).
77 Total-Cost-This-Item-No pic 9(6)v99.
77 Total-Qty-This-Item-No pic 9999.
77 Grand-Total-Cost pic 9(8)v99.
77 Grand-Total-Qty pic 9(8)v99.
77 Edited-Cost pic $$$,$$$,$$$.$$.
77 Edited-Qty pic zzz,zz9.
77 Edited-Unit-Cost pic $$$,$$$.$$.
This involves sorting and grouping on item number. A problem arises here. Payments to suppliers
don’t have an item number to sort on. One way to deal with this would be to use an input procedure to
ignore the payments, as we did in an earlier example. Here we consider an alternative approach.

53
Ordering and Grouping

We actually foresaw this problem when we defined the Payment record structure, by including a
dummy Item-No. This will always contain low-values, not because of the value clause in the record
definition — which is effectively a comment, but because the program that created the Updates file
initialises Item-No correctly. This means that after the sort, all the Payment records will be grouped
together at the start of the output sequence.
In the procedure division, the output procedure includes an extra loop to skip over any records with
low-valued item numbers:
PROCEDURE DIVISION.
Sort-Updates-by-Item-No.
sort Updates on ascending Item-No of Updates,
using Unsorted-Updates
output procedure Process-All-Updates
stop run.

Process-All-Updates.
perform Start-All-Items.
perform Get-Next-Update
perform until Item-No in Updates not = Dummy-Item-No
perform Get-Next-Update
end-perform
perform until Item-No of Updates = high-values
move Item-No of Updates to Current-Item-No
perform Start-One-Item-No
perform until Item-No of Updates not = Current-Item-No
perform Process-One-Update
perform Get-Next-Update
end-perform
perform End-One-Item-No
end-perform
perform End-All-Items.

Get-Next-Update.
return Updates record
at end
move high-values to Item-No of Updates
end-return.
Without this preliminary loop, the program would have a bug. It would enter the main loop, setting
Current-Item-No to low-values. Since there are no deliveries that have a low-valued account number,
the program would attempt to find the statistics for an empty set. There would then be 3 separate bugs
in the code that follows,
Start-All-Items.
move zeros to Grand-Total-Cost, Grand-Total-Qty
display "Item-No Qty Cost Unit Cost"
display "----------------------------------------".

Start-One-Item-No.
move zeros to Total-Cost-This-Item-No,
Total-Qty-This-Item-No.

Process-One-Update.
add Cost of Updates to Total-Cost-This-Item-No
add Qty-Delivered of Updates to Total-Qty-This-Item-No.

End-One-Item-No.
add Total-Cost-This-Item-No to Grand-Total-Cost
add Total-Qty-This-Item-No to Grand-Total-Qty
if Total-Cost-This-Item not < 10
move Total-Cost-This-Item-No to Edited-Cost
move Total-Qty-This-Item-No to Edited-Qty
divide Total-Cost-This-Item-No by Total-Qty-This-Item-No
giving Edited-Unit-Cost rounded
display Current-Item-No, space, Edited-Qty, space,
Edited-Cost, space, Edited-Unit-Cost
end-if.

54
Ordering and Grouping
End-All-Items.
move Grand-Total-Cost to Edited-Cost
move Grand-Total-Qty to Edited-Qty
display "----------------------------------------"
display "Total ", Edited-Qty, space, Edited-Cost
display "=============================".
First, the program will attempt to display a ‘Current-Item-No’ of low-values. The effect is
unpredictable, because low-values is equivalent to ASCII null characters.
Second, ‘Process-One-Update’ will try to treat Payment records as if they were Deliveries. Due to
aliasing, ‘Cost of Updates’ will be assigned the value of the first 4 bytes of ‘Amount of Updates’, and
‘Qty-Delivered of Updates’ will be assigned its last 4 bytes, followed by 4 bytes of garbage. This
problem could be fixed by altering ‘Process-One-Update’ as follows,
Process-One-Update.
if Kind of Updates = Delivery-Code
add Cost of Updates to Total-Cost-This-Item-No
add Qty-Delivered of Updates to Total-Qty-This-Item-No
end-if.
which would reveal the third bug: In ‘End-One-Item-No’, ‘Total-Qty-This-Item-No’ would be zero,
causing a divide-by-zero error.
This problem could be fixed as follows,
End-One-Item-No.
add Total-Cost-This-Item-No to Grand-Total-Cost
add Total-Qty-This-Item-No to Grand-Total-Qty
move Total-Cost-This-Item-No to Edited-Cost
move Total-Qty-This-Item-No to Edited-Qty
if Total-Qty-This-Item not = zero
divide Total-Cost-This-Item-No by Total-Qty-This-Item-No
giving Edited-Unit-Cost rounded
else
move zero to Edited-Unit-Cost
display Current-Item-No, space, Edited-Qty, space,
Edited-Cost, space, Edited-Unit-Cost.
where the picture clause for ‘Edited-Unit-Cost’ will display a zero value as blank.
Both these changes are wise moves anyway. We shall consider several additional kinds of
transactions in later examples. This is quite realistic: in real life, systems are often modified by adding
new kinds of transaction. The modification to ‘Process-One-Update’ would ignore the new kinds of
transaction, and not misinterpret them as deliveries. But it would then be possible for group of
transactions to occur for a valid Item-No, which contained no deliveries. The modification to ‘End-
One-Item-No’ would then be essential.
The rounded option of the divide statement means that ‘Edited-Unit-Cost’ will be computed to the
nearest cent. For example, if ‘Total-Cost-This-Item-No’ equals $1,231.00 and ‘Total-Qty-This-Item-
No’ equals 16, the exact unit cost is $76.9375. Since ‘Edited-Unit-Cost’ has only two decimal places,
without the rounded option, this will be truncated to $76.93; with it, it will be rounded up to $76.94.
Of course, this whole problem would be much easier if the transaction file contained just deliveries,
and nothing else. This is best done by using a sort input procedure to select just the deliveries, as in an
earlier example.
Finally, suppose we want to display statistics only for those Items having a total cost exceeding some
preset amount, say $10,000. We would need to modify ‘End-One-Item-No’ as follows.

55
Ordering and Grouping
End-One-Item-No.
add Total-Cost-This-Item-No to Grand-Total-Cost
add Total-Qty-This-Item-No to Grand-Total-Qty
if Total-cost-This-Item-No > 10000
move Total-Cost-This-Item-No to Edited-Cost
move Total-Qty-This-Item-No to Edited-Qty
if Total-Qty-This-Item not = zero
divide Total-Cost-This-Item-No
by Total-Qty-This-Item-No
giving Edited-Unit-Cost rounded
else
move zero to Edited-Unit-Cost
display Current-Item-No, space, Edited-Qty, space,
Edited-Cost, space, Edited-Unit-Cost
end-if.
Note that this is process not correctly called ‘selection’. Selection is a process of choosing particular
records, such as deliveries. Here we are displaying aggregate data having certain properties.

56
Set Union, Intersection and Difference

8 Set Union, Intersection and Difference

With the exception of ‘findcust’, all the examples so far have involved just one input file. What it we
want to combine information from two or more files? There are two varieties of this problem: where
the files have a common ordering, and where they don’t. Here we consider the case that they do.
We begin by considering a modification of our standard read loop to deal with two files:
get the first record from file A;
get the first record from file B;
until end of both files
process both records;
get the next record from file A;
get the next record from file B;
repeat the loop;
This scheme will only work if both files contain exactly the same number of records with the same set
of keys. This is not a very likely situation. A much more useful scheme would allow for the sets of
records to be different. The basic plan is to consider the set of all keys in either file. Then a given key
can be present in both files, present in the first but missing in the second, or present in the second and
missing in the first. It can’t be missing in both:
get the first record from file A;
get the first record from file B;
choose the next key to process;
until end of both files
process the records for the current key;
get the next record from file A if necessary;
get the next record from file B if necessary;
choose the next key to process;
repeat the loop;
Assume both files are in ascending order of their common primary key. When the current records of
both files share the same key, it is obvious what to do. What happens if they don’t? In that case we
choose the smaller of the two keys to process first. The reason is simple: If file A has key A003 but
file B has key A001, we don’t know if file B also has key A003 until we have cleared key A001. So if
we want to be able to match pairs with the same key, we must always deal with the smaller key first.

8.1 Finding Duplicate Accounts

Let’s apply this to a realistic situation: Serv-U-Rite are worried that using the same type of account
code for customers and suppliers might have led to confusion. They want to know if any customer has
the same account code as a supplier. The required program should display all accounts from both files,
flagging those that share the same account code. Fortunately, both the Suppliers and Customers files
are indexed by Account. The first two divisions of the program are straightforward:
IDENTIFICATION DIVISION.
Program-ID. accounts.

* Lists all customer and supplier accounts, flagging duplicates.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Suppliers assign to "newsupp.ndx"
organization is indexed,
record key is Account of Suppliers
access is sequential.
select Customers assign to "newcust.ndx"
organization is indexed,
record key is Account of Customers
access is sequential.
In the data division, we declare ‘Current-Account’. This will keep track of which account the
program is processing. The copied statements are used to edit supplier and customer records.

57
Set Union, Intersection and Difference
DATA DIVISION.
File Section.
FD Customers.
copy "customer.cbl".
FD Suppliers.
copy "supplier.cbl".
Working-Storage Section.
77 Current-Account pic a999.
copy "editsupp.cbl".
copy "editcust.cbl".
The procedure division follows the logic outlined above:
PROCEDURE DIVISION.
Process-All-Accounts.
display "Listing all suppliers and customers ..."
open input Suppliers, Customers
perform Get-Next-Supplier
perform Get-Next-Customer
perform Choose-Current-Account
perform until Current-Account = high-values
perform Process-One-Account
if Account of Suppliers = Current-Account
perform Get-Next-Supplier
end-if
if Account of Customers = Current-Account
perform Get-Next-Customer
end-if
perform Choose-Current-Account
end-perform
close Suppliers, Customers
display "Listing complete."
stop run.
where ‘Choose-Current-Account’ sets ‘Current-Account’ to the lower of the two account codes:
Choose-Current-Account.
if Account of Suppliers < Account of Customers
move Account of Suppliers to Current-Account
else
move Account of Customers to Current-Account
end-if.
The program can test if there is a supplier record or customer record for the current account by
comparing its key with the value of ‘Current-Account’. This test is used to decide which refreshing
reads are needed at the end of the loop. Note that if the key of the Customers record or the Suppliers
record does not equal ‘Current-Account’, it must exceed it, because ‘Current-Account’ is the smaller of
the two keys. That means the record has not yet been processed, so no read is necessary. In a sense,
the read has already been done.
The trick of using high-values to mark the end of file proves especially useful here. There are three
cases to consider: both files end with the same key, the Customers file ends on a lower key than the
Suppliers file, or the Suppliers file ends on a lower key than the Customers file. Because high-values
is greater than any real account code, it will never be chosen as the current key until both files have
reached their end. This deals with all three cases properly. The loop finally exits when ‘Current-
Account’ equals high-values.
‘Process-One-Account’ must allow for either file to have a missing record, but in the case that both
are present, it is required to flag the account:
Process-One-Account.
if Account of Suppliers = Current-Account
perform Display-Supplier
end-if
if Account of Customers = Current-Account
perform Display-Customer
end-if
if Account of Suppliers = Account of Customers
display "Duplicate account number!"
display "========================="
end-if.

58
Set Union, Intersection and Difference

Note that if the two account codes are equal, they must also equal ‘Current-Account’.
The rest of the program contains no surprises. We assume that we can copy the statements needed to
display the customer and supplier records:
Get-Next-Supplier.
read Suppliers next record,
at end
move high-values to Account of Suppliers
end-read.

Get-Next-Customer.
read Customers next record,
at end
move high-values to Account of Customers
end-read.

Display-Customer.
copy "dispcust.cbl".

Display-Supplier.
copy "dispsupp.cbl".
Start

Open supplier and

customer files

Read 1st record

of each file

Choose the lower

account number

True
End of
Close both files
input?

False
Stop
Process the
account

Is the Yes
Choose the lower Read next
supplier
account number supplier record
present?

Is the Yes
Read next
customer
customer record
present?

59
Set Union, Intersection and Difference

8.2 Set Theory Terminology

To describe this kind of program, we borrow some terminology from set theory. Consider the
following Venn diagram:
U

C S

1 2 3

The rectangle U represents the universe of all possible account codes. The circle C encloses the set of
all Customer account codes. The circle S encloses the set of all Supplier account codes. In general,
these sets overlap. The area marked 2 represents the region of overlap, containing the codes both files
have in common. The following terms describe potentially interesting sets of account codes:
C union S Areas 1, 2 and 3 All codes in either file. (All values of Current-Account.)
C intersect S Area 2 only The codes common to both files. (Those to be flagged.)
C minus S Area 1 only Customer codes that are not also Supplier codes.
S minus C Area 3 only Supplier codes that are not also Customer codes.
Of the other possibilities, Area 4 represents the set of nearly 26,000 account codes that aren’t
currently used, and is therefore of limited interest. There are 8 ways of choosing the remaining 3
areas: Four have already been considered. Of the rest, one is the empty set, Areas 1 and 2 make set C,
Areas 2 and 3 make set S, leaving the combination 1 and 3, which is called the symmetric difference of
C and S, the set of all codes that are not common to both files: their union minus their intersection.
In these terms, the functions of the above program are to display accounts in the union of C and S, and
flag accounts in the intersection of C and S. This description is a harmless abuse of the terminology of
set theory. Strictly speaking, set operations can only be applied to the information shared by both files.
in this case, only the codes themselves.
In this particular example, both files shared a common primary key, so they were already in the same
order. If this had not been the case, we could have sorted one or both files to bring them into the same
order. It should be clear that, if desired, the program could easily be modified to find any desired set
operation. However, the program still assumes that each file contains at most one record with a given
key. We shall see how to deal with multiple records in the next section.

60
Join Algorithms

9 Join Algorithms
We have seen how to read and write files, select particular records, project particular results, and to
summarise totals. All these operations involved just one input file. We have also seen how to combine
information from two files that share a common primary key. But we often need to combine
information from files in more interesting ways than this. For example, we may want to associate
product data with its corresponding supplier data, or we may want to discover all pairs of customers
and suppliers that have addresses in the same suburb as one another. We call the combination of data
from two or more files a join. Unlike set union, intersection and difference, a join does not need the
files being combined to share a common primary key.
There are three basic algorithms for joining files, and each has its place:
Nested-loops For each record of file A we read all records of file B
Random-access For each record of file A we read the record of file B with the matching key.
Sort-merge We merge the files as above, but allow multiple records on one of the files.
The nested loops method is the most general, because it allows any kind of join to be made. The
other two methods can only be used when matching is done on a primary key of one of the files.

9.1 The Nested-Loops Method

Suppose Serv-U-Rite want — for no good reason — to find all combinations of suppliers and
customers who share the same suburb. In the nested loops method, we repeatedly read all the records
from the customer file for each record of the Suppliers file. That is to say, we read a supplier record,
then read the Customers file sequentially from start to finish, reporting every customer that has a
matching suburb. Then we read the next supplier, and read the entire customer file again, and so on
until we have read every supplier record. The algorithm has an outer loop that steps through the
Suppliers file, and an inner loop that steps through the Customers file — hence its name, the nested-
loops method. Each of the loops has the form that is now familiar to us.
To write the program, we need to know how to keep getting back to the start of the customer file once
we have read to its end. In Cobol, the answer is to close the file, then open it again. Opening a file for
input always leaves a program ready to read its first record.
The first three divisions are routine:
IDENTIFICATION DIVISION.
Program-ID. loopjoin.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Suppliers assign to "newsupp.ndx",
organization is indexed,
record key is Account of Suppliers,
access is sequential.
select Customers assign to "newcust.ndx",
organization is indexed,
record key is Account of Customers,
access is sequential.

DATA DIVISION.
File Section.
FD Suppliers.
copy "supplier.cbl".
FD Customers.
copy "customer.cbl".
Working-Storage Section.
copy "editsupp.cbl".
copy "editcust.cbl".

61
Join Algorithms

The outer loop is like many previous examples:

PROCEDURE DIVISION.
Join-Suppliers-and-Customers.
display "Joining Customers and Suppliers on suburb ..."
open input Suppliers
perform Get-Next-Supplier
perform until Account of Suppliers = high-values
perform Process-One-Supplier
perform Get-Next-Supplier
end-perform
close Suppliers
display "End of join."
stop run.
The inner loop is similar. However, remember that this procedure is executed many times: once for
each supplier record:
Process-One-Supplier.
open input Customers
perform Get-Next-Customer
perform until Account of Customers = high-values
perform Process-Supplier-Customer-Pair
perform Get-Next-Customer
end-perform
close Customers.
The procedure ‘Process-Supplier-Customer-Pair’ is executed once for each combination of supplier
and customer. If there are S supplier records and C customer records, ‘Process-Supplier-Customer-
Pair’ is executed S¥C times. Most of the time it does nothing, but if the supplier’s and customer’s
suburbs match (exactly), it displays both records, then a blank line.
Process-Supplier-Customer-Pair.
if Suburb of Suppliers = Suburb of Customers
perform Display-Customer
perform Display-Supplier
display spaces
end-if.
The remaining procedures are again routine:
Get-Next-Supplier.
read Suppliers record
at end
move high-values to Account of Suppliers
end-read.

Get-Next-Customer.
read Customers record
at end
move high-values to Account of Customers
end-read.

Display-Customer.
copy "dispcust.cbl".

Display-Supplier.
copy "dispsupp.cbl".
If ‘Process-Supplier-Customer-Pair’ did not contain the condition on Suburb, the program would list
every possible pair of supplier and customer records. Mathematically, the resulting set of pairs is
called a cartesian product. The pairs are ordered: supplier-customer pairs are distinct from customer-
supplier pairs. Because of the condition however, the program only lists a subset of the cartesian
product. A subset of a cartesian product is called a relation.
We can regard the files themselves as being formed in a similar way. If we take the set of all account
numbers, all names, all street addresses, all suburbs and all balances, then form their cartesian product,
we obtain a set containing every possible supplier record. The Suppliers file contains only a subset of
the cartesian product, so mathematically speaking, it too is a relation. Thus relations are a
mathematical abstraction of files. This is the origin of the term ‘relational database’. The join
operation is a means of forming new relations from existing relations.
62
Join Algorithms

Start

Open suppliers,
read 1st record

At end of True
supplier
file?

False
Close supplier
file
Open customers,
read 1st record

Stop
Read next
supplier record

At end of True
customer
file?

False

True
Same Display supplier-
suburb? customer pair

False

Close customer
file

9.2 The Random Access Method

A special case arises when an equi-join is made on the primary key of an indexed file. This allows
direct access to the records of that file. It also implies that the join must be many-to-one or one-to-one.
We often refer to a many-to-one relationship as a master-detail, or parent-child relationship, in which a
parent may have zero, one or many children, but every child must have exactly one parent.
The random-access look-up method is as follows: We read the child file sequentially, and for each of
its records we read the corresponding record of the parent file by random access. For this reason, the
join must be made on the primary key of the parent file.
Start

Open parent and

child files

Read first child

record

True
At end of
child file?

False
Close both files

Read matching
parent record

Stop

Parent False
Read next child
record
record
found?

True

Process parent- Display error

child pair message

We illustrate the method by joining the Products and Suppliers files on the account of the supplier.
Since every product must have exactly one supplier, but a supplier can supply any number of products,
the Suppliers file is the parent, and the Products file is the child.
We can adapt two programs we have already studied: one to list the Products file sequentially
(‘listprod’), and one to read the Suppliers file randomly (‘findsupp’). In effect, all we have to do is
adapt ‘findsupp’ by replacing its keyboard input file by the Products file.
In the environment division, we select sequential access mode for the child file, and random access
mode for the parent file:
IDENTIFICATION DIVISION.
Program-ID. randjoin.
* Joins products and suppliers by random access to supplier.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Suppliers assign to "newsupp.ndx",
organization is indexed,
record key is Account of Suppliers,
access is random.
select Products assign to "newprod.ndx",
organization is indexed,
record key is Item-No of Products,
access is sequential.

64
Join Algorithms

The data division is unremarkable:

DATA DIVISION.
File Section.
FD Suppliers.
copy "supplier.cbl".
FD Products.
copy "product.cbl".
Working-Storage Section.
copy "editprod.cbl".
copy "editsupp.cbl".
The procedure division begins with the familiar read loop:
PROCEDURE DIVISION.
Join-Suppliers-and-Products.
display "Joining Suppliers and Products ..."
display spaces
open input Products, Suppliers
perform Get-Next-Product
perform until Item-No of Products = high-values
perform Process-One-Product
perform Get-Next-Product
end-perform
close Products, Suppliers
display "Join complete."
stop run.
‘Process-One-Product’ displays a product record, then tries to read its corresponding supplier record
by random access. Note the important move statement, to specify the required primary key value.
Process-One-Product.
perform Display-Product
move Supplier of Products to Account of Suppliers
read Suppliers record
invalid key
display "Account: " Account of Suppliers
" is not in the suppliers file."
display spaces
not invalid key
perform Display-Supplier
end-read.
The rest is routine:
Get-Next-Product.
read Products record
at end
move high-values to Item-No of Products
end-read.

Display-Product.
copy "dispprod.cbl".

Display-Supplier.
copy "dispsupp.cbl".
We can see that, when it is applicable, this method is much faster than the nested loops method. For
each product, it reads the required supplier record directly, instead of reading the entire Suppliers file.
This program is best described as forming an outer join, because products that have no matching
supplier are listed in the join, along with an error message. It forms a equi-join, because the Supplier
in the product record must equal the Account in the supplier record. (Note the first line of ‘Display-
Supplier-Status’!) Because ‘Supplier’ and ‘Account’ are not the same name, it is not strictly a natural
join. On the other hand, many programmers use the term loosely to refer to any join on the primary
key of a file. Actually, the supplier account code gets listed twice, once as part of the product record
and once as part of the supplier record. Although this redundancy is reassuring, normally only one of
the occurrences would be used in the join.

65
Join Algorithms

9.3 The Sort-Merge Method

The third join method is based on the merge program of the previous chapter. Like the random access
method, it can be used to form a many-to-one or a one-to-one join. The basic idea is to first sort the
Products file into ‘Supplier’ order. This will group the product records by ‘Supplier’. Then we merge
each group of product records with individual supplier records. We have already discussed how to
merge two files in the previous section. To derive the sort-merge join program, we need to replace the
processing of zero or one customer record in the ‘accounts’ program by the processing of zero, one or
many product records.
Start

Open parent and

child files

Read 1st record

from both files

Choose smaller
join key

True
Choose smaller At end of
Close both files
join key both files?

False
Stop

Child False
matches
join key?

True

Parent True
Process parent-
matches
child pair
join key

False
Read next parent
Read next child record
record

IDENTIFICATION DIVISION.
Program-ID. sortjoin.

* Joins Suppliers and Products using sort-merge.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Suppliers assign to "newsupp.ndx"
organization is indexed,
record key is Account of Suppliers
access is sequential.
select Unsorted-Products assign to "newprod.ndx"
organization is indexed,
record key is Item-No of Unsorted-Products
access is sequential.
select Products assign to "work.tmp".

66
Join Algorithms
DATA DIVISION.
File Section.
FD Unsorted-Products.
copy "product.cbl".
SD Products.
copy "product.cbl".
FD Suppliers.
copy "supplier.cbl".
Working-Storage Section.
77 Current-Supplier pic x(4).
77 Supplier-Status pic x.
77 Supplier-Exists pic x value "Y".
77 Supplier-Missing pic x value "N".
copy "editprod.cbl".
copy "editsupp.cbl".
The working-storage section contains ‘Current-Supplier’, which keeps track of the account code that
is being processed. It also contains a variable called ‘Supplier-Status’, which is used to keep track of
whether a supplier record exists for the current account number. Strictly speaking, it is redundant,
because we can check this condition by checking if ‘Account of Supplier’ equals ‘Current-Account’.
However, its use makes the program more readable. ‘Supplier-Status’ can have one of two values, ‘Y’
or ‘N’, referred to as ‘Supplier-Exists’ and ‘Supplier-Missing’.
The procedure division starts by sorting the product records into order by supplier account code. The
Suppliers file is already in this order because it is indexed by ‘Account’.
PROCEDURE DIVISION.
Process-All-Suppliers.
display "Sorting Products ..."
sort Products on ascending Supplier of Products,
using Unsorted-Products
output procedure Process-Sorted-Products
stop run.
The sort operation structures the Products file as a series of groups sharing the same account code.
The output procedure therefore needs two levels of loop: The outer loop deals with an account code;
the inner loop deals with individual products within each supplier group. The logic is a meld of the
‘suppgrp’ program, which finds the total cost of deliveries and total of the payments for each supplier,
and the ‘accounts’ program, which merges two files:
Process-Sorted-Products.
display "Joining Suppliers and Products ..."
display spaces
open input Suppliers
perform Get-Next-Supplier
perform Get-Next-Product
perform Choose-Current-Supplier
perform until Current-Supplier = high-values
perform Check-Supplier-Status
perform until Supplier of Products not = Current-Supplier
perform Process-One-Product
perform Get-Next-Product
end-perform
if Account of Suppliers = Current-Supplier
perform Get-Next-Supplier
end-if
perform Choose-Current-Supplier
end-perform
close Suppliers
display "Join complete.".

Choose-Current-Supplier.
if Supplier of Products < Account of Suppliers
move Supplier of Products to Current-Supplier
else
move Account of Suppliers to Current-Supplier
end-if.

67
Join Algorithms

The parent-child relationship means that the two files are not treated symmetrically. It is an error if a
child has no parent, but it is acceptable for a parent to have no child. Thus, the program forms a kind of
outer join, because products that do not match suppliers are listed, along with an error message.
The new feature here is that the inner loop allows there to be zero, one or many child records for each
parent record. If a product record is an orphan, the outer loop skips the refreshing read from the parent
file, because the account code in the supplier record area must already exceed ‘Current-Supplier’.
‘Check-Supplier-Status’ exists mainly to make the program easier to understand. Its true value will
be appreciated when we study updating in a later section:
Check-Supplier-Status.
if Account of Suppliers = Current-Supplier,
move Supplier-Exists to Supplier-Status
else
move Supplier-Missing to Supplier-Status
end-if.
We make use of Supplier-Status in Process-One-Product to test whether the product has a matching
supplier. We do not need to test if a supplier has a matching product, because Process-One-Product
would not be preformed. The inner loop would have zero iterations.
Process-One-Product.
perform Display-Product
if Supplier-Status = Supplier-Exists
perform Display-Supplier
else
display Current-Supplier " is not on the supplier file."
display spaces
end-if.
The rest is routine:
Display-Supplier.
copy "dispsupp.cbl".

Display-Product.
copy "dispprod.cbl".

Get-Next-Supplier.
read Suppliers next record,
at end
move high-values to Account of Suppliers
end-read.
Get-Next-Product.
return Products record,
at end
move high-values to Supplier of Products
end-return.
As in the merge program, the trick of using high-values to indicate the end of file works beautifully.
It is worth noting that the sort-merge method doesn’t need either file to be indexed, and a join could
be made on any field that has a unique value for each parent record.
The sort-merge method is often the fastest of the three. It guarantees to read each parent record
exactly once, even if there are several children that have the same parent. One the other hand, it must
read every parent record, even if no child matches it. It is therefore likely to be better than the random
access method if a high proportion of parent records are accessed, and to perform worse than the
random access method if a low proportion are accessed.

9.4 The Skip-Sequential Method

A fourth and final method is a hybrid of the random access and sort-merge methods. It accesses the
Suppliers file randomly as in the random-access join, but it also sorts the Products file, as in the sort-
merge join. Because the product records are sorted, it reads the Suppliers file in sequential order,
skipping over any records whose accounts don’t occur in the Products file. For that reason, it is called
the skip-sequential method. A supplier record is read once for each sorted group of product records, so
each Supplier record is read at most once, and some may not be read at all.

68
Join Algorithms

One of the advantages of the sort-merge method or the skip-sequential method is that we can exploit
grouping at the same time as making the join. We illustrate this by applying the skip-sequential
method to write a program that shows a typical use of the join operation; it tells Serv-U-Rite what
products need to be purchased from suppliers. Its output should look like this,
Product Purchase Orders by Supplier
-----------------------------------

Netherlands Electrical Supply

124 Burbridge Road
Hilton SA 5033
50 Philips Radio Cass. w. Matching Spkrs
50 (Subtotal for supplier)

Nippon Electronics Importers

19 Ashford Rd
Redfern NSW 2016
15 Canon Flatbed Scanner
15 Sony 500W CD Tuner+6x9" Spkrs+4 Ch. Amp.
20 Sony 160 Watt CD Tuner
10 Kenwood 120 Watt Radio Cassette
60 (Subtotal for supplier)

and so on ...

US Audio Imports
5 Penna Ave
Clayton VIC 3109
10 Audio-Gods Box Speakers
10 (Subtotal for supplier)

350 (Grand Total)

We can see that following the address of each supplier, there is a list of products to be ordered.
Following this list is a sub-total of the number of items to be ordered for that supplier. At the end of
the report is a grand total, which is the sum of all sub-totals. The supplier’s account code, which is the
basis of the join, does not actually appear on the report.
Except for the sort work file, the first three divisions are similar to the random access method:
IDENTIFICATION DIVISION.
Program-ID. reorder.

* Provides reorder details for products that need ordering.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Suppliers assign to "newsupp.ndx"
organization is indexed,
record key is Account of Suppliers
access is random.
select Unsorted-Products assign to "newprod.ndx"
organization is indexed,
record key is Item-No of Unsorted-Products
access is sequential.
select Products assign to "workfile.tmp".

DATA DIVISION.
File Section.
FD Unsorted-Products.
copy "product.cbl".
SD Products.
copy "product.cbl".
FD Suppliers.
copy "supplier.cbl".

69
Join Algorithms
Working-Storage Section.
77 Current-Supplier pic a999.
77 Supplier-Status pic x.
77 Supplier-Exists pic x value "Y".
77 Supplier-Missing pic x value "N".
77 Reorder-Qty-Subtotal pic 999999.
77 Reorder-Qty-Grand-Total pic 999999.
01 Report-Line.
02 Reorder-Qty pic zzz,zz9.
02 pic x value space.
02 Description pic x(40).
The procedure division begins by sorting the Products file:
PROCEDURE DIVISION.
Process-All-Suppliers.
sort Products on ascending Supplier of Products
using Unsorted-Products
output procedure Process-Sorted-Products
stop run.
It retains the nested loop structure of the sort-merge algorithm, but the Supplier record is read (within
‘Start-One-Supplier’) at the start of each group.
Process-Sorted-Products.
perform Start-All-Suppliers
open input Suppliers
perform Get-Next-Product
perform until Supplier of Products = high-values
move Supplier of Products to Current-Supplier
perform Start-One-Supplier
perform until Supplier of Products not = Current-Supplier
perform Process-One-Product
perform Get-Next-Product
end-perform
perform End-One-Supplier
end-perform
close Suppliers
perform End-All-Suppliers.
‘Start-All-Suppliers’ displays the heading, and clears the final total:
Start-All-Suppliers.
display "Product Purchase Orders by Supplier"
display "-----------------------------------"
display spaces
move zero to Reorder-Qty-Grand-Total.
‘Start-One-Supplier’ reads the matching supplier record in the same way as the random access
method. It then displays the supplier’s address and clears the sub-total.
Start-One-Supplier.
move Supplier of Products to Account of Suppliers
read Suppliers
invalid key
move Supplier-Missing to Supplier-Status
not invalid key
move Supplier-Exists to Supplier-Status
end-read.
if Supplier-Status = Supplier-Exists
display Name of Suppliers
display Street of Suppliers
display Suburb of Suppliers
else
display Current-Supplier " is not on the supplier file."
end-if
move zero to Reorder-Qty-Subtotal.
Processing a product consists of testing if the number of items in stock plus the number on order is
below the reorder level. If so, the required reorder quantity and the description of the product are
displayed and the sub-total for the current supplier is updated:

70
Join Algorithms
Process-One-Product.
if Stock of Products + On-Order of Products
< Reorder-Level of Products
move Reorder-Qty of Products
to Reorder-Qty of Report-Line
move Description of Products
to Description of Report-Line
display Report-Line
add Reorder-Qty of Products to Reorder-Qty-Subtotal
end-if.
At the end of each group of products, the sub-total is displayed, then added to the final total:
End-One-Supplier.
move "(Subtotal for supplier)"
to Description of Report-Line
move Reorder-Qty-Subtotal to Reorder-Qty of Report-Line
display Report-Line
display spaces
add Reorder-Qty-Subtotal to Reorder-Qty-Grand-Total.
At the end of file, the final total is displayed:
End-All-Suppliers.
move "(Grand Total)" to Description of Report-Line
move Reorder-Qty-Grand-Total to Reorder-Qty of Report-Line
display Report-Line.
Finally, we meet an old friend:
Get-Next-Product.
return Products record,
at end
move high-values to Supplier of Products
end-return.
We can see that the skip-sequential method is efficient both when a high proportion of supplier
records are accessed and when a low proportion are accessed. In the first case it behaves like the sort-
merge method. In the second case it behaves roughly like the random access method. Although
sorting the Products file adds some overhead, this will almost certainly be rewarded by a more orderly
access to the Suppliers file, reducing seek time.
This particular use of the skip-sequential method has the weakness that it accesses suppliers and
displays their addresses even if none of their products need to be reordered. This can be corrected by
selecting the required product records before the join is made:
Get-Next-Product.
perform with test after
until Supplier of Products = high-values
or Stock of Products + On-Order of Products
< Reorder-Level of Products
return Products,
at end
move high-values to Supplier of Products
end-return
end-perform.
The if statement in Process-One-Product is then redundant.

9.5 Comparing the Methods

We have now considered four ways of forming a join: nested loops, random access, sort-merge, and
skip-sequential. Which method is the most efficient? Let us estimate the time each method would take
to join the Products and Suppliers files.
We know that a supplier record occupies 102 bytes, so 5 records fit into one sector. Likewise, a
product record occupies 80 bytes, so 6 records fit into a sector. For purposes of calculation, we shall
assume that the remaining space in each sector is wasted. We shall also assume that the Products file
contains 6,000 records and the Suppliers file contains 500 records. Consequently, the Products file
occupies 1,000 sectors and the Suppliers file occupies 100 sectors. We shall further assume that the
files have blocks of one sector, so sectors are read one at a time, and it takes an average of 10ms to
read each sector.

71
Join Algorithms

We can estimate the time taken by the nested-loops method as follows: The Suppliers file is read
once. This should take 100¥10ms = 1 second. The Products file can be read once in 1,000¥10ms =
10!seconds, but it must be read many times: once for each supplier. Since there are 500 suppliers,
5,000 seconds will be spent reading the Products file, so the total time taken is 5,001 seconds — over
1!hour and 23!minutes.
It is worth asking what would happen if the order of the nested loops were reversed. At first sight, we
might expect there to be no change in performance. With 500 suppliers and 6,000 products, there are
bound to be 3,000,000 reads in the inner loop, whichever order the loops are nested. Even so, there is
a difference: Reading the Products file once takes 10 seconds, and reading the Suppliers file 6,000
times takes 6,000 seconds, so the total is 6,010 seconds. Our first choice was better because there are
more product records than supplier records per sector. The question is how many blocks have to be
read, not how many records.
Using the same assumptions, how long should the random-access method take? The Products file is
read sequentially, taking 10 seconds, as before. The Suppliers file is read randomly, once for each
product record. If we assume that there is negligible chance of two successive supplier records lying
in the same sector, each access to the Suppliers file will take 10ms, so the total time spent accessing it
will be 6,000¥10ms = 60 seconds. The total access time, including 10 seconds for the Products file, is
70!seconds.
To estimate the time taken by the sort-merge method, we need to know how long it takes to sort the
Products file. Since we have not yet discussed how sorting is done, we need to take a certain amount
on trust at this stage. We will assume that the unsorted Products file is copied to the work file, then the
work file records are read by the sort output procedure. We assume that all these transfers are made
one sector at a time. The time taken to sort the Products file is therefore 10 seconds to read it, 15
seconds to write it to the work file (including read-after-write verification), and 10 seconds to read the
work file: 35 seconds in all. Since the Suppliers file is read sequentially, the total access time is
36!seconds.
The skip-sequential method is likely to take exactly the same time. It would be faster only if one or
more whole sectors of the Suppliers file were skipped, which is most improbable in this case.
We therefore have the following estimates:
Nested-loops 5,001 seconds
Random-access 70 seconds
Sort-merge 36 seconds
Skip-sequential 36 seconds
It would be wrong to conclude form this that the sort-merge method is always the best choice. First,
although the nested-loops method is certainly the least efficient, it can make many-to-many joins,
while the other methods can only make one-to-one or one-to-many joins. Consequently, nested-loops
is the method of last resort, being used only when more efficient methods cannot.
Second, before rushing to premature conclusions about the other three methods, consider what
happens if the Suppliers file is 100 times bigger. The random-access method is unaffected by this
change, but since it now takes 100 seconds to read all 10,000 sectors of the Suppliers file sequentially,
the sort-merge method takes 135!seconds. The skip-sequential method cannot possibly need to read
more than 6,000 sectors from the Suppliers file, so it takes at most 95 seconds. The sort-merge method
moves from equal first to third place, and the random-access method moves from third to first place.
The nested-loops method maintains its fourth place, taking over 5 days.

9.6 The Break-Even Equation

What determines the relative merits of the sort-merge and random-access methods is the hit rate, the
proportion of supplier records that have to be accessed. If many of them are accessed, sequential
access is faster; if few are accessed, random access is faster. Intuition suggests that if each supplier
record is accessed more than once, the sort-merge method will be faster. This intuition is wrong, as the
following argument shows.
Files are organised in blocks, consisting of one or more sectors. When a file is read, a whole block is
transferred at a time. We assume the parent file contains Bp blocks, and the child file Rc records. We
ignore the time to read or sort the child file, which is much the same in each method, and concentrate
entirely on the accesses to the parent file.
72
Join Algorithms

To a close approximation, the random access method will read Rc blocks of this file (one per child
record), whereas the sort-merge method must read all Bp blocks (which is constant). Random access is
therefore likely to be faster when Rc < Bp, and sequential access is likely to be faster when R c > Bp. If
Rc = Bp, this is called the break-even point.

Break-even occurs when the number of child records equals the number of parent blocks.
Since records are typically much smaller than blocks, at the break-even point the child file is typically
much smaller than the parent file, so this justifies ignoring the time taken to read it or sort it.
We may test this rule in the case of the Products and Suppliers files. Since the Suppliers file contains
100 blocks, at the break-even point the Products file contains 100 records, or 14 blocks. Reading the
Products file then takes 0.14 seconds, and sorting it takes 0.49 seconds. At the break-even point, read-
ing the Suppliers file randomly or sequentially takes 1 second. Therefore, the random-access method
takes 1.14 seconds, and the sort-merge method takes 1.49 seconds, so the approximation is close, but
not exact. Actually, with so few records, the whole Products file would occupy less than 7KB. In
these circumstances the Cobol sort algorithm would almost certainly discard the work file, and sort the
Products file in main memory. The two methods would then have exactly the same performance.
If the hit rate is so low that no parent record is accessed twice, it equals Rc ÷ Rp, where Rp is the
number of records in the parent file. By dividing the equation Rc = Bp by Rp, we get Rc ÷ Rp = Bp ÷ Rp.
The ratio Rp ÷ B p is the number of records per block of the parent file, or its blocking factor. This
suggests the following alternative rule:
Break-even occurs when the hit rate equals the inverse of the parent file blocking factor.
Since the Suppliers file has a blocking factor of 5 records per sector, the break-even hit rate must be
0.2, or 20%. Since it contains 500 records, this hit-rate occurs when there are 100 product records, as
before.
This second formulation of the break-even point stresses that increasing the blocking factor will
reduce the hit rate needed to reach break-even. The more we increase the blocking factor of the parent
file, the less blocks it will have, the less seeks will occur, and the faster the sequential method will
become. For this reason, files are often written in blocks of many sectors. What limits block size is
either the length of a track on disk, or the amount of main memory that can be set aside to buffer a
block. Transfer time usually is only a small fraction of total access time, so larger blocks do not slow
random accesses much.
Usually the choice of join method is clear cut. If there is no parent-child relationship, nested-loops is
the only choice, If there is, the hit-rate will usually determine that random-access or sort-merge is the
clear winner. For example, the parent file may have a blocking factor of 20, with a break-even hit-rate
of 5%, but the actual hit rate may be 50%. In the rare case the two methods take much the same time,
it usually doesn’t matter which we choose. In the even rarer case that the choice is critical, there is
really no substitute for experiment. The skip-sequential method is always a safe choice: It mimics the
random-access method when the hit-rate is low, and it mimics the sort-merge method when it is high.

9.7 The Effect of Caching

When one file to be joined is small enough to fit into main memory, we can dramatically improve the
speed of the random-access method. We arrange to read all the supplier records into an internal table.
Then each product record can refer to the table instead of the file. With this approach, both the
Suppliers file and the Products file are read sequentially. Because the Products file does not need
sorting, this sometimes makes the new method slightly faster than the sort-merge method. The penalty
is that it takes time to look up the required supplier records in the table, but since this occurs in main
memory, it has no impact on the elapsed time.
Both files are used in sequential access mode:

73
Join Algorithms
IDENTIFICATION DIVISION.
Program-ID. lkupjoin.

* Join the product and supplier files using a look-up table.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Suppliers assign to "newsupp.ndx",
organization is indexed,
record key is Account of Suppliers,
access is sequential.
select Products assign to "newprod.ndx",
organization is indexed,
record key is Item-No of Products,
access is sequential.

DATA DIVISION.
File Section.
FD Suppliers.
copy "supplier.cbl".
FD Products.
copy "product.cbl".
Working-Storage Section.
01 Supplier-Table.
02 Supplier-Entry occurs 1000 times.
copy "supplier.cbl" replacing 01 by 03, 02 by 04, 03 by 05.
copy "editprod.cbl".
77 No-Of-Suppliers pic 9999.
77 Item pic 9999.
77 Least pic 9999.
77 Most pic 9999.
copy "editsupp.cbl".
An array of 1,000 entries is declared in the working storage, along with a few auxiliary variables.
Each entry has the same format as a supplier record. Unfortunately, the level numbers within the
copied text need to be adjusted in this context, so we use ‘copy … replacing …’. This makes a
textual replacement of ‘01’ by ‘03’, ‘02’ by ‘04’ and ‘03’ by ‘05’. We need to be careful. Replacing
‘1’ by ‘3’, ‘2’ by ‘4’ and ‘3’ by ‘5’ could have a disastrous side-effect: pic!x(30) could be changed to
pic!x(50). This is probably the best argument for writing ‘01’ instead of ‘1’, etc.
Cobol arrays are always numbered from 1 onwards, so the elements have indices from 1 to 1,000:
The usual read loop is preceded by loading the Suppliers file into working storage:
PROCEDURE DIVISION.
Join-Suppliers-and-Products.
perform Load-Look-Up-Table
open input Products
perform Get-Next-Product
perform until Item-No of Products = high-values
perform Process-One-Product
perform Get-Next-Product
end-perform
close Products
stop run.
Loading the array requires another read loop. At the end of the loop the supplier records lie in
locations 1 to ‘No-Of-Suppliers’. If there are more than 1,000 supplier records, the program fails.
Because an indexed sequential file is stored in order of its primary key, the resulting table is in account
code order:

74
Join Algorithms
Load-Look-Up-Table.
open input Suppliers
move zero to No-Of-Suppliers
perform Get-Next-Supplier
perform until Account of Suppliers = high-values
add 1 to No-Of-Suppliers
move Supplier of Suppliers
to Supplier of Supplier-Entry (No-Of-Suppliers)
perform Get-Next-Supplier
end-perform
close Suppliers.
‘Process-One-Product’ displays the product record, then attempts to find the matching supplier by
executing ‘Binary-Search’.
Process-One-Product.
perform Display-Product
perform Binary-Search
if Least > Most
display Supplier of Products,
" is not on the supplier file."
display spaces
else
perform Display-Supplier
end-if.

Binary-Search.
move 1 to Least
move No-Of-Suppliers to Most
compute Item = (Least + Most) / 2
perform until Least > Most or
Account of Supplier-Entry (Item) = Supplier of Products
if Account of Supplier-Entry (Item) < Supplier of Products
add 1, Item giving Least
else
if Account of Supplier-Entry (Item)
> Supplier of Products
subtract 1 from Item giving Most
end-if
end-if
compute Item = (Least + Most) / 2
end-perform.
In case you are not already familiar with binary search, here is how it works: ‘Least’ and ‘Most’ store
the lowest and highest positions of the table between which the desired supplier record might lie.
Equally, the record can neither lie below ‘Least’ nor above ‘Most’. The search progresses by bringing
‘Least’ and ‘Most’ closer together, narrowing the interval containing the desired account. To do this, it
tests an element, ‘Item’, within the range ‘Least’ to ‘Most’. Any element would do, but the one at the
mid-point between ‘Least’ and ‘Most’ is best because this halves the interval. If the account number of
the mid-point element is less than the desired account, the position above it is the lowest where the
desired element can lie. If it is greater than the desired account, the position below it is the highest
where the desired element can lie. If it is equal to the desired account, the proper supplier has been
found. If the account number of the product is not in the table at all, the values of ‘Least’ and ‘Most’
will eventually cross.
The method is efficient because each iteration halves the area of the table where the desired element
might lie. We assume there are 500 supplier records, as before. 500 can be halved only 9 times before
reaching 1, so the search makes at most 9 iterations even on an unsuccessful search. A simple linear
search of the table taking each element in turn would take an average of 250 iterations to find an
element, and 500 on an unsuccessful search. In general, the number of iterations in ‘Binary-Search’
increases only with the logarithm of the size of the array.
Cobol actually provides a verb, ‘search … all …’, that allows a binary search to be written as a single
statement. Unfortunately, explaining how to use it here would involve a long digression into features
of Cobol that are otherwise irrelevant to us. But it’s nice to know it’s there!

75
Join Algorithms

The rest is almost routine:

Display-Supplier.
copy "dispsupp.cbl"
replacing Suppliers by Supplier-Entry (Item).

Display-Product.
copy "dispprod.cbl".

Get-Next-Supplier.
read Suppliers record
at end
move high-values to Account of Suppliers
end-read.

Get-Next-Product.
read Products record
at end
move high-values to Item-No of Products
end-read.
Since the supplier record that should be displayed is in the table, not the Suppliers file, copying
‘dispsupp.cbl’ will not work correctly unless the word ‘Suppliers’ is replaced by ‘Supplier-
Entry!(Item)’, so again we must use ‘copy … replacing …’.
Giving the table space for 1,000 supplier records is generous compared with our test population.
What would happen if the file were much bigger? We could increase the number of entries in the
table. Many compilers refuse to create tables above a certain size, for example, 65,536 (216) entries.
(The supplier table would then occupy over 6MB.) Some compilers allow bigger tables, but it is
important to avoid using virtual memory. Virtual memory is implemented by using secondary storage
to make RAM look bigger than it is. Using virtual memory would mean that at least part of the array
was really on disk, and efficiency would suffer.
The table is effectively a local cache for the Suppliers file. Operating systems and disk drives often
provide caches to store recently used records. The program uses the table a bit more intelligently than
an operating system uses a cache, because the operating system might discard a supplier record in
preference to a more recently used product record. This is not a good idea, because product records are
only used once, but the operating system cannot know this.
Even so, if the Suppliers file is actually smaller than the operating system’s disk cache, the random-
access method is likely to have much the same performance as using an internal table. Although the
program would access each supplier record many times, after the first access the record would usually
be fetched from the cache rather than the disk. In other words, the effort of writing the ‘lkupjoin’
program could have been saved simply by making the operating system’s cache big enough, or perhaps
by requesting enough buffer space in some other way.
On the other hand, if the parent file is much bigger than the size of RAM, neither a table or a cache
can be expected to work well. For example, if the file is 5 times bigger than the cache, we might
expect 1 access in 5 to hit the cache, and 4 accesses in 5 to have to read from disk. In theory, the cache
would only reduce access time by 20%.
In practice, the system might perform better than this, because of what is known as the 80-20 law.
This ‘law’ says that 80% of the accesses occur to the active 20% of the records, and 20% of the
accesses occur to the inactive 80% of the records. If we assume this law applies to the Suppliers file,
we would expect 80% of the cache to contain active records and 20% to contain inactive records, in
proportion to the numbers of accesses. Assuming that the file is 5 times bigger than the cache, as
before, 80% of the cache would hold 80% of the active records, but the remaining 20% of the cache
could hold only 5% of the inactive records. Thus 4 accesses in 5 would have 80% chance of hitting the
cache, and the remaining 1 in 5 would have a 5% chance. Overall, 65% of accesses would hit the
cache and only 35% would need to read from disk. The cache might reduce access time by 65%.
The 80-20 law is claimed to be recursive, so that 80% of 80% of the accesses occur to 20% of 20% of
the records, and so on. In other words, 64% of the accesses occur to 4% of the records. We can
virtually guarantee that these records will remain in the cache at all times, implying that only about
20% of accesses would result in disk activity. In this particular case the cache would make the random-
access method five times faster.

76
Join Algorithms

These statistical complications make it difficult to estimate the effect of a cache on actual data,
especially if the operating system shares the same cache space between many programs. However, it is
clear that the bigger the file is compared the cache, the less effective the cache can be. Test files are
likely to be so small that caching will obscure any performance differences between the methods. If
we attempted to measure the number of accesses used by each method, we would almost certainly get
silly answers. Even the same program might give better results when it was run a second time. Only
realistic data can give reliable results.
Since the table look-up program’s performance is limited by disk access, it actually makes little
difference whether its internal search is efficient. If we had used a linear search to find the matching
supplier, the program would have been analogous to the nested-loops method. Indeed, a similar
internal table could also be used to speed up the many-to-many join of Suppliers and Customers on
‘Suburb’ discussed earlier. The only restriction is that one of the files to be joined should fit into main
memory. The same consideration applies to a cache: If the cache was bigger than the whole Suppliers
file, the nested-loops method and the random-access methods would have virtually the same
performance.
A development of this idea can be used to implement a fast version of the nested-loops method that
works even if one file is too big to fit into main memory. This time there are three loops. In the outer
loop, the program reads as much of the Suppliers file as will fit into its internal table. In the two inner
loops it reads a record from the Products file and forms its join with all the records in the table. It then
reads the next product record and joins it with the records in the table, and so on, until the Products file
is exhausted. It then returns to the outer loop, and fills the table from the next section of the Suppliers
file. The table is joined with the Products file as before, and the process is repeated until the whole
Suppliers file has been read.
In the case of equi-joins, the same trick can be combined with the sort-merge method. Suppose we
want to join the Customers and Suppliers files on ‘Suburb’, as before. If our program sorts both files
on Suburb, it will group all records with the same suburb together, and each file will consist of a series
of suburb groups. The join can only contain records in matching groups. We read the first suburb
group from the Suppliers file into a table, then join it with the records from the Customers file that
belong to the same suburb group. We then read the next suburb group from the Suppliers file into the
table and join it with the matching group from the Customers file, and so on, until all the suburbs have
been processed. If it should happen that there are so many suppliers in one suburb that the table is too
small to hold them, the program must read as many as it can, and read through the corresponding
customer records more than once. If this is to be done efficiently, the program must be able to return
to the start of the current group in the file and read it again.

77
Updating

10 Updating
Updating a file means changing the data stored in its records. Updating can occur on a periodic basis,
or it can occur as the result of transactions. In some cases, new records can be added to the file, or
existing records can be deleted. Combining these options with different access modes or different join
algorithms leads to a wide range of possibilities. We shall start with the simplest.
Earlier, we saw how the skip-sequential join could be used to tell Serv-U-Rite what products needed
reordering. Let us suppose that Serv-U-Rite followed these recommendations exactly, so the products
are now on order. We therefore need to update the On-Order amounts in the Products file. We can do
this by creating a new copy of the file or by altering the existing copy.

10.1 Copy Mode

When we update a file by copying, we usually first rename the existing file. So ‘newprod.ndx’ would
be renamed as ‘oldprod.ndx’. Then the update creates a new copy with the name ‘newprod.ndx’, so
that it will become the one used by the programs that we have already discussed, such as ‘listprod’.
IDENTIFICATION DIVISION.
Program-ID. copymode.
* Copy the products file updating stock on order.

DATA DIVISION.
File Section.
FD Old-Products.
copy "product.cbl".
FD Products.
copy "product.cbl".
Working-Storage Section.
copy "editprod.cbl".
The procedure division starts in the familiar fashion:
PROCEDURE DIVISION.
Process-All-Products.
display "Updating and copying product records ..."
display spaces
open input Old-Products, output Products
perform Get-Next-Product
perform until Item-No of Old-Products = high-values
perform Process-One-Product
perform Get-Next-Product
end-perform
close Old-Products, Products
display "Update complete."
stop run.
Process-One-Product must copy the record in the input area to the output record area, modify it as
necessary, then write it to the new copy of the file. It also displays the modified record. Depending on
the Cobol run-time system, it may be important to display the record before writing it, because the
contents of the output area are undefined after the write statement.

78
Updating
Process-One-Product.
move Product of Old-Products to Product of Products
if Stock of Products + On-Order of Products
< Reorder-Level of Products
add Reorder-Qty of Products to On-Order of Products
end-if
perform Display-Product
write Product of Products
invalid key
display "Write error on above record."
stop run
end-write.
There are two ways an error could occur on writing a record: there is a logic error in the program, or
there is something wrong with the hardware. Either way, the wisest thing is to stop the program.
The rest is routine:
Get-Next-Product.
read Old-Products next record
at end
move high-values to Item-No of Old-Products
end-read.

Display-Product.
copy "dispprod.cbl".

10.2 In Situ Updating

In situ updating is just a fancy way of saying ‘updating in place’. In this method, records that need
updating are overwritten by new data. The other records are not touched at all. However, it is always
wise to make a back up copy of a file before updating it in situ, in case the update program fails.
The first three divisions of the program are straightforward. The Products file is read sequentially, as
there is no other way to find out which records need to be updated in this case.
IDENTIFICATION DIVISION.
Program-ID. insitu.
* Updates product on order amounts by over-writing.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Products assign to "newprod.ndx"
organization is indexed,
record key is Item-No of Products
access is sequential.

DATA DIVISION.
File Section.
FD Products.
copy "product.cbl".
Working-Storage Section.
copy "editprod.cbl".
The procedure division has the usual form, except Products is opened in i-o (input-output) mode:
PROCEDURE DIVISION.
Process-All-Products.
display "Updating and copying product records ..."
display spaces
open i-o Products
perform Get-Next-Product
perform until Item-No of Products = high-values
perform Process-One-Product
perform Get-Next-Product
end-perform
close Products
display "Update complete."
stop run.

79
Updating

If a product should be placed on order, its record is updated, then the record is rewritten to the file,
using rewrite. (A write statement would attempt to create a new record.)
Process-One-Product.
if Stock of Products + On-Order of Products
< Reorder-Level of Products
add Reorder-Qty of Products to On-Order of Products
perform Display-Product
rewrite Product of Products
invalid key
display "Rewrite error on above record."
stop run
end-rewrite
end-if.
The rest is familiar territory:
Get-Next-Product.
read Products next record
at end
move high-values to Item-No of Products
end-read.

Display-Product.
copy "dispprod.cbl".

10.3 Updating from Transactions

A very common application is one that updates a master file in response to a batch of recorded
transactions. This scheme was explained earlier in a run diagram, repeated above. For example, a
Delivery transaction (3) should increase the stock on hand of the product concerned, decrease the
number of items on order, update the valuation of the stock, and increase the balance owed to the
supplier. The product file (9) is updated first (4), producing a stock report (7). Then the supplier file
(10) is updated (6), and another report (8) is produced:
Blue Waters Nominees B007
1 Francis Road
Paramatta NSW 2150
Opening balance $120,459.56
Closing balance $120,459.56

Netherlands Electrical Supply N001

124 Burbridge Road
Hilton SA 5033
Opening balance $127.99
01/02/08 50 PLPDCD Philips Dtchble Radio Cass. w. CD Chngr $14,950.00 $15,077.99
Closing balance $15,077.99

Nippon Electronics Importers N002

19 Ashford Rd
Redfern NSW 2016
Opening balance $2,790.46
01/02/06 Payment $2,790.46 0.00
01/02/07 Payment $50.00 $50.00 CR
Closing balance $50.00 CR
and so on ...

80
Updating

The above report shows the financial activity for each supplier: There is no activity for account
B007, which is currently owed $120,459.56. Supplier N001 has supplied 50 products with Item-No
PLPDCD, costing a total of $14,950.00. This has increased the balance owing to them from $127.99
to $15077.99. Supplier N002 has received a payment of $2,790.46, bringing their balance owing to
zero, then an overpayment of $50.00, resulting in a credit of $50.00.
In order to produce this report, the two programs must be connected by a transfer file. This file
contains two kinds of record:
01 Delivery.
02 Time-Stamp.
03 YY-MM-DD pic 9(6).
03 HH-MM-SS pic 9(6).
02 Kind pic x.
02 Account pic a999.
02 Item-No pic x(6).
02 Qty-Delivered pic 9999.
02 Cost pic 9(6)v99.
02 Description pic x(40).
01 Payment.
02 Time-Stamp-2.
03 YY-MM-DD-2 pic 9(6).
03 HH-MM-SS-2 pic 9(6).
02 Kind-2 pic x.
02 Account-2 pic a999.
02 Amount pic 9(6)v99.
The ‘Delivery’ record is similar to that in the Updates file, but it also includes ‘Description’. When
the product file is updated, the description of the product is copied into the transfer record, so that it
can appear on the supplier report. (This is called extending a transaction.) The ‘Payment’ record is
essentially the same as in the Updates file, except that it no longer needs a dummy Item-No.
Creating the transfer file is essentially a matter of joining the Products and Updates files. However,
the program must also update the product records and display a report. The report shows each delivery
transaction, followed by the updated product record:
Updating Products ...

01/02/06 14:55:14, Delivery: ALTCIU P004 10 $100.00

Item: ALTCIU, delivery exceeds No. on order. Ignored.
ALTCIU Altec Caller ID Unit
Acct ROL ROQ Ordr Stck Price ea Valuation
P004 5 10 12 $12.00 $84.33

01/02/06 14:57:19, Delivery: ALTPCD P004 30 $900.00

ALTPCD Altec Portable CD Including Headphones
Acct ROL ROQ Ordr Stck Price ea Valuation
P004 10 30 33 $37.00 $975.00

01/02/06 14:58:44, Delivery: PLPDCD N001 50 $14,950.00

PLPDCD Philips Dtchble Radio Cass. w. CD Chngr
Acct ROL ROQ Ordr Stck Price ea Valuation
N001 20 50 63 $340.50 $17,664.57

01/02/06 15:00:02, Delivery: PLPRCM N001 50 $4,550.00

Item: PLPRCM, delivery exceeds No. on order. Ignored.
PLPRCM Philips Radio Cass. w. Matching Spkrs
Acct ROL ROQ Ordr Stck Price ea Valuation
N001 20 50 5 $98.50 $403.29

and so on ...

81
Updating

10.4 Random Access Updating

The most straightforward way to update the Products file is to use a random access join, and in situ
updating. The environment division therefore specifies that the products file should be used in random
access mode:
IDENTIFICATION DIVISION.
Program-ID. raupdate.
* Updates Products file from Updates file using random access.
ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Products assign to "newprod.ndx"
organization is indexed,
record key is Item-No of Products
access is random.
select Updates assign to "updates.seq"
organization is sequential.
select optional Transfers assign to "transfer.seq"
organization is sequential.
The data division declares ‘Current-Item-No’ which keeps track of the item number of the product to
be updated, and ‘Product-Status’ which records whether such a product is on file:
DATA DIVISION.
File Section.
FD Updates.
copy "update.cbl".
FD Products.
copy "product.cbl".
FD Transfers.
copy "transfer.cbl".
Working-Storage Section.
77 Current-Item-No pic x(6).
77 Product-Status pic x.
77 Product-Exists pic x value "Y".
77 Product-Missing pic x value "N".
copy "editupd8.cbl".
copy "editprod.cbl".
copy "constant.cbl".
The Products file is opened in i-o mode so it can be updated in situ. The Transfers file is opened in
extend mode. This allows records to be added to the end of an existing file. The optional clause in
the environment division allows a new file to be created if none exists. The main loop of the program
finds the existing status of the product record, updates it, then records its new status:

PROCEDURE DIVISION.
Process-All-Products.
display "Updating Products ..."
display spaces
open input Updates, i-o Products, extend Transfers
perform Get-Next-Update
move Item-No of Updates to Current-Item-No
perform until Current-Item-No = high-values
perform Get-Product-Status
perform Process-One-Update
perform Record-Product-Status
perform Get-Next-Update
move Item-No of Updates to Current-Item-No
end-perform
close Updates, Products, Transfers
display "Update complete."
stop run.
The Updates file is read sequentially in the usual way:
Get-Next-Update.
read Updates next record,
at end
move high-values to Item-No of Updates
end-read.

82
Updating

Finding the initial status of the product record means moving ‘Current-Item-No’ to the record area,
then attempting to read the matching product. However, Payment transactions have low-valued item
numbers, so the program must avoid trying to read products for them:
Get-Product-Status.
if Current-Item-No not = Dummy-Item-No
move Current-Item-No to Item-No of Products
read Products record,
invalid key
move Product-Missing to Product-Status
not invalid
move Product-Exists to Product-Status
end-read
else
move Product-Missing to Product-Status
end-if.
The processing of an update transaction depends on the kind of transaction and ‘Product-Status’.
There are three cases to consider: a delivery for a product that exists, a payment, and a delivery for a
non-existent product. In the first case, there are two potential error conditions that need to be checked.
Serv-U-Rite have deemed that they should be processed as follows. If the number of items delivered
exceeds the number on order, the delivery transaction is rejected. If the account of the supplier from
whom the delivery is received is different from the supplier account in the product record, then the
condition is flagged, but the delivery is accepted.
Process-One-Update.
evaluate Kind of Updates also Product-Status
when Delivery-Code also Product-Exists
perform Display-Update
if Qty-Delivered of Updates > On-Order of Products
display "More delivered than ordered. Ignored."
else
if Account of Updates not = Supplier of Products
display "Supplier is not the expected one."
end-if
add Qty-Delivered of Updates to Stock of Products
subtract Qty-Delivered of Updates
from On-Order of Products
add Cost of Updates to Valuation of Products
move Time-Stamp of Updates to Time-Stamp of Transfers
move Kind of Updates to Kind of Transfers
move Account of Updates to Account of Transfers
move Item-No of Updates to Item-No of Transfers
move Qty-Delivered of Updates to Qty-Delivered of Transfers
move Cost of Updates to Cost of Transfers
move Description of Products
to Description of Transfers
write Delivery of Transfers
end-if
perform Display-Product
when Payment-Code also any
move Time-Stamp of Updates to Time-Stamp of Transfers
move Kind of Updates to Kind of Transfers
move Account of Updates to Account of Transfers
move Amount of Updates to Amount of Transfers
write Payment of Transfers
when any also Product-Missing
perform Display-Update
display "Product update ignored. "
Current-Item-No " is not on file."
end-evaluate.
Apart from updating and displaying the product record in response to valid deliveries, it is also
necessary to write transfer records for deliveries and payments.

83
Updating

Finally, recording the updated product requires a rewrite—but only if the product exists:
Record-Product-Status.
if Product-Status = Product-Exists
rewrite Product of Products
invalid key
display "Error rewriting Product record."
perform Dump-and-Quit
end-rewrite
end-if.

Display-Update.
copy "dispupd8.cbl".

Display-Product.
copy "dispprod.cbl".
Cobol requires an invalid key clause to be specified here. Since there is no way it should ever be
invoked, it is all the more important to deal with it carefully:
Dump-and-Quit.
display "Current-Item-No: ", Current-Item-No,
", Product-Status: ", Product-Status,
display "Product record: "
perform Display-Product
display "Update record: "
perform Display-Update
stop run.
Like the random access join, this program only reads product records that are updated. It is therefore
efficient if the fraction of updated records is low.

10.5 Sequential Access Updating

If the fraction of updated products is higher, it pays to use the sort-merge method of forming the join.
This can be combined with copy mode updating, or in situ updating. Here we use copy mode.
IDENTIFICATION DIVISION.
Program-ID. smupdate.
* Joins Products and Updates using sort-merge.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Old-Products assign to "oldprod.ndx"
organization is indexed,
record key is Item-No of Old-Products
access is sequential.
select Products assign to "newprod.ndx"
organization is indexed,
record key is Item-No of Products
access is sequential.
select Unsorted-Updates assign to "updates.seq"
organization is sequential.
select Updates assign to "work.tmp".
select optional Transfers assign to "transfer.seq"
organization is sequential.
‘Old-Products’ is the original version of the Products file, and ‘Products’ is the new updated version.

84
Updating

There are no surprises in the data division:

DATA DIVISION.
File Section.
FD Unsorted-Updates.
copy "update.cbl".
SD Updates.
copy "update.cbl".
FD Old-Products.
copy "product.cbl".
FD Products.
copy "product.cbl".
FD Transfers.
copy "transfer.cbl".

Working-Storage Section.
77 Current-Item-No pic x(6).
77 Product-Status pic x.
77 Product-Exists pic x value "Y".
77 Product-Missing pic x value "N".
copy "editupd8.cbl".
copy "editprod.cbl".
copy "constant.cbl".
The procedure division uses a sort output procedure:
PROCEDURE DIVISION.
Process-All-Products.
display "Sorting Updates ..."
sort Updates on ascending Item-No of Updates,
Time-Stamp of Updates
using Unsorted-Updates
output procedure Process-Sorted-Updates
stop run.
Since the sort operation will place all the low-valued records at the start of the sequence, the program
can deal with them before the main loop. Sorting also groups all the deliveries for the same product.
Therefore the program has two levels of loop. The sort key includes ‘Time-Stamp’. This means the
transactions for each group will be applied in the intended order.
Process-Sorted-Updates.
display "Updating Products ..."
display spaces
open input Old-Products, output Products, extend Transfers
perform Get-Next-Update
perform until Item-No in Updates not = Dummy-Item-No
perform Process-One-Update
perform Get-Next-Update
end-perform
perform Get-Next-Product
perform Choose-Current-Item-No
perform until Current-Item-No = high-values
perform Get-Product-Status
perform until Item-No of Updates not = Current-Item-No
perform Process-One-Update
perform Get-Next-Update
end-perform
perform Record-Product-Status
if Item-No of Old-Products = Current-Item-No
perform Get-Next-Product
end-if
perform Choose-Current-Item-No
end-perform
close Old-Products, Products, Transfers
display "Update complete.".

85
Updating

The next few paragraphs are exactly the same as in the sort-merge join:
Choose-Current-Item-No.
if Item-No of Updates < Item-No of Old-Products
move Item-No of Updates to Current-Item-No
else
move Item-No of Old-Products to Current-Item-No
end-if.

Get-Next-Product.
read Old-Products next record,
at end
move high-values to Item-No of Old-Products
end-read.

Get-Next-Update.
return Updates record,
at end
move high-values to Item-No of Updates
end-return.

Get-Product-Status.
if Item-No of Old-Products = Current-Item-No,
move Product-Exists to Product-Status
move Product of Old-Products to Product of Products
else
move Product-Missing to Product-Status
end-if.
The text of ‘Process-One-Update’ is exactly the same as in the random access method just considered.
This is because we have been careful to separate the input-output logic from the business logic—a
really important idea. In a real-world application program, ‘Process-One-Update’ would deal with
many more cases than this example, whereas the input-output logic would be no more complex. The
input-output logic is generic, but ‘Process-One-Update’ contains the specific business rules. Keeping
these two concerns separate makes it possible to change the join method and updating mode without
affecting, or having to rewrite, any application-dependent code.
Finally, the product record, if any, needs to be written to the Products file:
Record-Product-Status.
if Product-Status = Product-Exists
write Product of Products
invalid key
display "Error writing Product record."
perform Dump-and-Quit
end-rewrite
end-if.

Display-Update.
copy "dispupd8.cbl".

Display-Product.
copy "dispprod.cbl".

Dump-and-Quit.
display "Current-Item-No: ", Current-Item-No,
", Product-Status: ", Product-Status,
display "Product record: "
perform Display-Product
display "Update record: "
perform Display-Update
stop run.

86
Updating

10.6 Updating with Insertions and Deletions

Many update programs have provision for adding and deleting master file records. In addition to
delivery and payment records, we consider four additional kinds of update record. The following
record definitions should be added to the end of the ‘update.cbl’ file.
01 New-Product.
02 Time-Stamp-2.
03 YY-MM-DD-2 pic 9(6).
03 HH-MM-SS-2 pic 9(6).
02 Kind-2 pic x.
02 Item-No-2 pic x(6).
02 Account-2 pic a999.
02 Description pic x(40).
02 Reorder-Level pic 9(4).
02 Reorder-Qty pic 9(4).
02 Price pic 9(6)v99.
01 Withdraw-Product.
02 Time-Stamp-2.
03 YY-MM-DD-2 pic 9(6).
03 HH-MM-SS-2 pic 9(6).
02 Kind-2 pic x.
02 Item-No-2 pic x(6).
02 Account-2 pic a999
01 Open-Account.
02 Time-Stamp-2.
03 YY-MM-DD-2 pic 9(6).
03 HH-MM-SS-2 pic 9(6).
02 Kind-2 pic x.
02 Item-No-2 pic x(6).
02 Account-2 pic a999.
02 Address.
03 Name pic x(30).
03 Street pic x(30).
03 Suburb pic x(30).
01 Close-Account.
02 Time-Stamp-2.
03 YY-MM-DD-2 pic 9(6).
03 HH-MM-SS-2 pic 9(6).
02 Kind-2 pic x.
02 Item-No-2 pic x(6).
02 Account-2 pic a999.
We also need to amend ‘constant.cbl’,
77 Delivery-Code pic x value "D".
77 Payment-Code pic x value "$".
77 New-Product-Code pic x value "P".
77 Withdraw-Product-Code pic x value "X".
77 Open-Account-Code pic x value "S".
77 Close-Account-Code pic x value "C".
77 Dummy-Item-No pic x(6) value low-values.
77 Dummy-Account-No pic x(4) value low-values.
A ‘New-Product’ transaction specifies the item-no, description, reorder level and reorder quantity of a
new product. There is no need to specify the stock, quantity on order or the valuation of the stock, as
these will always be zero initially. A ‘Withdraw-Product’ transaction simply specifies the item
number of the product to be deleted. The deletion should only occur if the product is not in stock or on
order. An ‘Open-Account’ transaction specifies the account code and address of a new supplier. The
balance owed is initially zero. A ‘Close-Account’ transaction simply specifies an account code, but the
deletion only takes place if the balance owed is zero.
It is worth saying that real-world systems would have many more kinds of transaction. Here, we can
imagine transactions to change the price, description, reorder level or reorder quantity of products,
changes of address for suppliers, the return of faulty goods and their associated payment refunds. In
addition, real-world master files would contain many more data items, implying the need for further
transactions to adjust them.
The Transfers file also needs records to open new supplier accounts and close old ones:

87
Updating
01 Open-Account.
02 Time-Stamp-2.
03 YY-MM-DD-2 pic 9(6).
03 HH-MM-SS-2 pic 9(6).
02 Kind-2 pic x.
02 Account-2 pic a999.
02 Address.
03 Name pic x(30).
03 Street pic x(30).
03 Suburb pic x(30).
01 Close-Account.
02 Time-Stamp-2.
03 YY-MM-DD-2 pic 9(6).
03 HH-MM-SS-2 pic 9(6).
02 Kind-2 pic x.
02 Account-2 pic a999.
Because the Transfers file has fewer kinds of records than the Updates file, and is therefore simpler,
we shall begin by using it to update the Suppliers file. As in the previous example, we use the sort-
merge join and copy mode updating. The program produces a report of each supplier’s business
activity, as shown earlier.
IDENTIFICATION DIVISION.
Program-ID. sminsdel.
* Update the Suppliers from the Transfers file using sort-merge.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select optional Old-Suppliers assign to "oldsupp.ndx"
organization is indexed,
record key is Account of Old-Suppliers
access is sequential.
select Suppliers assign to "newsupp.ndx"
organization is indexed,
record key is Account of Suppliers
access is sequential.
select Unsorted-Transfers assign to "transfer.seq"
organization is sequential.
select optional Transfers assign to "work.tmp".
The working-storage section of the data division includes a description of how payments and
deliveries are set out in the report. This approach makes it easy to space its columns correctly.
DATA DIVISION.
File Section.
FD Unsorted-Transfers.
copy "transfer.cbl".
SD Transfers.
copy "transfer.cbl".
FD Old-Suppliers.
copy "supplier.cbl".
FD Suppliers.
copy "supplier.cbl".
Working-Storage Section.
copy "constant.cbl"
01 Detail-Line.
02 YY-MM-DD pic 99/99/99.
02 pic x value space.
02 Qty-Delivered pic z,zz9.
02 pic x value space.
02 Item-No pic x(6).
02 pic x value space.
02 Description pic x(40).
02 pic x value space.
02 Debit pic $$$$,$$$.$$.
02 pic x value space.
02 Credit pic $$$$,$$$.$$.
02 pic x value space.
02 Balance pic $$$$,$$9.99bCR.

88
Updating
77 Current-Account pic a999.
77 Supplier-Status pic x.
77 Supplier-Exists pic x value "Y".
77 Supplier-Missing pic x value "N".
copy "editxfr.cbl".
The procedure division uses a sort output procedure, as before:
PROCEDURE DIVISION.
Process-All-Suppliers.
display "Sorting Transfers ..."
sort Transfers on ascending Account of Transfers,
ascending Time-Stamp of Transfers,
using Unsorted-Transfers
output procedure Process-Sorted-Transfers
stop run.
Because of the sort operation, the transfers are ordered by ‘Time-Stamp’ within ‘Account’. This
suggests the output procedure should contain three levels of loop. However, since each transaction
should have a unique time stamp, the file can only be grouped by ‘Account’. A ‘Time-Stamp’ group
can only contain one record. Therefore only two levels of loop are needed.
Process-Sorted-Transfers.
display "Updating Suppliers ..."
display spaces
open input Old-Suppliers, output Suppliers
perform Get-Next-Transfer
perform Get-Next-Supplier
perform Choose-Current-Account
perform until Current-Account = high-values
perform Get-Supplier-Status
perform Start-One-Account
perform until Account of Transfers not = Current-Account
perform Process-One-Transfer
perform Get-Next-Transfer
end-perform
perform End-One-Account
perform Record-Supplier-Status
if Account of Old-Suppliers = Current-Account
perform Get-Next-Supplier
end-if
perform Choose-Current-Account
end-perform
close Old-Suppliers, Suppliers
display "Update complete.".
Since this program has some reporting to do at the start and end of each group, ‘Start-One-Account’
and ‘End-One-Account’ are written as separate procedures, to keep the main output procedure as
clean as possible, and separate it from the business logic.
The next few paragraphs are routine:
Choose-Current-Account.
if Account of Transfers < Account of Old-Suppliers
move Account of Transfers to Current-Account
else
move Account of Old-Suppliers to Current-Account
end-if.

Get-Next-Supplier.
read Old-Suppliers next record,
at end
move high-values to Account of Old-Suppliers
end-read.

Get-Next-Transfer.
return Transfers record,
at end
move high-values to Account of Transfers
end-return.

89
Updating
Get-Supplier-Status.
if Account of Old-Suppliers = Current-Account,
move Supplier-Exists to Supplier-Status
move Supplier of Old-Suppliers to Supplier of Suppliers
else
move Supplier-Missing to Supplier-Status
end-if.
Before considering ‘Process-One-Transfer’, let’s look at ‘Record-Supplier-Status’. It writes a record
only if ‘Supplier-Status’ equals ‘Supplier-Exists’:
Record-Supplier-Status.
if Supplier-Status = Supplier-Exists
write Supplier of Suppliers
end-if.
Processing Delivery and Payment transfer records is simple enough. However, opening a new
supplier account requires that there should not be an existing supplier record for the account, ie,
‘Supplier-Status’ should equal ‘Supplier-Missing’. Now here’s the trick. The procedure initialises the
output record area, then sets ‘Supplier-Status’ to ‘Supplier-Exists’. First, this allows the record area to
be updated by payments and deliveries. Second, it will force ‘Record-Supplier-Status’ to write the
supplier record to the new Suppliers file, as soon as the group of updates finishes.
The complementary trick is that to close an account it is merely necessary to set ‘Supplier-Status’ to
‘Supplier-Missing’. Then ‘Record-Supplier-Status’ will fail to write it to the new file. Note that, in
one batch of transactions, it is even safe to open, update and close the same account several times.
Sorting on ‘Time-Stamp’ within ‘Account’ ensures that everything will be done in the right order.
Process-One-Transfer.
evaluate Kind of Transfers also Supplier-Status
when Delivery-Code also Supplier-Exists
add Cost of Transfers to Balance of Suppliers
perform Debit-Detail
when Payment-Code also Supplier-Exists
subtract Amount of Transfers from Balance of Suppliers
perform Credit-Detail
when Open-Account-Code also Supplier-Missing
move Account of Transfers to Account of Suppliers
move Address of Transfers to Address of Suppliers
move zero to Balance of Suppliers
move Supplier-Exists to Supplier-Status
display "Account opened."
perform Start-One-Account
when Close-Account-Code also Supplier-Exists
if Balance of Suppliers not = zero
display "Can't close account. Balance is non-zero."
else
move Supplier-Missing to Supplier-Status
perform End-One-Account
display "Account Closed."
end-if
when Open-Account-Code also Supplier-Exists
display "Can't open account. It already exists."
perform Display-Transfer
when Close-Account-Code also Supplier-Missing
display "Can't close account ", Current-Account,
". It doesn't exist."
perform Display-Transfer
when any also Supplier-Missing
display "Supplier Update ignored. Account not on file."
perform Display-Transfer
end-evaluate.

Display-Transfer.
copy "dispxfr.cbl".
The evaluate statement checks both the kind of transaction and ‘Supplier-Status’. After dealing with
the valid cases, it deals with attempts to open an account that already exists, close one that doesn’t, or
to update an account that isn’t open.

90
Updating

Finally, procedures are needed to report the changes that occur to each supplier:
Start-One-Account.
if Supplier-Status = Supplier-Exists
display Name of Suppliers space Current-Account
display Street of Suppliers
display Suburb of Suppliers
move spaces to Detail-Line
move "Opening balance" to Description of Detail-Line
move Balance of Suppliers to Balance of Detail-Line
display Detail-Line
else
display Current-Account " is closed."
end-if.

Debit-Detail.
move spaces to Detail-Line
move YY-MM-DD of Transfers to YY-MM-DD of Detail-Line
move Item-No of Transfers to Item-No of Detail-Line
move Qty-Delivered of Transfers
to Qty-Delivered of Detail-Line
move Description of Transfers to Description of Detail-Line
move Cost of Transfers to Debit of Detail-Line
move Balance of Suppliers to Balance of Detail-Line
display Detail-Line.

Credit-Detail.
move spaces to Detail-Line
move YY-MM-DD of Transfers to YY-MM-DD of Detail-Line
move "Payment" to Description of Detail-Line
move Amount of Transfers to Credit of Detail-Line
move Balance of Suppliers to Balance of Detail-Line
display Detail-Line.

End-One-Account.
if Supplier-Status = Supplier-Exists
move spaces to Detail-Line
move "Closing balance" to Description of Detail-Line
move Balance of Suppliers to Balance of Detail-Line
display Detail-Line
display spaces
else
display Current-Account " is closed."
end-if.
This program reports each supplier, even one that has no transactions. Serv-U-Rite consider this a
feature, as it is an unusual situation, and it is more noticeable if the supplier records are reported.

10.7 Skip-Sequential Updating

Finally, we consider a skip-sequential update of the Products file with insertions and deletions. The
logic is similar to sort-merge, in that the Updates file is sorted, but the Products file is used in random
access mode:
IDENTIFICATION DIVISION.
Program-ID. ssinsdel.
* Joins Products and Updates using skip-sequential access.

ENVIRONMENT DIVISION.
Input-Output Section.
File-Control.
select Products assign to "newprod.ndx"
organization is indexed,
record key is Item-No of Products
access is random.
select Unsorted-Updates assign to "updates.seq"
organization is sequential.
select Updates assign to "work.tmp".
select optional Transfers assign to "transfer.seq"
organization is sequential.

91
Updating

The data division introduces a new variable, ‘Original-Status’, whose use will be explained later:
DATA DIVISION.
File Section.
FD Unsorted-Updates.
copy "update.cbl".
SD Updates.
copy "update.cbl".
FD Products.
copy "product.cbl".
FD Transfers.
copy "transfer.cbl".
Working-Storage Section.
77 Current-Item-No pic x(6).
77 Product-Status pic x.
77 Original-Status pic x.
77 Product-Exists pic x value "Y".
77 Product-Missing pic x value "N".
copy "constant.cbl".
copy "editupd8.cbl".
copy "editprod.cbl".
The procedure division uses a sort output procedure. The Products file is opened in i-o (input-
output) mode. Because transactions that only affect the Suppliers file have low-valued item-numbers,
they are processed in a separate loop. The status of each product record is displayed before each group
of transactions, then the final updated status is displayed after the group:
PROCEDURE DIVISION.
Process-All-Products.
display "Sorting Updates ..."
sort Updates on ascending Item-No of Updates,
Time-Stamp of Updates,
using Unsorted-Updates
output procedure Process-Sorted-Updates
stop run.

Process-Sorted-Updates.
display "Updating Products ..."
display spaces
open i-o Products, extend Transfers
perform Get-Next-Update
perform until Item-No in Updates not = Dummy-Item-No
perform Process-One-Update
perform Get-Next-Update
end-perform
move Item-No of Updates to Current-Item-No
perform until Current-Item-No = high-values
perform Get-Product-Status
perform Display-Product-Status
perform until Item-No of Updates not = Current-Item-No
perform Process-One-Update
perform Get-Next-Update
end-perform
perform Display-Product-Status
perform Record-Product-Status
move Item-No of Updates to Current-Item-No
end-perform
close Products, Transfers
display "Update complete.".
The next few paragraphs are routine, except that ‘Product-Status’ is copied to ‘Original-Status’.
Get-Next-Update.
return Updates record,
at end
move high-values to Item-No of Updates
end-return.

92
Updating
Get-Product-Status.
move Current-Item-No to Item-No of Products
read Products record
invalid key
move Product-Missing to Product-Status
not invalid
move Product-Exists to Product-Status
end-read
move Product-Status to Original-Status.
Process-One-Update repeats the same trick of altering Product-Status to insert new product records or
to delete old ones. Otherwise, it is long, but straightforward:
Process-One-Update.
perform Display-Update
evaluate Kind of Updates also Product-Status
when Delivery-Code also Product-Exists
if Qty-Delivered of Updates > On-Order of Products
display "Item: ", Current-Item-No,
", delivery exceeds No. on order. Ignored."
perform Display-Update
else
if Account of Updates not = Supplier of Products
display "Supplier is not the expected one."
end-if
add Qty-Delivered of Updates to Stock of Products
subtract Qty-Delivered of Updates
from On-Order of Products
add Cost of Updates to Valuation of Products
move Time-Stamp of Updates to Time-Stamp of Transfers
move Kind of Updates to Kind of Transfers
move Account of Updates to Account of Transfers
move Cost of Updates to Cost of Transfers
write Delivery of Transfers
end-if
when Payment-Code also any
move Time-Stamp of Updates to Time-Stamp of Transfers
move Kind of Updates to Kind of Transfers
move Account of Updates to Account of Transfers
move Amount of Updates to Amount of Transfers
write Payment of Transfers
when New-Product-Code also Product-Missing
move Item-No of Updates to Item-No of Products
move Account of Updates to Supplier of Products
move Description of Updates to Description of Products
move Reorder-Level of Updates
to Reorder-Level of Products
move Reorder-Qty of Updates to Reorder-Qty of Products
move Price of Updates to Price of Products
move zeros to Stock of Products, On-Order of Products
move Product-Exists to Product-Status
when Withdraw-Product-Code also Product-Exists
evaluate true
when Stock of Products not = zero
display "Item: ", Current-Item-No,
" is still in stock. Ignored."
perform Display-Update
when On-Order of Products not = zero
display "Item: ", Current-Item-No,
" is currently on order. Ignored."
perform Display-Update
when other
move Product-Missing to Product-Status
end-evaluate
when Open-Account-Code also any
move Time-Stamp of Updates to Time-Stamp of Transfers
move Kind of Updates to Kind of Transfers
move Account of Updates to Account of Transfers
move Address of Updates to Address of Transfers
write Open-Account of Transfers

93
Updating
when Close-Account-Code also any
move Time-Stamp of Updates to Time-Stamp of Transfers
move Kind of Updates to Kind of Transfers
move Account of Updates to Account of Transfers
write Close-Account of Transfers
when New-Product-Code also Product-Exists
display "New product ignored. Item-No already on file."
perform Display-Update
when Withdraw-Product-Code also Product-Missing
display "Withdraw product ignored. Item-No not on file."
perform Display-Update
when any also Product-Missing
display "Product update ignored. Item-No not on file."
end-evaluate.
Unlike the copy mode update, which just needs a write statement, Record-Product-Status needs a
write statement to insert a new record, a rewrite statement to update an existing record, and a delete
statement to delete existing records. The choice must be made by comparing the original state of the
product with its updated state. If ‘Product-Status’ changes from ‘Product-Missing’ to ‘Product-Exists’,
a new record must be written. If it changes from ‘Product-Exists’ to ‘Product-Missing’, the record
must be deleted. If it remains as ‘Product-Exists’, the record must be rewritten. If it remains as
‘Product-Missing’, nothing needs to be done. If the program had not saved the original value of
‘Product-Status’ in ‘Original-Status’, it wouldn’t know which action to take.
Record-Product-Status.
evaluate Original-Status also Product-Status
when Product-Exists also Product-Exists
rewrite Product of Products
invalid key
display "Error rewriting Products record."
perform Dump-and-Quit
end-rewrite
when Product-Exists also Product-Missing
delete Products record
invalid key
display "Error deleting Products record."
perform Dump-and-Quit
end-delete
when Product-Missing also Product-Exists
write Product of Products
invalid key
display "Error writing Products record."
perform Dump-and-Quit
end-write
end-evaluate.
Like read, a delete statement must specify a file, not a record.
The invalid key clauses are required by Cobol. There seems to be no way they can be triggered here.
Therefore, just in case, the program dumps all the useful information it can, then stops.
Dump-and-Quit.
display "Current-Item-No: ", Current-Item-No,
", Product-Status: ", Product-Status,
", Original-Status: ”, Original-Status
display "Product record: "
perform Display-Product
display "Update record: "
perform Display-Update
stop run.
The rest is routine:
Display-Product-Status.
if Product-Status = Product-Exists
perform Display-Product
else
display Current-Item-No " is not on file."
display spaces
end-if.

94
Updating
Display-Update.
copy "dispupd8.cbl".

Display-Product.
copy "dispprod.cbl".
It is equally possible to update a master file in random access mode. There are no new ideas
involved. The program is similar to the skip-sequential update, except that the sort operation could
omitted. However, the same product might be displayed several times, and care would have to be
taken when dealing with low-valued item numbers. But ‘Process-One-Update’ would remain exactly
the same, as indeed it would even if it was transplanted to a copy mode update program.

Im 067310
No ratings yet
Im 067310
178 pages
How To Configure Shopify Markets With Klaviyo
No ratings yet
How To Configure Shopify Markets With Klaviyo
8 pages
FortiOS-6 4 7-CLI - Reference
No ratings yet
FortiOS-6 4 7-CLI - Reference
1,597 pages
Haptic Technology
25% (4)
Haptic Technology
29 pages
Part 5: Token Ring Access Method and Physical Layer Specifications Amendment 5: Gigabit Token Ring Operation
No ratings yet
Part 5: Token Ring Access Method and Physical Layer Specifications Amendment 5: Gigabit Token Ring Operation
210 pages
Finance Management System
No ratings yet
Finance Management System
110 pages
TL SG2210P - User Guide
No ratings yet
TL SG2210P - User Guide
861 pages
Gnucobpg Letter
No ratings yet
Gnucobpg Letter
912 pages
Gnucobpg Letter
No ratings yet
Gnucobpg Letter
828 pages
5th Sem DSP Lab Manual
No ratings yet
5th Sem DSP Lab Manual
107 pages
Query Compiler
No ratings yet
Query Compiler
599 pages
COBOL 400 User's Guide - c0918120
67% (3)
COBOL 400 User's Guide - c0918120
425 pages
Querycompiler PDF
No ratings yet
Querycompiler PDF
714 pages
Ibm - Cobol Programming Guide
No ratings yet
Ibm - Cobol Programming Guide
639 pages
COBOL
No ratings yet
COBOL
425 pages
SAD Merged 2 Merged
No ratings yet
SAD Merged 2 Merged
107 pages
Business Partner Biz Dev Event - 09032015
No ratings yet
Business Partner Biz Dev Event - 09032015
86 pages
DBMS Piyushwairale
No ratings yet
DBMS Piyushwairale
50 pages
System 38 Cobol Userguide Reference
No ratings yet
System 38 Cobol Userguide Reference
646 pages
Cobol Subfiles For AS400 Developers
100% (1)
Cobol Subfiles For AS400 Developers
656 pages
Efficient Algorithms For Sorting and Synchronization (Andrew
No ratings yet
Efficient Algorithms For Sorting and Synchronization (Andrew
115 pages
Firebird Dataaccess Path en
No ratings yet
Firebird Dataaccess Path en
91 pages
Crypto Casino Carding ??
No ratings yet
Crypto Casino Carding ??
2 pages
Headless Content Management System
No ratings yet
Headless Content Management System
6 pages
Webutil Example Code Fragments
No ratings yet
Webutil Example Code Fragments
6 pages
M02 AWS Security+Management+in+AWS Ed9
No ratings yet
M02 AWS Security+Management+in+AWS Ed9
38 pages
Object Oriented Programming
No ratings yet
Object Oriented Programming
16 pages
CL
No ratings yet
CL
387 pages
COBOL Set For AIX Programming Guide
No ratings yet
COBOL Set For AIX Programming Guide
641 pages
Virtual Class Room Final
No ratings yet
Virtual Class Room Final
23 pages
Enterprise PL-I For Z-OS Language Reference
No ratings yet
Enterprise PL-I For Z-OS Language Reference
596 pages
Some More Padhi
No ratings yet
Some More Padhi
14 pages
Hamza Afzal Lab 5
No ratings yet
Hamza Afzal Lab 5
17 pages
AWS Certified Advanced Networking Specialty - Sample Questions c01
No ratings yet
AWS Certified Advanced Networking Specialty - Sample Questions c01
8 pages
HIS Implementation Guide
No ratings yet
HIS Implementation Guide
11 pages
Activate &SAP - EDIT in SE16N (SAP ECC 6.0) - SAP Community
No ratings yet
Activate &SAP - EDIT in SE16N (SAP ECC 6.0) - SAP Community
16 pages
Sort - Ice1cg50
No ratings yet
Sort - Ice1cg50
220 pages
Uol Algorithms
No ratings yet
Uol Algorithms
215 pages
DBMS Case Study
No ratings yet
DBMS Case Study
12 pages
Comet
No ratings yet
Comet
581 pages
cmpt141 Readings
No ratings yet
cmpt141 Readings
166 pages
Sort MFX
No ratings yet
Sort MFX
722 pages
Homework 1 Solutions
No ratings yet
Homework 1 Solutions
10 pages
COBOL Programming Course 2 Advanced Topics
No ratings yet
COBOL Programming Course 2 Advanced Topics
51 pages
Enterprise COBOL Reference
No ratings yet
Enterprise COBOL Reference
724 pages
Vue Js App Fromtend
No ratings yet
Vue Js App Fromtend
13 pages
JCL Utilities Index: Sys-Ed /computer Education Techniques, Inc. (JCL Util - 0.8) IDX: Page 1
No ratings yet
JCL Utilities Index: Sys-Ed /computer Education Techniques, Inc. (JCL Util - 0.8) IDX: Page 1
5 pages
Module 1
No ratings yet
Module 1
13 pages
Pandas Guide
No ratings yet
Pandas Guide
64 pages
Recutils
No ratings yet
Recutils
92 pages
srx1500 Firewall Datasheet
No ratings yet
srx1500 Firewall Datasheet
6 pages
Oracle Optimization
No ratings yet
Oracle Optimization
99 pages
CS502
No ratings yet
CS502
184 pages
COBOL Programming Course 2 Learning COBOL
No ratings yet
COBOL Programming Course 2 Learning COBOL
111 pages
Pandasguide
No ratings yet
Pandasguide
65 pages
Henry Jackson Resume
No ratings yet
Henry Jackson Resume
2 pages
5 Oracle SQL and PL SQL Optimization For D
No ratings yet
5 Oracle SQL and PL SQL Optimization For D
99 pages
Aselsan CMFD 55
No ratings yet
Aselsan CMFD 55
2 pages
COBOL - Programming Guide
No ratings yet
COBOL - Programming Guide
519 pages
Pandasguide Readthedocs Io en Latest PDF
No ratings yet
Pandasguide Readthedocs Io en Latest PDF
65 pages
Lab 2.8.1: Basic Static Route Configuration: Task 3: Interpreting Debug Output
No ratings yet
Lab 2.8.1: Basic Static Route Configuration: Task 3: Interpreting Debug Output
6 pages
Pandas Guide
No ratings yet
Pandas Guide
65 pages
Oracle
No ratings yet
Oracle
101 pages
Ibm Ecobol PRG Guide
No ratings yet
Ibm Ecobol PRG Guide
864 pages
Oracle SQL & PL-SQL Optimization For Developers Documentation PDF
No ratings yet
Oracle SQL & PL-SQL Optimization For Developers Documentation PDF
103 pages
Python Invoice Managment
No ratings yet
Python Invoice Managment
3 pages
Ub
No ratings yet
Ub
520 pages
Learneverythingai 1661068200
No ratings yet
Learneverythingai 1661068200
66 pages
Book Flyer For Computer Architecture: A Minimalist Perspective
No ratings yet
Book Flyer For Computer Architecture: A Minimalist Perspective
2 pages
Optimization For Developers
No ratings yet
Optimization For Developers
103 pages
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
Conquering the Competition: Strategies for Standing Out in the Gaming Content Landscape
From Everand
Conquering the Competition: Strategies for Standing Out in the Gaming Content Landscape
Rian McCullen
No ratings yet
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
Keys to Better Reading
From Everand
Keys to Better Reading
Judy McFall
No ratings yet
Audio, Video, and Media in the Ministry
From Everand
Audio, Video, and Media in the Ministry
Clarence Floyd Richmond
No ratings yet
A Discourse Analysis of 1 Peter
From Everand
A Discourse Analysis of 1 Peter
Ervin Ray Starwalt
No ratings yet
Blog Smarter, Not Harder: SEO, Blogging, and AI Strategies to Skyrocket Your Traffic
From Everand
Blog Smarter, Not Harder: SEO, Blogging, and AI Strategies to Skyrocket Your Traffic
Jay Nans
No ratings yet
CAN Bus for Beginners: A Practical Guide to Automotive Networking
From Everand
CAN Bus for Beginners: A Practical Guide to Automotive Networking
Mohamad Charara
No ratings yet
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
A To Z of Internet: Everything You Wanted to Know
From Everand
A To Z of Internet: Everything You Wanted to Know
Bittu Kumar
No ratings yet
Aquaponic Design Plans Everything You Needs to Know: Everything You Need to Know from Backyard to Profitable Business
From Everand
Aquaponic Design Plans Everything You Needs to Know: Everything You Need to Know from Backyard to Profitable Business
David H Dudley
No ratings yet
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Aquaponics Design Plans, Construction, Operation, and Income: Organic Food
From Everand
Aquaponics Design Plans, Construction, Operation, and Income: Organic Food
David H Dudley
No ratings yet
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet
10K Blueprint
From Everand
10K Blueprint
Cian O Farrell
5/5 (2)
Software Patterns Made Easy
From Everand
Software Patterns Made Easy
Justice Nanhou
No ratings yet
Web Video Business
From Everand
Web Video Business
MUHAMMAD NUR WAHID ANUAR
No ratings yet
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet

Cobol

Uploaded by

Cobol

Uploaded by

File Algorithms in Cobol 85

2 KNOW YOUR ENEMY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

8 SET UNION, INTERSECTION AND DIFFERENCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

2 Know Your Enemy

2.1 Extreme Engineering

Table!2.1: Specifications of the Phantom II Disk Drive

2.2 Serial Access and Random Access

2.3 Why Input-Output Remains A Problem

2.4 Disk Technology

2.5 Operating System Considerations

2.6 Other Media

2.7 What’s Next?

3.1 The Master File Records

3.2 The Master Files

3.3 An Example Program

Open input and

Read first record

Read next record Move input

Write new output Stop

4.1 Transaction Files

4.2 Copying a Transaction File

* Copies the customer updates file.

Open input and

Read first record

Read next record Delivery Type of Payment Close both files

Write new Write new

4.3 Punctuation in Cobol

4.4 Matters of Style

4.5 The Copy Statement

and also a file ‘constant.cbl’ containing the following,

5 Interacting with the Operator

* Lists the Customers file.

5.1 Edited-Numeric Items

B003 BCR Mobile Installations Credit Limit: $2,000.00

B007 Blaupunkt Credit Limit: $25,000.00

B012 Bobs Electronic Repairs Credit Limit: $1,000.00

ACLOTP Alcatel One Touch Phone

ALTCIU Altec Caller ID Unit

ALTPCD Altec Portable CD Including Headphones

We can describe this layout in file ‘editprod.cbl’:

5.3 The Truth about Move

5.5 Creating a Suppliers File

* Creates a Suppliers file.

Read 1st code

Accept rest of Close supplier

Try to write new Stop

Warn user Confirm update

5.6 Creating a Transaction File

* Creates an sequential Updates file from keyboard input.

5.7 Random Access

Read 1st code

6 Projection and Selection

B007 Blaupunkt Balance: $10,295.00

C007 Cargear Pty Ltd Balance: $0.00

* Lists those customers that have back orders.

Read next False Close customer

Edit and display

* Lists back order amounts for all customers.

6.3 Combining Selection and Projection

7 Ordering and Grouping

* Sorts customers in descending order of balance owing.

* Displays customers in descending order of balance owing.

The resulting output looks like this,

B007 Blaupunkt Credit Limit: $25,000.00

R003 RS Automotive Development Credit Limit: $5,000.00

S004 Sound 4 Australia Pty Ltd Credit Limit: $5,000.00

and so on, until ...

B012 Bobs Electronic Repairs Credit Limit: $1,000.00

* Displays Deliveries in descending order of Cost.

* Finds the costs of deliveries and payments for each supplier.

Start-All-Accounts is performed once only, at the start of the program.

Sort updates by Read 1st update

Open updates Start all

False End all accounts

Start one account

No Read next update