Cie 1
Cie 1
paths that behave similarly to fuses. In the original state of the device, all the fuses are
intact. Programming the device involves blowing those fuses along the paths that must
be removed in order to obtain the particular configuration of the desired logic function.
In this chapter, we introduce the configuration of PLDs and indicate procedures for their
use in the design of digital systems. We also present CMOS FPGAs, which are configured
by downloading a stream of bits into the device to configure transmission gates to estab-
lish the internal connectivity required by a specified logic function (combinational or
sequential).
A typical PLD may have hundreds to millions of gates interconnected through hun-
dreds to thousands of internal paths. In order to show the internal logic diagram of such
a device in a concise form, it is necessary to employ a special gate symbology applicable
to array logic. Figure 7.1 shows the conventional and array logic symbols for a multiple‐
input OR gate. Instead of having multiple input lines into the gate, we draw a single line
entering the gate. The input lines are drawn perpendicular to this single line and are
connected to the gate through internal fuses. In a similar fashion, we can draw the array
logic for an AND gate. This type of graphical representation for the inputs of gates will
be used throughout the chapter in array logic diagrams.
k address lines
Memory unit
Read 2k words
n bit per word
Write
FIGURE 7.2
Block diagram of a memory unit
Communication between memory and its environment is achieved through data input
and output lines, address selection lines, and control lines that specify the direction of
transfer. A block diagram of a memory unit is shown in Fig. 7.2. The n data input lines
provide the information to be stored in memory, and the n data output lines supply the
information coming out of memory. The k address lines specify the particular word
chosen among the many available. The two control inputs specify the direction of trans-
fer desired: The Write input causes binary data to be transferred into the memory, and
the Read input causes binary data to be transferred out of memory.
The memory unit is specified by the number of words it contains and the number of
bits in each word. The address lines select one particular word. Each word in memory
is assigned an identification number, called an address, starting from 0 up to 2k - 1,
where k is the number of address lines. The selection of a specific word inside memory
is done by applying the k‐bit address to the address lines. An internal decoder accepts
this address and opens the paths needed to select the word specified. Memories vary
greatly in size and may range from 1,024 words, requiring an address of 10 bits, to 232
words, requiring 32 address bits. It is customary to refer to the number of words (or
bytes) in memory with one of the letters K (kilo), M (mega), and G (giga). K is equal to
210, M is equal to 220, and G is equal to 230. Thus, 64K = 216, 2M = 221, and 4G = 232.
Consider, for example, a memory unit with a capacity of 1K words of 16 bits each.
Since 1K = 1,024 = 210 and 16 bits constitute two bytes, we can say that the memory
can accommodate 2,048 = 2K bytes . Figure 7.3 shows possible contents of the first
three and the last three words of this memory. Each word contains 16 bits that can be
divided into two bytes. The words are recognized by their decimal address from 0 to
1,023. The equivalent binary address consists of 10 bits. The first address is specified with
ten 0’s; the last address is specified with ten 1’s, because 1,023 in binary is equal to
1111111111. A word in memory is selected by its binary address. When a word is read or
written, the memory operates on all 16 bits as a single unit.
The 1K * 16 memory of Fig. 7.3 has 10 bits in the address and 16 bits in each word.
As another example, a 64K * 10 memory will have 16 bits in the address (since
64K = 216) and each word will consist of 10 bits. The number of address bits needed in
302 Chapter 7 Memory and Programmable Logic
Memory address
0000000000 0 1011010101011101
0000000001 1 1010101110001001
0000000010 2 0000110101000110
• •
• •
• •
• •
FIGURE 7.3
Contents of a 1024 * 16 memory
a memory is dependent on the total number of words that can be stored in the memory
and is independent of the number of bits in each word. The number of bits in the address
is determined from the relationship 2k Ú m, where m is the total number of words and
k is the number of address bits needed to satisfy the relationship.
Table 7.1
Control Inputs to Memory Chip
Memory Enable Read/Write Memory Operation
0 X None
1 0 Write to selected word
1 1 Read from selected word
The memory unit will then take the bits from the word that has been selected by the
address and apply them to the output data lines. The contents of the selected word do
not change after the read operation, i.e., the word operation is nondestructive.
Commercial memory components available in integrated‐circuit chips sometimes
provide the two control inputs for reading and writing in a somewhat different configu-
ration. Instead of having separate read and write inputs to control the two operations,
most integrated circuits provide two other control inputs: One input selects the unit and
the other determines the operation. The memory operations that result from these
control inputs are specified in Table 7.1.
The memory enable (sometimes called the chip select) is used to enable the particu-
lar memory chip in a multichip implementation of a large memory. When the memory
enable is inactive, the memory chip is not selected and no operation is performed. When
the memory enable input is active, the read/write input determines the operation to be
performed.
Execution of this statement causes a transfer of four bits from the selected memory word
specified by Address onto the DataOut lines. If ReadWrite is 0, the memory performs a
write operation symbolized by the statement
Mem [Address] d DataIn;
Execution of this statement causes a transfer from the four‐bit DataIn lines into the
memory word selected by Address. When Enable is equal to 0, the memory is disabled
and the outputs are assumed to be in a high‐impedance state, indicated by the symbol z.
Thus, the memory has three‐state outputs.
Timing Waveforms
The operation of the memory unit is controlled by an external device such as a central
processing unit (CPU). The CPU is usually synchronized by its own clock. The memory,
however, does not employ an internal clock. Instead, its read and write operations are
specified by control inputs. The access time of memory is the time required to select a
word and read it. The cycle time of memory is the time required to complete a write
operation. The CPU must provide the memory control signals in such a way as to syn-
chronize its internal clocked operations with the read and write operations of memory.
This means that the access time and cycle time of the memory must be within a time
equal to a fixed number of CPU clock cycles.
Suppose as an example that a CPU operates with a clock frequency of 50 MHz, giv-
ing a period of 20 ns for one clock cycle. Suppose also that the CPU communicates with
a memory whose access time and cycle time do not exceed 50 ns. This means that the
Section 7.2 Random-Access Memory 305
20 nsec
T1 T2 T3 T1
Clock
Memory
Address valid
address
Memory
enable
Initiate writing Latched
Read/
Write
Data
Data valid
input
(a) Write cycle
50 nsec
T1 T2 T3 T1
Clock
Memory
Address valid
address
Memory
enable Initiate read
Read/
Write
Data
Data valid
output
(b) Read cycle
FIGURE 7.4
Memory cycle timing waveforms
write cycle terminates the storage of the selected word within a 50‐ns interval and that
the read cycle provides the output data of the selected word within 50 ns or less. (The
two numbers are not always the same.) Since the period of the CPU cycle is 20 ns, it will
be necessary to devote at least two‐and‐a‐half, and possibly three, clock cycles for each
memory request.
The memory timing shown in Fig. 7.4 is for a CPU with a 50‐MHz clock and a memory
with 50 ns maximum cycle time. The write cycle in part (a) shows three 20‐ns cycles: T 1,
306 Chapter 7 Memory and Programmable Logic
T 2, and T 3. For a write operation, the CPU must provide the address and input data to
the memory. This is done at the beginning of T 1. (The two lines that cross each other in
the address and data waveforms designate a possible change in value of the multiple
lines.) The memory enable and the read/write signals must be activated after the signals
in the address lines are stable in order to avoid destroying data in other memory words.
The memory enable signal switches to the high level and the read/write signal switches
to the low level to indicate a write operation. The two control signals must stay active
for at least 50 ns. The address and data signals must remain stable for a short time after
the control signals are deactivated. At the completion of the third clock cycle, the mem-
ory write operation is completed and the CPU can access the memory again with the
next T 1 cycle.
The read cycle shown in Fig. 7.4(b) has an address for the memory provided by the
CPU. The memory‐enable and read/write signals must be in their high level for a read
operation. The memory places the data of the word selected by the address into the out-
put data lines within a 50‐ns interval (or less) from the time that the memory enable is
activated. The CPU can transfer the data into one of its internal registers during the
negative transition of T 3. The next T 1 cycle is available for another memory request.
Types of Memories
The mode of access of a memory system is determined by the type of components used.
In a random‐access memory, the word locations may be thought of as being separated
in space, each word occupying one particular location. In a sequential‐access memory,
the information stored in some medium is not immediately accessible, but is available
only at certain intervals of time. A magnetic disk or tape unit is of this type. Each
memory location passes the read and write heads in turn, but information is read out
only when the requested word has been reached. In a random‐access memory, the access
time is always the same regardless of the particular location of the word. In a sequential‐
access memory, the time it takes to access a word depends on the position of the word
with respect to the position of the read head; therefore, the access time is variable.
Integrated circuit RAM units are available in two operating modes: static and
dynamic. Static RAM (SRAM) consists essentially of internal latches that store the
binary information. The stored information remains valid as long as power is applied to
the unit. Dynamic RAM (DRAM) stores the binary information in the form of electric
charges on capacitors provided inside the chip by MOS transistors. The stored charge
on the capacitors tends to discharge with time, and the capacitors must be periodically
recharged by refreshing the dynamic memory. Refreshing is done by cycling through the
words every few milliseconds to restore the decaying charge. DRAM offers reduced
power consumption and larger storage capacity in a single memory chip. SRAM is
easier to use and has shorter read and write cycles.
Memory units that lose stored information when power is turned off are said to be
volatile. CMOS integrated circuit RAMs, both static and dynamic, are of this category, since
the binary cells need external power to maintain the stored information. In contrast, a
nonvolatile memory, such as magnetic disk, retains its stored information after the removal
Section 7.3 Memory Decoding 307
of power. This type of memory is able to retain information because the data stored on
magnetic components are represented by the direction of magnetization, which is retained
after power is turned off. ROM is another nonvolatile memory. A nonvolatile memory
enables digital computers to store programs that will be needed again after the computer
is turned on. Programs and data that cannot be altered are stored in ROM, while other
large programs are maintained on magnetic disks. The latter programs are transferred into
the computer RAM as needed. Before the power is turned off, the binary information from
the computer RAM is transferred to the disk so that the information will be retained.
Internal Construction
The internal construction of a RAM of m words and n bits per word consists of m * n
binary storage cells and associated decoding circuits for selecting individual words. The
binary storage cell is the basic building block of a memory unit. The equivalent logic of
a binary cell that stores one bit of information is shown in Fig. 7.5. The storage part of
the cell is modeled by an SR latch with associated gates to form a D latch. Actually, the
Select
Select
R
Read/Write
Read/Write
FIGURE 7.5
Memory cell
308 Chapter 7 Memory and Programmable Logic
cell is an electronic circuit with four to six transistors. Nevertheless, it is possible and
convenient to model it in terms of logic symbols. A binary storage cell must be very small
in order to be able to pack as many cells as possible in the small area available in the
integrated circuit chip. The binary cell stores one bit in its internal latch. The select input
enables the cell for reading or writing, and the read/write input determines the operation
of the cell when it is selected. A 1 in the read/write input provides the read operation by
forming a path from the latch to the output terminal. A 0 in the read/write input provides
the write operation by forming a path from the input terminal to the latch.
The logical construction of a small RAM is shown in Fig. 7.6. This RAM consists of
four words of four bits each and has a total of 16 binary cells. The small blocks labeled
BC represent the binary cell with its three inputs and one output, as specified in
Fig. 7.5(b). A memory with four words needs two address lines. The two address inputs
go through a 2 * 4 decoder to select one of the four words. The decoder is enabled with
Input data
Word 0
BC BC BC BC
Address
inputs Word 1
24 BC BC BC BC
decoder
Word 2
BC BC BC BC
Memory
EN
enable
Word 3
BC BC BC BC
Read/Write
Output data
FIGURE 7.6
Diagram of a 4 * 4 RAM
Section 7.3 Memory Decoding 309
the memory‐enable input. When the memory enable is 0, all outputs of the decoder are
0 and none of the memory words are selected. With the memory select at 1, one of the
four words is selected, dictated by the value in the two address lines. Once a word has
been selected, the read/write input determines the operation. During the read opera-
tion, the four bits of the selected word go through OR gates to the output terminals.
(Note that the OR gates are drawn according to the array logic established in Fig. 7.1.)
During the write operation, the data available in the input lines are transferred into the
four binary cells of the selected word. The binary cells that are not selected are disabled,
and their previous binary values remain unchanged. When the memory select input that
goes into the decoder is equal to 0, none of the words are selected and the contents of
all cells remain unchanged regardless of the value of the read/write input.
Commercial RAMs may have a capacity of thousands of words, and each word may
range from 1 to 64 bits. The logical construction of a large‐capacity memory would be a
direct extension of the configuration shown here. A memory with 2k words of n bits per
word requires k address lines that go into a k * 2k decoder. Each one of the decoder
outputs selects one word of n bits for reading or writing.
Coincident Decoding
A decoder with k inputs and 2k outputs requires 2k AND gates with k inputs per gate.
The total number of gates and the number of inputs per gate can be reduced by
employing two decoders in a two‐dimensional selection scheme. The basic idea in
two‐dimensional decoding is to arrange the memory cells in an array that is close as
possible to square. In this configuration, two k/2‐input decoders are used instead of
one k‐input decoder. One decoder performs the row selection and the other the col-
umn selection in a two‐dimensional matrix configuration.
The two‐dimensional selection pattern is demonstrated in Fig. 7.7 for a 1K‐word
memory. Instead of using a single 10 * 1,024 decoder, we use two 5 * 32 decoders.
With the single decoder, we would need 1,024 AND gates with 10 inputs in each. In the
two‐decoder case, we need 64 AND gates with 5 inputs in each. The five most significant
bits of the address go to input X and the five least significant bits go to input Y. Each
word within the memory array is selected by the coincidence of one X line and one Y
line. Thus, each word in memory is selected by the coincidence between 1 of 32 rows and
1 of 32 columns, for a total of 1,024 words. Note that each intersection represents a word
that may have any number of bits.
As an example, consider the word whose address is 404. The 10‐bit binary equivalent
of 404 is 01100 10100. This makes X = 01100 (binary 12) and Y = 10100 (binary 20).
The n‐bit word that is selected lies in the X decoder output number 12 and the Y decoder
output number 20. All the bits of the word are selected for reading or writing.
Address Multiplexing
The SRAM memory cell modeled in Fig. 7.5 typically contains six transistors. In order to
build memories with higher density, it is necessary to reduce the number of transistors in
a cell. The DRAM cell contains a single MOS transistor and a capacitor. The charge stored
310 Chapter 7 Memory and Programmable Logic
5 32 decoder
0 1 2 . . . . 20 . . . 31
0
1
2
. binary address
.
5 32 . 01100 10100
X
decoder
12 X Y
.
.
.
31
FIGURE 7.7
Two‐dimensional decoding structure for a 1K‐word memory
on the capacitor discharges with time, and the memory cells must be periodically recharged
by refreshing the memory. Because of their simple cell structure, DRAMs typically have
four times the density of SRAMs. This allows four times as much memory capacity to be
placed on a given size of chip. The cost per bit of DRAM storage is three to four times
less than that of SRAM storage. A further cost savings is realized because of the lower
power requirement of DRAM cells. These advantages make DRAM the preferred tech-
nology for large memories in personal digital computers. DRAM chips are available in
capacities from 64K to 256M bits. Most DRAMs have a 1‐bit word size, so several chips
have to be combined to produce a larger word size.
Because of their large capacity, the address decoding of DRAMs is arranged in a
two‐dimensional array, and larger memories often have multiple arrays. To reduce the
number of pins in the IC package, designers utilize address multiplexing whereby one
set of address input pins accommodates the address components. In a two‐dimensional
array, the address is applied in two parts at different times, with the row address first and
the column address second. Since the same set of pins is used for both parts of the
address, the size of the package is decreased significantly.
We will use a 64K‐word memory to illustrate the address‐multiplexing idea.
A diagram of the decoding configuration is shown in Fig. 7.8. The memory consists of
Section 7.3 Memory Decoding 311
8-bit column
CAS register
8 256
decoder
RAS
Data Data
in out
FIGURE 7.8
Address multiplexing for a 64K DRAM
a two‐dimensional array of cells arranged into 256 rows by 256 columns, for a total of
28 * 28 = 216 = 64K words. There is a single data input line, a single data output line,
and a read/write control, as well as an eight‐bit address input and two address strobes,
the latter included for enabling the row and column address into their respective regis-
ters. The row address strobe (RAS) enables the eight‐bit row register, and the column
address strobe (CAS) enables the eight‐bit column register. The bar on top of the name
of the strobe symbol indicates that the registers are enabled on the zero level of the
signal.
The 16‐bit address is applied to the DRAM in two steps using RAS and CAS. Initially,
both strobes are in the 1 state. The 8‐bit row address is applied to the address inputs and
RAS is changed to 0. This loads the row address into the row address register. RAS also
enables the row decoder so that it can decode the row address and select one row of the
array. After a time equivalent to the settling time of the row selection, RAS goes back
to the 1 level. The 8‐bit column address is then applied to the address inputs, and CAS
is driven to the 0 state. This transfers the column address into the column register and
Section 7.5 Read‐Only Memory 315
7.5 R E A D ‐ O N LY M E M O RY
A read‐only memory (ROM) is essentially a memory device in which permanent binary
information is stored. The binary information must be specified by the designer and is
then embedded in the unit to form the required interconnection pattern. Once the pat-
tern is established, it stays within the unit even when power is turned off and on again.
A block diagram of a ROM consisting of k inputs and n outputs is shown in Fig. 7.9.
The inputs provide the address for memory, and the outputs give the data bits of the
stored word that is selected by the address. The number of words in a ROM is deter-
mined from the fact that k address input lines are needed to specify 2k words. Note that
ROM does not have data inputs, because it does not have a write operation. Integrated
316 Chapter 7 Memory and Programmable Logic
FIGURE 7.9
ROM block diagram
circuit ROM chips have one or more enable inputs and sometimes come with three‐state
outputs to facilitate the construction of large arrays of ROM.
Consider, for example, a 32 * 8 ROM. The unit consists of 32 words of 8 bits each.
There are five input lines that form the binary numbers from 0 through 31 for the
address. Figure 7.10 shows the internal logic construction of this ROM. The five inputs
are decoded into 32 distinct outputs by means of a 5 * 32 decoder. Each output of the
decoder represents a memory address. The 32 outputs of the decoder are connected to
each of the eight OR gates. The diagram shows the array logic convention used in com-
plex circuits. (See Fig. 6.1.) Each OR gate must be considered as having 32 inputs. Each
output of the decoder is connected to one of the inputs of each OR gate. Since each OR
gate has 32 input connections and there are 8 OR gates, the ROM contains 32 * 8 = 256
internal connections. In general, a 2k * n ROM will have an internal k * 2k decoder
and n OR gates. Each OR gate has 2k inputs, which are connected to each of the outputs
of the decoder.
0
1
I0
2
I1 3
.
I2 5 32 .
decoder
.
I3 28
I4 29
30
31
A7 A6 A5 A4 A3 A2 A1 A0
FIGURE 7.10
Internal logic of a 32 : 8 ROM
Section 7.5 Read‐Only Memory 317
Table 7.3
ROM Truth Table (Partial)
Inputs Outputs
I4 I3 I2 I1 I0 A7 A6 A5 A4 A3 A2 A1 A0
0 0 0 0 0 1 0 1 1 0 1 1 0
0 0 0 0 1 0 0 0 1 1 1 0 1
0 0 0 1 0 1 1 0 0 0 1 0 1
0 0 0 1 1 1 0 1 1 0 0 1 0
f f
1 1 1 0 0 0 0 0 0 1 0 0 1
1 1 1 0 1 1 1 1 0 0 0 1 0
1 1 1 1 0 0 1 0 0 1 0 1 0
1 1 1 1 1 0 0 1 1 0 0 1 1
318 Chapter 7 Memory and Programmable Logic
0
1
I0
2
I1 3
.
I2 5 ⫻ 32 .
decoder
.
I3 28
I4 29
30
31
A7 A6 A5 A4 A3 A2 A1 A0
FIGURE 7.11
Programming the ROM according to Table 7.3
EXAMPLE 7.1
Design a combinational circuit using a ROM. The circuit accepts a three‐bit number and
outputs a binary number equal to the square of the input number.
The first step is to derive the truth table of the combinational circuit. In most cases,
this is all that is needed. In other cases, we can use a partial truth table for the ROM by
utilizing certain properties in the output variables. Table 7.4 is the truth table for the
combinational circuit. Three inputs and six outputs are needed to accommodate all
possible binary numbers. We note that output B0 is always equal to input A0, so there
is no need to generate B0 with a ROM, since it is equal to an input variable. Moreover,
output B1 is always 0, so this output is a known constant. We actually need to generate
only four outputs with the ROM; the other two are readily obtained. The minimum size
of ROM needed must have three inputs and four outputs. Three inputs specify eight
words, so the ROM must be of size 8 * 4. The ROM implementation is shown in
Fig. 7.12. The three inputs specify eight words of four bits each. The truth table in
Fig. 7.12(b) specifies the information needed for programming the ROM. The block
diagram of Fig. 7.12(a) shows the required connections of the combinational circuit.
Table 7.4
Truth Table for Circuit of Example 7.1
Inputs Outputs
A2 A1 A0 B5 B4 B3 B2 B1 B0 Decimal
0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 1 1
0 1 0 0 0 0 1 0 0 4
0 1 1 0 0 1 0 0 1 9
1 0 0 0 1 0 0 0 0 16
1 0 1 0 1 1 0 0 1 25
1 1 0 1 0 0 1 0 0 36
1 1 1 1 1 0 0 0 1 49
B0 A2 A1 A0 B5 B4 B3 B2
0 B1 0 0 0 0 0 0 0
0 0 1 0 0 0 0
B2
0 1 0 0 0 0 1
A0
B3 0 1 1 0 0 1 0
A1 8 ⫻ 4 ROM 1 0 0 0 1 0 0
B4 1 0 1 0 1 1 0
A2 1 1 0 1 0 0 1
B5 1 1 1 1 1 0 0
FIGURE 7.12
ROM implementation of Example 7.1
■
320 Chapter 7 Memory and Programmable Logic
Types of ROMs
The required paths in a ROM may be programmed in four different ways. The first
is called mask programming and is done by the semiconductor company during the
last fabrication process of the unit. The procedure for fabricating a ROM requires
that the customer fill out the truth table he or she wishes the ROM to satisfy. The
truth table may be submitted in a special form provided by the manufacturer or in
a specified format on a computer output medium. The manufacturer makes the cor-
responding mask for the paths to produce the 1’s and 0’s according to the customer’s
truth table. This procedure is costly because the vendor charges the customer a
special fee for custom masking the particular ROM. For this reason, mask program-
ming is economical only if a large quantity of the same ROM configuration is to be
ordered.
For small quantities, it is more economical to use a second type of ROM called pro-
grammable read‐only memory, or PROM. When ordered, PROM units contain all the
fuses intact, giving all 1’s in the bits of the stored words. The fuses in the PROM are
blown by the application of a high‐voltage pulse to the device through a special pin.
A blown fuse defines a binary 0 state and an intact fuse gives a binary 1 state. This pro-
cedure allows the user to program the PROM in the laboratory to achieve the desired
relationship between input addresses and stored words. Special instruments called
PROM programmers are available commercially to facilitate the procedure. In any case,
all procedures for programming ROMs are hardware procedures, even though the word
programming is used.
The hardware procedure for programming ROMs or PROMs is irreversible, and once
programmed, the fixed pattern is permanent and cannot be altered. Once a bit pattern
has been established, the unit must be discarded if the bit pattern is to be changed. A
third type of ROM is the erasable PROM, or EPROM, which can be restructured to the
initial state even though it has been programmed previously. When the EPROM is
placed under a special ultraviolet light for a given length of time, the shortwave radiation
discharges the internal floating gates that serve as the programmed connections. After
erasure, the EPROM returns to its initial state and can be reprogrammed to a new set
of values.
The fourth type of ROM is the electrically erasable PROM (EEPROM or E2PROM).
This device is like the EPROM, except that the previously programmed connections can
be erased with an electrical signal instead of ultraviolet light. The advantage is that the
device can be erased without removing it from its socket.
Flash memory devices are similar to EEPROMs, but have additional built‐in circuitry
to selectively program and erase the device in‐circuit, without the need for a special
programmer. They have widespread application in modern technology in cell phones,
digital cameras, set‐top boxes, digital TV, telecommunications, nonvolatile data storage,
and microcontrollers. Their low consumption of power makes them an attractive storage
medium for laptop and notebook computers. Flash memories incorporate additional
circuitry, too, allowing simultaneous erasing of blocks of memory, for example, of size
16 to 64 K bytes. Like EEPROMs, flash memories are subject to fatigue, typically having
about 105 block erase cycles.
Section 7.6 Programmable Logic Array 321
Fixed
Inputs programmable Outputs
AND array
OR array
(decoder)
Combinational PLDs
The PROM is a combinational programmable logic device (PLD)—an integrated circuit
with programmable gates divided into an AND array and an OR array to provide an
AND–OR sum‐of‐product implementation. There are three major types of combina-
tional PLDs, differing in the placement of the programmable connections in the AND–
OR array. Figure 7.13 shows the configuration of the three PLDs. The PROM has a fixed
AND array constructed as a decoder and a programmable OR array. The programmable
OR gates implement the Boolean functions in sum‐of‐minterms form. The PAL has a
programmable AND array and a fixed OR array. The AND gates are programmed to
provide the product terms for the Boolean functions, which are logically summed in each
OR gate. The most flexible PLD is the PLA, in which both the AND and OR arrays can
be programmed. The product terms in the AND array may be shared by any OR gate
to provide the required sum‐of‐products implementation. The names PAL and PLA
emerged from different vendors during the development of PLDs. The implementation
of combinational circuits with PROM was demonstrated in this section. The design of
combinational circuits with PLA and PAL is presented in the next two sections.
2 Chapter 1 / Basic Concepts and Computer Evolution
Learning Objectives
After studying this chapter, you should be able to:
rr Explain the general functions and structure of a digital computer.
rr Present an overview of the evolution of computer technology from early
digital computers to the latest microprocessors.
rr Present an overview of the evolution of the x86 architecture.
rr Define embedded systems and list some of the requirements and constraints
that various embedded systems must meet.
included a number of models. The customer with modest requirements could buy a
cheaper, slower model and, if demand increased, later upgrade to a more expensive,
faster model without having to abandon software that had already been developed.
Over the years, IBM has introduced many new models with improved technology
to replace older models, offering the customer greater speed, lower cost, or both.
These newer models retained the same architecture so that the customer’s soft-
ware investment was protected. Remarkably, the System/370 architecture, with a
few enhancements, has survived to this day as the architecture of IBM’s mainframe
product line.
In a class of computers called microcomputers, the relationship between archi-
tecture and organization is very close. Changes in technology not only influence
organization but also result in the introduction of more powerful and more complex
architectures. Generally, there is less of a requirement for generation-to-generation
compatibility for these smaller machines. Thus, there is more interplay between
organizational and architectural design decisions. An intriguing example of this is
the reduced instruction set computer (RISC), which we examine in Chapter 15.
This book examines both computer organization and computer architecture.
The emphasis is perhaps more on the side of organization. However, because a
computer organization must be designed to implement a particular architectural
specification, a thorough treatment of organization requires a detailed examination
of architecture as well.
lower layers of the hierarchy. The remainder of this section provides a very brief
overview of this plan of attack.
Function
Both the structure and functioning of a computer are, in essence, simple. In general
terms, there are only four basic functions that a computer can perform:
■■ Data processing: Data may take a wide variety of forms, and the range of pro-
cessing requirements is broad. However, we shall see that there are only a few
fundamental methods or types of data processing.
■■ Data storage: Even if the computer is processing data on the fly (i.e., data
come in and get processed, and the results go out immediately), the computer
must temporarily store at least those pieces of data that are being worked on
at any given moment. Thus, there is at least a s hort-term data storage function.
Equally important, the computer performs a long-term data storage function.
Files of data are stored on the computer for subsequent retrieval and update.
■■ Data movement: The computer’s operating environment consists of devices
that serve as either sources or destinations of data. When data are received
from or delivered to a device that is directly connected to the computer, the
process is known as input–output (I/O), and the device is referred to as a
peripheral. When data are moved over longer distances, to or from a remote
device, the process is known as data communications.
■■ Control: Within the computer, a control unit manages the computer’s
resources and orchestrates the performance of its functional parts in response
to instructions.
The preceding discussion may seem absurdly generalized. It is certainly
possible, even at a top level of computer structure, to differentiate a variety of func-
tions, but to quote [SIEW82]:
Structure
We now look in a general way at the internal structure of a computer. We begin with
a traditional computer with a single processor that employs a microprogrammed
control unit, then examine a typical multicore structure.
ingle-processor computer Figure 1.1 provides a hierarchical view
simple s
of the internal structure of a traditional s ingle-processor computer. There are four
main structural components:
■■ Central processing unit (CPU): Controls the operation of the computer and
performs its data processing functions; often simply referred to as processor.
■■ Main memory: Stores data.
1.2 / Structure and Function 5
COMPUTER
I/O Main
memory
System
bus
CPU
CPU
Registers
ALU
Internal
bus
Control
unit
CONTROL
UNIT
Sequencing
logic
Control unit
registers and
decoders
Control
memory
■■ I/O: Moves data between the computer and its external environment.
■■ System interconnection: Some mechanism that provides for communication
among CPU, main memory, and I/O. A common example of system intercon-
nection is by means of a system bus, consisting of a number of conducting
wires to which all the other components attach.
There may be one or more of each of the aforementioned components. Tra-
ditionally, there has been just a single processor. In recent years, there has been
increasing use of multiple processors in a single computer. Some design issues relat-
ing to multiple processors crop up and are discussed as the text proceeds; Part Five
focuses on such computers.
6 Chapter 1 / Basic Concepts and Computer Evolution
Each of these components will be examined in some detail in Part Two. How-
ever, for our purposes, the most interesting and in some ways the most complex
component is the CPU. Its major structural components are as follows:
■■ Control unit: Controls the operation of the CPU and hence the computer.
■■ Arithmetic and logic unit (ALU): Performs the computer’s data processing
functions.
■■ Registers: Provides storage internal to the CPU.
■■ CPU interconnection: Some mechanism that provides for communication
among the control unit, ALU, and registers.
Part Three covers these components, where we will see that complexity is added by
the use of parallel and pipelined organizational techniques. Finally, there are sev-
eral approaches to the implementation of the control unit; one common approach is
a microprogrammed implementation. In essence, a microprogrammed control unit
operates by executing microinstructions that define the functionality of the control
unit. With this approach, the structure of the control unit can be depicted, as in
Figure 1.1. This structure is examined in Part Four.
multicore computer structure As was mentioned, contemporary
computers generally have multiple processors. When these processors all reside
on a single chip, the term multicore computer is used, and each processing unit
(consisting of a control unit, ALU, registers, and perhaps cache) is called a core. To
clarify the terminology, this text will use the following definitions.
■■ Central processing unit (CPU): That portion of a computer that fetches and
executes instructions. It consists of an ALU, a control unit, and registers.
In a system with a single processing unit, it is often simply referred to as a
processor.
■■ Core: An individual processing unit on a processor chip. A core may be equiv-
alent in functionality to a CPU on a single-CPU system. Other specialized pro-
cessing units, such as one optimized for vector and matrix operations, are also
referred to as cores.
■■ Processor: A physical piece of silicon containing one or more cores. The
processor is the computer component that interprets and executes instruc-
tions. If a processor contains multiple cores, it is referred to as a multicore
processor.
After about a decade of discussion, there is broad industry consensus on this usage.
Another prominent feature of contemporary computers is the use of multiple
layers of memory, called cache memory, between the processor and main memory.
Chapter 4 is devoted to the topic of cache memory. For our purposes in this section,
we simply note that a cache memory is smaller and faster than main memory and is
used to speed up memory access, by placing in the cache data from main memory,
that is likely to be used in the near future. A greater performance improvement may
be obtained by using multiple levels of cache, with level 1 (L1) closest to the core
and additional levels (L2, L3, and so on) progressively farther from the core. In this
scheme, level n is smaller and faster than level n + 1.
1.2 / Structure and Function 7
MOTHERBOARD
Main memory chips
Processor
chip
I/O chips
PROCESSOR CHIP
L3 cache L3 cache
CORE
Arithmetic
Instruction Load/
and logic
logic store logic
unit (ALU)
L1 I-cache L1 data cache
L2 instruction L2 data
cache cache
The motherboard contains a slot or socket for the processor chip, which typ-
ically contains multiple individual cores, in what is known as a multicore processor.
There are also slots for memory chips, I/O controller chips, and other key computer
components. For desktop computers, expansion slots enable the inclusion of more
components on expansion boards. Thus, a modern motherboard connects only a
few individual chip components, with each chip containing from a few thousand up
to hundreds of millions of transistors.
Figure 1.2 shows a processor chip that contains eight cores and an L3 cache.
Not shown is the logic required to control operations between the cores and the
cache and between the cores and the external circuitry on the motherboard. The
figure indicates that the L3 cache occupies two distinct portions of the chip surface.
However, typically, all cores have access to the entire L3 cache via the aforemen-
tioned control circuits. The processor chip shown in Figure 1.2 does not represent
any specific product, but provides a general idea of how such chips are laid out.
Next, we zoom in on the structure of a single core, which occupies a portion of
the processor chip. In general terms, the functional elements of a core are:
■■ Instruction logic: This includes the tasks involved in fetching instructions,
and decoding each instruction to determine the instruction operation and the
memory locations of any operands.
■■ Arithmetic and logic unit (ALU): Performs the operation specified by an
instruction.
■■ Load/store logic: Manages the transfer of data to and from main memory via
cache.
The core also contains an L1 cache, split between an instruction cache
(I-cache) that is used for the transfer of instructions to and from main memory, and
an L1 data cache, for the transfer of operands and results. Typically, today’s pro-
cessor chips also include an L2 cache as part of the core. In many cases, this cache
is also split between instruction and data caches, although a combined, single L2
cache is also used.
Keep in mind that this representation of the layout of the core is only intended
to give a general idea of internal core structure. In a given product, the functional
elements may not be laid out as the three distinct elements shown in Figure 1.2,
especially if some or all of these functions are implemented as part of a micropro-
grammed control unit.
examples It will be instructive to look at some real-
world examples that
illustrate the hierarchical structure of computers. Figure 1.3 is a photograph of the
motherboard for a computer built around two Intel Quad-Core Xeon processor
chips. Many of the elements labeled on the photograph are discussed subsequently
in this book. Here, we mention the most important, in addition to the processor
sockets:
■■ CI-Express slots for a high-end display adapter and for additional peripher-
P
als (Section 3.6 describes PCIe).
■■ Ethernet controller and Ethernet ports for network connections.
■■ USB sockets for peripheral devices.
1.2 / Structure and Function 9
Intel® 3420
Chipset
Six Channel DDR3-1333 Memory
2x Quad-Core Intel® Xeon® Processors Serial ATA/300 (SATA)
Interfaces Up to 48GB Interfaces
with Integrated Memory Controllers
2x USB 2.0
Internal
2x USB 2.0
External
BIOS
2x Ethernet Ports
10/100/1000Base-T
Ethernet Controller
■ Serial ATA (SATA) sockets for connection to disk memory (Section 7.7
discusses Ethernet, USB, and SATA).
■ Interfaces for DDR (double data rate) main memory chips (Section 5.3
discusses DDR).
■ Intel 3420 chipset is an I/O controller for direct memory access operations
between peripheral devices and main memory (Section 7.5 discusses DDR).
Following our top-down strategy, as illustrated in Figures 1.1 and 1.2, we can
now zoom in and look at the internal structure of a processor chip. For variety, we
look at an IBM chip instead of the Intel processor chip. Figure 1.4 is a photograph
of the processor chip for the IBM zEnterprise EC12 mainframe computer. This chip
has 2.75 billion transistors. The superimposed labels indicate how the silicon real
estate of the chip is allocated. We see that this chip has six cores, or processors.
In addition, there are two large areas labeled L3 cache, which are shared by all six
processors. The L3 control logic controls traffic between the L3 cache and the cores
and between the L3 cache and the external environment. Additionally, there is stor-
age control (SC) logic between the cores and the L3 cache. The memory controller
(MC) function controls access to memory external to the chip. The GX I/O bus
controls the interface to the channel adapters accessing the I/O.
Going down one level deeper, we examine the internal structure of a single
core, as shown in the photograph of Figure 1.5. Keep in mind that this is a portion
of the silicon surface area making up a single-processor chip. The main sub-areas
within this core area are the following:
10 Chapter 1 / Basic Concepts and Computer Evolution
main sub- areas within this core area are the following:
■ ISU (instruction sequence unit): Determines the sequence in which instructions
are executed in what is referred to as a superscalar architecture (Chapter 16).
■ IFU (instruction fetch unit): Logic for fetching instructions.
■ IDU (instruction decode unit): The IDU is fed from the IFU buffers, and is
responsible for the parsing and decoding of all z/Architecture operation codes.
■ LSU (load-store unit): The LSU contains the 96-kB L1 data cache,1 and man-
ages data traffic between the L2 data cache and the functional execution
units. It is responsible for handling all types of operand accesses of all lengths,
modes, and formats as defined in the z/Architecture.
■ XU (translation unit): This unit translates logical addresses from instructions
into physical addresses in main memory. The XU also contains a translation
lookaside buffer (TLB) used to speed up memory access. TLBs are discussed
in Chapter 8.
■ FXU (fixed-point unit): The FXU executes fixed-point arithmetic operations.
■ BFU (binary floating-point unit): The BFU handles all binary and hexadeci-
mal floating-point operations, as well as fixed-point multiplication operations.
■ DFU (decimal floating-point unit): The DFU handles both fixed-point and
floating-point operations on numbers that are stored as decimal digits.
■ RU (recovery unit): The RU keeps a copy of the complete state of the sys-
tem that includes all registers, collects hardware fault signals, and manages the
hardware recovery actions.
■ COP (dedicated co-processor): The COP is responsible for data compression
and encryption functions for each core.
■ I-cache: This is a 64-kB L1 instruction cache, allowing the IFU to prefetch
instructions before they are needed.
■ L2 control: This is the control logic that manages the traffic through the two
L2 caches.
■ Data-L2: A 1-MB L2 data cache for all memory traffic other than instructions.
■ Instr-L2: A 1-MB L2 instruction cache.
As we progress through the book, the concepts introduced in this section will
become clearer.
1
kB = kilobyte = 2048 bytes. Numerical prefixes are explained in a document under the “Other Useful”
tab at ComputerScienceStudent.com.
Part Two The Computer
System
Chapter
A Top-Level View of
Computer Function and
Interconnection
3.1 Computer Components
3.2 Computer Function
Instruction Fetch and Execute
Interrupts
I/O Function
3.3 Interconnection Structures
3.4 Bus Interconnection
3.5 Point-to-Point Interconnect
QPI Physical Layer
QPI Link Layer
QPI Routing Layer
QPI Protocol Layer
3.6 PCI Express
PCI Physical and Logical Architecture
PCIe Physical Layer
PCIe Transaction Layer
PCIe Data Link Layer
3.7 Key Terms, Review Questions, and Problems
80
3.1 / Computer Components 81
Learning Objectives
After studying this chapter, you should be able to:
rr Understand the basic elements of an instruction cycle and the role of
interrupts.
rr Describe the concept of interconnection within a computer system.
rr Assess the relative advantages of point-to-point interconnection compared to
bus interconnection.
rr Present an overview of QPI.
rr Present an overview of PCIe.
At a top level, a computer consists of CPU (central processing unit), memory, and
I/O components, with one or more modules of each type. These components are
interconnected in some fashion to achieve the basic function of the computer,
which is to execute programs. Thus, at a top level, we can characterize a computer
system by describing (1) the external behavior of each component, that is, the data
and control signals that it exchanges with other components, and (2) the intercon-
nection structure and the controls required to manage the use of the interconnec-
tion structure.
This top-level view of structure and function is important because of its
explanatory power in understanding the nature of a computer. Equally important is
its use to understand the increasingly complex issues of performance evaluation. A
grasp of the top-level structure and function offers insight into system bottlenecks,
alternate pathways, the magnitude of system failures if a component fails, and the
ease of adding performance enhancements. In many cases, requirements for greater
system power and fail-safe capabilities are being met by changing the design rather
than merely increasing the speed and reliability of individual components.
This chapter focuses on the basic structures used for computer component
interconnection. As background, the chapter begins with a brief examination of the
basic components and their interface requirements. Then a functional overview is
provided. We are then prepared to examine the use of buses to interconnect system
components.
Sequence of
Data arithmetic Results
and logic
functions
Instruction Instruction
codes interpreter
Control
signals
General-purpose
arithmetic
Data Results
and logic
functions
and let us add to the general-purpose hardware a segment that can accept a code
and generate control signals (Figure 3.1b).
Programming is now much easier. Instead of rewiring the hardware for each
new program, all we need to do is provide a new sequence of codes. Each code is, in
effect, an instruction, and part of the hardware interprets each instruction and gen-
erates control signals. To distinguish this new method of programming, a sequence
of codes or instructions is called software.
Figure 3.1b indicates two major components of the system: an instruction
interpreter and a module of general-purpose arithmetic and logic functions. These
two constitute the CPU. Several other components are needed to yield a function-
ing computer. Data and instructions must be put into the system. For this we need
some sort of input module. This module contains basic components for accepting
data and instructions in some form and converting them into an internal form
of signals usable by the system. A means of reporting results is needed, and this
is in the form of an output module. Taken together, these are referred to as I/O
components.
One more component is needed. An input device will bring instructions and
data in sequentially. But a program is not invariably executed sequentially; it may
jump around (e.g., the IAS jump instruction). Similarly, operations on data may
require access to more than just one element at a time in a predetermined sequence.
Thus, there must be a place to temporarily store both instructions and data. That
module is called memory, or main memory, to distinguish it from external storage or
peripheral devices. Von Neumann pointed out that the same memory could be used
to store both instructions and data.
Figure 3.2 illustrates these top-level components and suggests the interac-
tions among them. The CPU exchanges data with memory. For this purpose, it typ-
ically makes use of two internal (to the CPU) registers: a memory address register
(MAR), which specifies the address in memory for the next read or write, and a
memory buffer register (MBR), which contains the data to be written into memory
or receives the data read from memory. Similarly, an I/O address register (I/OAR)
specifies a particular I/O device. An I/O buffer register (I/OBR) is used for the
exchange of data between an I/O module and the CPU.
A memory module consists of a set of locations, defined by sequentially num-
bered addresses. Each location contains a binary number that can be interpreted as
either an instruction or data. An I/O module transfers data from external devices to
CPU and memory, and vice versa. It contains internal buffers for temporarily hold-
ing these data until they can be sent on.
Having looked briefly at these major components, we now turn to an overview
of how these components function together to execute programs.
I/O AR
Data
Execution
unit Data
I/O BR
Data
Data
PC = Program counter
Buffers IR = Instruction register
MAR = Memory address register
MBR = Memory buffer register
I/O AR = Input/output address register
I/O BR = Input/output buffer register
Figure 3.2 Computer Components: Top-Level View
the key elements of program execution. In its simplest form, instruction processing
consists of two steps: The processor reads (fetches) instructions from memory one
at a time and executes each instruction. Program execution consists of repeating
the process of instruction fetch and instruction execution. The instruction execution
may involve several operations and depends on the nature of the instruction (see, for
example, the lower portion of Figure 2.4).
The processing required for a single instruction is called an instruction cycle.
Using the simplified two-step description given previously, the instruction cycle is
depicted in Figure 3.3. The two steps are referred to as the fetch cycle and the execute
cycle. Program execution halts only if the machine is turned off, some sort of unrecov-
erable error occurs, or a program instruction that halts the computer is encountered.
always increments the PC after each instruction fetch so that it will fetch the next
instruction in sequence (i.e., the instruction located at the next higher memory
address). So, for example, consider a computer in which each instruction occupies
one 16-bit word of memory. Assume that the program counter is set to memory loca-
tion 300, where the location address refers to a 16-bit word. The processor will next
fetch the instruction at location 300. On succeeding instruction cycles, it will fetch
instructions from locations 301, 302, 303, and so on. This sequence may be altered, as
explained presently.
The fetched instruction is loaded into a register in the processor known as
the instruction register (IR). The instruction contains bits that specify the action
the processor is to take. The processor interprets the instruction and performs the
required action. In general, these actions fall into four categories:
■■ Processor-memory: Data may be transferred from processor to memory or
from memory to processor.
■■ Processor-I/O: Data may be transferred to or from a peripheral device by
transferring between the processor and an I/O module.
■■ Data processing: The processor may perform some arithmetic or logic oper-
ation on data.
■■ Control: An instruction may specify that the sequence of execution be altered.
For example, the processor may fetch an instruction from location 149, which
specifies that the next instruction be from location 182. The processor will
remember this fact by setting the program counter to 182. Thus, on the next
fetch cycle, the instruction will be fetched from location 182 rather than 150.
An instruction’s execution may involve a combination of these actions.
Consider a simple example using a hypothetical machine that includes the
characteristics listed in Figure 3.4. The processor contains a single data register,
called an accumulator (AC). Both instructions and data are 16 bits long. Thus, it is
convenient to organize memory using 16-bit words. The instruction format provides
4 bits for the opcode, so that there can be as many as 24 = 16 different opcodes, and
up to 212 = 4096 (4K) words of memory can be directly addressed.
Figure 3.5 illustrates a partial program execution, showing the relevant por-
tions of memory and processor registers.1 The program fragment shown adds the
contents of the memory word at address 940 to the contents of the memory word at
1
Hexadecimal notation is used, in which each digit represents 4 bits. This is the most convenient nota-
tion for representing the contents of memory and registers when the word length is a multiple of 4. See
Chapter 9 for a basic refresher on number systems (decimal, binary, hexadecimal).
86 Chapter 3 / A Top-Level View of Computer Function and Interconnection
0 3 4 15
Opcode Address
0 1 15
Magnitude
Step 1 Step 2
Step 3 Step 4
Step 5 Step 6
address 941 and stores the result in the latter location. Three instructions, which can
be described as three fetch and three execute cycles, are required:
1. The PC contains 300, the address of the first instruction. This instruction (the
value 1940 in hexadecimal) is loaded into the instruction register IR, and
the PC is incremented. Note that this process involves the use of a memory
address register and a memory buffer register. For simplicity, these intermedi-
ate registers are ignored.
2. The first 4 bits (first hexadecimal digit) in the IR indicate that the AC is to be
loaded. The remaining 12 bits (three hexadecimal digits) specify the address
(940) from which data are to be loaded.
3. The next instruction (5941) is fetched from location 301, and the PC is
incremented.
4. The old contents of the AC and the contents of location 941 are added, and
the result is stored in the AC.
5. The next instruction (2941) is fetched from location 302, and the PC is
incremented.
6. The contents of the AC are stored in location 941.
In this example, three instruction cycles, each consisting of a fetch cycle and an
execute cycle, are needed to add the contents of location 940 to the contents of 941.
With a more complex set of instructions, fewer cycles would be needed. Some older
processors, for example, included instructions that contain more than one memory
address. Thus, the execution cycle for a particular instruction on such processors
could involve more than one reference to memory. Also, instead of memory refer-
ences, an instruction may specify an I/O operation.
For example, the PDP-11 processor includes an instruction, expressed symboli-
cally as ADD B,A, that stores the sum of the contents of memory locations B and A
into memory location A. A single instruction cycle with the following steps occurs:
■ Fetch the ADD instruction.
■ Read the contents of memory location A into the processor.
■ Read the contents of memory location B into the processor. In order that the
contents of A are not lost, the processor must have at least two registers for
storing memory values, rather than a single accumulator.
■ Add the two values.
■ Write the result from the processor to memory location A.
Thus, the execution cycle for a particular instruction may involve more than one
reference to memory. Also, instead of memory references, an instruction may specify
an I/O operation. With these additional considerations in mind, Figure 3.6 provides a
more detailed look at the basic instruction cycle of Figure 3.3. The figure is in the form
of a state diagram. For any given instruction cycle, some states may be null and others
may be visited more than once. The states can be described as follows:
■ Instruction address calculation (iac): Determine the address of the next
instruction to be executed. Usually, this involves adding a fixed number to
Figure 3.6 Instruction Cycle State Diagram
Multiple Multiple
operands results
array) of characters. As Figure 3.6 indicates, this would involve repetitive operand
fetch and/or store operations.
Interrupts
Virtually all computers provide a mechanism by which other modules (I/O, memory)
may interrupt the normal processing of the processor. Table 3.1 lists the most com-
mon classes of interrupts. The specific nature of these interrupts is examined later in
this book, especially in Chapters 7 and 14. However, we need to introduce the concept
now to understand more clearly the nature of the instruction cycle and the impli-
cations of interrupts on the interconnection structure. The reader need not be con-
cerned at this stage about the details of the generation and processing of interrupts,
but only focus on the communication between modules that results from interrupts.
Interrupts are provided primarily as a way to improve processing efficiency.
For example, most external devices are much slower than the processor. Suppose
that the processor is transferring data to a printer using the instruction cycle scheme
of Figure 3.3. After each write operation, the processor must pause and remain
idle until the printer catches up. The length of this pause may be on the order of
many hundreds or even thousands of instruction cycles that do not involve memory.
Clearly, this is a very wasteful use of the processor.
Figure 3.7a illustrates this state of affairs. The user program performs a ser-
ies of WRITE calls interleaved with processing. Code segments 1, 2, and 3 refer to
sequences of instructions that do not involve I/O. The WRITE calls are to an I/O
program that is a system utility and that will perform the actual I/O operation. The
I/O program consists of three sections:
■■ A sequence of instructions, labeled 4 in the figure, to prepare for the actual I/O
operation. This may include copying the data to be output into a special buffer
and preparing the parameters for a device command.
■■ The actual I/O command. Without the use of interrupts, once this command
is issued, the program must wait for the I/O device to perform the requested
function (or periodically poll the device). The program might wait by simply
repeatedly performing a test operation to determine if the I/O operation is done.
■■ A sequence of instructions, labeled 5 in the figure, to complete the operation.
This may include setting a flag indicating the success or failure of the operation.
1 4 1 4 1 4
Interrupt Interrupt
2b Handler Handler
END END
3a
3 3
3b
(a) No interrupts (b) Interrupts; short I/O wait (c) Interrupts; long I/O wait
Because the I/O operation may take a relatively long time to complete, the I/O
program is hung up waiting for the operation to complete; hence, the user program
is stopped at the point of the WRITE call for some considerable period of time.
interrupts and the instruction cycle With interrupts, the processor can
be engaged in executing other instructions while an I/O operation is in progress.
Consider the flow of control in Figure 3.7b. As before, the user program reaches a
point at which it makes a system call in the form of a WRITE call. The I/O program
that is invoked in this case consists only of the preparation code and the actual I/O
command. After these few instructions have been executed, control returns to the
user program. Meanwhile, the external device is busy accepting data from computer
memory and printing it. This I/O operation is conducted concurrently with the
execution of instructions in the user program.
When the external device becomes ready to be serviced—that is, when it is
ready to accept more data from the processor—the I/O module for that external
device sends an interrupt request signal to the processor. The processor responds by
suspending operation of the current program, branching off to a program to service
that particular I/O device, known as an interrupt handler, and resuming the original
execution after the device is serviced. The points at which such interrupts occur are
indicated by an asterisk in Figure 3.7b.
Let us try to clarify what is happening in Figure 3.7. We have a user program
that contains two WRITE commands. There is a segment of code at the beginning,
then one WRITE command, then a second segment of code, then a second WRITE
command, then a third and final segment of code. The WRITE command invokes
the I/O program provided by the OS. Similarly, the I/O program consists of a seg-
ment of code, followed by an I/O command, followed by another segment of code.
The I/O command invokes a hardware I/O operation.
92 Chapter 3 / A Top-Level View of Computer Function and Interconnection
User program Interrupt handler
• •
• •
• •
i
Interrupt
occurs here
i+1
•
•
•
M
Figure 3.8 Transfer of Control via Interrupts
From the point of view of the user program, an interrupt is just that: an interrup-
tion of the normal sequence of execution. When the interrupt processing is completed,
execution resumes (Figure 3.8). Thus, the user program does not have to contain any
special code to accommodate interrupts; the processor and the operating system are
responsible for suspending the user program and then resuming it at the same point.
To accommodate interrupts, an interrupt cycle is added to the instruction
cycle, as shown in Figure 3.9. In the interrupt cycle, the processor checks to see if
any interrupts have occurred, indicated by the presence of an interrupt signal. If no
interrupts are pending, the processor proceeds to the fetch cycle and fetches the
next instruction of the current program. If an interrupt is pending, the processor
does the following:
■■ It suspends execution of the current program being executed and saves its
context. This means saving the address of the next instruction to be executed
Interrupts
disabled
Check for
Fetch next Execute
START instruction instruction
interrupt;
Interrupts process interrupt
enabled
HALT
(current contents of the program counter) and any other data relevant to the
processor’s current activity.
■■ It sets the program counter to the starting address of an interrupt handler routine.
The processor now proceeds to the fetch cycle and fetches the first instruction
in the interrupt handler program, which will service the interrupt. The interrupt
handler program is generally part of the operating system. Typically, this program
determines the nature of the interrupt and performs whatever actions are needed.
In the example we have been using, the handler determines which I/O module gen-
erated the interrupt and may branch to a program that will write more data out to
that I/O module. When the interrupt handler routine is completed, the processor
can resume execution of the user program at the point of interruption.
It is clear that there is some overhead involved in this process. Extra instruc-
tions must be executed (in the interrupt handler) to determine the nature of the inter-
rupt and to decide on the appropriate action. Nevertheless, because of the relatively
large amount of time that would be wasted by simply waiting on an I/O operation,
the processor can be employed much more efficiently with the use of interrupts.
To appreciate the gain in efficiency, consider Figure 3.10, which is a timing
diagram based on the flow of control in Figures 3.7a and 3.7b. In this figure, user
program code segments are shaded green, and I/O program code segments are
Time
1 1
4 4
I/O operation
I/O operation;
2a concurrent with
processor waits
processor executing
5 5
2b
2
4
I/O operation
4 3a concurrent with
processor executing
I/O operation;
processor waits 5
5 3b
shaded gray. Figure 3.10a shows the case in which interrupts are not used. The pro-
cessor must wait while an I/O operation is performed.
Figures 3.7b and 3.10b assume that the time required for the I/O operation is rela-
tively short: less than the time to complete the execution of instructions between write
operations in the user program. In this case, the segment of code labeled code segment 2
is interrupted. A portion of the code (2a) executes (while the I/O operation is performed)
and then the interrupt occurs (upon the completion of the I/O operation). After the inter-
rupt is serviced, execution resumes with the remainder of code segment 2 (2b).
The more typical case, especially for a slow device such as a printer, is that the
I/O operation will take much more time than executing a sequence of user instruc-
tions. Figure 3.7c indicates this state of affairs. In this case, the user program reaches
the second WRITE call before the I/O operation spawned by the first call is com-
plete. The result is that the user program is hung up at that point. When the preced-
ing I/O operation is completed, this new WRITE call may be processed, and a new
I/O operation may be started. Figure 3.11 shows the timing for this situation with
Time
1 1
4 4
5
2
4
3 I/O operation
concurrent with
I/O operation; processor executing;
processor waits then processor
waits
5
5
and without the use of interrupts. We can see that there is still a gain in efficiency
because part of the time during which the I/O operation is under way overlaps with
the execution of user instructions.
Figure 3.12 shows a revised instruction cycle state diagram that includes inter-
rupt cycle processing.
multiple interrupts The discussion so far has focused only on the occurrence
of a single interrupt. Suppose, however, that multiple interrupts can occur. For
example, a program may be receiving data from a communications line and
printing results. The printer will generate an interrupt every time it completes
a print operation. The communication line controller will generate an interrupt
every time a unit of data arrives. The unit could either be a single character or a
block, depending on the nature of the communications discipline. In any case, it is
possible for a communications interrupt to occur while a printer interrupt is being
processed.
Two approaches can be taken to dealing with multiple interrupts. The first
is to disable interrupts while an interrupt is being processed. A disabled interrupt
simply means that the processor can and will ignore that interrupt request signal.
If an interrupt occurs during this time, it generally remains pending and will be
checked by the processor after the processor has enabled interrupts. Thus, when a
user program is executing and an interrupt occurs, interrupts are disabled immedi-
ately. After the interrupt handler routine completes, interrupts are enabled before
resuming the user program, and the processor checks to see if additional interrupts
have occurred. This approach is nice and simple, as interrupts are handled in strict
sequential order (Figure 3.13a).
The drawback to the preceding approach is that it does not take into account
relative priority or time-critical needs. For example, when input arrives from the
communications line, it may need to be absorbed rapidly to make room for more
input. If the first batch of input has not been processed before the second batch
arrives, data may be lost.
A second approach is to define priorities for interrupts and to allow an i nterrupt
of higher priority to cause a lower-priority interrupt handler to be itself interrupted
(Figure 3.13b). As an example of this second approach, consider a system with three
I/O devices: a printer, a disk, and a communications line, with increasing priori-
ties of 2, 4, and 5, respectively. Figure 3.14 illustrates a possible sequence. A user
program begins at t = 0. At t = 10, a printer interrupt occurs; user information is
placed on the system stack and execution continues at the printer interrupt service
routine (ISR). While this routine is still executing, at t = 15, a communications inter-
rupt occurs. Because the communications line has higher priority than the printer,
the interrupt is honored. The printer ISR is interrupted, its state is pushed onto the
stack, and execution continues at the communications ISR. While this routine is exe-
cuting, a disk interrupt occurs (t = 20). Because this interrupt is of lower priority, it
is simply held, and the communications ISR runs to completion.
When the communications ISR is complete (t = 25), the previous proces-
sor state is restored, which is the execution of the printer ISR. However, before
even a single instruction in that routine can be executed, the processor honors the
higher-priority disk interrupt and control transfers to the disk ISR. Only when that
96
Multiple Multiple
operands results
Interrupt
handler Y
Interrupt
User program handler X
Interrupt
handler Y
97
98 Chapter 3 / A Top-Level View of Computer Function and Interconnection
Printer Communication
interrupt interrupt
User program service routine service routine
t=0
15
10 t=
t=
t = 25
Disk
t=
40 t=2 interrupt
5 service routine
t=
35
routine is complete (t = 35) is the printer ISR resumed. When that routine com-
pletes (t = 40), control finally returns to the user program.
I/O Function
Thus far, we have discussed the operation of the computer as controlled by the pro-
cessor, and we have looked primarily at the interaction of processor and memory.
The discussion has only alluded to the role of the I/O component. This role is dis-
cussed in detail in Chapter 7, but a brief summary is in order here.
An I/O module (e.g., a disk controller) can exchange data directly with the
processor. Just as the processor can initiate a read or write with memory, desig-
nating the address of a specific location, the processor can also read data from or
write data to an I/O module. In this latter case, the processor identifies a specific
device that is controlled by a particular I/O module. Thus, an instruction sequence
similar in form to that of Figure 3.5 could occur, with I/O instructions rather than
memory-referencing instructions.
In some cases, it is desirable to allow I/O exchanges to occur directly with
memory. In such a case, the processor grants to an I/O module the authority to read
from or write to memory, so that the I/O-memory transfer can occur without tying
up the processor. During such a transfer, the I/O module issues read or write com-
mands to memory, relieving the processor of responsibility for the exchange. This
operation is known as direct memory access (DMA) and is examined in Chapter 7.
3.3 / Interconnection Structures 99
Read
Memory
Write
N words
Address 0 Data
•
•
•
Data N1
Read
I/O module Internal
Write data
External
Address M ports data
Internal
data Interrupt
signals
External
data
Instructions Address
Control
Data CPU signals
Interrupt Data
signals
2
The wide arrows represent multiple signal lines carrying multiple bits of information in parallel. Each
narrow arrow represents a single signal line.
100 Chapter 3 / A Top-Level View of Computer Function and Interconnection
is indicated by read and write control signals. The location for the operation is
specified by an address.
■■ I/O module: From an internal (to the computer system) point of view, I/O is
functionally similar to memory. There are two operations; read and write. Fur-
ther, an I/O module may control more than one external device. We can refer
to each of the interfaces to an external device as a port and give each a unique
address (e.g., 0, 1, c , M -1). In addition, there are external data paths for
the input and output of data with an external device. Finally, an I/O module
may be able to send interrupt signals to the processor.
■■ Processor: The processor reads in instructions and data, writes out data after
processing, and uses control signals to control the overall operation of the sys-
tem. It also receives interrupt signals.
The preceding list defines the data to be exchanged. The interconnection
structure must support the following types of transfers:
■■ Memory to processor: The processor reads an instruction or a unit of data
from memory.
■■ Processor to memory: The processor writes a unit of data to memory.
■■ I/O to processor: The processor reads data from an I/O device via an I/O
module.
■■ Processor to I/O: The processor sends data to the I/O device.
■■ I/O to or from memory: For these two cases, an I/O module is allowed to
exchange data directly with memory, without going through the processor,
using direct memory access.
Over the years, a number of interconnection structures have been tried. By far
the most common are (1) the bus and various multiple-bus structures, and (2) point-
to-point interconnection structures with packetized data transfer. We devote the
remainder of this chapter for a discussion of these structures.
The bus was the dominant means of computer system component interconnection
for decades. For general-purpose computers, it has gradually given way to various
point-to-point interconnection structures, which now dominate computer system
design. However, bus structures are still commonly used for embedded systems, par-
ticularly microcontrollers. In this section, we give a brief overview of bus structure.
Appendix C provides more detail.
A bus is a communication pathway connecting two or more devices. A key
characteristic of a bus is that it is a shared transmission medium. Multiple devices
connect to the bus, and a signal transmitted by any one device is available for recep-
tion by all other devices attached to the bus. If two devices transmit during the same
time period, their signals will overlap and become garbled. Thus, only one device at
a time can successfully transmit.
3.4 / Bus Interconnection 101
Control lines
there must be a means of controlling their use. Control signals transmit both com-
mand and timing information among system modules. Timing signals indicate the
validity of data and address information. Command signals specify operations to be
performed. Typical control lines include:
■ Memory write: causes data on the bus to be written into the addressed location.
■ Memory read: causes data from the addressed location to be placed on the bus.
■ I/O write: causes data on the bus to be output to the addressed I/O port.
■ I/O read: causes data from the addressed I/O port to be placed on the bus.
■ Transfer ACK: indicates that data have been accepted from or placed on the
bus.
■ Bus request: indicates that a module needs to gain control of the bus.
■ Bus grant: indicates that a requesting module has been granted control of the
bus.
■ Interrupt request: indicates that an interrupt is pending.
■ Interrupt ACK: acknowledges that the pending interrupt has been recognized.
■ Clock: is used to synchronize operations.
■ Reset: initializes all modules.
The operation of the bus is as follows. If one module wishes to send data to
another, it must do two things: (1) obtain the use of the bus, and (2) transfer data
via the bus. If one module wishes to request data from another module, it must
(1) obtain the use of the bus, and (2) transfer a request to the other module over the
appropriate control and address lines. It must then wait for that second module to
send the data.
Go, change the world
COMPUTER ORGANIZATION
Dr. Anala M R
Professor
Information Science and Engineering
RV College of Engineering
Bengaluru – 59
E-mail: [email protected]
Go, change the world
Computer Organization Vs Architecture
• Computer organization refers to the operational units
and their interconnections that realize the architectural specifications.
• Organizational attributes include those hardware details
transparent to the programmer, such as control signals; interfaces between the
computer and peripherals; and the memory technology used.
• Data storage- Short-term data storage function and Long-term data storage
• Control Unit
Go, change the world
Structure
• Single Core Architecture
• Multicore Architecture
Go, change the world
Single Core Architecture -Structure
• CPU: Controls the operation of the computer
and does data processing functions
• Main memory: Stores data
• I/O: Moves data between the computer and its
external environment
• System interconnection: System bus, consists
of a number of conducting wires to which all the
other components attach.
Go, change the world
Components of CPU
• Control unit: Controls the operation of the CPU and hence the computer
• Arithmetic and logic unit (ALU): Performs the computer’s data processing
functions.
• Registers: Provides storage internal to the CPU.
• CPU interconnection: Some mechanism that provides for communication
among the control unit, ALU, and registers.
Go, change the world
Terminologies
• A printed circuit board (PCB) is a rigid, flat board that holds and
interconnects chips and other electronic components
• The main printed circuit board in a computer is called a system board
or motherboard, while smaller ones that plug into the slots in the main board
are called expansion boards
• Prominent elements on the motherboard are the chips
• A chip is a single piece of semiconducting material upon which electronic
circuits and logic gates are fabricated
Go, change the world
Computer Components
• Key Concepts
→Data and instructions are stored in a single read–write memory
→The contents of this memory are addressable by location, without
regard to the type of data contained there
→Execution occurs in a sequential fashion (unless explicitly modified)
from one instruction to the next
Go, change the world
Interconnection structures
• The collection of paths connecting the various modules is called the
interconnection structure.
• A computer consists of 3 types of components -processor, memory, I/O
Go, change the world
Bus Interconnection
Go, change the world
Control lines
• Memory write: causes data on the bus to be written into the addressed location.
• Memory read: causes data from the addressed location to be placed on the bus.
• I/O write: causes data on the bus to be output to the addressed I/O port.
• I/O read: causes data from the addressed I/O port to be placed on the bus.
• Transfer ACK: indicates that data have been accepted from or placed on the bus.
• Bus request: indicates that a module needs to gain control of the bus.
• Bus grant: indicates that a requesting module has been granted control of the bus.
• Interrupt request: indicates that an interrupt is pending.
• Interrupt ACK: acknowledges that the pending interrupt has been recognized.
• Clock: is used to synchronize operations.
• Reset: initializes all modules
Go, change the world
Questions
17
+
Hardwired program
The result of the process of connecting the various components in
the desired configuration
+ Data
Sequence of
arithmetic
and logic
functions
Results
Hardware
and Software I nstruction I nstruction
Approaches
codes interpreter
Control
signals
General-purpose
Data arithmetic Results
and logic
functions
I/O AR
Data
Execution
unit Data
I/O BR Data
Data
PC = Program counter
Buffers IR = Instruction register
MAR = Memory address register
MBR = Memory buffer register
I/O AR = Input/output address register
I/O BR = Input/output buffer register
MAR
I/O address I/O buffer
register (I/OAR) register (I/OBR)
• Specifies a particular • Used for the
I/O device exchange of data
+ between an I/O
module and the CPU
MBR
Fetch Cycle Execute Cycle
Processor- Processor-
memory I/O
Data
Control
processing
0 1 15
S Magnitude
Multiple Multiple
operands results
Classes of Interrupts
User I/O User I/O User I/O
Program Program Program Program Program Program
1 4 1 4 1 4
Interrupt Interrupt
2b Handler Handler
END END
3a
3 3
3b
(a) No interrupts (b) Interrupts; short I/O wait (c) Interrupts; long I/O wait
i
Interrupt
occurs here i+1
Interrupts
Disabled
Check for
Fetch Next Execute
START Interrupt;
Instruction Instruction Interrupts Process Interrupt
Enabled
HALT
1 1
4 4
I/O operation
I/O operation;
processor waits 2a concurrent with
processor executing
5 5
2b
2
4
I/O operation
4 3a concurrent with
processor executing
I/O operation;
processor waits 5
5 3b
1 1
4 4
5
2
4
4
3 I/O operation
concurrent with
I/O operation; processor executing;
processor waits then processor
waits
5
5
Multiple Multiple
operands results
No
Instruction complete, Return for string interrupt
fetch next instruction or vector data
Interrupt
handler Y
Interrupt
User program handler X
Interrupt
handler Y
15
0 t=
t =1
t = 25
t= t = 25 Disk
40 interrupt service routine
t=
35
Data N–1
External
Address M Ports Data
Internal
Data Interrupt
Signals
External
Data
Instructions Address
Control
Data CPU Signals
Interrupt Data
Signals
An I/O
module is
allowed to
exchange
data
Processor Processor
directly
reads an Processor reads data Processor
with
instruction writes a from an I/O sends data
memory
or a unit of unit of data device via to the I/O
without
data from to memory an I/O device
going
memory module
through the
processor
using direct
memory
access
Data Bus
Data lines that provide a path for moving data among system
modules
Control lines
Data lines
COMPUTER ARITHMETIC
Dr. Anala M R
Professor
Information Science and Engineering
RV College of Engineering
Bengaluru – 59
E-mail: [email protected]
Go, change the world
Computer Arithmetic
Integer Representation
• Sign-Magnitude Representation
• One’s Complement Representation
• Two’s Complement Representation
• Range Extension
Go, change the world
Drawbacks
• Addition & subtraction require a consideration of both the signs of the numbers
and their relative magnitudes to carry out the required operation.
• There are two representations of 0
• More difficult to test for 0
–
Go, change the world
• Positive Number
• Negative Number
–
Go, change the world
Integer Arithmetic
• Addition
• Subtraction
• Multiplication
• Division
Go, change the world
Addition
Go, change the world
Subtraction
Go, change the world
Booth Algorithm
Go, change the world
Booth Algorithm
Go, change the world
Booth Algorithm
Go, change the world
Booth Algorithm
Go, change the world
Booth Algorithm
Go, change the world
Booth Algorithm(7*3)
Go, change the world
Example →15*-13
Go, change the world
Division
Division –Examples Go, change the world
Go, change the world
Division(7/3)