0% found this document useful (0 votes)
2 views126 pages

Cie 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views126 pages

Cie 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 126

300 Chapter 7 Memory and Programmable Logic

(a) Conventional symbol (b) Array logic symbol


FIGURE 7.1
Conventional and array logic diagrams for OR gate

paths that behave similarly to fuses. In the original state of the device, all the fuses are
intact. Programming the device involves blowing those fuses along the paths that must
be removed in order to obtain the particular configuration of the desired logic function.
In this chapter, we introduce the configuration of PLDs and indicate procedures for their
use in the design of digital systems. We also present CMOS FPGAs, which are configured
by downloading a stream of bits into the device to configure transmission gates to estab-
lish the internal connectivity required by a specified logic function (combinational or
sequential).
A typical PLD may have hundreds to millions of gates interconnected through hun-
dreds to thousands of internal paths. In order to show the internal logic diagram of such
a device in a concise form, it is necessary to employ a special gate symbology applicable
to array logic. Figure 7.1 shows the conventional and array logic symbols for a multiple‐
input OR gate. Instead of having multiple input lines into the gate, we draw a single line
entering the gate. The input lines are drawn perpendicular to this single line and are
connected to the gate through internal fuses. In a similar fashion, we can draw the array
logic for an AND gate. This type of graphical representation for the inputs of gates will
be used throughout the chapter in array logic diagrams.

7.2 RANDOM-ACCESS MEMORY


A memory unit is a collection of storage cells, together with associated circuits needed
to transfer information into and out of a device. The architecture of memory is such that
information can be selectively retrieved from any of its internal locations. The time it
takes to transfer information to or from any desired random location is always the
same—hence the name random‐access memory, abbreviated RAM. In contrast, the time
required to retrieve information that is stored on magnetic tape depends on the location
of the data.
A memory unit stores binary information in groups of bits called words. A word in
memory is an entity of bits that move in and out of storage as a unit. A memory word
is a group of 1’s and 0’s and may represent a number, an instruction, one or more
alphanumeric characters, or any other binary‐coded information. A group of 8 bits is
called a byte. Most computer memories use words that are multiples of 8 bits in length.
Thus, a 16‐bit word contains two bytes, and a 32‐bit word is made up of four bytes. The
capacity of a memory unit is usually stated as the total number of bytes that the unit
can store.
Section 7.2 Random-Access Memory 301

n data input lines

k address lines
Memory unit
Read 2k words
n bit per word
Write

n data output lines

FIGURE 7.2
Block diagram of a memory unit

Communication between memory and its environment is achieved through data input
and output lines, address selection lines, and control lines that specify the direction of
transfer. A block diagram of a memory unit is shown in Fig. 7.2. The n data input lines
provide the information to be stored in memory, and the n data output lines supply the
information coming out of memory. The k address lines specify the particular word
chosen among the many available. The two control inputs specify the direction of trans-
fer desired: The Write input causes binary data to be transferred into the memory, and
the Read input causes binary data to be transferred out of memory.
The memory unit is specified by the number of words it contains and the number of
bits in each word. The address lines select one particular word. Each word in memory
is assigned an identification number, called an address, starting from 0 up to 2k - 1,
where k is the number of address lines. The selection of a specific word inside memory
is done by applying the k‐bit address to the address lines. An internal decoder accepts
this address and opens the paths needed to select the word specified. Memories vary
greatly in size and may range from 1,024 words, requiring an address of 10 bits, to 232
words, requiring 32 address bits. It is customary to refer to the number of words (or
bytes) in memory with one of the letters K (kilo), M (mega), and G (giga). K is equal to
210, M is equal to 220, and G is equal to 230. Thus, 64K = 216, 2M = 221, and 4G = 232.
Consider, for example, a memory unit with a capacity of 1K words of 16 bits each.
Since 1K = 1,024 = 210 and 16 bits constitute two bytes, we can say that the memory
can accommodate 2,048 = 2K bytes . Figure 7.3 shows possible contents of the first
three and the last three words of this memory. Each word contains 16 bits that can be
divided into two bytes. The words are recognized by their decimal address from 0 to
1,023. The equivalent binary address consists of 10 bits. The first address is specified with
ten 0’s; the last address is specified with ten 1’s, because 1,023 in binary is equal to
1111111111. A word in memory is selected by its binary address. When a word is read or
written, the memory operates on all 16 bits as a single unit.
The 1K * 16 memory of Fig. 7.3 has 10 bits in the address and 16 bits in each word.
As another example, a 64K * 10 memory will have 16 bits in the address (since
64K = 216) and each word will consist of 10 bits. The number of address bits needed in
302 Chapter 7 Memory and Programmable Logic

Memory address

Binary Decimal Memory content

0000000000 0 1011010101011101

0000000001 1 1010101110001001

0000000010 2 0000110101000110

• •
• •
• •
• •

1111111101 1021 1001110100010100

1111111110 1022 0000110100011110

1111111111 1023 1101111000100101

FIGURE 7.3
Contents of a 1024 * 16 memory

a memory is dependent on the total number of words that can be stored in the memory
and is independent of the number of bits in each word. The number of bits in the address
is determined from the relationship 2k Ú m, where m is the total number of words and
k is the number of address bits needed to satisfy the relationship.

Write and Read Operations


The two operations that RAM can perform are the write and read operations. As alluded
to earlier, the write signal specifies a transfer‐in operation and the read signal specifies
a transfer‐out operation. On accepting one of these control signals, the internal circuits
inside the memory provide the desired operation.
The steps that must be taken for the purpose of transferring a new word to be stored
into memory are as follows:
1. Apply the binary address of the desired word to the address lines.
2. Apply the data bits that must be stored in memory to the data input lines.
3. Activate the write input.
The memory unit will then take the bits from the input data lines and store them in the
word specified by the address lines.
The steps that must be taken for the purpose of transferring a stored word out of
memory are as follows:
1. Apply the binary address of the desired word to the address lines.
2. Activate the read input.
Section 7.2 Random-Access Memory 303

Table 7.1
Control Inputs to Memory Chip
Memory Enable Read/Write Memory Operation
0 X None
1 0 Write to selected word
1 1 Read from selected word

The memory unit will then take the bits from the word that has been selected by the
address and apply them to the output data lines. The contents of the selected word do
not change after the read operation, i.e., the word operation is nondestructive.
Commercial memory components available in integrated‐circuit chips sometimes
provide the two control inputs for reading and writing in a somewhat different configu-
ration. Instead of having separate read and write inputs to control the two operations,
most integrated circuits provide two other control inputs: One input selects the unit and
the other determines the operation. The memory operations that result from these
control inputs are specified in Table 7.1.
The memory enable (sometimes called the chip select) is used to enable the particu-
lar memory chip in a multichip implementation of a large memory. When the memory
enable is inactive, the memory chip is not selected and no operation is performed. When
the memory enable input is active, the read/write input determines the operation to be
performed.

Memory Description in HDL


Memory is modeled in the Verilog hardware description language (HDL) by an array
of registers. It is declared with a reg keyword, using a two‐dimensional array. The first
number in the array specifies the number of bits in a word (the word length) and the
second gives the number of words in memory (memory depth). For example, a memory
of 1,024 words with 16 bits per word is declared as
reg[15: 0] memword [0: 1023];

This statement describes a two‐dimensional array of 1,024 registers, each containing 16


bits. The second array range in the declaration of memword specifies the total number
of words in memory and is equivalent to the address of the memory. For example,
memword[512] refers to the 16‐bit memory word at address 512.
The operation of a memory unit is illustrated in HDL Example 7.1. The memory has
64 words of four bits each. There are two control inputs: Enable and ReadWrite. The
DataIn and DataOut lines have four bits each. The input Address must have six bits
(since 26 = 64). The memory is declared as a two‐dimensional array of registers, with
Mem used as an identifier that can be referenced with an index to access any of the
64 words. A memory operation requires that the Enable input be active. The ReadWrite
input determines the type of operation. If ReadWrite is 1, the memory performs a read
operation symbolized by the statement
304 Chapter 7 Memory and Programmable Logic

DataOut d Mem [Address];

Execution of this statement causes a transfer of four bits from the selected memory word
specified by Address onto the DataOut lines. If ReadWrite is 0, the memory performs a
write operation symbolized by the statement
Mem [Address] d DataIn;

Execution of this statement causes a transfer from the four‐bit DataIn lines into the
memory word selected by Address. When Enable is equal to 0, the memory is disabled
and the outputs are assumed to be in a high‐impedance state, indicated by the symbol z.
Thus, the memory has three‐state outputs.

HDL Example 7.1

// Read and write operations of memory


// Memory size is 64 words of four bits each.
module memory (Enable, ReadWrite, Address, DataIn, DataOut);
input Enable, ReadWrite;
input [3: 0] DataIn;
input [5: 0] Address;
output [3: 0] DataOut;
reg [3: 0] DataOut;
reg [3: 0] Mem [0: 63]; // 64 x 4 memory
always @ (Enable or ReadWrite)
if (Enable)
if (ReadWrite) DataOut = Mem [Address]; // Read
else Mem [Address] = DataIn; // Write
else DataOut = 4'bz; // High impedance state
endmodule

Timing Waveforms
The operation of the memory unit is controlled by an external device such as a central
processing unit (CPU). The CPU is usually synchronized by its own clock. The memory,
however, does not employ an internal clock. Instead, its read and write operations are
specified by control inputs. The access time of memory is the time required to select a
word and read it. The cycle time of memory is the time required to complete a write
operation. The CPU must provide the memory control signals in such a way as to syn-
chronize its internal clocked operations with the read and write operations of memory.
This means that the access time and cycle time of the memory must be within a time
equal to a fixed number of CPU clock cycles.
Suppose as an example that a CPU operates with a clock frequency of 50 MHz, giv-
ing a period of 20 ns for one clock cycle. Suppose also that the CPU communicates with
a memory whose access time and cycle time do not exceed 50 ns. This means that the
Section 7.2 Random-Access Memory 305

20 nsec

T1 T2 T3 T1
Clock

Memory
Address valid
address

Memory
enable
Initiate writing Latched
Read/
Write

Data
Data valid
input
(a) Write cycle

50 nsec

T1 T2 T3 T1
Clock

Memory
Address valid
address

Memory
enable Initiate read

Read/
Write

Data
Data valid
output
(b) Read cycle
FIGURE 7.4
Memory cycle timing waveforms

write cycle terminates the storage of the selected word within a 50‐ns interval and that
the read cycle provides the output data of the selected word within 50 ns or less. (The
two numbers are not always the same.) Since the period of the CPU cycle is 20 ns, it will
be necessary to devote at least two‐and‐a‐half, and possibly three, clock cycles for each
memory request.
The memory timing shown in Fig. 7.4 is for a CPU with a 50‐MHz clock and a memory
with 50 ns maximum cycle time. The write cycle in part (a) shows three 20‐ns cycles: T 1,
306 Chapter 7 Memory and Programmable Logic

T 2, and T 3. For a write operation, the CPU must provide the address and input data to
the memory. This is done at the beginning of T 1. (The two lines that cross each other in
the address and data waveforms designate a possible change in value of the multiple
lines.) The memory enable and the read/write signals must be activated after the signals
in the address lines are stable in order to avoid destroying data in other memory words.
The memory enable signal switches to the high level and the read/write signal switches
to the low level to indicate a write operation. The two control signals must stay active
for at least 50 ns. The address and data signals must remain stable for a short time after
the control signals are deactivated. At the completion of the third clock cycle, the mem-
ory write operation is completed and the CPU can access the memory again with the
next T 1 cycle.
The read cycle shown in Fig. 7.4(b) has an address for the memory provided by the
CPU. The memory‐enable and read/write signals must be in their high level for a read
operation. The memory places the data of the word selected by the address into the out-
put data lines within a 50‐ns interval (or less) from the time that the memory enable is
activated. The CPU can transfer the data into one of its internal registers during the
negative transition of T 3. The next T 1 cycle is available for another memory request.

Types of Memories
The mode of access of a memory system is determined by the type of components used.
In a random‐access memory, the word locations may be thought of as being separated
in space, each word occupying one particular location. In a sequential‐access memory,
the information stored in some medium is not immediately accessible, but is available
only at certain intervals of time. A magnetic disk or tape unit is of this type. Each
memory location passes the read and write heads in turn, but information is read out
only when the requested word has been reached. In a random‐access memory, the access
time is always the same regardless of the particular location of the word. In a sequential‐
access memory, the time it takes to access a word depends on the position of the word
with respect to the position of the read head; therefore, the access time is variable.
Integrated circuit RAM units are available in two operating modes: static and
dynamic. Static RAM (SRAM) consists essentially of internal latches that store the
binary information. The stored information remains valid as long as power is applied to
the unit. Dynamic RAM (DRAM) stores the binary information in the form of electric
charges on capacitors provided inside the chip by MOS transistors. The stored charge
on the capacitors tends to discharge with time, and the capacitors must be periodically
recharged by refreshing the dynamic memory. Refreshing is done by cycling through the
words every few milliseconds to restore the decaying charge. DRAM offers reduced
power consumption and larger storage capacity in a single memory chip. SRAM is
easier to use and has shorter read and write cycles.
Memory units that lose stored information when power is turned off are said to be
volatile. CMOS integrated circuit RAMs, both static and dynamic, are of this category, since
the binary cells need external power to maintain the stored information. In contrast, a
nonvolatile memory, such as magnetic disk, retains its stored information after the removal
Section 7.3 Memory Decoding 307

of power. This type of memory is able to retain information because the data stored on
magnetic components are represented by the direction of magnetization, which is retained
after power is turned off. ROM is another nonvolatile memory. A nonvolatile memory
enables digital computers to store programs that will be needed again after the computer
is turned on. Programs and data that cannot be altered are stored in ROM, while other
large programs are maintained on magnetic disks. The latter programs are transferred into
the computer RAM as needed. Before the power is turned off, the binary information from
the computer RAM is transferred to the disk so that the information will be retained.

7.3 MEMORY DECODING


In addition to requiring storage components in a memory unit, there is a need for decod-
ing circuits to select the memory word specified by the input address. In this section, we
present the internal construction of a RAM and demonstrate the operation of the
decoder. To be able to include the entire memory in one diagram, the memory unit
presented here has a small capacity of 16 bits, arranged in four words of 4 bits each. An
example of a two‐dimensional coincident decoding arrangement is presented to show a
more efficient decoding scheme that is used in large memories. We then give an example
of address multiplexing commonly used in DRAM integrated circuits.

Internal Construction
The internal construction of a RAM of m words and n bits per word consists of m * n
binary storage cells and associated decoding circuits for selecting individual words. The
binary storage cell is the basic building block of a memory unit. The equivalent logic of
a binary cell that stores one bit of information is shown in Fig. 7.5. The storage part of
the cell is modeled by an SR latch with associated gates to form a D latch. Actually, the

Select

Select

S Output Input BC Output


Input

R
Read/Write

Read/Write

(a) Logic diagram (b) Block diagram

FIGURE 7.5
Memory cell
308 Chapter 7 Memory and Programmable Logic

cell is an electronic circuit with four to six transistors. Nevertheless, it is possible and
convenient to model it in terms of logic symbols. A binary storage cell must be very small
in order to be able to pack as many cells as possible in the small area available in the
integrated circuit chip. The binary cell stores one bit in its internal latch. The select input
enables the cell for reading or writing, and the read/write input determines the operation
of the cell when it is selected. A 1 in the read/write input provides the read operation by
forming a path from the latch to the output terminal. A 0 in the read/write input provides
the write operation by forming a path from the input terminal to the latch.
The logical construction of a small RAM is shown in Fig. 7.6. This RAM consists of
four words of four bits each and has a total of 16 binary cells. The small blocks labeled
BC represent the binary cell with its three inputs and one output, as specified in
Fig. 7.5(b). A memory with four words needs two address lines. The two address inputs
go through a 2 * 4 decoder to select one of the four words. The decoder is enabled with

Input data

Word 0

BC BC BC BC

Address
inputs Word 1

24 BC BC BC BC
decoder

Word 2

BC BC BC BC

Memory
EN
enable
Word 3

BC BC BC BC

Read/Write

Output data

FIGURE 7.6
Diagram of a 4 * 4 RAM
Section 7.3 Memory Decoding 309

the memory‐enable input. When the memory enable is 0, all outputs of the decoder are
0 and none of the memory words are selected. With the memory select at 1, one of the
four words is selected, dictated by the value in the two address lines. Once a word has
been selected, the read/write input determines the operation. During the read opera-
tion, the four bits of the selected word go through OR gates to the output terminals.
(Note that the OR gates are drawn according to the array logic established in Fig. 7.1.)
During the write operation, the data available in the input lines are transferred into the
four binary cells of the selected word. The binary cells that are not selected are disabled,
and their previous binary values remain unchanged. When the memory select input that
goes into the decoder is equal to 0, none of the words are selected and the contents of
all cells remain unchanged regardless of the value of the read/write input.
Commercial RAMs may have a capacity of thousands of words, and each word may
range from 1 to 64 bits. The logical construction of a large‐capacity memory would be a
direct extension of the configuration shown here. A memory with 2k words of n bits per
word requires k address lines that go into a k * 2k decoder. Each one of the decoder
outputs selects one word of n bits for reading or writing.

Coincident Decoding
A decoder with k inputs and 2k outputs requires 2k AND gates with k inputs per gate.
The total number of gates and the number of inputs per gate can be reduced by
employing two decoders in a two‐dimensional selection scheme. The basic idea in
two‐dimensional decoding is to arrange the memory cells in an array that is close as
possible to square. In this configuration, two k/2‐input decoders are used instead of
one k‐input decoder. One decoder performs the row selection and the other the col-
umn selection in a two‐dimensional matrix configuration.
The two‐dimensional selection pattern is demonstrated in Fig. 7.7 for a 1K‐word
memory. Instead of using a single 10 * 1,024 decoder, we use two 5 * 32 decoders.
With the single decoder, we would need 1,024 AND gates with 10 inputs in each. In the
two‐decoder case, we need 64 AND gates with 5 inputs in each. The five most significant
bits of the address go to input X and the five least significant bits go to input Y. Each
word within the memory array is selected by the coincidence of one X line and one Y
line. Thus, each word in memory is selected by the coincidence between 1 of 32 rows and
1 of 32 columns, for a total of 1,024 words. Note that each intersection represents a word
that may have any number of bits.
As an example, consider the word whose address is 404. The 10‐bit binary equivalent
of 404 is 01100 10100. This makes X = 01100 (binary 12) and Y = 10100 (binary 20).
The n‐bit word that is selected lies in the X decoder output number 12 and the Y decoder
output number 20. All the bits of the word are selected for reading or writing.

Address Multiplexing
The SRAM memory cell modeled in Fig. 7.5 typically contains six transistors. In order to
build memories with higher density, it is necessary to reduce the number of transistors in
a cell. The DRAM cell contains a single MOS transistor and a capacitor. The charge stored
310 Chapter 7 Memory and Programmable Logic

5  32 decoder

0 1 2 . . . . 20 . . . 31

0
1
2
. binary address
.
5  32 . 01100 10100
X
decoder
12 X Y
.
.
.
31

FIGURE 7.7
Two‐dimensional decoding structure for a 1K‐word memory

on the capacitor discharges with time, and the memory cells must be periodically recharged
by refreshing the memory. Because of their simple cell structure, DRAMs typically have
four times the density of SRAMs. This allows four times as much memory capacity to be
placed on a given size of chip. The cost per bit of DRAM storage is three to four times
less than that of SRAM storage. A further cost savings is realized because of the lower
power requirement of DRAM cells. These advantages make DRAM the preferred tech-
nology for large memories in personal digital computers. DRAM chips are available in
capacities from 64K to 256M bits. Most DRAMs have a 1‐bit word size, so several chips
have to be combined to produce a larger word size.
Because of their large capacity, the address decoding of DRAMs is arranged in a
two‐dimensional array, and larger memories often have multiple arrays. To reduce the
number of pins in the IC package, designers utilize address multiplexing whereby one
set of address input pins accommodates the address components. In a two‐dimensional
array, the address is applied in two parts at different times, with the row address first and
the column address second. Since the same set of pins is used for both parts of the
address, the size of the package is decreased significantly.
We will use a 64K‐word memory to illustrate the address‐multiplexing idea.
A diagram of the decoding configuration is shown in Fig. 7.8. The memory consists of
Section 7.3 Memory Decoding 311

8-bit column
CAS register

8  256
decoder

RAS

8-bit 256  256


8-bit 8  256
row memory Read/Write
address decoder
register cell array

Data Data
in out
FIGURE 7.8
Address multiplexing for a 64K DRAM

a two‐dimensional array of cells arranged into 256 rows by 256 columns, for a total of
28 * 28 = 216 = 64K words. There is a single data input line, a single data output line,
and a read/write control, as well as an eight‐bit address input and two address strobes,
the latter included for enabling the row and column address into their respective regis-
ters. The row address strobe (RAS) enables the eight‐bit row register, and the column
address strobe (CAS) enables the eight‐bit column register. The bar on top of the name
of the strobe symbol indicates that the registers are enabled on the zero level of the
signal.
The 16‐bit address is applied to the DRAM in two steps using RAS and CAS. Initially,
both strobes are in the 1 state. The 8‐bit row address is applied to the address inputs and
RAS is changed to 0. This loads the row address into the row address register. RAS also
enables the row decoder so that it can decode the row address and select one row of the
array. After a time equivalent to the settling time of the row selection, RAS goes back
to the 1 level. The 8‐bit column address is then applied to the address inputs, and CAS
is driven to the 0 state. This transfers the column address into the column register and
Section 7.5 Read‐Only Memory 315

7.5 R E A D ‐ O N LY M E M O RY
A read‐only memory (ROM) is essentially a memory device in which permanent binary
information is stored. The binary information must be specified by the designer and is
then embedded in the unit to form the required interconnection pattern. Once the pat-
tern is established, it stays within the unit even when power is turned off and on again.
A block diagram of a ROM consisting of k inputs and n outputs is shown in Fig. 7.9.
The inputs provide the address for memory, and the outputs give the data bits of the
stored word that is selected by the address. The number of words in a ROM is deter-
mined from the fact that k address input lines are needed to specify 2k words. Note that
ROM does not have data inputs, because it does not have a write operation. Integrated
316 Chapter 7 Memory and Programmable Logic

k inputs (address) 2k  n n outputs (data)


ROM

FIGURE 7.9
ROM block diagram

circuit ROM chips have one or more enable inputs and sometimes come with three‐state
outputs to facilitate the construction of large arrays of ROM.
Consider, for example, a 32 * 8 ROM. The unit consists of 32 words of 8 bits each.
There are five input lines that form the binary numbers from 0 through 31 for the
address. Figure 7.10 shows the internal logic construction of this ROM. The five inputs
are decoded into 32 distinct outputs by means of a 5 * 32 decoder. Each output of the
decoder represents a memory address. The 32 outputs of the decoder are connected to
each of the eight OR gates. The diagram shows the array logic convention used in com-
plex circuits. (See Fig. 6.1.) Each OR gate must be considered as having 32 inputs. Each
output of the decoder is connected to one of the inputs of each OR gate. Since each OR
gate has 32 input connections and there are 8 OR gates, the ROM contains 32 * 8 = 256
internal connections. In general, a 2k * n ROM will have an internal k * 2k decoder
and n OR gates. Each OR gate has 2k inputs, which are connected to each of the outputs
of the decoder.

0
1
I0
2
I1 3
.
I2 5  32 .
decoder
.
I3 28
I4 29
30
31

A7 A6 A5 A4 A3 A2 A1 A0
FIGURE 7.10
Internal logic of a 32 : 8 ROM
Section 7.5 Read‐Only Memory 317

The 256 intersections in Fig. 7.10 are programmable. A programmable connection


between two lines is logically equivalent to a switch that can be altered to be either
closed (meaning that the two lines are connected) or open (meaning that the two
lines are disconnected). The programmable intersection between two lines is some-
times called a crosspoint. Various physical devices are used to implement crosspoint
switches. One of the simplest technologies employs a fuse that normally connects the
two points, but is opened or “blown” by the application of a high‐voltage pulse into
the fuse.
The internal binary storage of a ROM is specified by a truth table that shows the
word content in each address. For example, the content of a 32 * 8 ROM may be
specified with a truth table similar to the one shown in Table 7.3. The truth table shows
the five inputs under which are listed all 32 addresses. Each address stores a word of
8 bits, which is listed in the outputs columns. The table shows only the first four and
the last four words in the ROM. The complete table must include the list of all
32 words.
The hardware procedure that programs the ROM blows fuse links in accordance with
a given truth table. For example, programming the ROM according to the truth table
given by Table 7.3 results in the configuration shown in Fig. 7.11. Every 0 listed in the
truth table specifies the absence of a connection, and every 1 listed specifies a path that
is obtained by a connection. For example, the table specifies the eight‐bit word 10110010
for permanent storage at address 3. The four 0’s in the word are programmed by blowing
the fuse links between output 3 of the decoder and the inputs of the OR gates associated
with outputs A6, A3, A2, and A0. The four 1’s in the word are marked with a * to denote
a temporary connection, in place of a dot used for a permanent connection in logic
diagrams. When the input of the ROM is 00011, all the outputs of the decoder are 0
except for output 3, which is at logic 1. The signal equivalent to logic 1 at decoder output
3 propagates through the connections to the OR gate outputs of A7, A5, A4, and A1. The
other four outputs remain at 0. The result is that the stored word 10110010 is applied to
the eight data outputs.

Table 7.3
ROM Truth Table (Partial)
Inputs Outputs
I4 I3 I2 I1 I0 A7 A6 A5 A4 A3 A2 A1 A0

0 0 0 0 0 1 0 1 1 0 1 1 0
0 0 0 0 1 0 0 0 1 1 1 0 1
0 0 0 1 0 1 1 0 0 0 1 0 1
0 0 0 1 1 1 0 1 1 0 0 1 0
f f
1 1 1 0 0 0 0 0 0 1 0 0 1
1 1 1 0 1 1 1 1 0 0 0 1 0
1 1 1 1 0 0 1 0 0 1 0 1 0
1 1 1 1 1 0 0 1 1 0 0 1 1
318 Chapter 7 Memory and Programmable Logic

0
1
I0
2
I1 3
.
I2 5 ⫻ 32 .
decoder
.
I3 28
I4 29
30
31

A7 A6 A5 A4 A3 A2 A1 A0
FIGURE 7.11
Programming the ROM according to Table 7.3

Combinational Circuit Implementation


In Section 4.9, it was shown that a decoder generates the 2k minterms of the k input
variables. By inserting OR gates to sum the minterms of Boolean functions, we were
able to generate any desired combinational circuit. The ROM is essentially a device that
includes both the decoder and the OR gates within a single device to form a minterm
generator. By choosing connections for those minterms which are included in the func-
tion, the ROM outputs can be programmed to represent the Boolean functions of the
output variables in a combinational circuit.
The internal operation of a ROM can be interpreted in two ways. The first interpreta-
tion is that of a memory unit that contains a fixed pattern of stored words. The second
interpretation is that of a unit which implements a combinational circuit. From this point
of view, each output terminal is considered separately as the output of a Boolean func-
tion expressed as a sum of minterms. For example, the ROM of Fig. 7.11 may be consid-
ered to be a combinational circuit with eight outputs, each a function of the five input
variables. Output A7 can be expressed in sum of minterms as
A7(I4, I3, I2, I1, I0) = ⌺(0, 2, 3, c, 29)
(The three dots represent minterms 4 through 27, which are not specified in the figure.)
A connection marked with * in the figure produces a minterm for the sum. All other
crosspoints are not connected and are not included in the sum.
In practice, when a combinational circuit is designed by means of a ROM, it is not
necessary to design the logic or to show the internal gate connections inside the unit. All
that the designer has to do is specify the particular ROM by its IC number and provide
the applicable truth table. The truth table gives all the information for programming the
ROM. No internal logic diagram is needed to accompany the truth table.
Section 7.5 Read‐Only Memory 319

EXAMPLE 7.1
Design a combinational circuit using a ROM. The circuit accepts a three‐bit number and
outputs a binary number equal to the square of the input number.
The first step is to derive the truth table of the combinational circuit. In most cases,
this is all that is needed. In other cases, we can use a partial truth table for the ROM by
utilizing certain properties in the output variables. Table 7.4 is the truth table for the
combinational circuit. Three inputs and six outputs are needed to accommodate all
possible binary numbers. We note that output B0 is always equal to input A0, so there
is no need to generate B0 with a ROM, since it is equal to an input variable. Moreover,
output B1 is always 0, so this output is a known constant. We actually need to generate
only four outputs with the ROM; the other two are readily obtained. The minimum size
of ROM needed must have three inputs and four outputs. Three inputs specify eight
words, so the ROM must be of size 8 * 4. The ROM implementation is shown in
Fig. 7.12. The three inputs specify eight words of four bits each. The truth table in
Fig. 7.12(b) specifies the information needed for programming the ROM. The block
diagram of Fig. 7.12(a) shows the required connections of the combinational circuit.

Table 7.4
Truth Table for Circuit of Example 7.1
Inputs Outputs
A2 A1 A0 B5 B4 B3 B2 B1 B0 Decimal

0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 1 1
0 1 0 0 0 0 1 0 0 4
0 1 1 0 0 1 0 0 1 9
1 0 0 0 1 0 0 0 0 16
1 0 1 0 1 1 0 0 1 25
1 1 0 1 0 0 1 0 0 36
1 1 1 1 1 0 0 0 1 49

B0 A2 A1 A0 B5 B4 B3 B2

0 B1 0 0 0 0 0 0 0
0 0 1 0 0 0 0
B2
0 1 0 0 0 0 1
A0
B3 0 1 1 0 0 1 0
A1 8 ⫻ 4 ROM 1 0 0 0 1 0 0
B4 1 0 1 0 1 1 0
A2 1 1 0 1 0 0 1
B5 1 1 1 1 1 0 0

(a) Block diagram (b) ROM truth table

FIGURE 7.12
ROM implementation of Example 7.1

320 Chapter 7 Memory and Programmable Logic

Types of ROMs
The required paths in a ROM may be programmed in four different ways. The first
is called mask programming and is done by the semiconductor company during the
last fabrication process of the unit. The procedure for fabricating a ROM requires
that the customer fill out the truth table he or she wishes the ROM to satisfy. The
truth table may be submitted in a special form provided by the manufacturer or in
a specified format on a computer output medium. The manufacturer makes the cor-
responding mask for the paths to produce the 1’s and 0’s according to the customer’s
truth table. This procedure is costly because the vendor charges the customer a
special fee for custom masking the particular ROM. For this reason, mask program-
ming is economical only if a large quantity of the same ROM configuration is to be
ordered.
For small quantities, it is more economical to use a second type of ROM called pro-
grammable read‐only memory, or PROM. When ordered, PROM units contain all the
fuses intact, giving all 1’s in the bits of the stored words. The fuses in the PROM are
blown by the application of a high‐voltage pulse to the device through a special pin.
A blown fuse defines a binary 0 state and an intact fuse gives a binary 1 state. This pro-
cedure allows the user to program the PROM in the laboratory to achieve the desired
relationship between input addresses and stored words. Special instruments called
PROM programmers are available commercially to facilitate the procedure. In any case,
all procedures for programming ROMs are hardware procedures, even though the word
programming is used.
The hardware procedure for programming ROMs or PROMs is irreversible, and once
programmed, the fixed pattern is permanent and cannot be altered. Once a bit pattern
has been established, the unit must be discarded if the bit pattern is to be changed. A
third type of ROM is the erasable PROM, or EPROM, which can be restructured to the
initial state even though it has been programmed previously. When the EPROM is
placed under a special ultraviolet light for a given length of time, the shortwave radiation
discharges the internal floating gates that serve as the programmed connections. After
erasure, the EPROM returns to its initial state and can be reprogrammed to a new set
of values.
The fourth type of ROM is the electrically erasable PROM (EEPROM or E2PROM).
This device is like the EPROM, except that the previously programmed connections can
be erased with an electrical signal instead of ultraviolet light. The advantage is that the
device can be erased without removing it from its socket.
Flash memory devices are similar to EEPROMs, but have additional built‐in circuitry
to selectively program and erase the device in‐circuit, without the need for a special
programmer. They have widespread application in modern technology in cell phones,
digital cameras, set‐top boxes, digital TV, telecommunications, nonvolatile data storage,
and microcontrollers. Their low consumption of power makes them an attractive storage
medium for laptop and notebook computers. Flash memories incorporate additional
circuitry, too, allowing simultaneous erasing of blocks of memory, for example, of size
16 to 64 K bytes. Like EEPROMs, flash memories are subject to fatigue, typically having
about 105 block erase cycles.
Section 7.6 Programmable Logic Array 321

Fixed
Inputs programmable Outputs
AND array
OR array
(decoder)

(a) Programmable read-only memory (PROM)

Inputs programmable Fixed Outputs


AND array OR array

(b) Programmable array logic (PAL)

Inputs programmable programmable Outputs


AND array OR array

(c) Programmable logic array (PLA)


FIGURE 7.13
Basic configuration of three PLDs

Combinational PLDs
The PROM is a combinational programmable logic device (PLD)—an integrated circuit
with programmable gates divided into an AND array and an OR array to provide an
AND–OR sum‐of‐product implementation. There are three major types of combina-
tional PLDs, differing in the placement of the programmable connections in the AND–
OR array. Figure 7.13 shows the configuration of the three PLDs. The PROM has a fixed
AND array constructed as a decoder and a programmable OR array. The programmable
OR gates implement the Boolean functions in sum‐of‐minterms form. The PAL has a
programmable AND array and a fixed OR array. The AND gates are programmed to
provide the product terms for the Boolean functions, which are logically summed in each
OR gate. The most flexible PLD is the PLA, in which both the AND and OR arrays can
be programmed. The product terms in the AND array may be shared by any OR gate
to provide the required sum‐of‐products implementation. The names PAL and PLA
emerged from different vendors during the development of PLDs. The implementation
of combinational circuits with PROM was demonstrated in this section. The design of
combinational circuits with PLA and PAL is presented in the next two sections.
2   Chapter 1 / Basic Concepts and Computer Evolution

Learning Objectives
After studying this chapter, you should be able to:
rr Explain the general functions and structure of a digital computer.
rr Present an overview of the evolution of computer technology from early
digital computers to the latest microprocessors.
rr Present an overview of the evolution of the x86 architecture.
rr Define embedded systems and list some of the requirements and constraints
that various embedded systems must meet.

1.1 Organization and Architecture

In describing computers, a distinction is often made between computer architec-


ture and computer organization. Although it is difficult to give precise definitions
for these terms, a consensus exists about the general areas covered by each. For
example, see [VRAN80], [SIEW82], and [BELL78a]; an interesting alternative view
is presented in [REDD76].
Computer architecture refers to those attributes of a system visible to a pro-
grammer or, put another way, those attributes that have a direct impact on the
logical execution of a program. A term that is often used interchangeably with com-
puter architecture is instruction set architecture (ISA). The ISA defines instruction
formats, instruction opcodes, registers, instruction and data memory; the effect of
executed instructions on the registers and memory; and an algorithm for control-
ling instruction execution. Computer organization refers to the operational units
and their interconnections that realize the architectural specifications. Examples of
architectural attributes include the instruction set, the number of bits used to repre-
sent various data types (e.g., numbers, characters), I/O mechanisms, and techniques
for addressing memory. Organizational attributes include those hardware details
transparent to the programmer, such as control signals; interfaces between the com-
puter and peripherals; and the memory technology used.
For example, it is an architectural design issue whether a computer will have
a multiply instruction. It is an organizational issue whether that instruction will be
implemented by a special multiply unit or by a mechanism that makes repeated
use of the add unit of the system. The organizational decision may be based on the
anticipated frequency of use of the multiply instruction, the relative speed of the
two approaches, and the cost and physical size of a special multiply unit.
Historically, and still today, the distinction between architecture and organ-
ization has been an important one. Many computer manufacturers offer a family of
computer models, all with the same architecture but with differences in organization.
Consequently, the different models in the family have different price and perform-
ance characteristics. Furthermore, a particular architecture may span many years
and encompass a number of different computer models, its organization changing
with changing technology. A prominent example of both these phenomena is the
IBM System/370 architecture. This architecture was first introduced in 1970 and
1.2 / Structure and Function   3

included a number of models. The customer with modest requirements could buy a
cheaper, slower model and, if demand increased, later upgrade to a more expensive,
faster model without having to abandon software that had already been developed.
Over the years, IBM has introduced many new models with improved technology
to replace older models, offering the customer greater speed, lower cost, or both.
These newer models retained the same architecture so that the customer’s soft-
ware investment was protected. Remarkably, the System/370 architecture, with a
few enhancements, has survived to this day as the architecture of IBM’s mainframe
product line.
In a class of computers called microcomputers, the relationship between archi-
tecture and organization is very close. Changes in technology not only influence
organization but also result in the introduction of more powerful and more complex
architectures. Generally, there is less of a requirement for ­generation-​­to-​­generation
compatibility for these smaller machines. Thus, there is more interplay between
organizational and architectural design decisions. An intriguing example of this is
the reduced instruction set computer (RISC), which we examine in Chapter 15.
This book examines both computer organization and computer architecture.
The emphasis is perhaps more on the side of organization. However, because a
computer organization must be designed to implement a particular architectural
specification, a thorough treatment of organization requires a detailed examination
of architecture as well.

1.2 Structure and Function

A computer is a complex system; contemporary computers contain millions of


elementary electronic components. How, then, can one clearly describe them? The
key is to recognize the hierarchical nature of most complex systems, including the
computer [SIMO96]. A hierarchical system is a set of interrelated subsystems, each
of the latter, in turn, hierarchical in structure until we reach some lowest level of
elementary subsystem.
The hierarchical nature of complex systems is essential to both their design
and their description. The designer need only deal with a particular level of the
system at a time. At each level, the system consists of a set of components and
their interrelationships. The behavior at each level depends only on a simplified,
abstracted characterization of the system at the next lower level. At each level, the
designer is concerned with structure and function:
■■ Structure: The way in which the components are interrelated.
■■ Function: The operation of each individual component as part of the structure.
In terms of description, we have two choices: starting at the bottom and build-
ing up to a complete description, or beginning with a top view and decomposing the
system into its subparts. Evidence from a number of fields suggests that the t­op-​
­down approach is the clearest and most effective [WEIN75].
The approach taken in this book follows from this viewpoint. The computer
system will be described from the top down. We begin with the major components
of a computer, describing their structure and function, and proceed to successively
4   Chapter 1 / Basic Concepts and Computer Evolution

lower layers of the hierarchy. The remainder of this section provides a very brief
overview of this plan of attack.

Function
Both the structure and functioning of a computer are, in essence, simple. In general
terms, there are only four basic functions that a computer can perform:
■■ Data processing: Data may take a wide variety of forms, and the range of pro-
cessing requirements is broad. However, we shall see that there are only a few
fundamental methods or types of data processing.
■■ Data storage: Even if the computer is processing data on the fly (i.e., data
come in and get processed, and the results go out immediately), the computer
must temporarily store at least those pieces of data that are being worked on
at any given moment. Thus, there is at least a s­ hort-​­term data storage function.
Equally important, the computer performs a l­ong-​­term data storage function.
Files of data are stored on the computer for subsequent retrieval and update.
■■ Data movement: The computer’s operating environment consists of devices
that serve as either sources or destinations of data. When data are received
from or delivered to a device that is directly connected to the computer, the
process is known as ­input–​­output (I/O), and the device is referred to as a
peripheral. When data are moved over longer distances, to or from a remote
device, the process is known as data communications.
■■ Control: Within the computer, a control unit manages the computer’s
resources and orchestrates the performance of its functional parts in response
to instructions.
The preceding discussion may seem absurdly generalized. It is certainly
possible, even at a top level of computer structure, to differentiate a variety of func-
tions, but to quote [SIEW82]:

There is remarkably little shaping of computer structure to fit the


function to be performed. At the root of this lies the ­general-​­purpose
nature of computers, in which all the functional specialization occurs
at the time of programming and not at the time of design.

Structure
We now look in a general way at the internal structure of a computer. We begin with
a traditional computer with a single processor that employs a microprogrammed
control unit, then examine a typical multicore structure.
­ ingle-​­processor computer Figure 1.1 provides a hierarchical view
simple s
of the internal structure of a traditional s­ ingle-​­processor computer. There are four
main structural components:
■■ Central processing unit (CPU): Controls the operation of the computer and
performs its data processing functions; often simply referred to as processor.
■■ Main memory: Stores data.
1.2 / Structure and Function   5

COMPUTER

I/O Main
memory

System
bus

CPU

CPU

Registers
ALU

Internal
bus

Control
unit

CONTROL
UNIT
Sequencing
logic

Control unit
registers and
decoders

Control
memory

Figure 1.1 The Computer: ­Top-​­Level Structure

■■ I/O: Moves data between the computer and its external environment.
■■ System interconnection: Some mechanism that provides for communication
among CPU, main memory, and I/O. A common example of system intercon-
nection is by means of a system bus, consisting of a number of conducting
wires to which all the other components attach.
There may be one or more of each of the aforementioned components. Tra-
ditionally, there has been just a single processor. In recent years, there has been
increasing use of multiple processors in a single computer. Some design issues relat-
ing to multiple processors crop up and are discussed as the text proceeds; Part Five
focuses on such computers.
6   Chapter 1 / Basic Concepts and Computer Evolution

Each of these components will be examined in some detail in Part Two. How-
ever, for our purposes, the most interesting and in some ways the most complex
component is the CPU. Its major structural components are as follows:
■■ Control unit: Controls the operation of the CPU and hence the computer.
■■ Arithmetic and logic unit (ALU): Performs the computer’s data processing
functions.
■■ Registers: Provides storage internal to the CPU.
■■ CPU interconnection: Some mechanism that provides for communication
among the control unit, ALU, and registers.
Part Three covers these components, where we will see that complexity is added by
the use of parallel and pipelined organizational techniques. Finally, there are sev-
eral approaches to the implementation of the control unit; one common approach is
a microprogrammed implementation. In essence, a microprogrammed control unit
operates by executing microinstructions that define the functionality of the control
unit. With this approach, the structure of the control unit can be depicted, as in
­Figure 1.1. This structure is examined in Part Four.
multicore computer structure As was mentioned, contemporary
computers generally have multiple processors. When these processors all reside
on a single chip, the term multicore computer is used, and each processing unit
(consisting of a control unit, ALU, registers, and perhaps cache) is called a core. To
clarify the terminology, this text will use the following definitions.
■■ Central processing unit (CPU): That portion of a computer that fetches and
executes instructions. It consists of an ALU, a control unit, and registers.
In a system with a single processing unit, it is often simply referred to as a
processor.
■■ Core: An individual processing unit on a processor chip. A core may be equiv-
alent in functionality to a CPU on a ­single-​­CPU system. Other specialized pro-
cessing units, such as one optimized for vector and matrix operations, are also
referred to as cores.
■■ Processor: A physical piece of silicon containing one or more cores. The
processor is the computer component that interprets and executes instruc-
tions. If a processor contains multiple cores, it is referred to as a multicore
processor.
After about a decade of discussion, there is broad industry consensus on this usage.
Another prominent feature of contemporary computers is the use of multiple
layers of memory, called cache memory, between the processor and main memory.
Chapter 4 is devoted to the topic of cache memory. For our purposes in this section,
we simply note that a cache memory is smaller and faster than main memory and is
used to speed up memory access, by placing in the cache data from main memory,
that is likely to be used in the near future. A greater performance improvement may
be obtained by using multiple levels of cache, with level 1 (L1) closest to the core
and additional levels (L2, L3, and so on) progressively farther from the core. In this
scheme, level n is smaller and faster than level n + 1.
1.2 / Structure and Function   7

Figure 1.2 is a simplified view of the principal components of a typical mul-


ticore computer. Most computers, including embedded computers in smartphones
and tablets, plus personal computers, laptops, and workstations, are housed on a
motherboard. Before describing this arrangement, we need to define some terms.
A printed circuit board (PCB) is a rigid, flat board that holds and interconnects
chips and other electronic components. The board is made of layers, typically two
to ten, that interconnect components via copper pathways that are etched into
the board. The main printed circuit board in a computer is called a system board
or motherboard, while smaller ones that plug into the slots in the main board are
called expansion boards.
The most prominent elements on the motherboard are the chips. A chip is
a single piece of semiconducting material, typically silicon, upon which electronic
circuits and logic gates are fabricated. The resulting product is referred to as an
integrated circuit.

MOTHERBOARD
Main memory chips

Processor
chip
I/O chips

PROCESSOR CHIP

Core Core Core Core

L3 cache L3 cache

Core Core Core Core

CORE
Arithmetic
Instruction Load/
and logic
logic store logic
unit (ALU)
L1 I-cache L1 data cache

L2 instruction L2 data
cache cache

Figure 1.2 Simplified View of Major Elements of a Multicore Computer


8   Chapter 1 / Basic Concepts and Computer Evolution

The motherboard contains a slot or socket for the processor chip, which typ-
ically contains multiple individual cores, in what is known as a multicore processor.
There are also slots for memory chips, I/O controller chips, and other key computer
components. For desktop computers, expansion slots enable the inclusion of more
components on expansion boards. Thus, a modern motherboard connects only a
few individual chip components, with each chip containing from a few thousand up
to hundreds of millions of transistors.
Figure 1.2 shows a processor chip that contains eight cores and an L3 cache.
Not shown is the logic required to control operations between the cores and the
cache and between the cores and the external circuitry on the motherboard. The
figure indicates that the L3 cache occupies two distinct portions of the chip surface.
However, typically, all cores have access to the entire L3 cache via the aforemen-
tioned control circuits. The processor chip shown in Figure 1.2 does not represent
any specific product, but provides a general idea of how such chips are laid out.
Next, we zoom in on the structure of a single core, which occupies a portion of
the processor chip. In general terms, the functional elements of a core are:
■■ Instruction logic: This includes the tasks involved in fetching instructions,
and decoding each instruction to determine the instruction operation and the
memory locations of any operands.
■■ Arithmetic and logic unit (ALU): Performs the operation specified by an
instruction.
■■ Load/store logic: Manages the transfer of data to and from main memory via
cache.
The core also contains an L1 cache, split between an instruction cache
(­I-​­cache) that is used for the transfer of instructions to and from main memory, and
an L1 data cache, for the transfer of operands and results. Typically, today’s pro-
cessor chips also include an L2 cache as part of the core. In many cases, this cache
is also split between instruction and data caches, although a combined, single L2
cache is also used.
Keep in mind that this representation of the layout of the core is only intended
to give a general idea of internal core structure. In a given product, the functional
elements may not be laid out as the three distinct elements shown in Figure 1.2,
especially if some or all of these functions are implemented as part of a micropro-
grammed control unit.
examples It will be instructive to look at some r­eal-​­
world examples that
illustrate the hierarchical structure of computers. Figure 1.3 is a photograph of the
motherboard for a computer built around two Intel ­Quad-​­Core Xeon processor
chips. Many of the elements labeled on the photograph are discussed subsequently
in this book. Here, we mention the most important, in addition to the processor
sockets:
■■ ­ CI-​­Express slots for a ­high-​­end display adapter and for additional peripher-
P
als (Section 3.6 describes PCIe).
■■ Ethernet controller and Ethernet ports for network connections.
■■ USB sockets for peripheral devices.
1.2 / Structure and Function   9

Intel® 3420
Chipset
Six Channel DDR3-1333 Memory
2x Quad-Core Intel® Xeon® Processors Serial ATA/300 (SATA)
Interfaces Up to 48GB Interfaces
with Integrated Memory Controllers

2x USB 2.0
Internal
2x USB 2.0
External

VGA Video Output

BIOS
2x Ethernet Ports
10/100/1000Base-T
Ethernet Controller

Power & Backplane I/O PCI Express® PCI Express® Clock


Connector C Connector B Connector A

Figure 1.3 Motherboard with Two Intel ­Quad-​­Core Xeon Processors


Source: Chassis Plans, www.chassis-plans.com

■ Serial ATA (SATA) sockets for connection to disk memory (Section 7.7
­discusses Ethernet, USB, and SATA).
■ Interfaces for DDR (double data rate) main memory chips (Section 5.3
­discusses DDR).
■ Intel 3420 chipset is an I/O controller for direct memory access operations
between peripheral devices and main memory (Section 7.5 discusses DDR).
Following our ­top-​­down strategy, as illustrated in Figures 1.1 and 1.2, we can
now zoom in and look at the internal structure of a processor chip. For variety, we
look at an IBM chip instead of the Intel processor chip. Figure 1.4 is a photograph
of the processor chip for the IBM zEnterprise EC12 mainframe computer. This chip
has 2.75 billion transistors. The superimposed labels indicate how the silicon real
estate of the chip is allocated. We see that this chip has six cores, or processors.
In addition, there are two large areas labeled L3 cache, which are shared by all six
processors. The L3 control logic controls traffic between the L3 cache and the cores
and between the L3 cache and the external environment. Additionally, there is stor-
age control (SC) logic between the cores and the L3 cache. The memory controller
(MC) function controls access to memory external to the chip. The GX I/O bus
controls the interface to the channel adapters ­accessing the I/O.
Going down one level deeper, we examine the internal structure of a single
core, as shown in the photograph of Figure 1.5. Keep in mind that this is a portion
of the silicon surface area making up a ­single-​­processor chip. The main ­sub-​­areas
within this core area are the following:
10   Chapter 1 / Basic Concepts and Computer Evolution

main sub- areas within this core area are the following:
■ ISU (instruction sequence unit): Determines the sequence in which instructions
are executed in what is referred to as a superscalar architecture (Chapter 16).
■ IFU (instruction fetch unit): Logic for fetching instructions.
■ IDU (instruction decode unit): The IDU is fed from the IFU buffers, and is
responsible for the parsing and decoding of all z/Architecture operation codes.
■ LSU (­load-​­store unit): The LSU contains the 96-kB L1 data cache,1 and man-
ages data traffic between the L2 data cache and the functional execution
units. It is responsible for handling all types of operand accesses of all lengths,
modes, and formats as defined in the z/Architecture.
■ XU (translation unit): This unit translates logical addresses from instructions
into physical addresses in main memory. The XU also contains a translation
lookaside buffer (TLB) used to speed up memory access. TLBs are discussed
in Chapter 8.
■ FXU (­fixed-​­point unit): The FXU executes ­fixed-​­point arithmetic operations.
■ BFU (binary ­floating-​­point unit): The BFU handles all binary and hexadeci-
mal ­floating-​­point operations, as well as ­fixed-​­point multiplication operations.
■ DFU (decimal ­floating-​­point unit): The DFU handles both ­fixed-​­point and
­floating-​­point operations on numbers that are stored as decimal digits.
■ RU (recovery unit): The RU keeps a copy of the complete state of the sys-
tem that includes all registers, collects hardware fault signals, and manages the
hardware recovery actions.
■ COP (dedicated co-processor): The COP is responsible for data compression
and encryption functions for each core.
■ I-cache: This is a 64-kB L1 instruction cache, allowing the IFU to prefetch
instructions before they are needed.
■ L2 control: This is the control logic that manages the traffic through the two
L2 caches.
■ Data-L2: A 1-MB L2 data cache for all memory traffic other than instructions.
■ Instr-L2: A 1-MB L2 instruction cache.
As we progress through the book, the concepts introduced in this section will
become clearer.

1
kB = kilobyte = 2048 bytes. Numerical prefixes are explained in a document under the “Other Useful”
tab at ComputerScienceStudent.com.
Part Two The Computer
System

Chapter

A Top-Level View of
­Computer Function and
Interconnection
3.1 Computer Components
3.2 Computer Function
Instruction Fetch and Execute
Interrupts
I/O Function
3.3 Interconnection Structures
3.4 Bus Interconnection
3.5 Point-to-Point Interconnect
QPI Physical Layer
QPI Link Layer
QPI Routing Layer
QPI Protocol Layer
3.6 PCI Express
PCI Physical and Logical Architecture
PCIe Physical Layer
PCIe Transaction Layer
PCIe Data Link Layer
3.7 Key Terms, Review Questions, and Problems

80
3.1 / Computer Components   81

Learning Objectives
After studying this chapter, you should be able to:
rr Understand the basic elements of an instruction cycle and the role of
interrupts.
rr Describe the concept of interconnection within a computer system.
rr Assess the relative advantages of point-to-point interconnection compared to
bus interconnection.
rr Present an overview of QPI.
rr Present an overview of PCIe.

At a top level, a computer consists of CPU (central processing unit), memory, and
I/O components, with one or more modules of each type. These components are
interconnected in some fashion to achieve the basic function of the computer,
which is to execute programs. Thus, at a top level, we can characterize a computer
system by describing (1) the external behavior of each component, that is, the data
and control signals that it exchanges with other components, and (2) the intercon-
nection structure and the controls required to manage the use of the interconnec-
tion structure.
This top-level view of structure and function is important because of its
explanatory power in understanding the nature of a computer. Equally important is
its use to understand the increasingly complex issues of performance evaluation. A
grasp of the top-level structure and function offers insight into system bottlenecks,
alternate pathways, the magnitude of system failures if a component fails, and the
ease of adding performance enhancements. In many cases, requirements for greater
system power and fail-safe capabilities are being met by changing the design rather
than merely increasing the speed and reliability of individual components.
This chapter focuses on the basic structures used for computer component
interconnection. As background, the chapter begins with a brief examination of the
basic components and their interface requirements. Then a functional overview is
provided. We are then prepared to examine the use of buses to interconnect system
components.

3.1 Computer Components

As discussed in Chapter 1, virtually all contemporary computer designs are based


on concepts developed by John von Neumann at the Institute for Advanced Studies,
Princeton. Such a design is referred to as the von Neumann architecture and is based
on three key concepts:
■■ Data and instructions are stored in a single read–write memory.
■■ The contents of this memory are addressable by location, without regard to
the type of data contained there.
82   Chapter 3 / A Top-Level View of ­Computer Function and Interconnection
■■ Execution occurs in a sequential fashion (unless explicitly modified) from one
instruction to the next.
The reasoning behind these concepts was discussed in Chapter 2 but is worth
summarizing here. There is a small set of basic logic components that can be com-
bined in various ways to store binary data and perform arithmetic and logical
operations on that data. If there is a particular computation to be performed, a con-
figuration of logic components designed specifically for that computation could be
constructed. We can think of the process of connecting the various components in
the desired configuration as a form of programming. The resulting “program” is in
the form of hardware and is termed a hardwired program.
Now consider this alternative. Suppose we construct a general-purpose con-
figuration of arithmetic and logic functions. This set of hardware will perform vari-
ous functions on data depending on control signals applied to the hardware. In the
original case of customized hardware, the system accepts data and produces results
(Figure 3.1a). With general-purpose hardware, the system accepts data and control
signals and produces results. Thus, instead of rewiring the hardware for each new
program, the programmer merely needs to supply a new set of control signals.
How shall control signals be supplied? The answer is simple but subtle. The
entire program is actually a sequence of steps. At each step, some arithmetic or
logical operation is performed on some data. For each step, a new set of control sig-
nals is needed. Let us provide a unique code for each possible set of control signals,

Sequence of
Data arithmetic Results
and logic
functions

(a) Programming in hardware

Instruction Instruction
codes interpreter

Control
signals

General-purpose
arithmetic
Data Results
and logic
functions

(b) Programming in software


Figure 3.1 Hardware and Software Approaches
3.2 / Computer Function   83

and let us add to the general-purpose hardware a segment that can accept a code
and generate control signals (Figure 3.1b).
Programming is now much easier. Instead of rewiring the hardware for each
new program, all we need to do is provide a new sequence of codes. Each code is, in
effect, an instruction, and part of the hardware interprets each instruction and gen-
erates control signals. To distinguish this new method of programming, a sequence
of codes or instructions is called software.
Figure 3.1b indicates two major components of the system: an instruction
interpreter and a module of general-purpose arithmetic and logic functions. These
two constitute the CPU. Several other components are needed to yield a function-
ing computer. Data and instructions must be put into the system. For this we need
some sort of input module. This module contains basic components for accepting
data and instructions in some form and converting them into an internal form
of signals usable by the system. A means of reporting results is needed, and this
is in the form of an output module. Taken together, these are referred to as I/O
components.
One more component is needed. An input device will bring instructions and
data in sequentially. But a program is not invariably executed sequentially; it may
jump around (e.g., the IAS jump instruction). Similarly, operations on data may
require access to more than just one element at a time in a predetermined sequence.
Thus, there must be a place to temporarily store both instructions and data. That
module is called memory, or main memory, to distinguish it from external storage or
peripheral devices. Von Neumann pointed out that the same memory could be used
to store both instructions and data.
Figure 3.2 illustrates these top-level components and suggests the interac-
tions among them. The CPU exchanges data with memory. For this purpose, it typ-
ically makes use of two internal (to the CPU) registers: a memory address register
(MAR), which specifies the address in memory for the next read or write, and a
memory buffer register (MBR), which contains the data to be written into memory
or receives the data read from memory. Similarly, an I/O address register (I/OAR)
specifies a particular I/O device. An I/O buffer register (I/OBR) is used for the
exchange of data between an I/O module and the CPU.
A memory module consists of a set of locations, defined by sequentially num-
bered addresses. Each location contains a binary number that can be interpreted as
either an instruction or data. An I/O module transfers data from external devices to
CPU and memory, and vice versa. It contains internal buffers for temporarily hold-
ing these data until they can be sent on.
Having looked briefly at these major components, we now turn to an overview
of how these components function together to execute programs.

3.2 Computer Function

The basic function performed by a computer is execution of a program, which con-


sists of a set of instructions stored in memory. The processor does the actual work by
executing instructions specified in the program. This section provides an overview of
84   Chapter 3 / A Top-Level View of ­Computer Function and Interconnection

CPU Main memory


0
System 1
bus 2
PC MAR
Instruction
Instruction
Instruction
IR MBR

I/O AR
Data
Execution
unit Data
I/O BR
Data
Data

I/O Module n–2


n–1

PC = Program counter
Buffers IR = Instruction register
MAR = Memory address register
MBR = Memory buffer register
I/O AR = Input/output address register
I/O BR = Input/output buffer register
Figure 3.2 Computer Components: Top-Level View

the key elements of program execution. In its simplest form, instruction processing
consists of two steps: The processor reads (fetches) instructions from memory one
at a time and executes each instruction. Program execution consists of repeating
the process of instruction fetch and instruction execution. The instruction execution
may involve several operations and depends on the nature of the instruction (see, for
example, the lower portion of Figure 2.4).
The processing required for a single instruction is called an instruction cycle.
Using the simplified two-step description given previously, the instruction cycle is
depicted in Figure 3.3. The two steps are referred to as the fetch cycle and the ­execute
cycle. Program execution halts only if the machine is turned off, some sort of unrecov-
erable error occurs, or a program instruction that halts the computer is encountered.

Instruction Fetch and Execute


At the beginning of each instruction cycle, the processor fetches an instruction from
memory. In a typical processor, a register called the program counter (PC) holds the
address of the instruction to be fetched next. Unless told otherwise, the processor
3.2 / Computer Function   85

Fetch cycle Execute cycle

Fetch next Execute


START instruction instruction HALT

Figure 3.3 Basic Instruction Cycle

always increments the PC after each instruction fetch so that it will fetch the next
instruction in sequence (i.e., the instruction located at the next higher memory
address). So, for example, consider a computer in which each instruction occupies
one 16-bit word of memory. Assume that the program counter is set to memory loca-
tion 300, where the location address refers to a 16-bit word. The processor will next
fetch the instruction at location 300. On succeeding instruction cycles, it will fetch
instructions from locations 301, 302, 303, and so on. This sequence may be altered, as
explained presently.
The fetched instruction is loaded into a register in the processor known as
the instruction register (IR). The instruction contains bits that specify the action
the processor is to take. The processor interprets the instruction and performs the
required action. In general, these actions fall into four categories:
■■ Processor-memory: Data may be transferred from processor to memory or
from memory to processor.
■■ Processor-I/O: Data may be transferred to or from a peripheral device by
transferring between the processor and an I/O module.
■■ Data processing: The processor may perform some arithmetic or logic oper-
ation on data.
■■ Control: An instruction may specify that the sequence of execution be altered.
For example, the processor may fetch an instruction from location 149, which
specifies that the next instruction be from location 182. The processor will
remember this fact by setting the program counter to 182. Thus, on the next
fetch cycle, the instruction will be fetched from location 182 rather than 150.
An instruction’s execution may involve a combination of these actions.
Consider a simple example using a hypothetical machine that includes the
characteristics listed in Figure 3.4. The processor contains a single data register,
called an accumulator (AC). Both instructions and data are 16 bits long. Thus, it is
convenient to organize memory using 16-bit words. The instruction format provides
4 bits for the opcode, so that there can be as many as 24 = 16 different opcodes, and
up to 212 = 4096 (4K) words of memory can be directly addressed.
Figure 3.5 illustrates a partial program execution, showing the relevant por-
tions of memory and processor registers.1 The program fragment shown adds the
contents of the memory word at address 940 to the contents of the memory word at

1
Hexadecimal notation is used, in which each digit represents 4 bits. This is the most convenient nota-
tion for representing the contents of memory and registers when the word length is a multiple of 4. See
­Chapter 9 for a basic refresher on number systems (decimal, binary, hexadecimal).
86   Chapter 3 / A Top-Level View of ­Computer Function and Interconnection

0 3 4 15
Opcode Address

(a) Instruction format

0 1 15
Magnitude

(b) Integer format

Program counter (PC) = Address of instruction


Instruction register (IR) = Instruction being executed
Accumulator (AC) = Temporary storage

(c) Internal CPU registers

0001 = Load AC from memory


0010 = Store AC to memory
0101 = Add to AC from memory

(d) Partial list of opcodes


Figure 3.4 Characteristics of a Hypothetical Machine

Memory CPU registers Memory CPU registers


300 1 9 4 0 3 0 0 PC 300 1 9 4 0 3 0 1 PC
301 5 9 4 1 AC 301 5 9 4 1 0 0 0 3 AC
302 2 9 4 1 1 9 4 0 IR 302 2 9 4 1 1 9 4 0 IR
• •
• •
940 0 0 0 3 940 0 0 0 3
941 0 0 0 2 941 0 0 0 2

Step 1 Step 2

Memory CPU registers Memory CPU registers


300 1 9 4 0 3 0 1 PC 300 1 9 4 0 3 0 2 PC
301 5 9 4 1 0 0 0 3 AC 301 5 9 4 1 0 0 0 5 AC
302 2 9 4 1 5 9 4 1 IR 302 2 9 4 1 5 9 4 1 IR
• •
• •
940 0 0 0 3 940 0 0 0 3 3+2=5
941 0 0 0 2 941 0 0 0 2

Step 3 Step 4

Memory CPU registers Memory CPU registers


300 1 9 4 0 3 0 2 PC 300 1 9 4 0 3 0 3 PC
301 5 9 4 1 0 0 0 5 AC 301 5 9 4 1 0 0 0 5 AC
302 2 9 4 1 2 9 4 1 IR 302 2 9 4 1 2 9 4 1 IR
• •
• •
940 0 0 0 3 940 0 0 0 3
941 0 0 0 2 941 0 0 0 5

Step 5 Step 6

Figure 3.5 Example of Program Execution (contents of memory and


registers in hexadecimal)
3.2 / Computer Function   87

address 941 and stores the result in the latter location. Three instructions, which can
be described as three fetch and three execute cycles, are required:
1. The PC contains 300, the address of the first instruction. This instruction (the
value 1940 in hexadecimal) is loaded into the instruction register IR, and
the PC is incremented. Note that this process involves the use of a memory
address register and a memory buffer register. For simplicity, these intermedi-
ate registers are ignored.
2. The first 4 bits (first hexadecimal digit) in the IR indicate that the AC is to be
loaded. The remaining 12 bits (three hexadecimal digits) specify the address
(940) from which data are to be loaded.
3. The next instruction (5941) is fetched from location 301, and the PC is
incremented.
4. The old contents of the AC and the contents of location 941 are added, and
the result is stored in the AC.
5. The next instruction (2941) is fetched from location 302, and the PC is
incremented.
6. The contents of the AC are stored in location 941.
In this example, three instruction cycles, each consisting of a fetch cycle and an
execute cycle, are needed to add the contents of location 940 to the contents of 941.
With a more complex set of instructions, fewer cycles would be needed. Some older
processors, for example, included instructions that contain more than one memory
address. Thus, the execution cycle for a particular instruction on such processors
could involve more than one reference to memory. Also, instead of memory refer-
ences, an instruction may specify an I/O operation.
For example, the PDP-11 processor includes an instruction, expressed symboli-
cally as ADD B,A, that stores the sum of the contents of memory locations B and A
into memory location A. A single instruction cycle with the following steps occurs:
■ Fetch the ADD instruction.
■ Read the contents of memory location A into the processor.
■ Read the contents of memory location B into the processor. In order that the
contents of A are not lost, the processor must have at least two registers for
storing memory values, rather than a single accumulator.
■ Add the two values.
■ Write the result from the processor to memory location A.
Thus, the execution cycle for a particular instruction may involve more than one
reference to memory. Also, instead of memory references, an instruction may specify
an I/O operation. With these additional considerations in mind, Figure 3.6 provides a
more detailed look at the basic instruction cycle of Figure 3.3. The figure is in the form
of a state diagram. For any given instruction cycle, some states may be null and others
may be visited more than once. The states can be described as follows:
■ Instruction address calculation (iac): Determine the address of the next
instruction to be executed. Usually, this involves adding a fixed number to
Figure 3.6 Instruction Cycle State Diagram

Instruction Operand Operand


fetch fetch store

Multiple Multiple
operands results

Instruction Instruction Operand Operand


Data
address operation address address
operation
calculation decoding calculation calculation

Instruction complete, Return for string


fetch next instruction or vector data

■ Instruction address calculation (iac): Determine the address of the next


instruction to be executed. Usually, this involves adding a fixed number to
the address of the previous instruction. For example, if each instruction is 16
bits long and memory is organized into 16-bit words, then add 1 to the previ-
ous address. If, instead, memory is organized as individually addressable 8-bit
bytes, then add 2 to the previous address.
■ Instruction fetch (if): Read instruction from its memory location into the
processor.
■ Instruction operation decoding (iod): Analyze instruction to determine type
of operation to be performed and operand(s) to be used.
■ Operand address calculation (oac): If the operation involves reference to an
operand in memory or available via I/O, then determine the address of the
operand.
■ Operand fetch (of): Fetch the operand from memory or read it in from I/O.
■ Data operation (do): Perform the operation indicated in the instruction.
■ Operand store (os): Write the result into memory or out to I/O.
States in the upper part of Figure 3.6 involve an exchange between the pro-
cessor and either memory or an I/O module. States in the lower part of the diagram
involve only internal processor operations. The oac state appears twice, because
an instruction may involve a read, a write, or both. However, the action performed
during that state is fundamentally the same in both cases, and so only a single state
identifier is needed.
Also note that the diagram allows for multiple operands and multiple results,
because some instructions on some machines require this. For example, the PDP-11
instruction ADD A,B results in the following sequence of states: iac, if, iod, oac, of,
oac, of, do, oac, os.
Finally, on some machines, a single instruction can specify an operation to be per-
formed on a vector (one-dimensional array) of numbers or a string (one-dimensional
3.2 / Computer Function   89

array) of characters. As Figure 3.6 indicates, this would involve repetitive operand
fetch and/or store operations.

Interrupts
Virtually all computers provide a mechanism by which other modules (I/O, memory)
may interrupt the normal processing of the processor. Table 3.1 lists the most com-
mon classes of interrupts. The specific nature of these interrupts is examined later in
this book, especially in Chapters 7 and 14. However, we need to introduce the concept
now to understand more clearly the nature of the instruction cycle and the impli-
cations of interrupts on the interconnection structure. The reader need not be con-
cerned at this stage about the details of the generation and processing of interrupts,
but only focus on the communication between modules that results from interrupts.
Interrupts are provided primarily as a way to improve processing efficiency.
For example, most external devices are much slower than the processor. Suppose
that the processor is transferring data to a printer using the instruction cycle scheme
of Figure 3.3. After each write operation, the processor must pause and remain
idle until the printer catches up. The length of this pause may be on the order of
many hundreds or even thousands of instruction cycles that do not involve memory.
Clearly, this is a very wasteful use of the processor.
Figure 3.7a illustrates this state of affairs. The user program performs a ser-
ies of WRITE calls interleaved with processing. Code segments 1, 2, and 3 refer to
sequences of instructions that do not involve I/O. The WRITE calls are to an I/O
program that is a system utility and that will perform the actual I/O operation. The
I/O program consists of three sections:
■■ A sequence of instructions, labeled 4 in the figure, to prepare for the actual I/O
operation. This may include copying the data to be output into a special buffer
and preparing the parameters for a device command.
■■ The actual I/O command. Without the use of interrupts, once this command
is issued, the program must wait for the I/O device to perform the requested
­function (or periodically poll the device). The program might wait by simply
repeatedly performing a test operation to determine if the I/O operation is done.
■■ A sequence of instructions, labeled 5 in the figure, to complete the operation.
This may include setting a flag indicating the success or failure of the operation.

Table 3.1 Classes of Interrupts


Program Generated by some condition that occurs as a result of an instruction
execution, such as arithmetic overflow, division by zero, attempt to exe-
cute an illegal machine instruction, or reference outside a user’s allowed
memory space.
Timer Generated by a timer within the processor. This allows the operating
system to perform certain functions on a regular basis.
I/O Generated by an I/O controller, to signal normal completion of an
operation, request service from the processor, or to signal a variety of
error conditions.
Hardware Failure Generated by a failure such as power failure or memory parity error.
90

User I/O User I/O User I/O


Program Program Program Program Program Program

1 4 1 4 1 4

I/O I/O I/O


Command Command Command
WRITE WRITE WRITE
5
2a
END
2 2

Interrupt Interrupt
2b Handler Handler

WRITE WRITE 5 WRITE 5

END END
3a

3 3

3b

WRITE WRITE WRITE

(a) No interrupts (b) Interrupts; short I/O wait (c) Interrupts; long I/O wait

= interrupt occurs during course of execution of user program

Figure 3.7 Program Flow of Control without and with Interrupts


3.2 / Computer Function   91

Because the I/O operation may take a relatively long time to complete, the I/O
program is hung up waiting for the operation to complete; hence, the user program
is stopped at the point of the WRITE call for some considerable period of time.
interrupts and the instruction cycle With interrupts, the processor can
be engaged in executing other instructions while an I/O operation is in progress.
Consider the flow of control in Figure 3.7b. As before, the user program reaches a
point at which it makes a system call in the form of a WRITE call. The I/O program
that is invoked in this case consists only of the preparation code and the actual I/O
command. After these few instructions have been executed, control returns to the
user program. Meanwhile, the external device is busy accepting data from computer
memory and printing it. This I/O operation is conducted concurrently with the
execution of instructions in the user program.
When the external device becomes ready to be serviced—that is, when it is
ready to accept more data from the processor—the I/O module for that external
device sends an interrupt request signal to the processor. The processor responds by
suspending operation of the current program, branching off to a program to service
that particular I/O device, known as an interrupt handler, and resuming the original
execution after the device is serviced. The points at which such interrupts occur are
indicated by an asterisk in Figure 3.7b.
Let us try to clarify what is happening in Figure 3.7. We have a user program
that contains two WRITE commands. There is a segment of code at the beginning,
then one WRITE command, then a second segment of code, then a second WRITE
command, then a third and final segment of code. The WRITE command invokes
the I/O program provided by the OS. Similarly, the I/O program consists of a seg-
ment of code, followed by an I/O command, followed by another segment of code.
The I/O command invokes a hardware I/O operation.
92   Chapter 3 / A Top-Level View of ­Computer Function and Interconnection
User program Interrupt handler

• •
• •
• •

i
Interrupt
occurs here
i+1



M
Figure 3.8 Transfer of Control via Interrupts

From the point of view of the user program, an interrupt is just that: an interrup-
tion of the normal sequence of execution. When the interrupt processing is completed,
execution resumes (Figure 3.8). Thus, the user program does not have to contain any
special code to accommodate interrupts; the processor and the operating system are
responsible for suspending the user program and then resuming it at the same point.
To accommodate interrupts, an interrupt cycle is added to the instruction
cycle, as shown in Figure 3.9. In the interrupt cycle, the processor checks to see if
any interrupts have occurred, indicated by the presence of an interrupt signal. If no
interrupts are pending, the processor proceeds to the fetch cycle and fetches the
next instruction of the current program. If an interrupt is pending, the processor
does the following:
■■ It suspends execution of the current program being executed and saves its
context. This means saving the address of the next instruction to be executed

Fetch cycle Execute cycle Interrupt cycle

Interrupts
disabled
Check for
Fetch next Execute
START instruction instruction
interrupt;
Interrupts process interrupt
enabled

HALT

Figure 3.9 Instruction Cycle with Interrupts


3.2 / Computer Function   93

(current contents of the program counter) and any other data relevant to the
processor’s current activity.
■■ It sets the program counter to the starting address of an interrupt handler routine.
The processor now proceeds to the fetch cycle and fetches the first instruction
in the interrupt handler program, which will service the interrupt. The interrupt
handler program is generally part of the operating system. Typically, this program
determines the nature of the interrupt and performs whatever actions are needed.
In the example we have been using, the handler determines which I/O module gen-
erated the interrupt and may branch to a program that will write more data out to
that I/O module. When the interrupt handler routine is completed, the processor
can resume execution of the user program at the point of interruption.
It is clear that there is some overhead involved in this process. Extra instruc-
tions must be executed (in the interrupt handler) to determine the nature of the inter-
rupt and to decide on the appropriate action. Nevertheless, because of the relatively
large amount of time that would be wasted by simply waiting on an I/O operation,
the processor can be employed much more efficiently with the use of interrupts.
To appreciate the gain in efficiency, consider Figure 3.10, which is a timing
diagram based on the flow of control in Figures 3.7a and 3.7b. In this figure, user
program code segments are shaded green, and I/O program code segments are

Time
1 1

4 4
I/O operation
I/O operation;
2a concurrent with
processor waits
processor executing

5 5

2b
2
4
I/O operation
4 3a concurrent with
processor executing
I/O operation;
processor waits 5

5 3b

(b) With interrupts


3

(a) Without interrupts


Figure 3.10 Program Timing: Short I/O Wait
94   Chapter 3 / A Top-Level View of ­Computer Function and Interconnection

shaded gray. Figure 3.10a shows the case in which interrupts are not used. The pro-
cessor must wait while an I/O operation is performed.
Figures 3.7b and 3.10b assume that the time required for the I/O operation is rela-
tively short: less than the time to complete the execution of instructions between write
operations in the user program. In this case, the segment of code labeled code segment 2
is interrupted. A portion of the code (2a) executes (while the I/O operation is performed)
and then the interrupt occurs (upon the completion of the I/O operation). After the inter-
rupt is serviced, execution resumes with the remainder of code segment 2 (2b).
The more typical case, especially for a slow device such as a printer, is that the
I/O operation will take much more time than executing a sequence of user instruc-
tions. Figure 3.7c indicates this state of affairs. In this case, the user program reaches
the second WRITE call before the I/O operation spawned by the first call is com-
plete. The result is that the user program is hung up at that point. When the preced-
ing I/O operation is completed, this new WRITE call may be processed, and a new
I/O operation may be started. Figure 3.11 shows the timing for this situation with
Time
1 1

4 4

I/O operation; 2 I/O operation


processor waits concurrent with
processor executing;
then processor
waits
5

5
2

4
3 I/O operation
concurrent with
I/O operation; processor executing;
processor waits then processor
waits

5
5

3 (b) With interrupts

(a) Without interrupts


Figure 3.11 Program Timing: Long I/O Wait
3.2 / Computer Function   95

and without the use of interrupts. We can see that there is still a gain in efficiency
because part of the time during which the I/O operation is under way overlaps with
the execution of user instructions.
Figure 3.12 shows a revised instruction cycle state diagram that includes inter-
rupt cycle processing.

multiple interrupts The discussion so far has focused only on the occurrence
of a single interrupt. Suppose, however, that multiple interrupts can occur. For
example, a program may be receiving data from a communications line and
printing results. The printer will generate an interrupt every time it completes
a print operation. The communication line controller will generate an interrupt
every time a unit of data arrives. The unit could either be a single character or a
block, depending on the nature of the communications discipline. In any case, it is
possible for a communications interrupt to occur while a printer interrupt is being
processed.
Two approaches can be taken to dealing with multiple interrupts. The first
is to disable interrupts while an interrupt is being processed. A disabled interrupt
simply means that the processor can and will ignore that interrupt request signal.
If an interrupt occurs during this time, it generally remains pending and will be
checked by the processor after the processor has enabled interrupts. Thus, when a
user program is executing and an interrupt occurs, interrupts are disabled immedi-
ately. After the interrupt handler routine completes, interrupts are enabled before
resuming the user program, and the processor checks to see if additional interrupts
have occurred. This approach is nice and simple, as interrupts are handled in strict
sequential order (Figure 3.13a).
The drawback to the preceding approach is that it does not take into account
relative priority or time-critical needs. For example, when input arrives from the
communications line, it may need to be absorbed rapidly to make room for more
input. If the first batch of input has not been processed before the second batch
arrives, data may be lost.
A second approach is to define priorities for interrupts and to allow an i­ nterrupt
of higher priority to cause a lower-priority interrupt handler to be itself interrupted
(Figure 3.13b). As an example of this second approach, consider a system with three
I/O devices: a printer, a disk, and a communications line, with increasing priori-
ties of 2, 4, and 5, respectively. Figure 3.14 illustrates a possible sequence. A user
program begins at t = 0. At t = 10, a printer interrupt occurs; user information is
placed on the system stack and execution continues at the printer interrupt ­service
routine (ISR). While this routine is still executing, at t = 15, a communications inter-
rupt occurs. Because the communications line has higher priority than the printer,
the interrupt is honored. The printer ISR is interrupted, its state is pushed onto the
stack, and execution continues at the communications ISR. While this routine is exe-
cuting, a disk interrupt occurs (t = 20). Because this interrupt is of lower priority, it
is simply held, and the communications ISR runs to completion.
When the communications ISR is complete (t = 25), the previous proces-
sor state is restored, which is the execution of the printer ISR. However, before
even a single instruction in that routine can be executed, the processor honors the
higher-priority disk interrupt and control transfers to the disk ISR. Only when that
96

Instruction Operand Operand


fetch fetch store

Multiple Multiple
operands results

Instruction Instruction Operand Operand


Data Interrupt
address operation address address Interrupt
operation check
calculation decoding calculation calculation

Instruction complete, Return for string No


fetch next instruction or vector data interrupt

Figure 3.12 Instruction Cycle State Diagram, with Interrupts


Interrupt
User program handler X

Interrupt
handler Y

(a) Sequential interrupt processing

Interrupt
User program handler X

Interrupt
handler Y

(b) Nested interrupt processing


Figure 3.13 Transfer of Control with Multiple Interrupts

97
98   Chapter 3 / A Top-Level View of ­Computer Function and Interconnection

Printer Communication
interrupt interrupt
User program service routine service routine
t=0

15
10 t=
t=

t = 25

Disk
t=
40 t=2 interrupt
5 service routine

t=
35

Figure 3.14 Example Time Sequence of Multiple Interrupts

routine is complete (t = 35) is the printer ISR resumed. When that routine com-
pletes (t = 40), control finally returns to the user program.

I/O Function
Thus far, we have discussed the operation of the computer as controlled by the pro-
cessor, and we have looked primarily at the interaction of processor and memory.
The discussion has only alluded to the role of the I/O component. This role is dis-
cussed in detail in Chapter 7, but a brief summary is in order here.
An I/O module (e.g., a disk controller) can exchange data directly with the
processor. Just as the processor can initiate a read or write with memory, desig-
nating the address of a specific location, the processor can also read data from or
write data to an I/O module. In this latter case, the processor identifies a specific
device that is controlled by a particular I/O module. Thus, an instruction sequence
similar in form to that of Figure 3.5 could occur, with I/O instructions rather than
­memory-referencing instructions.
In some cases, it is desirable to allow I/O exchanges to occur directly with
memory. In such a case, the processor grants to an I/O module the authority to read
from or write to memory, so that the I/O-memory transfer can occur without tying
up the processor. During such a transfer, the I/O module issues read or write com-
mands to memory, relieving the processor of responsibility for the exchange. This
operation is known as direct memory access (DMA) and is examined in Chapter 7.
3.3 / Interconnection Structures   99

3.3 Interconnection Structures

A computer consists of a set of components or modules of three basic types (pro-


cessor, memory, I/O) that communicate with each other. In effect, a computer is a
network of basic modules. Thus, there must be paths for connecting the modules.
The collection of paths connecting the various modules is called the intercon-
nection structure. The design of this structure will depend on the exchanges that
must be made among modules.
Figure 3.15 suggests the types of exchanges that are needed by indicating the
major forms of input and output for each module type2:
■■ Memory: Typically, a memory module will consist of N words of equal length.
Each word is assigned a unique numerical address (0, 1, c , N -1). A word of
data can be read from or written into the memory. The nature of the operation

Read
Memory
Write
N words
Address 0 Data



Data N1

Read
I/O module Internal
Write data

External
Address M ports data
Internal
data Interrupt
signals
External
data

Instructions Address

Control
Data CPU signals

Interrupt Data
signals

Figure 3.15 Computer Modules

2
The wide arrows represent multiple signal lines carrying multiple bits of information in parallel. Each
narrow arrow represents a single signal line.
100   Chapter 3 / A Top-Level View of ­Computer Function and Interconnection

is indicated by read and write control signals. The location for the operation is
specified by an address.
■■ I/O module: From an internal (to the computer system) point of view, I/O is
functionally similar to memory. There are two operations; read and write. Fur-
ther, an I/O module may control more than one external device. We can refer
to each of the interfaces to an external device as a port and give each a unique
address (e.g., 0, 1, c , M -1). In addition, there are external data paths for
the input and output of data with an external device. Finally, an I/O module
may be able to send interrupt signals to the processor.
■■ Processor: The processor reads in instructions and data, writes out data after
processing, and uses control signals to control the overall operation of the sys-
tem. It also receives interrupt signals.
The preceding list defines the data to be exchanged. The interconnection
structure must support the following types of transfers:
■■ Memory to processor: The processor reads an instruction or a unit of data
from memory.
■■ Processor to memory: The processor writes a unit of data to memory.
■■ I/O to processor: The processor reads data from an I/O device via an I/O
module.
■■ Processor to I/O: The processor sends data to the I/O device.
■■ I/O to or from memory: For these two cases, an I/O module is allowed to
exchange data directly with memory, without going through the processor,
using direct memory access.
Over the years, a number of interconnection structures have been tried. By far
the most common are (1) the bus and various multiple-bus structures, and (2) point-
to-point interconnection structures with packetized data transfer. We devote the
remainder of this chapter for a discussion of these structures.

3.4 Bus Interconnection

The bus was the dominant means of computer system component interconnection
for decades. For general-purpose computers, it has gradually given way to various
point-to-point interconnection structures, which now dominate computer system
design. However, bus structures are still commonly used for embedded systems, par-
ticularly microcontrollers. In this section, we give a brief overview of bus structure.
Appendix C provides more detail.
A bus is a communication pathway connecting two or more devices. A key
characteristic of a bus is that it is a shared transmission medium. Multiple devices
connect to the bus, and a signal transmitted by any one device is available for recep-
tion by all other devices attached to the bus. If two devices transmit during the same
time period, their signals will overlap and become garbled. Thus, only one device at
a time can successfully transmit.
3.4 / Bus Interconnection   101

Typically, a bus consists of multiple communication pathways, or lines. Each


line is capable of transmitting signals representing binary 1 and binary 0. Over time,
a sequence of binary digits can be transmitted across a single line. Taken together,
several lines of a bus can be used to transmit binary digits simultaneously (in paral-
lel). For example, an 8-bit unit of data can be transmitted over eight bus lines.
Computer systems contain a number of different buses that provide pathways
between components at various levels of the computer system hierarchy. A bus that
connects major computer components (processor, memory, I/O) is called a system
bus. The most common computer interconnection structures are based on the use of
one or more system buses.
A system bus consists, typically, of from about fifty to hundreds of separate
lines. Each line is assigned a particular meaning or function. Although there are
many different bus designs, on any bus the lines can be classified into three func-
tional groups (Figure 3.16): data, address, and control lines. In addition, there may
be power distribution lines that supply power to the attached modules.
The data lines provide a path for moving data among system modules. These
lines, collectively, are called the data bus. The data bus may consist of 32, 64, 128,
or even more separate lines, the number of lines being referred to as the width of
the data bus. Because each line can carry only one bit at a time, the number of
lines determines how many bits can be transferred at a time. The width of the data
bus is a key factor in determining overall system performance. For example, if the
data bus is 32 bits wide and each instruction is 64 bits long, then the processor must
access the memory module twice during each instruction cycle.
The address lines are used to designate the source or destination of the data on
the data bus. For example, if the processor wishes to read a word (8, 16, or 32 bits)
of data from memory, it puts the address of the desired word on the address lines.
Clearly, the width of the address bus determines the maximum possible memory capac-
ity of the system. Furthermore, the address lines are generally also used to address I/O
ports. Typically, the higher-order bits are used to select a particular module on the
bus, and the lower-order bits select a memory location or I/O port within the module.
For example, on an 8-bit address bus, address 01111111 and below might reference
locations in a memory module (module 0) with 128 words of memory, and address
10000000 and above refer to devices attached to an I/O module (module 1).
The control lines are used to control the access to and the use of the data and
address lines. Because the data and address lines are shared by all components,

CPU Memory ••• Memory I/O ••• I/O

Control lines

Address lines Bus


Data lines

Figure 3.16 Bus Interconnection Scheme


102   Chapter 3 / A Top-Level View of ­Computer Function and Interconnection

there must be a means of controlling their use. Control signals transmit both com-
mand and timing information among system modules. Timing signals indicate the
validity of data and address information. Command signals specify operations to be
performed. Typical control lines include:
■ Memory write: causes data on the bus to be written into the addressed location.
■ Memory read: causes data from the addressed location to be placed on the bus.
■ I/O write: causes data on the bus to be output to the addressed I/O port.
■ I/O read: causes data from the addressed I/O port to be placed on the bus.
■ Transfer ACK: indicates that data have been accepted from or placed on the
bus.
■ Bus request: indicates that a module needs to gain control of the bus.
■ Bus grant: indicates that a requesting module has been granted control of the
bus.
■ Interrupt request: indicates that an interrupt is pending.
■ Interrupt ACK: acknowledges that the pending interrupt has been recognized.
■ Clock: is used to synchronize operations.
■ Reset: initializes all modules.
The operation of the bus is as follows. If one module wishes to send data to
another, it must do two things: (1) obtain the use of the bus, and (2) transfer data
via the bus. If one module wishes to request data from another module, it must
(1) obtain the use of the bus, and (2) transfer a request to the other module over the
appropriate control and address lines. It must then wait for that second module to
send the data.
Go, change the world

COMPUTER ORGANIZATION

Dr. Anala M R
Professor
Information Science and Engineering
RV College of Engineering
Bengaluru – 59
E-mail: [email protected]
Go, change the world
Computer Organization Vs Architecture
• Computer organization refers to the operational units
and their interconnections that realize the architectural specifications.
• Organizational attributes include those hardware details
transparent to the programmer, such as control signals; interfaces between the
computer and peripherals; and the memory technology used.

• Computer architecture refers to those attributes of a system visible to a programmer


or, put another way, those attributes that have a direct impact on the
logical execution of a program.
• Architectural attributes include the instruction set, the number of bits used to repre-
sent various data types (e.g., numbers, characters), I/O mechanisms, and techniques
for addressing memory.
Go, change the world
System Characterization

• Structure: The way in which the components are interrelated

• Function: The operation of each individual component as part of the structure


Go, change the world
Function

• Data processing- Varieties of data

• Data storage- Short-term data storage function and Long-term data storage

• Data movement- Data movement between devices

→Input–output (I/O) device is referred to as peripheral

→Data movement over long distance is known as data communications

• Control Unit
Go, change the world

Structure
• Single Core Architecture
• Multicore Architecture
Go, change the world
Single Core Architecture -Structure
• CPU: Controls the operation of the computer
and does data processing functions
• Main memory: Stores data
• I/O: Moves data between the computer and its
external environment
• System interconnection: System bus, consists
of a number of conducting wires to which all the
other components attach.
Go, change the world
Components of CPU
• Control unit: Controls the operation of the CPU and hence the computer
• Arithmetic and logic unit (ALU): Performs the computer’s data processing
functions.
• Registers: Provides storage internal to the CPU.
• CPU interconnection: Some mechanism that provides for communication
among the control unit, ALU, and registers.
Go, change the world

Multi-Core Architecture -Structure


• Central processing unit (CPU): That
portion of a computer that fetches and
executes instructions

• Core: An individual processing unit on a


processor chip

• Processor: A physical piece of silicon


containing one or more cores. A processor
with multiple cores-multicore processor
Go, change the world

Terminologies
• A printed circuit board (PCB) is a rigid, flat board that holds and
interconnects chips and other electronic components
• The main printed circuit board in a computer is called a system board
or motherboard, while smaller ones that plug into the slots in the main board
are called expansion boards
• Prominent elements on the motherboard are the chips
• A chip is a single piece of semiconducting material upon which electronic
circuits and logic gates are fabricated
Go, change the world

Functional Elements of Core

■ Instruction logic: Fetching instructions, and decoding


■ ALU: Performs the operation specified by an instruction
■ Load/store logic: Manages the transfer of data to and from main memory
via cache
Go, change the world

IBM zEnterprise EC12 mainframe computer


Go, change the world
Sub-areas in Core
• ISU (instruction sequence unit)
• IFU (instruction fetch unit)
• IDU (instruction decode unit)
• LSU ( load-store unit)
• XU (translation unit) - Logical addresses into physical addresses
• FXU ( fixed-point unit)
• BFU (binary floating-point unit)
• FU (decimal floating-point unit)
• RU (recovery unit)- The RU keeps a copy of the complete state of the system that includes
all registers, collects hardware fault signals, and manages the hardware recovery actions.
• COP (dedicated co-processor): Data compression and Encryption
• I-cache: L1 instruction cache
• L2 control: Manages the traffic through the two L2 caches.
• Data-L2
• Instr-L2
Go, change the world

Computer Components
• Key Concepts
→Data and instructions are stored in a single read–write memory
→The contents of this memory are addressable by location, without
regard to the type of data contained there
→Execution occurs in a sequential fashion (unless explicitly modified)
from one instruction to the next
Go, change the world

Interconnection structures
• The collection of paths connecting the various modules is called the
interconnection structure.
• A computer consists of 3 types of components -processor, memory, I/O
Go, change the world

Bus Interconnection
Go, change the world
Control lines
• Memory write: causes data on the bus to be written into the addressed location.
• Memory read: causes data from the addressed location to be placed on the bus.
• I/O write: causes data on the bus to be output to the addressed I/O port.
• I/O read: causes data from the addressed I/O port to be placed on the bus.
• Transfer ACK: indicates that data have been accepted from or placed on the bus.
• Bus request: indicates that a module needs to gain control of the bus.
• Bus grant: indicates that a requesting module has been granted control of the bus.
• Interrupt request: indicates that an interrupt is pending.
• Interrupt ACK: acknowledges that the pending interrupt has been recognized.
• Clock: is used to synchronize operations.
• Reset: initializes all modules
Go, change the world

Questions

17
+

(Advanced) Computer Organization & Architechture

Prof. Dr. Hasan Hüseyin BALIK


(3rd Week)
+
Outline
2. The computer system
2.1 A Top-Level View of Computer Function
and Interconnection
2.2 Cache Memory
2.3 Internal Memory Technology
2.4 External Memory
2.5 Input/Output
+

2.1 A Top-Level View of Computer


Function and Interconnection
+
Computer Components

 Contemporary computer designs are based on concepts


developed by John von Neumann at the Institute for
Advanced Studies, Princeton

 Referred to as the von Neumann architecture and is based on


three key concepts:
 Data and instructions are stored in a single read-write memory
 The contents of this memory are addressable by location, without
regard to the type of data contained there
 Execution occurs in a sequential fashion (unless explicitly
modified) from one instruction to the next

 Hardwired program
 The result of the process of connecting the various components in
the desired configuration
+ Data
Sequence of
arithmetic
and logic
functions
Results

(a) Programming in hardware

Hardware
and Software I nstruction I nstruction

Approaches
codes interpreter

Control
signals

General-purpose
Data arithmetic Results
and logic
functions

(b) Programming in software

Figure 3.1 Hardware and Software Approaches


Software
• A sequence of codes or instructions
• Part of the hardware interprets each instruction and
Software
generates control signals
• Provide a new sequence of codes for each new
program instead of rewiring the hardware
Major components:
• CPU I/O
• Instruction interpreter Components
• Module of general-purpose arithmetic and logic
functions
• I/O Components
• Input module
+ • Contains basic components for accepting data
and instructions and converting them into an
internal form of signals usable by the system
• Output module
• Means of reporting results
CPU Main Memory
0
System 1
2
PC MAR Bus
Instruction
Instruction
Instruction
IR MBR

I/O AR
Data
Execution
unit Data
I/O BR Data
Data

I/O Module n–2


n–1

PC = Program counter
Buffers IR = Instruction register
MAR = Memory address register
MBR = Memory buffer register
I/O AR = Input/output address register
I/O BR = Input/output buffer register

Figure 3.2 Computer Components: Top-Level View


Memory address Memory buffer
register (MAR) register (MBR) MEMORY
• Specifies the address • Contains the data to
in memory for the be written into
next read or write memory or receives
the data read from
memory

MAR
I/O address I/O buffer
register (I/OAR) register (I/OBR)
• Specifies a particular • Used for the
I/O device exchange of data
+ between an I/O
module and the CPU

MBR
Fetch Cycle Execute Cycle

Fetch Next Execute


START HALT
Instruction Instruction

Figure 3.3 Basic Instruction Cycle


+
Fetch Cycle
 At the beginning of each instruction cycle the processor
fetches an instruction from memory

 The program counter (PC) holds the address of the


instruction to be fetched next

 The processor increments the PC after each instruction


fetch so that it will fetch the next instruction in sequence

 The fetched instruction is loaded into the instruction


register (IR)

 The processor interprets the instruction and performs the


required action
Action Categories
• Data transferred from • Data transferred to or
processor to memory from a peripheral
or from memory to device by
processor transferring between
the processor and an
I/O module

Processor- Processor-
memory I/O

Data
Control
processing

• An instruction may • The processor may


specify that the perform some
sequence of arithmetic or logic
execution be altered operation on data
0 3 4 15
Opcode Address

(a) Instruction format

0 1 15
S Magnitude

(b) Integer format

Program Counter (PC) = Address of instruction


Instruction Register (IR) = Instruction being executed
Accumulator (AC) = Temporary storage

(c) Internal CPU registers

0001 = Load AC from Memory


0010 = Store AC to Memory
0101 = Add to AC from Memory

(d) Partial list of opcodes

Figure 3.4 Characteristics of a Hypothetical M achine


Memory CPU Registers Memory CPU Registers
300 1 9 4 0 3 0 0 PC 300 1 9 4 0 3 0 1 PC
301 5 9 4 1 AC 301 5 9 4 1 0 0 0 3 AC
302 2 9 4 1 1 9 4 0 IR 302 2 9 4 1 1 9 4 0 IR
• •
• •
940 0 0 0 3 940 0 0 0 3
941 0 0 0 2 941 0 0 0 2
Step 1 Step 2
Memory CPU Registers Memory CPU Registers
300 1 9 4 0 3 0 1 PC 300 1 9 4 0 3 0 2 PC
301 5 9 4 1 0 0 0 3 AC 301 5 9 4 1 0 0 0 5 AC
302 2 9 4 1 5 9 4 1 IR 302 2 9 4 1 5 9 4 1 IR
• •
• •
940 0 0 0 3 940 0 0 0 3 3+2=5
941 0 0 0 2 941 0 0 0 2
Step 3 Step 4
Memory CPU Registers Memory CPU Registers
300 1 9 4 0 3 0 2 PC 300 1 9 4 0 3 0 3 PC
301 5 9 4 1 0 0 0 5 AC 301 5 9 4 1 0 0 0 5 AC
302 2 9 4 1 2 9 4 1 IR 302 2 9 4 1 2 9 4 1 IR
• •
• •
940 0 0 0 3 940 0 0 0 3
941 0 0 0 2 941 0 0 0 5
Step 5 Step 6

Figure 3.5 Example of Program Execution


(contents of memory and registers in hexadecimal)
Instruction Operand Operand
fetch fetch store

Multiple Multiple
operands results

Instruction Instruction Operand Operand


Data
address operation address address
Operation
calculation decoding calculation calculation

Return for string


Instruction complete, or vector data
fetch next instruction

Figure 3.6 Instruction Cycle State Diagram


Program Generated by some condition that occurs as a result of an instruction
execution, such as arithmetic overflow, division by zero, attempt to
execute an illegal machine instruction, or reference outside a user's
allowed memory space.
Timer Generated by a timer within the processor. This allows the operating
system to perform certain functions on a regular basis.
I /O Generated by an I/O controller, to signal normal completion of an
operation, request service from the processor, or to signal a variety of
error conditions.
Hardware failure Generated by a failure such as power failure or memory parity error.

Classes of Interrupts
User I/O User I/O User I/O
Program Program Program Program Program Program

1 4 1 4 1 4

I/O I/O I/O


Command Command Command
WRITE WRITE WRITE
5
2a
END
2 2

Interrupt Interrupt
2b Handler Handler

WRITE WRITE 5 WRITE 5

END END
3a

3 3

3b

WRITE WRITE WRITE

(a) No interrupts (b) Interrupts; short I/O wait (c) Interrupts; long I/O wait

= interrupt occurs during course of execution of user program

Figure 3.7 Program Flow of Control Without and With Interrupts


User Program Interrupt Handler

i
Interrupt
occurs here i+1

Figure 3.8 Transfer of Control via Interrupts


Fetch Cycle Execute Cycle Interrupt Cycle

Interrupts
Disabled
Check for
Fetch Next Execute
START Interrupt;
Instruction Instruction Interrupts Process Interrupt
Enabled

HALT

Figure 3.9 Instruction Cycle with Interrupts


Time

1 1

4 4
I/O operation
I/O operation;
processor waits 2a concurrent with
processor executing

5 5

2b
2
4
I/O operation
4 3a concurrent with
processor executing
I/O operation;
processor waits 5

5 3b

(b) With interrupts


3

(a) Without interrupts

Figure 3.10 Program Timing: Short I/O Wait


Time

1 1

4 4

I/O operation; 2 I/O operation


processor waits concurrent with
processor executing;
then processor
waits
5

5
2
4
4
3 I/O operation
concurrent with
I/O operation; processor executing;
processor waits then processor
waits

5
5

3 (b) With interrupts

(a) Without interrupts

Figure 3.11 Program Timing: Long I/O Wait


Instruction Operand Operand
fetch fetch store

Multiple Multiple
operands results

Instruction Instruction Operand Operand


Data Interrupt
address operation address address Interrupt
Operation check
calculation decoding calculation calculation

No
Instruction complete, Return for string interrupt
fetch next instruction or vector data

Figure 3.12 Instruction Cycle State Diagram, With Interrupts


Interrupt
User program handler X

Interrupt
handler Y

(a) Sequential interrupt processing

Interrupt
User program handler X

Interrupt
handler Y

(b) Nested interrupt processing

Figure 3.13 Transfer of Control with Multiple Interrupts


Printer Communication
User program
interrupt service routine interrupt service routine
t=0

15
0 t=
t =1

t = 25

t= t = 25 Disk
40 interrupt service routine

t=
35

Figure 3.14 Example Time Sequence of Multiple Interrupts


+
I/O Function
 I/O module can exchange data directly with the processor

 Processor can read data from or write data to an I/O module


 Processor identifies a specific device that is controlled by a
particular I/O module
 I/O instructions rather than memory referencing instructions

 In some cases it is desirable to allow I/O exchanges to occur


directly with memory
 The processor grants to an I/O module the authority to read from
or write to memory so that the I/O memory transfer can occur
without tying up the processor
 The I/O module issues read or write commands to memory
relieving the processor of responsibility for the exchange
 This operation is known as direct memory access (DMA)
Read Memory
Write
N Words
Address 0 Data

Data N–1

Read I/O Module Internal


Write Data

External
Address M Ports Data

Internal
Data Interrupt
Signals
External
Data

Instructions Address

Control
Data CPU Signals

Interrupt Data
Signals

Figure 3.15 Computer Modules


The interconnection structure must support the
following types of transfers:

Memory Processor I/O to or


I/O to Processor
to to from
processor to I/O
processor memory memory

An I/O
module is
allowed to
exchange
data
Processor Processor
directly
reads an Processor reads data Processor
with
instruction writes a from an I/O sends data
memory
or a unit of unit of data device via to the I/O
without
data from to memory an I/O device
going
memory module
through the
processor
using direct
memory
access
Data Bus
 Data lines that provide a path for moving data among system
modules

 May consist of 32, 64, 128, or more separate lines

 The number of lines is referred to as the width of the data bus

 The number of lines determines how many bits can be


transferred at a time

 The width of the data bus


is a key factor in
determining overall
system performance
+ Address Bus Control Bus

 Used to designate the source or


destination of the data on the  Used to control the access and the use
data bus of the data and address lines
 If the processor wishes to
read a word of data from  Because the data and address lines are
memory it puts the address of
the desired word on the shared by all components there must
address lines be a means of controlling their use

 Width determines the maximum  Control signals transmit both command


possible memory capacity of the and timing information among system
system
modules
 Also used to address I/O ports
 Timing signals indicate the validity of
 The higher order bits are
used to select a particular data and address information
module on the bus and the
lower order bits select a  Command signals specify operations to
memory location or I/O port be performed
within the module
CPU Memory Memory I/O I/O

Control lines

Address lines Bus

Data lines

Figure 3.16 Bus Interconnection Scheme


Go, change the world

COMPUTER ARITHMETIC

Dr. Anala M R
Professor
Information Science and Engineering
RV College of Engineering
Bengaluru – 59
E-mail: [email protected]
Go, change the world

Computer Arithmetic

• The Arithmetic and Logic Unit (ALU)


• Integer Representation
• Integer Arithmetic
• Floating-Point Representation
Go, change the world

Arithmetic and Logic Unit


Go, change the world

Integer Representation

• Sign-Magnitude Representation
• One’s Complement Representation
• Two’s Complement Representation
• Range Extension
Go, change the world

Integer Representation - Sign-Magnitude Representation

Drawbacks
• Addition & subtraction require a consideration of both the signs of the numbers
and their relative magnitudes to carry out the required operation.
• There are two representations of 0
• More difficult to test for 0

Go, change the world

Integer Representation Two’s Complement Representation

• Positive Number

• Negative Number

Go, change the world

Integer Representation Two’s Complement Representation

• Characteristics of Twos Complement Representation and Arithmetic


Go, change the world
Representations for 4-Bit Integers

Go, change the world

Integer Representation Range Extension

• n-bit integer and stored in m bits, where m > n


• Expansion of bit length is referred to as range extension,
Go, change the world

Integer Arithmetic
• Addition
• Subtraction
• Multiplication
• Division
Go, change the world
Addition
Go, change the world
Subtraction
Go, change the world

Hardware- Addition and Subtraction


Go, change the world
Multiplication -unsigned integers
Go, change the world
Go, change the world
Multiplication -Signed integers
Go, change the world

Booth Algorithm
Go, change the world

Booth Algorithm
Go, change the world

Booth Algorithm
Go, change the world

Booth Algorithm
Go, change the world

Booth Algorithm
Go, change the world
Booth Algorithm(7*3)
Go, change the world

Booth Algorithm-Hardware Implementation

Right shift - leftmost bit of A, namely An - 1 , not only


is shifted into An - 2 , but also remains in An - 1
Go, change the world

Example – Booth Algorithm(7*5)


Go, change the world
Go, change the world

Example →15*-13
Go, change the world

Division
Division –Examples Go, change the world
Go, change the world

Division(7/3)

You might also like