Unit 5
Unit 5
Peripheral Devices
Input or output devices attached to the computer are also called peripherals.
The display terminal can operate in a single-character mode where all characters entered
on the screen through the keyboard are transmitted to the computer simultaneously. In
the block mode, the edited text is first stored in a local memory inside the terminal. The
text is transferred to the computer as a block of data.
Printers provide a permanent record on paper of computer output data.
Magnetic tapes are used mostly for storing files of data.
Magnetic disks have high-speed rotational surfaces coated with magnetic material.
Input-Output Interface
Input-output interface provides a method for transferring information between internal storage
and external I/O devices. Peripherals connected to a computer need special communication
links for interfacing them with the central processing unit. The major differences are:
1. Peripherals are electromechanical and electromagnetic devices and their manner of
operation is different from the operation of the CPU and memory, which are electronic
devices. Therefore, a conversion of signal values may be required.
2. The data transfer rate of peripherals is usually slower than the transfer rate of the CPU,
and consequently, a synchronization mechanism may be needed.
3. Data codes and formats in peripherals differ from the word format in the CPU and
memory.
4. The operating modes of peripherals are different from each other and each must be
controlled so as not to disturb the operation of other peripherals connected to the CPU.
A typical communication link between the processor and several peripherals is shown in
Fig.36. The I/O bus consists of data lines, address lines, and control lines. The magnetic disk,
printer, and terminal are employed in practically any general-purpose computer. The interface
selected responds to the function code and proceeds to execute it. The function code is referred
to as an I/O command and is in essence an instruction that is executed in the interface and its
attached peripheral unit.
There are three ways that computer buses can be used to communicate with memory and I/O:
1. Use two separate buses, one for memory and the other for I/O.
2. Use one common bus for both memory and I/O but have separate control lines for each.
3. Use one common bus for memory and I/O with common control lines.
Isolated I/O versus Memory-Mapped I/O
Many computers use one common bus to transfer information between memory or I/O and the
CPU. In the isolated I/O configuration, the CPU has distinct input and output instructions, and
each of these instructions is associated with the address of an interface register. The isolated
I/O method isolates memory and I/O addresses so that memory address values are not affected
by interface address assignment since each has its own address space. The other alternative is
to use the same address space for both memory and I/O.
This is the case in computers that employ only one set of read and write signals and do not
distinguish between memory and I/O addresses. This configuration is referred to as memory-
mapped I/O. In a memory-mapped I/O organization there is no specific input or output
instructions. Computers with memory-mapped I/O can use memory-type instructions to access
I/O data.
An example of an I/O interface unit is shown in block diagram form in Fig.37. It consists of
two data registers called ports, a control register, a status register, bus buffers, and timing and
control circuits. The interface communicates with the CPU through the data bus. The chip
select and register select inputs determine the address assigned to the interface. The I/O read
and write are two control lines that specify an input or output, respectively. The four registers
communicate directly with the I/O device attached to the interface.
The disadvantage of the strobe method is that the source unit that initiates the transfer has no
way of knowing whether the destination unit has actually received the data item that was
placed in the bus. The handshake method solves this problem by introducing a second control
signal that provides a reply to the unit that two-wire control initiates the transfer.
Figure 40 shows the data transfer procedure when initiated by the source. The two
handshaking lines are data valid, which is generated by the source unit, and data accepted,
generated by the destination unit. The timing diagram shows the exchange of signals between
the two units. Figure 41 the destination-initiated transfer using handshaking lines. Note that the
name of the signal generated by the destination unit has been changed to ready for data to
reflect its new meaning.
Asynchronous Serial Transfer
The transfer of data between two units may be done in parallel or serial. In parallel data
transmission, each bit of the message has its own path and the total message is transmitted at
the same time. This means that an w-bit message must be transmitted through n separate
conductor paths. In serial data transmission, each bit in the message is sent in sequence one at
a time. This method requires the use of one pair of conductors or one conductor and a common
ground. Parallel transmission is faster but requires many wires. It is used for short distances
and where speed is important. Serial transmission is slower but is less expensive since it
requires only one pair of conductors. Serial transmission can be synchronous or asynchronous.
A transmitted character can be detected by the receiver from knowledge of the transmission
rules:
1. When a character is not being sent, the line is kept in the 1-state.
2. The initiation of a character transmission is detected from the start bit, which is
always(0).
3. The character bits always follow the start bit.
4. After the last bit of the character is transmitted, a stop bit is detected when the line
returns to the 1-state for at least one bit time.
An example of serial transmission format is shown in Fig. 42.
3. Direct memory access (DMA): the interface transfers data into and out of the memory
unit through the memory bus. The CPU initiates the transfer by supplying the interface
with the starting address and the number of words needed to be transferred and then
proceeds to execute other tasks. This method of connection between devices and the
memory is shown in Fig. 45.
Pipelining
Pipelining is a technique of decomposing a sequential process into sub-operations; with
each sub-process being executed in a special dedicated segment that operates concurrently
with all other segments. A pipeline can be visualized as a collection of processing segments
through which binary information flows.
General Considerations
Any operation that can be decomposed into a sequence of sub-operations of about the same
complexity can be implemented by a pipeline processor. The general structure of a four-
segment pipeline is illustrated in Fig. 46. The operands pass through all four segments in a
fixed sequence.
Instruction-Level Parallelism
Contrary to pipeline techniques, instruction-level parallelism (ILP) is based on the idea of
multiple issue processors (MIP). An MIP has multiple pipelined datapaths for instruction
execution. Each of these pipelines can issue and execute one instruction per cycle. Figure 49
shows the case of a processor having three pipes. For comparison purposes, we also show in
the same figure the sequential and the single pipeline case.
Arithmetic Pipeline
Pipeline arithmetic units are usually found in very high speed computers. They are used to
implement floating-point operations, multiplication of fixed-point numbers, and similar
computations encountered in scientific problems.
an example of a pipeline unit for floating-point addition and subtraction. The inputs to the
floating-point adder pipeline are two normalized floating-point binary numbers.
A, B are two fractions that represent the mantissas and a, b are the exponents. The sub-
operations that are performed in the four segments are:
1. Compare the exponents.
2. Align the mantissas.
3. Add or subtract the mantissas.
4. Normalize the result.
Numerical example may clarify the sub-operations performed in each segment. For simplicity,
we use decimal numbers, although Fig.49 refers to binary numbers. Consider the two
normalized floating-point numbers:
The two exponents are subtracted in the first segment to obtain (3 − 2 = 1). The larger
exponent 3 is chosen as the exponent of the result. The next segment shifts the mantissa of Y
to the right to obtain:
This aligns the two mantissas under the same exponent. The addition of the two mantissas in
segment 3 produces the sum:
Suppose that the time delays of the four segments are 𝑡1 = 60𝑛𝑠, 𝑡2 = 70𝑛𝑠, 𝑡3 = 100𝑛𝑠,
𝑡4 = 80𝑛𝑠, and the interface registers have a delay of 𝑡𝑟 = 10𝑛𝑠. The clock cycle is chosen to
be 𝑡𝑝 = 𝑡3 + 𝑡𝑟 = 110𝑛𝑠 . An equivalent non-pipeline floating point adder-subtractor will
have a delay time 𝑡𝑛 = 𝑡1 + 𝑡2 + 𝑡3 + 𝑡4 + 𝑡𝑟 = 320𝑛𝑠. In this case the pipelined adder has a
speedup of 320/110 = 2.9 over the non-pipelined adder.
Supercomputers
Supercomputers are very powerful, high-performance machines used mostly for scientific
computations. To speed up the operation, the components are packed tightly together to
minimize the distance that the electronic signals have to travel. Supercomputers also use
special techniques for removing the heat from circuits to prevent them from burning up
because of their close proximity.
A supercomputer is a computer system best known for its high computational speed, fast and
large memory systems, and the extensive use of parallel processing.
Delayed Branch
Consider now the operation of the following four instructions:
If the three-segment pipeline proceeds: (I: Instruction fetch, A:ALU operation, and E: Execute
instruction) without interruptions, there will be a data conflict in instruction 3 because the
operand in R2 is not yet available in the A segment. This can be seen from the timing of the
pipeline shown in Fig. 50(a). The E segment in clock cycle 4 is in a process of placing the
memory data into R2. The A segment in clock cycle 4 is using the data from R2, but the value
in R2 will not be the correct value since it has not yet been transferred from memory. It is up
to the compiler to make sure that the instruction following the load instruction uses the data
fetched from memory. It was shown in Fig. 50 that a branch instruction delays the pipeline
operation by NOP instruction until the instruction at the branch address is fetched.