Disadvantages:-The Circuit Speed Can't Be Increased Indefinitely. The Most Ironic
Disadvantages:-The Circuit Speed Can't Be Increased Indefinitely. The Most Ironic
(3+3)
The von Neumann architecture allows instructions and data to be mixed and stored in
the same memory module and the contents of this memory are addressable by location
only. The execution occurs in a sequential fashion.
The Von Neumann machine had five basic parts:- (i) Memory (ii) ALU (iii) Program
control unit (iv) Input equipment & (v) output equipment.
Disadvantages:- The circuit speed cant be increased indefinitely. The most ironic
aspect of the quest for even increasing speed is that most of the transistors are idle all
the time. The traditional structure of a computer with a single CPU that issues
sequential requests over a bus to a memory that responds to one request at a time has
became known as the von-Neumann bottle neck.Modern computer predominantly
follow the vonNeuman architecture, but use some elements of Harvard architecture .
6. Device subsystem
External Storage System:
Common used secondary memory is magnetic disks and magnetic tapes.
Other devices (magnetic drums, bubble memory, CD, DVD, flash disks, etc.)
The important characteristics of those storage devices are
Access mode
Access time
Transfer rate
Capacity
Cost.
Access time
Average time needed to reach storage location and obtain contents.
Access time = seek time + transfer time
Seek time:
Transfer time: time required to transfer data to-from device.
Secondary storage devices are organized into records (blocks). Reading or writing is always done with
entire record.
Transfer rate
The number of characters or words the device can transfer in one second
Advantages
External storage provides additional storage other than that available in a computer.
Data can be transported easily from one place to another.
It is useful to store software and data that is not needed frequently.
External storage also works as data back up.
This back up may prove useful at times such as fire or theft because important data is not lost.
Optical memory:
The huge commercial success of CD enabled the development of low cost optical disk storage
technology that has revolutionized computer data storage.
Digitally recorded information is imprinted as series of microscopic pits on the surface of poly
carbonate.
The pited surface is then coated with reflecting surface usually aluminium or gold.
The shiny surface is protected against dust and scratches by the top coat of acrylic.
The intensity of reflected light of laser changes as it encounters a pit. Specifically if the laser
beam falls on pit which has some what rough surface the light scatters and low intensity is reflected
back to the surface.
The areas between pits are called lands.
A land is smooth surfaces which reflect back at higher intensity.
The change between pits and land is detected by photo sensor and converted into digital
signal.
The sensor tests the surface at regular interval.
Magnetic disk:
Disk that are permanently attached to the unit assembly and can not be used by occasional
user are called hard disk drive with removal disk is called floppy disk.
Magnetic tape:
Tape system used the same reading and recording technique as disk system.
The medium is flexible polyester tape coated with magnetizable material.
Data on tapes are structured as number of parallel tracks running length wise.
Earlier tape system typically used nine tracks.
This made it possible to store data one byte at a time with additional parity bit as 9th track. The
recording of data in this form is referred to as parallel recording.
RAID (Redundant Array Independent Disk) Architecture:
Disk storage designers recognize that if one component can only be pushed so far addition gain in the
performance are to be had by using multiple parallel components in the case of disk storage this leads to
the development of arrays of disk that operate independently and in parallel with multiple disk separate I/O
request can be handled in parallel as long as the data reside on separate disk. Further single I/O request
can be executed in parallel if the block of data to be accessed is distributed across multiple disks. With the
use of multiple disks there is wide variety of ways in which data can be organized and in which redundancy
can be added to improve reliability. This could make it difficult to develop data base scheme that are usable
on number of plat form and operating system. Fortunately industry has agreed on standardized on scheme
for multiple disk data base design know as RAID.
Or
RAID stands for Redundant Array of Independent Disks
A system of arranging multiple disks for redundancy (or
performance)
Term first coined in 1987 at Berkley
Idea has been around since the mid 70s
RAID is now an umbrella term for various disk arrangements
Not necessarily redundant
RAID
0
It splits data among two or more disks.
Provides good performance.
Lack of data redundancy means there is no fail over
support with this configuration.
In the diagram to the right, the odd blocks are written to
disk 0 and the even blocks to disk 1 such that A1, A2, A3,
A4, would be the order of blocks read if read
sequentially from the beginning.
Used in read only NFS systems and gaming systems.
Prepared By: Sujan Kunwar-5
RAID
1
RAID1 is data mirroring.
Two copies of the data are held on two physical disks, and the data is always identical.
Twice as many disks are required to store the same data when compared to RAID 0.
Array continues to operate so long as at least one drive is functioning.
RAID-1 provides the best performance and the best fault-tolerance in a multi-user system
RAID 2:
This type uses striping across disks with some disks storing error checking and correcting (ECC)
information.
It has no advantage over RAID-3.
RAID
3:
This type uses striping and dedicates one drive to storing parity information.
The embedded error checking (ECC) information is used to detect errors.
Data recovery is accomplished by calculating the exclusive OR (XOR) of the information recorded on
the other drives.
Since an I/O operation addresses all drives at the same time, RAID-3 cannot overlap I/O.
For this reason, RAID-3 is best for single-user systems with long record applications.
RAID-4:
This type uses large stripes, which means you can read records from any single drive.
This allows you to take advantage of overlapped I/O for read operations.
Since all rite operations have to update the parity drive, no I/O overlapping is possible.
RAID-4 offers no advantage over RAID-5.
RAID-5:
This type includes a rotating parity array, thus addressing the write limitation in RAID-4.
Thus, all read and write operations can be overlapped.
RAID-5 stores parity information but not redundant data (but parity information can be used to
reconstruct data). RAID-5 requires at least three and usually five disks for the array.
It's best for multi-user systems in which performance is not critical or which do few write operations.
RAID-6:
This type is similar to RAID-5 but includes a second parity scheme that is distributed across different
drives and thus offers extremely high fault- and drive-failure tolerance.
[8 Marks]
A special very-high-speed memory called cache is used to increase the speed of processing by
making current programs and data available to the CPU at a rapid rate
CPU logic is usually faster than main memory access time, with the result that processing speed is
limited primarily by the speed of main memory
The cache is used for storing segments of programs currently being executed in the CPU and
temporary data frequently needed in the present calculations
The typical access time ratio between cache and main memory is about 1 to 7~10
Auxiliary memory access time is usually 1000 times that of main memory
Most of the main memory in a general purpose computer is made up of RAM integrated circuits
chips, but a portion of the memory may be constructed with ROM chips
RAM Random Access memory
o Integrated RAM are available in two possible operating modes, Static and Dynamic
ROM Read Only memory
Read Only Memory (ROM): ROM is used for storing programs that are PERMENTLY resident in the computer and for tables of
constants that do not change in value once the production of the computer is completed
The ROM portion of main memory is needed for storing an initial program called bootstrap loader,
witch is to start the computer software operating when power is turned off
A RAM chip is better suited for communication with the CPU if it has one or more control inputs that select
the chip when needed.
The Block diagram of a RAM chip is shown next slide, the capacity of the memory is 128 words of 8 bits (one
byte) per word
to
to
state
Auxiliary Memory
Prepared By: Sujan Kunwar-8
MAGNETIC DISKS
Circular plate constructed of metal or plastic coated with magnetized material.
Both sides of disks are used and severatemsl sysl disks may be stacked with read-write heads
available for each surface.
All disks rotate together at high speed
Bits are stored in tracks which are concentric circles
Tracks are divided into sections called sector.
Some disk systems use single read-write
head moveable to different tracks using
mechanical assembly
Others use multiple read-write heads
positioned on each track (faster, more
expensive).
Addressing used to specify disk number,
surface, track number, and sector within
track.
After head positioned at track, must wait to
synchronize with sector
Then reading data will start as same speed
rotation
Hard disks are permanently attached to unit
and cannot move. Floppy disks are removable ones. With 2 sizes 5.25 and 3.5
MAGNETIC TAPES
Strip of plastic coated with magnetic recording material.
Bits are recorded as magnetic spots
several parallel tracks(7 to 9 tracks
form character with parity).3
Read-write heads are positioned on
track.
Magnetic tape units can be started
stopped, forward moved, or reverse
or rewound.
Data are recorded in records
(number of characters) followed by
between record for synchronization.
At start and end of each record there
bit patterns.
Records are identified by reading ID bit patterns.
of
along
to
each
moved
gaps
is
ID
Associative Memory
Accessed by the content of the data rather than by an address.
Also called Content Addressable Memory (CAM).
Prepared By: Sujan Kunwar-9
When word is written to CAM, no address is needed; next available unused storage location is
located.
When word is read from CAM, the content of word or part of it is specified, the memory locates all
words which give match and marks them for reading.
Associative memories are expensive and used for application where time search is critical.
HARDWARE ORANIZATION
Consists of memory array of m words each of n bits,
argument register A and key register K each of n bits.
Match register M has m bits, one for each
memory word
Each word of memory is compared in parallel
with content of argument register and set
corresponding bit in match register. Those
bits set in match register indicate their words
has match.
Key register provides mask to select particular
bits in argument word to be included in match
not. 1 means corresponding bit in argument
register is in match and 0 means not.
or
Previous figure shows CAM memory of m words by n cells per word and next figure shows internal
organization of single cell.
Cache Memory:
Cache Memory is defined as a very high speed memory that is used in computer system to
compensate the seed differential between the main memory access time and processor logic.
A very high speed memory called a cache is used to increase the speed of processing by making
current programs and data available to the CPU at a rapid rate.
It is place between the CPU and the main memory.
The cache memory access time is less than the access time of the main memory by a factor of 5 to
10.
The cache is used for storing program segments currently being executed in the CPU and the data
frequently used in the present calculations.
By making programs and data available at a rapid rate, it is possible to increase the performance
rate of a computer.
Advantages
The average memory access time of a computer system can be improved by use of a cache.
The fast access time of cache memory.
Very little or no time wasted when searching for words in the cache.
Cache memory is a fast & small memory.
Program segment and data frequently needed by CPU are stored in cache memory and hence fast
processing.
Associative memories are expensive compared to random-access memories because of the added
logic associated with each cell. The possibility of using a random-access memory for the cache is
investigated. The CPU address of 15 bits is divided into two fields.
The nine least significant bits constitute the index field and the remaining six bits form the tag field.
The figure shows that main memory needs an address that includes both the tag and the index bits.
The number of bits in the index field is equal to the number of address bits required to access the
cache memory.
In the general case, there are 2k words in cache memory and 2n words in main memory.
The n-bit memory address is divided into two fields: k bits for the index field and n k bits for the tag
field.
The direct mapping cache organization uses the n-bit address to access the main memory and the kbit index to access the cache. Each word in cache consists of the data word and its associated tag.
When a new word is first brought into the cache, the tag bits are stored alongside the data bits.
When the CPU generates a memory request, the index field is used for the address to access the
cache.
The tag field of the CPU address is compared with the tag in the word read from the cache.
If the two tags match, there is a hit and the desired data word is in cache.
If there is no match, there is a miss and the required word is read from main memory.
It is then stored in the cache together with the new tag, replacing the previous value.
The disadvantage of direct mapping is that the hit ratio can drop considerably if two or more words
whose addresses have the same index but different tags are accessed repeatedly.
Prepared By: Sujan Kunwar-12
iii. Set-Associative mapping: The disadvantage of direct mapping is that two words with the same index in their address but with
different tag values cannot reside in cache memory at the same time.
A third type of cache organization, called setassociative mapping, is an improvement over the
direct-mapping organization in that each word of
cache can store two or more words of memory
under the same index address.
Each data word is stored together with its tag and
the number of tag-data items in one word of cache
is said to form a set.
An example of a set-associative cache organization
for a set size of two is shown.
Each index address refers to two data words and
their associated tags.
Each tag requires six bits and each data words has
12 bits, so the word length is 2(6+12) = 36 bits.
An index address of nine bits can accommodate
512 words.
Thus the size of cache memory is 512 x 36.
It can accommodate 1024 words of main memory
since each word of cache contains two data words.
In general, a set-associative cache of set size K will
accommodate k words of main memory in each word of cache.
When a miss occurs in a set-associative cache and the set is full, it is necessary to replace one of the
tag-data items with new value.
The most common replacement algorithms used are: FIFO and LRU. The FIFO procedure selects for
replacement the item that has been in the set the longest. The LRU algorithm selects for replacement
the items that have been least recently used by the CPU. Both FIFO and LRU can be implemented by
adding a few extra bits in each word of cache.
[12 Marks]
Instruction Sequencing:
Control Memory
Access is performed as part of a control section sequence while the master clock oscillator
is running.
The control memory addresses are divided into two groups: a task mode and
an executive (interrupt) mode.
Addressing words stored in control memory is via the address select logic for each of the
register groups.
There can be up to five register groups in control memory.
These groups select a register for fetching data for programmed CPU operation or for
maintenance console or equivalent display or storage of data via a maintenance console
or equivalent.
During programmed CPU operations, these registers are accessed directly by the CPU logic.
Data routing circuits are used by control memory to interconnect the registers
used in control memory.
Some of the registers contained in a control memory that operates in the task and the
executive modes include the following:
Accumulators
Indexes
Monitor clock status indicating registers
Interrupt data registers
Hardwired Control
In the hardware implementation the control unit is essentially a combinational circuit or
sequential circuit. The key inputs are instruction register, the clock flags & control bus signals.
Each of these individual bits typically has some meaning.
The other inputs are not directly useful to control unit. The control unit makes use of opcode perform different action for different instruction. This function can be performed by a
decoder which takes & encoding inputs & produces single output. The clock portion of the control
unit issues a representative sequence of pulses.
PQ = 01 Indirect cycle .
PQ = 10 Execute cycle .
PQ = 11 Interrupt cycle.
Then the following Boolean expression defines c5.
C5 = PQT2 +p.Q.T2
i.e the control signal c5 will be asserted during the 2nd time unit of both fetch and indirect cycle.
Micro-operation.
The operation of computer is executing a program consists of sequence of instruction cycle. Each
instruction cycle is made up off no of smaller units, one subdivision that we found convenient is
fetch, indirect execute and interrupt with only fetch and execute cycle always occurring. Each of
the smaller cycle involves series of steps, each of which involve processor register. We will refer to
these steps as micro operations. Fig depicts the relationship among the various concepts we have
been discussing.
Fetch cycle: It causes an instruction to be fetched form memory. Fetch cycle actually consist of
three steps and four micro operations.
t1: MAR (PC)
t2: MBR Memory
PC PC+1
t3: IR (MBR)
The notion (t1, t2, t3) represent successive time units.
Indirect cycle: once an instruction is fetch, the next step is to fetch source operand.
t1: MAR (IR (address))
t2: MBR Memory
t3: IR(address) (MBR(address))
Interrupt cycle: At the completion of execution cycle a test is made to determine weather any
enabled interrupts have occur if so the interrupt cycle occurs.
t1: MBR (PC)
t2: MARsave address
PC Routine address
t3: Memory (MBR)
Execute cycle: The fetch indirect and interrupt cycle are simple and predictable. Each involve fix
sequence of micro operation. This is not true of the execute cycle for a machine with N different
upcodes, there are N different sequence of micro operation that can occur. Consider ADD
instruction.
ADD R1,X
Which adds the content of location X to register R1.
t1: MAR (IR address)
t2: MBR Memory
t3: R1 (R1)+(MBR)
Microinstruction sequencing: The two basic tasks performed by micro programmed control
unit are as follows:
- Micro instruction sequencing:- Get then next micro instruction from the control memory.
- Micro instruction execution:- Generates the control signals needed to execute the micro
instruction.
Based on the current micro operation, condition flags and content of instruction register, control
memory address must be generated for next micro instruction. A wide variety of techniques have
been used. We can group them into three general categories based on the format of address
information in the micro instruction:
- Two address field.
- Single address field.
- Variable format.
A multiplexer is provided that serves as destination for both address field plus instruction register
based on the address selection input the multiplexer transmits the op-code or one of the two
address to the control address register (CAR). CAR is subsequently decoded to produce the next
micro instruction address.
The sequencing logic module generates a address of next micro instruction using as inputs
instruction register flags, CAR(for implementing), control buffer register. The module is driven by
clock that determines the timing of micro instruction cycle. The control logic module generate the
control signal as a function of some of the bits in micro instruction.
Application of microprogramming:
The set of current application for micro programming include:
- Realization of computer.
- Micro program approach offers a systematic technique for control unit implementation. A relative
technique is emulation. Emulation refers to used of microprogramming on one machine to
execute program original written for another.
- Another used of micro program is in the area of operating system supports.
- Realization of special purpose device a good example of this is data communication bore
- High level language support microprogramming can be used to support monitoring detection,
isolation and repair of system error. These feature are known as micro diagnostics and
significantly enhance the system maintenance facility.
- User tailoring, a no of machine produced writable control store that is control memory
implemented in RAM rather than ROM and allows the user to write micro programs.
These allow the user to tailor the machine to the desire application.
Sequencing logic
read
Control memory
a. Micro-instruction sequencing
This means get the next instruction from control memory. Two concerns are involved in design
of micro-instruction sequentially technique.
i.
Size of micro instruction
ii.
Address generation times
b.
Micro-instruction execution
Implementation of instruction to get the desire task perform we organized the format into
independence field such that alternative actions can be specified the hold that are mutually
exclusive.
Point out the advantages and disadvantages of hardwired and micro-programmed
Control. [2064]
Hardwired Control
Advantages
Hardwired Control Unit is fast because control signals are generated by combinational
circuits.
The delay in generation of control signals depends upon the number of gates.
Disadvantages
More is the control signals required by CPU; more complex will be the design of control unit.
Modifications in control signal are very difficult. That means it requires rearranging of wires
in the hardware circuit.
It is difficult to correct mistake in original design or adding new feature in existing design of
control unit.
Micro-Programmed Control
Advantages
It is possible to alter the content of the micro-program memory (sometimes called the
control store)
One hardware can realized many instruction sets
One instruction set can be used throughout different models of hardware.
Simplifies design of control unit
"Cheaper
"Less error-prone
Disadvantages:
Slower
It is expensive in case of limited hardware resources.
Micro-programmed
Control
Slow
Hardwired Control
Speed
Fast
Cost of
Implementati More
on
Flexibility
Ability to
Handle
Complex
Instructions
Decoding
Applications
Instruction
Set Size
Control
Memory
Chip Area
Required
Cheaper
Flexible, new
instructions can easily
be added
Difficult
Easier
Complex
Easy
RISC Microprocessor
CISC Microprocessor
Small
Large
Absent
Present
Less
More
Hardwired Control
1. Hardwired control is a control
mechanism to generate control
signals by using appropriate finite
state machine (FSM).
hardwired control.
6. It is not flexible.
7. The design of control unit will be
more complex.
8. RISC Application
Micro programmed.
6. It is flexible.
7. The design of control unit will
be more easy.
8.CISC Application.
Chapter 9
Differentiate between synchronous and asynchronous data transfer method. (6)
Synchronous transmission
The two circuits share a common clock frequency and bits are transmitted continuously.
In long distant serial transmission, each unit is driven by a separate clock of the same
frequency.
In synchronous transmission bits must be transmitted continuously to keep the clock
frequency in both units synchronized with each other.
Asynchronous transmission
In synchronous transmission employs special bits that are inserted at both ends of the
character code.
Each character consists of three parts: a start bit, the character bits, and stop bits.
Asynchronous transmission sent only binary information.
DMA based data transfer
The transfer of data between a fast storage device such as magnetic disk and memory is
often limited by the speed of the CPU.
Removing the CPU from the path and letting the peripheral device manage the memory
buses directly would improve the speed of transfer.
This transfer technique is called direct memory access (DMA).
During DMA transfer, the CPU
is idle and has no control of
the memory buses.
Two control signals in the CPU
that
facilitate
the
DMA
transfer.
The bus request (BR) input is
used by the DMA controller to
request the CPU to relinquish
control of the buses.
When this input is active, the
CPU terminates the execution
of the current instruction. The
CPU activates the bus grant
(BC) output to inform the
external DMA that the buses
are in the high-impedance
state.
The DMA that originated the
bus request can now take
control of the buses to
conduct memory transfers
without
processor
intervention.
When the DMA terminates the transfer, it disables the bus request line. The CPU disables
the bus grant, takes control of the buses, and returns to its normal operation.
Prepared By: Sujan Kunwar-21
When the DMA takes control of the bus system, it communicates directly with the memory.
Discuss Direct Memory Access is detail.
The transfer of data between a fast storage device such as magnetic disk and memory is
often limited by the speed of the CPU.
Removing the CPU from the path and letting the peripheral device manage the memory
buses directly would improve the speed of transfer.
This transfer technique is called direct memory access (DMA).
During DMA transfer, the CPU is idle and has no control of the memory buses.
Figure below shows two control signals in the CPU that facilitate the DMA transfer.
The bus request (BR) input is used by the DMA controller to request the CPU to relinquish
control of the buses.
The CPU activates the bus grant (BG) output to inform the external DMA that the buses are
in the high-impedance state.
The DMA that originated the bus request can now take control of the buses to conduct
memory transfers without processor intervention.
When the DMA terminates the transfer, it disables the bus request line.
The CPU disables the bus grant, takes control of the buses, and returns to its normal
operation.
When the DMA takes control of the bus system, it communicates directly with the memory.
Figure shows the block diagram of a
typical DMA controller.
The unit communicates with the CPU via
the data bus and control lines.
The registers in the DMA are selected by
the CPU through the address bus by
enabling the DS (DMA select) and RS
(register select) inputs.
The RD (read) and WR (write) inputs are bidirectional.
transfer
When the BG (bus grant) input is 0, the CPU can communicate with the DMA registers
through the data bus to read from or write to the DMA registers.
When BG = 1, the CPU has relinquished the buses and the DMA can communicate directly
with the memory by specifying an address in the address bus and activating the RD or WR
control.
The DMA controller has three registers: an address register, a word count register, and a
control register.
The position of the DMA controller among the other components in a computer system is
illustrated in Fig. The CPU communicates with the DMA through the address and data buses as
with any interface unit. The DMA has its own address, which activates the DS and RS lines. The
CPU initializes the DMA through the data bus. Once the DMA receives the start control command,
it can start the transfer between the peripheral device and the memory. When the peripheral
device sends a DMA request, the DMA controller activates the BR line, informing the CPU to
relinquish the buses. The CPU responds with its BG line, informing the DMA that its buses are
disabled. The DMA then puts the current value of its address register into the address bus,
initiates the RD or WR signal, and sends a DMA acknowledge to the peripheral device. Note that
the RD and WR lines in the DMA controller are bi-directional. The direction of transfer depends on
the status of the BG line. When BG = 0, the RD and WR are input lines allowing the CPU to
communicate with the internal DMA controller to the random-access memory to specify the read
or write operation for the data. When the peripheral device receives a DMA acknowledge, it puts a
word in the data bus (for write) or receives a word from the data bus (for read). Thus the DMA
controls the read or write operations and supplies the address for the memory. The peripheral unit
can then communicate with memory through the data bus for direct transfer between the two
units while the CPU is momentarily disabled.
Because each CISC command must be translated by the processor into tens or
even hundreds of lines of microcode, it tends to run slower than an equivalent
series of simpler commands that do not require so much translation.
RISC (Reduced Instruction Set Computer):A reduced instruction set type microprocessor which reduces loss of central
processing unit (CPU) time.RISC is a microprocessor that is designed to perform a smaller
number of types of computer instructions so that it can operate at a higher speed. Since
each instruction type that a computer must perform requires additional transistors and
circuitry, a larger list or set of computer instructions tends to make the microprocessor
more complicated and slower in operation.
RISC
a.
b.
c.
d.
e.
f.
g.
h.
Characteristics
Relatively few instructions
Relatively few addressing modes
Memory access limited to load and store instructions
All operations done within the registers of the CPU
Fixed-length, easily decoded instruction format
Single-cycle instruction execution
Hardwired rather than micro-programmed control
Emphasis on software.
Pros
Emphasis on software
Single-clock,
reduced instruction only
Register to register:
"LOAD" and "STORE"
are independent instructions
Low cycles per second,
large code sizes
Spends more transistors
on memory registers
Cons
Prepared By: Sujan Kunwar-25
There is still considerable controversy among experts about the ultimate value of
RISC architectures. Its proponents argue that RISC machines are both cheaper and
faster, and are therefore the machines of the future.
However, by making the hardware simpler, RISC architectures put a greater burden
on the software. Is this worth the trouble because conventional microprocessors are
becoming increasingly fast and cheap anyway?
RISC Pipelining:
The simplicity of the instruction set can be utilized to implement the instruction pipeline
using a small no of sub-operations with each being executed in one clock cycle. All data
manipulation
Instructions have register to register operations. Since all operands are in register there is
no need of calculating the effective address or fetching of operand from memory. The
instruction cycle can be divided into 3 sub-operations and implemented in 3 segments:
1. I - Instruction fetch.
2. A ALU operation.
3. E - Execute instruction.
Consider now the operation of following four instructions: 1. Load R1 M [ address 1]
2. LOSD:R2 M [ address 2]
3. ADD:R3 R1+R2
4. STORE:M[address] R3
If 3 segment pipeline proceeds without interrupt there will be data conflict in instruction
three because the operand in R2 is not yet available in A segment. This can be seen from
the timing of pipeline shown in fig.
The E segment in clock cycle 4 is in the process of placing the memory data into R2. The
A segment in clock cycle 4 is using the data from R2 but the value in R2 will not be the
correct value since it has not yet been transferred from memory. If compiler can not find a
useful instruction to put after the load it inserts no operation instruction thus wasting a
clock cycle. This concept of delaying the use of data loaded form memory is referred to as
delayed load.
RISC Vs CISC
RISC
1. Emphasis on software
CISC
1. Emphasis on hardware
Advantages
Increase performance
Better compiler targets
Potentially easier to program
Potentially scalable
o Can add more execution units, allow more instructions to be packed into the VLIW instructions
o Or remove duplicated execution units, to reduce cost.
Disadvantages