DDCO Module 4
DDCO Module 4
Module 4:
Memory System: Basic Concepts, Semiconductor RAM Memories, Read Only Memories, Speed, Size, and
Cost, Cache Memories Mapping Functions.
Arithmetic: Signed Operand Multiplication, Fast multiplication, Integer Division, Floating-point Numbers
and Operations , IEEE standard for floating point numbers.
Basic Processing Unit: Some Fundamental Concepts, Execution of a Complete Instruction, Multiple Bus
Organization, Hard,wired Control.
Memory System
BASIC CONCEPTS
• Maximum size of memory that can be used in any computer is determined by addressing mode.
• If MAR is k-bits long then
CMOS Cell
• Transistor pairs (T3, T5) and (T4, T6) form the inverters in the latch (Figure 8.5).
• In state 1, the voltage at point X is high by having T5, T6 ON and T4, T5 are OFF.
• Thus, T1 and T2 returned ON (Closed), bit-line b and b‟ will have high and low signals respectively.
• Advantages:
1) It has low power consumption „.‟ the current flows in the cell only when the cell is active.
2) Static RAM‟s can be accessed quickly. It access time is few nanoseconds.
• Disadvantage: SRAMs are said to be volatile memories „.‟ their contents are lost when poweris
interrupted.
ASYNCHRONOUS DRAM
• Less expensive RAMs can be implemented if simple cells are used.
• Such cells cannot retain their state indefinitely. Hence they are called Dynamic RAM (DRAM).
• The information stored in a dynamic memory-cell in the form of a charge on a capacitor.
• This charge can be maintained only for tens of milliseconds.
• The contents must be periodically refreshed by restoring this capacitor charge to its full value.
• In order to store information in the cell, the transistor T is turned „ON‟ (Figure 8.6).
• The appropriate voltage is applied to the bit-line which charges the capacitor.
• After the transistor is turned off, the capacitor begins to discharge.
• Hence, info. stored in cell can be retrieved correctly before threshold value of capacitor drops down.
• During a read-operation,
→ transistor is turned „ON‟
→ a sense amplifier detects whether the charge on the capacitor is above the threshold value.
➢ If (charge on capacitor) > (threshold value) Bit-line will have logic value „1‟.
➢ If (charge on capacitor) < (threshold value) Bit-line will set to logic value „0‟.
• During Read/Write-operation,
→ row-address is applied first.
→ row-address is loaded into row-latch in response to a signal pulse on RAS’ input of chip.(RAS =
Row-address Strobe CAS = Column-address Strobe)
• When a Read-operation is initiated, all cells on the selected row are read and refreshed.
• Shortly after the row-address is loaded, the column-address is
→ applied to the address pins &
→ loaded into CAS’.
• The information in the latch is decoded.
• The appropriate group of 8 Sense/Write circuits is selected.
R/W’=1(read-operation) Output values of selected circuits are transferred to data-lines D0-D7.
R/W’=0(write-operation) Information on D0-D7 are transferred to the selected circuits.
• RAS‟ & CAS‟ are active-low so that they cause latching of address when they change from highto
low.
• To ensure that the contents of DRAMs are maintained, each row of cells is accessed periodically.
• A special memory-circuit provides the necessary control signals RAS‟ & CAS‟ that govern the timing.
• The processor must take into account the delay in the response of the memory.
Fast Page Mode
➢ Transferring the bytes in sequential order is achieved by applying the consecutive sequenceof
column-address under the control of successive CAS‟ signals.
➢ This scheme allows transferring a block of data at a faster rate.
➢ The block of transfer capability is called as fast page mode.
SYNCHRONOUS DRAM
• The operations are directly synchronized with clock signal (Figure 8.8).
• The address and data connections are buffered by means of registers.
• The output of each sense amplifier is connected to a latch.
• A Read-operation causes the contents of all cells in the selected row to be loaded in these latches.
• Data held in latches that correspond to selected columns are transferred into data-output register.
• Thus, data becoming available on the data-output pins.
• First, the row-address is latched under control of RAS‟ signal (Figure 8.9).
• The memory typically takes 2 or 3 clock cycles to activate the selected row.
• Then, the column-address is latched under the control of CAS‟ signal.
• After a delay of one clock cycle, the first set of data bits is placed on the data-lines.
• SDRAM automatically increments column-address to access next 3 sets of bits in the selected row.
READ ONLY MEMORY (ROM)
• Both SRAM and DRAM chips are volatile, i.e. They lose the stored information if power is turned off.
• Many application requires non-volatile memory which retains the stored information if power isturned
off.
• For ex:
OS software has to be loaded from disk to memory i.e. it requires non-volatile memory.
• Non-volatile memory is used in embedded system.
• Since the normal operation involves only reading of stored data, a memory of this type is called ROM.
➢ At Logic value ‘0’ Transistor(T) is connected to the ground point (P).
Transistor switch is closed & voltage on bit-line nearly drops to zero (Figure 8.11).
➢ At Logic value ‘1’ Transistor switch is open.The bit-line remains at high voltage.
TYPES OF ROM
• Different types of non-volatile memory are
1) PROM
2) EPROM
3) EEPROM &
4) Flash Memory (Flash Cards & Flash Drives)
FLASH MEMORY
• In EEPROM, it is possible to read & write the contents of a single cell.
• In Flash device, it is possible to read contents of a single cell & write entire contents of a block.
• Prior to writing, the previous contents of the block are erased.
Eg. In MP3 player, the flash memory stores the data that represents sound.
• Single flash chips cannot provide sufficient storage capacity for embedded-system.
• Advantages:
1) Flash drives have greater density which leads to higher capacity & low cost per bit.
2) It requires single power supply voltage & consumes less power.
• There are 2 methods for implementing larger memory: 1) Flash Cards & 2) Flash Drives
Flash Cards
One way of constructing larger module is to mount flash-chips on a small card.
➢ Such flash-card have standard interface.
➢ The card is simply plugged into a conveniently accessible slot.
➢ Memory-size of the card can be 8, 32 or 64MB.
➢ Eg: A minute of music can be stored in 1MB of memory. Hence 64MB flash cards can store an
hour of music.
Flash Drives
➢ Larger flash memory can be developed by replacing the hard disk-drive.
➢ The flash drives are designed to fully emulate the hard disk.
➢ The flash drives are solid state electronic devices that have no movable parts.
➢ Advantages:
1) They have shorter seek & access time which results in faster response.
2) They have low power consumption. .‟. they are attractive for battery driven
application.
3) They are insensitive to vibration.
➢ Disadvantages:
1) The capacity of flash drive (<1GB) is less than hard disk (>1GB).
2) It leads to higher cost per bit.
3) Flash memory will weaken after it has been written a number of times (typically atleast
1 million times).
SPEED, SIZE COST
SET-ASSOCIATIVE MAPPING
• It is the combination of direct and associative mapping. (Figure 8.18).
• The blocks of the cache are grouped into sets.
• The mapping allows a block of the main-memory to reside in any block of the specified set.
• The cache has 2 blocks per set, so the memory-blocks 0, 64, 128…….. 4032 maps into cache set „0‟.
• The cache can occupy either of the two block position within the set.
• 6 bit set field
➢ Determines which set of cache contains the desired block.
• 6 bit tag field
➢ The tag field of the address is compared to the tags of the two blocks of the set.
➢ This comparison is done to check if the desired block is present.
• The cache which contains 1 block per set is called direct mapping.
• A cache that has „k‟ blocks per set is called as “k-way set associative cache‟.
• Each block contains a control-bit called a valid-bit.
• The Valid-bit indicates that whether the block contains valid-data.
• The dirty bit indicates that whether the block has been modified during its cache residency.
Valid-bit=0 When power is initially applied to system.
Valid-bit=1 When the block is loaded from main-memory at first time.
• If the main-memory-block is updated by a source & if the block in the source is already exists in the
cache, then the valid-bit will be cleared to “0‟.
• If Processor & DMA uses the same copies of data then it is called as Cache Coherence Problem.
• Advantages:
1) Contention problem of direct mapping is solved by having few choices for block placement.
2) The hardware cost is decreased by reducing the size of associative search.
Arithmetic
ADDITION OF POSITIVE NUMBERS
• Consider adding two 1-bit numbers.
The sum of 1 & 1 requires the 2-bit vector 10 to represent the value 2. We say that sum is 0 and thecarry-
out is 1.
ARRAY MULTIPLICATION
• The main component in each cell is a full adder(FA)..
The AND gate in each cell determines whether a multiplicand bit mj, is added to the incoming partial-
product bit, based on the value of the multiplier bit qi (Figure 9.6).
SEQUENTIAL CIRCUIT BINARY MULTIPLIER
• Registers A and Q combined hold PPi(partial product)
while the multiplier bit qi generates the signal Add/Noadd.
• The carry-out from the adder is stored in flip-flop C (Figure 9.7).
• Procedure for multiplication:
1) Multiplier is loaded into register Q, Multiplicand is loaded into register M and C & A are cleared
to 0.
2) If q0=1, add M to A and store sum in A. Then C, A and Q are shifted right one bit-position.If
q0=0, no addition performed and C, A & Q are shifted right one bit-position.
3) After n cycles, the high-order half of the product is held in register A andthe low-order half is
held in register Q.
SIGNED OPERAND MULTIPLICATIONBOOTH ALGORITHM
• This algorithm
→ generates a 2n-bit product
→ treats both positive & negative 2's-complement n-bit operands uniformly(Figure 9.9-9.12).
• Attractive feature: This algorithm achieves some efficiency in the number of addition required whenthe
multiplier has a few large blocks of 1s.
• This algorithm suggests that we can reduce the number of operations required for multiplication by
representing multiplier as a difference between 2 numbers.
For e.g. multiplier(Q) 14(001110) can be represented as010000 (16)
-000010 (2)
001110 (14)
• Therefore, product P=M*Q can be computed by adding 24 times the M to the 2's complement of 21
times the M.
FAST MULTIPLICATION BIT-PAIR RECODING OF MULTIPLIERS
• This method
→ derived from the booth algorithm
→ reduces the number of summands by a factor of 2
• Group the Booth-recoded multiplier bits in pairs. (Figure 9.14 & 9.15).
• The pair (+1 -1) is equivalent to the pair (0 +1).
CARRY-SAVE ADDITION OF SUMMANDS
• Consider the array for 4*4 multiplication. (Figure 9.16 & 9.18).
• Instead of letting the carries ripple along the rows, they can be "saved" and introduced into the next
row, at the correct weighted positions.
• The full adder is input with three partial bit products in the first row.
• Multiplication requires the addition of several summands.
• CSA speeds up the addition process.
• Consider the array for 4x4 multiplication shown in fig 9.16.
• First row consisting of just the AND gates that implement the bit products m3q0, m2q0, m1q0 and
m0q0.
• The delay through the carry-save array is somewhat less than delay through the ripple-carry array. This
is because the S and C vector outputs from each row are produced in parallel in one full-adder delay.
• Consider the addition of many summands in fig 9.18.
• Group the summands in threes and perform carry-save addition on each of these groups in parallel to
generate a set of S and C vectors in one full-adder delay
• Group all of the S and C vectors into threes, and perform carry-save addition on them, generating a
further set of S and C vectors in one more full-adder delay
• Continue with this process until there are only two vectors remaining
• They can be added in a RCA or CLA to produce the desired product.
• When the number of summands is large, the time saved is proportionally much greater.
• Delay: AND gate + 2 gate/CSA level + CLA gate delay, Eg., 6 bit number require 15 gate delay, array
6x6 require 6(n-1)-1 = 29 gate Delay.
• In general, CSA takes 1.7 log2k-1.7 levels of CSA to reduce k summands.
INTEGER DIVISION
• An n-bit positive-divisor is loaded into register M.
An n-bit positive-dividend is loaded into register Q at the start of the operation.
Register A is set to 0 (Figure 9.21).
• After division operation, the n-bit quotient is in register Q, and
the remainder is in register A.
NON-RESTORING DIVISION
• Procedure:
Step 1: Do the following n times
i) If the sign of A is 0, shift A and Q left one bit position and subtract M from A;otherwise,
shift A and Q left and add M to A (Figure 9.23).
ii) Now, if the sign of A is 0, set q0 to 1; otherwise set q0 to 0.Step 2: If the sign of A is 1,
add M to A (restore).
RESTORING DIVISION
• Procedure: Do the following n times
1) Shift A and Q left one binary position (Figure 9.22).
2) Subtract M from A, and place the answer back in A
3) If the sign of A is 1, set q0 to 0 and add M back to A(restore A).If the sign of A is 0, set
q0 to 1 and no restoring done.
FLOATING-POINT NUMBERS & OPERATIONS IEEE STANDARD FOR FLOATING POINT
NUMBERS
• Single precision representation occupies a single 32-bit word.
The scale factor has a range of 2-126 to 2+127 (which is approximately equal to 10+38).
• The 32 bit word is divided into 3 fields: sign(1 bit), exponent(8 bits) and mantissa(23 bits).
• Signed exponent=E.
Unsigned exponent E'=E+127. Thus, E' is in the range 0<E'<255.
• The last 23 bits represent the mantissa. Since binary normalization is used, the MSB of the mantissa is
always equal to 1. (M represents fractional-part).
• The 24-bit mantissa provides a precision equivalent to about 7 decimal-digits (Figure 9.24).
• Double precision representation occupies a single 64-bit word. And E' is in the range 1<E'<2046.
BRANCHING INSTRUCTIONS
• Control sequence for an unconditional branch instruction is as follows:
Note:
To execute instructions, the processor must have some means of generating the control-signals. Thereare
two approaches for this purpose:
1) Hardwired control and 2) Microprogrammed control.
HARDWIRED CONTROL
• Hardwired control is a method of control unit design (Figure 7.11).
• The control-signals are generated by using logic circuits such as gates, flip-flops, decoders etc.
• Decoder/Encoder Block is a combinational-circuit that generates required control-outputsdepending
on state of all its inputs.
• Instruction Decoder
➢ It decodes the instruction loaded in the IR.
➢ If IR is an 8 bit register, then instruction decoder generates 2 8(256 lines); one for each
instruction.
➢ It consists of a separate output-lines INS1 through INSm for each machine instruction.
➢ According to code in the IR, one of the output-lines INS1 through INSm is set to 1, and all
other lines are set to 0.
• Step-Decoder provides a separate signal line for each step in the control sequence.
• Encoder
➢ It gets the input from instruction decoder, step decoder, external inputs and condition codes.
➢ It uses all these inputs to generate individual control-signals: Yin, PCout, Add, End and so on.
➢ For example (Figure 7.12), Zin=T1+T6.ADD+T4.BR
;This signal is asserted during time-slot T1 for all instructions.
during T6 for an Add instruction.
during T4 for unconditional branch instruction
• When RUN=1, counter is incremented by 1 at the end of every clock cycle.When RUN=0, counter stops
counting.
• After execution of each instruction, end signal is generated. End signal resets step counter.
• Sequence of operations carried out by this machine is determined by wiring of logic circuits, hence
the name “hardwired”.
• Advantage: Can operate at high speed.
• Disadvantages:
1) Since no. of instructions/control-lines is often in hundreds, the complexity of control unit is
very high.
2) It is costly and difficult to design.
3) The control unit is inflexible because it is difficult to change the design.
MICROPROGRAMMED CONTROL
• Microprogramming is a method of control unit design (Figure 7.16).
• Control-signals are generated by a program similar to machine language programs.
• Control Word(CW) is a word whose individual bits represent various control-signals (like Add, PCin).
• Each of the control-steps in control sequence of an instruction defines a unique combination of 1s &0s
in CW.
• Individual control-words in microroutine are referred to as microinstructions (Figure 7.15).
• A sequence of CWs corresponding to control-sequence of a machine instruction constitutes the
microroutine.
• The microroutines for all instructions in the instruction-set of a computer are stored in a specialmemory
called the Control Store (CS).
• Control-unit generates control-signals for any instruction by sequentially reading CWs of
corresponding microroutine from CS.
• µPC is used to read CWs sequentially from CS. (µPC Microprogram Counter).
• Every time new instruction is loaded into IR, o/p of Starting Address Generator is loaded into µPC.
• Then, µPC is automatically incremented by clock;
causing successive microinstructions to be read from CS.
Hence, control-signals are delivered to various parts of processor in correct sequence.
Advantages
• It simplifies the design of control unit. Thus it is both, cheaper and less error prone implement.
• Control functions are implemented in software rather than hardware.
• The design process is orderly and systematic.
• More flexible, can be changed to accommodate new system specifications or to correct the designerrors
quickly and cheaply.
• Complex function such as floating point arithmetic can be realized efficiently.
Disadvantages
• A microprogrammed control unit is somewhat slower than the hardwired control unit, because time is
required to access the microinstructions from CM.
• The flexibility is achieved at some extra hardware cost due to the control memory and its access
circuitry.