CS3351 Digital Principles and Computer Organization Lecture Notes 1
CS3351 Digital Principles and Computer Organization Lecture Notes 1
com
1. The memory-reference instructions load word (lw) and store word (sw)
2. The arithmetic-logical instructions add, sub, AND, OR, and slt
3. The instructions branch equal (beq) and jump (j)
To implement the above three types of instructions for same method, but independent of the
exact class of instruction. For every instruction, the first two steps are identical:
1. Send the program counter (PC) to the memory that contains the code and fetch the
instruction from that memory.
2. Read one or two registers, using fields of the instruction to select the registers to read.
For the load word instruction, to read only one register, but most other instructions
require reading two registers.
After the above two steps, the actions required to complete the instruction depend on
the instruction class.
1. Memory Reference
2. Arithmetic- Logical
3. Branches
After using the ALU, the actions required to complete various instruction classes differ.
A memory-reference instruction will need to access the memory either to read data for
a load or write data for a store.
An arithmetic-logical or load instruction must write the data from the ALU or memory
back into a register.
A branch instruction may need to change the next instruction address based on the
comparison; otherwise, the PC should be incremented by 4 to get the address of the
next instruction.
All instructions start by using the program counter to supply the instruction address to
the instruction memory.
After the instruction is fetched, the register operands used by an instruction are
specified by fields of that instruction.
Once the register operands have been fetched, they can be operated on
1. To compute a memory address (for a load or store),
2. To compute an arithmetic result (for an integer arithmetic-logical instruction),
3. To compare (for a branch).
If the instruction is an arithmetic-logical instruction, the result from the ALU must be
written to a register.
If the operation is a load or store, the ALU result is used as an address to either store
a value from the registers or load a value from memory into the registers.
The result from the ALU or memory is written back into the register file.
Branches require the use of the ALU output to determine the next instruction address,
which comes either from the ALU (where the PC and branch off set are summed) or
from an adder that increments the current PC by 4.
The above figure 3.1 shows most of the flow of data through the processor, it omits two
important aspects of instruction execution.
1. Data going to a particular unit as coming from two different sources.
2. Several of the units must be controlled depending on the type of instruction.
First aspect: Data going to a particular unit as coming from two different sources.
The value written into the PC can come from one of two adders.
The data written into the register file can come from either the ALU or the data memory
The second input to the ALU can come from a register or the immediate field of the
instruction.
These data lines cannot simply be wired together so must add a logic element that
chooses from among the multiple sources and steers one of those sources to its
destination.
This selection is commonly done with a device called a multiplexor; it is also called as
data selector.
The multiplexor, which selects from among several inputs based on the setting of its
control lines. The control lines are set based primarily on information taken from the
instruction being executed.
Second aspect: Several of the units must be controlled depending on the type of
instruction.
The following Figure 3.2 shows the data path with the three required multiplexors added, as
well as control lines for the major functional units.
Control unit:
A control unit, which has the instruction as an input, is used to determine how to set
the control lines for the functional units and two of the multiplexors.
Function of Third multiplexor
The middle multiplexor, whose output returns to the register file, is used to steer the output
of the ALU or the output of the data memory (in the case of a load) for writing into the register
file.
Finally, the bottom most multiplexor is used to determine whether the second ALU input is
from the registers (for an arithmetic-logical instruction or a branch) or from the offset field of
the instruction (for a load or store).
The control line determines the ALU perform which operations among three mentioned
operations.
Instruction memory:
The instruction memory need only provide read access because the data path does
not write instructions.
The instruction memory is called as combinational element because it will perform
only read access.
The output at any time reflects the contents of the location specified by the address
input,
No read control signal is needed.
Program Counter:
The program counter is a 32-bit register that is written at the end of every clock
cycle
Does not need a write control signal.
The register containing the address of the instruction in the program being
executed.
Adder:
The adder is an ALU wired to always add its two 32-bit inputs and place the sum on its
output.
Fetching Phase:
To execute any instruction, must start by fetching the instruction from memory.
To prepare for executing the next instruction, must also increment the program
counter
So that it points at the next instruction, 4 bytes (PC+4).
Portion of data path used for fetching instruction and incrementing the program
counter
Register File
A register file is a state element that contains set of registers that can be read or
written by specifying the number to be accessed.
The register file contains the register state of the computer.
There are two elements need to implement the R-format ALU operations such as
1. Register file
2. ALU
Register file:
The register file contains all the registers and has two read ports and one write port.
The register file always outputs the contents of the registers corresponding to the
Read register inputs on the outputs
No other control inputs are needed.
Register write must be explicitly indicated by asserting the write control signal.
The register number to the register file is all 5 bits wide to specify one of 32 bits
wide.
ALU:
ALU takes two 32 bit inputs and produces a 32 bit result as well as 1 bit signal if the
result is 0.
The operation to be performed by the ALU is controlled with the ALU operation
signal, which will be 4 bits wide.
The Zero detection output used to implement branches.
Consider the MIPS load word and store word instructions, which have the general
form.
1. lw $t1,offset_value($t2)
2. sw $t1,offset_value ($t2).
These instructions compute a memory address by adding the base register, which is
$t2, to the 16-bit signed off set field contained in the instruction.
If the instruction is a store, the value to be stored must also be read from the
register file where it resides in $t1.
If the instruction is a load, the value read from memory must be written into the
register file in the specified register, which is $t1.
6
To implement MIPS load and store instructions we need following four units:
1. Register file
2. ALU
3. Data Memory
4. Sign extension unit
1. Register file:
The register file contains all the registers and has two read ports and one write port.
The register file always outputs the contents of the registers corresponding to the
Read register inputs on the outputs
No other control inputs are needed.
Register write must be explicitly indicated by asserting the write control signal.
The register number to the register file is all 5 bits wide to specify one of 32 bits wide.
2. ALU:
ALU takes two 32 bit inputs and produces a 32 bit result as well as 1 bit signal if the
result is 0.
The operation to be performed by the ALU is controlled with the ALU operation
signal, which will be 4 bits wide.
The Zero detection output used to implement branches.
The memory unit is a state element with inputs for the address and the write data.
It produces a single output for the read result.
Data Memory unit has separate read and write control lines for read and write
operation.
Register file does not require read signal but memory unit needs a read signal
Because the register file, reading the value of an invalid address can cause problems.
The sign extension unit has a 16-bit data input and that sign-extended into a 32-bit result.
1. beq-branch equal
2. bnq-branch unequal
The beq instruction has three operands, in that
1. Two operands are registers that are compared for equality
2. One operand is a 16-bit off set used to compute the branch target address relative to
the branch instruction address.
beq $t1,$t2,offset
It is an address specified in a branch, which becomes the new program counter (PC)
if the branch is taken.
If the operands are equal, the branch target address becomes the new PC, and it
is called as branch is taken.
If the operands are not equal, the incremented PC should replace the current PC
and it is called as branch is not taken.
The branch data path will perform two kinds of operations:
1. Compute the branch target address
2. Compare the register contents.
To compute the branch target address, the branch data path includes a sign
extension unit
To perform the compare, need to use the register file
Adder circuit is used to compute the branch target and it is sum of the
incremented PC and sign extended lower 16 bits of the instruction shifted left 2
units.
Control logic is used to decide whether the incremented PC or branch target should
replace the PC, based on the Zero output of the ALU.
Shift left 2is simply a direction of the signals between input and output that adds
00two to the low-order end of the sign-extended off set field
Control logic is used to decide whether the incremented PC or branch target should
replace the PC, based on the Zero output of the ALU as shown in fig 3.3
By combining individual instruction class data path components can form a single data
path and add the control to complete the implementation.
Single data path will execute all instructions in one clock cycle. This means that no
data path resource can be used more than once per instruction
In single data path if any element needed more than once must be duplicated.
To share a data path element between two different instruction classes, need to
allow multiple connections to the input of an element.
To provide multiple connections we need to use multiplexor and control signal to
select among the multiple inputs.
For example consider two different instruction classes are
1. Arithmetic and logical instructions (or) R-type
2. Memory instructions
Difference between arithmetic and logical instructions and memory instructions
S.NO R-type instruction Memory instruction
1 It gets two operand s from It gets one operand from register and
register to perform LAU operation another operand from sign extended 16
bit offset field from the instruction to do
address calculation
2 ALU result has stored in the ALU result has stored in the load
destination register
For these two different kinds of instruction classes need to make single data path.
It can be obtained by using single register file, single ALU to handle both types of
instructions and multiplexers.
9
To create a data path with only a single register file and a single ALU, need to
provide two different sources for the second input of the ALU.
Because both instructions has first operand as register and second operand is
different.
Two instructions have two different formats to store result so need to support two
different sources for the data stored into the register file.
For that need two multiplexers, one multiplexor is placed at the ALU input and
another at the data input to the register file.
Fig 3.4 Data path for the memory and R-Type instructions
Combine the simple data path for the core MIPS architecture.
It can be obtained by adding the data path for instruction fetch and the data path
from R-type and memory instructions and the data path for branches that is
shown in the above figure 3.4
The branch instruction uses the main ALU for comparison of the register operands.
So need to use the adder circuit for the data path components of branch instruction
An additional multiplexor is required to select either the sequentially following
instruction address (PC + 4) or the branch target address to be written into the
PC.
To complete this simple data path, must add the control unit.
The control unit must be able to take inputs and generate a write signal for each state
element, the selector control for each multiplexor, and the ALU control.
The ALU control is different in a number of ways, and it will be useful to design it first
before design the rest of the control unit.
The Simple data path for the core MIPS architecture by combining elements required
by different instruction classes as shown in fig 3.5.
10
Fig 3.5 Simple data path for the core MIPS architecture by combining elements required
by different instruction classes
Any instruction set can be implemented in many different ways like single-cycle
implementation and multicycle implementation.
In a basic single-cycle implementation all operations take the same amount of time-a
single cycle.
A multicycle implementation allows faster operations to take less time than slower
ones, so overall performance can be increased.
Load word and store word instructions- ALU to compute the memory address by
addition.
R-type instructions- ALU needs to perform one of the five actions (AND,OR, subtract,
add, or set on less than), depending on the value of the 6‑bit funct (or function) field in
the low-order bits of the instruction.
Branch equal-ALU must perform a subtraction.
We can generate the 4-bit ALU control input using a small control unit.
It has input function field of the instruction and a 2-bit control field, called ALUOp.
ALUOp indicates three kinds of operations to be performed
1. add (00) for loads and stores,
2. subtract (01) for beq, or
3. determined by the operation encoded in the funct field (10).
The output of the ALU control unit is a 4-bit signal that directly controls the ALU by generating
one of the 4-bit combinations.
11
In following figure 3.6 shows how to set the ALU control inputs based on the 2‑bit ALUOp
control and the 6‑bit function code.
Fig 3.6 The ALU control inputs based on the 2‑bit ALUOp control and the 6‑bit function
code.
When the ALuOP is 00 or 01, the ALU action does not depend on the function code
field.
We do not care about the value of the function code and the function field is shown as
XXXXXX for 00 and 01 values.
When the ALUOP value is 10, then the function code is used to set the ALU control
input.
Multiple levels of decoding functions:
1. The main control unit generates the ALUOP bits.
2. ALUOP bit is used as a input to the ALU control.
3. That ALU control generates the actual signals to control the ALU unit.
Mapping 2‑bit ALU Op field and 6‑bit funct field
There are several different ways to implement the mapping. From the 2‑bit ALUOp
field and the 6‑bit funct field to the four ALU operation control bits.
There are 64 possible values are available for function field in that small values are
used more frequently.
The function field is used only when the ALUOP bits equal to 10.
A small piece of logic that recognizes the subset of possible values and causes the
correct setting of the ALU control bits.
To design logic first we have to create a truth table for the function code field and the
ALUOP bits.
Truth table: It is a representation of a logical operation by listing all the values of the inputs
and then in each case showing what the resulting outputs should be.
Don’t-care term: An element of a logical function in which the output does not depend on the
values of all the inputs. Don’t-care terms may be specified in different ways.
12
The truth table for the 4 ALU control bits (called Operation).
Opcode: The field that denotes the operation and format of an instruction.
(a) Instruction format for R-format instructions, which all have an opcode of 0. These
instructions have three register operands: rs, rt, and rd. Fields rs and rt are sources, and rd is
the destination. The ALU function is in the funct field and is decoded by the ALU control
design in the previous section. The R-type instructions that we implement are add, sub, AND,
OR, and slt. The shamt field is used only for shift
(b) Instruction format for load (opcode = 35ten) and store (opcode = 43ten) instructions.
The register rs is the base register that is added to the 16-bit address field to form the memory
address. For loads, rt is the destination register for the loaded value. For stores, rt is the
source register whose value should be stored into memory.
(c) Instruction format for branch equal (opcode =4). The registers rs and rt are the source
registers that are compared for equality. The 16-bit address field is sign-extended, shifted, and
added to the PC + 4 to compute the branch target address.
13
The op field, also called the opcode, is always contained in bits 31:26.
The two registers to be read are always specified by the rs and rt fields, at positions
25:21 and 20:16.
The base register for load and store instructions is always in bit positions 25:21 (rs).
The 16‑bit offset for branch equal, load, and store is always in positions 15:0.
The destination register is in one of two places. For a load it is in bit positions 20:16
(rt), For R-type it is in bit positions 15:11(rd).
So we need to add a multiplexor to select which field of the instruction is used to
indicate the register number to be written.
Using this information, we can add the instruction labels and extra multiplexor to the
simple datapath as shown in fig 3.7
Fig 3.7 The datapath with all necessary multiplexors and all control lines
identified.
This shows these additions plus the ALU control block, the write signals for state
elements, the read signal for the data memory, and the control signals for the
multiplexors. Since all the multiplexors have two inputs, they each require a single control
line.
14
15
Setting of the control lines depends only on the opcode and we have to define whether each
control signal should be 0,1 or don’t care (X) for each of the opcode values.
The below truth table shows how the control signals should be set for each opcode.
R-Format:
The first row of the table corresponds to the R-format instructions (add, sub, AND, OR,
and slt).
For all these instructions, the source register fields are rs and rt, and the destination
register field is rd; this defines how the signals ALUSrc and RegDst are set.
R-type instruction writes a register (Reg-Write = 1), but neither reads nor writes data
memory.
The ALUOp field for R-type instructions is set to 10 to indicate that the ALU control
should be generated from the funct field.
Branch instruction:
The branch instruction is similar to an R-format operation, since it sends the rs and rt
registers to the ALU.
The ALUOp field for branch is set for a subtract (ALU control = 01), which is used to
test for equality. Notice that the MemtoReg field is irrelevant when the RegWrite signal
is 0: since the register is not being written, the value of the data on the register data
write port is not used.
Thus, the entry MemtoReg in the last two rows of the table is replaced with X for don’t
care. Don’t cares can also be added to RegDst when RegWrite is 0.
3.3.3 Operation of the DataPath:
1. R-type instructions
2. Load and store instructions
3. Branch instructions
In R-type instruction consider add $t1,$t2,$t3 and remaining four operations (sub,
AND, OR,slt) occurs in one clock cycle as shown in fig 3.9.
17
18
The jump instructionlooks somewhat like a branch instruction but computes the target
PC differently and is not conditional.
Like a branch, the low-order 2 bits of a jump address are always 00two.
The next lower 26 bits of this 32-bit address come from the 26-bit immediate field in
the instruction.
The upper 4 bits of the address that should replace the PC come from the PC of the
jump instruction plus 4.
Thus, we can implement a jump by storing into the PC the concatenation of
1. The upper 4 bits of the current PC + 4
2. The 26-bit immediate field of the jump instruction
19
Fig 3.12 The simple control and datapath are extended to handle the jump instruction
Single cycle implementation is not used mostly because of the following reasons:
1. It is inefficient.
2. Clock cycle have same length for every instruction.
3. Overall performance is very poor because it has too long clock cycle.
20
UNIT – V
Memory and I/O Organisation
Memory Concepts and Hierarchy – Memory Management – Cache Memories: Mapping and Replacement
Techniques – Virtual Memory – DMA – I/O – Accessing I/O: Parallel and Serial Interface – Interrupt I/O –
Interconnection Standards: USB, SATA
5.1 INTRODUCTION
Memory unit enables us to store data inside the computer.The Computer memory always had here’s to principle
of locality .
Principle of locality or locality of reference is the tendency of a processor to access the same set of memory
locations repetitively over a short period of time.
Temporal locality:The principle stating that if a data location is referenced then it will tend to be referenced
Memory hierarchy is a structure that uses multiple levels of memories;as the distance from the cpu
increases,the size of the memories and access time both increases.
A memory hierarchy consists of multiple levels of memory with different speed and sizes.The faster
memories are more expensive per bit than the slower memories and thus smaller.
For each k, the faster, smaller device at level k serves as a cache for the larger, slower
device at level k+1.
The computer programs tend to access the data at level k more often that at level
k+1.
The storage at level at k+1 can be slower
Cache memory (CPU memory) is high-speed SRAM that a computer Microprocessor can
access more quickly than it can access regular RAM. This memory is typically integrated
directly into the CPU chip or placed on a separate chip that has a separate bus interconnect
with the CPU.
The data transfer between various levels of memory is done through blocks. The
minimum unit of information is called a block. If the data requested by the processor appears in
some block
The fraction of memory accesses found in a cache is termed as hit rate or hit ratio.
Miss rate is the fraction of memory accesses not found in a level of the memory hierarchy. Hit
time is the time required to access a level of the memory hierarchy, including the time needed to
determine whether the access is a hit or a miss.
Miss penalty is the time required to fetch a block into a level of the memory hierarchy
from the lower level, including the time to access the block, transmit it from one level to
the other, and insert it in the level that experienced the miss.
Because the upper level is smaller and built using faster memory parts, the hit time will
be much smaller than the time to access the next level in the hierarchy, which is the major
component of the miss penalty.
MEMORY HIERARCHY
Principle of Locality
The locality of reference or the principle of locality is the term applied to situations
where the same value or related storage locations are frequently accessed. There are three basic
types of locality of reference:
Temporal locality: Here a resource that is referenced at one point in time is referenced
again soon afterwards.
Spatial locality: Here the likelihood of referencing a storage location is greater if a storage
location near it has been recently referenced.
Sequential locality: Here storage is accessed sequentially, in descending or ascending
order. The locality or reference leads to memory hierarchy.
Need for memory hierarchy
The memory hierarchy is shown in Fig 4.1. The entire memory elements of the computer fall
under the following three categories:
Processor Memory:
This is present inside the CPU for high-speed data access. This consists of small set of
registers that act as temporary storage. This is the costliest memory component.
Primary memory:
This memory is directly accessed by the CPU. All the data must be brought inside main
memory before accessing them. Semiconductor chips acts as main memory.
Secondary memory:
This is cheapest, large and relatively slow memory component. The data from the
secondary memory is accessed by the CPU only after it is loaded to main memory.
There is a trade-off among the three key characteristics of memory namely-
Cost
Capacity
Access time
Terminologies in memory access
Block or line: The minimum unit of information that could be either present or totally
absent.
Hit: If the requested data is found in the upper levels of memory hierarchy it is called hit.
Miss: If the requested data is not found in the upper levels of memory hierarchy it is called
miss.
Hit rate or Hit ratio: It is the fraction of memory access found in the upper level .It is a
performance metric.
Hit Ratio = Hit/ (Hit + Miss)
accessed by the computer for storing and retrieving data. It is a cheaper memory with
more memory access time.
CLASSIFICATION OF MEMORY
The instructions and data are stored in memory unit of the computer system are divided into
following main groups:
Main or Primary memory
Secondary memory.
Primary Memory:
Primary memory is the main area in a computer in which data is stored for quick access by the
computer╆s processor. )t is divided into two parts:
PROM (programmable ROM): PROM is one in which the user can load and store ╉read-
only╊ programs and data. )n PROM the programs or data are stored only fast time and the
stored data cannot modify the user.
EPROM (erasable programmable ROM): EPROM is one in which is possible to erase
information stored in an EPROM chip and the chip can be reprogrammed to store new
information. When an EPROM is in use, information stored in it can only be ╉read╊ and the
information remains in the chip until it is erased.
EEPROM (electronically erasable and programmable ROM): EEPROM is one type of
EPROM in which the stored information is erased by using high voltage electric pulse. It is
easier to alter information stored in an EEPROM chip.
A memory consists of cells in the form of an array. The basic element of the
semiconductor memory is the cell. Each cell is capable of storing one bit of information. Each
row of the cells constitutes a memory words and all cells of a row are connected to a common
line referred to as a word line. AW×b memory has w words, each word having ╅b╆ number of
bits.
The basic memory element called cell can be in two states (0 or 1). The data can be written
into the cell and can be read from it.
The organization of 1024 x 1 memory chips, has 1024 memory words of size 1 bit only.
The size of data bus is 1 bit and the size of address bus is 10 bits. A particular memory location is
identified by the contents of memory address bus. A decoder is used to decode the memory
address.
The whole memory address bus is used together to decode the address of the specified
location.
.
During a Read or a Write operation, the row address is applied first. In response to a signal
pulse on the Row Address Strobe (RAS) input of the chip, this part of the address is loaded
into the row address latch.
All cell of this particular row is selected. Shortly after the row address is latched, the column
address is applied to the address pins.
It is loaded into the column address latch with the help of Column Address Strobe (CAS)
signal, similar to RAS.
The information in this latch is decoded and the appropriate Sense/Write circuit is selected.
Each chip has a control input line called Chip Select (CS). A chip can be enabled to accept
data input or to place the data on the output bus by setting its Chip Select input to 1.
The address bus for the 64K memory is 16 bits wide.
The high order two bits of the address are decoded to obtain the four chip select control
signals.
The remaining 14 address bits are connected to the address lines of all the chips.
They are used to access a specific location inside each chip of the selected row.
The R/ W inputs of all chips are tied together to provide a common read / write control.
CACHE MEMORY
The cache memory exploits the locality of reference to enhance the speed of the
processor.
Cache memory or CPU memory, is high-speed SRAM that a processor can access more
quickly than a regular RAM. This memory is integrated directly into the CPU chip or placed
on a separate chip that has a separate bus interconnect with the CPU.
The cache memory stores instructions and data that are more frequently used or data
that is likely to be used next. The processor looks first in the cache memory for the data. If it
finds the instructions or data then it does perform a more time-consuming reading of data from
larger main memory or other data storage devices.
The processor do not need to know the exact location of the cache. It can simply issue
read and write instructions. The cache control circuitry determines whether the requested data
resides in the cache.
Cache and temporal reference: When data is requested by the processor, the data should
be loaded in the cache and should be retained till it is needed again.
Cache and spatial reference: Instead of fetching single data, a contiguous block of data is
Terminologies in Cache
Split cache: It has separate data cache and a separate instruction cache. The two caches
work in parallel, one transferring data and the other transferring instructions.
A dual or unified cache: The data and the instructions are stored in the same cache. A
combined cache with a total size equal to the sum of the two split caches will usually have a
better hit rate.
Mapping Function: The correspondence between the main memory blocks and those in
the cache is specified by a mapping function.
Cache Replacement: When the cache is full and a memory word that is not in the cache is
referenced, the cache control hardware must decide which block should be removed to
create space for the new block that contains the referenced word. The collection of rules for
making this decision is the replacement algorithm.
Cache performance:
When the processor needs to read or write a location in main memory, it first checks for
a corresponding entry in the cache. If the processor finds that the memory location is in the
cache, a cache hit has said to be occurred. If the processor does not find the memory location in
the cache, a cache miss has occurred. When a cache miss occurs, the cache replacement is made
by allocating a new entry and copies in data from main memory. The performance of cache
memory is frequently measured in terms of a quantity called Hit ratio.
Miss penalty or cache penalty is the sum of time to place a bock in the cache and time to deliver
Hit ratio = hit / (hit + miss) = Number of hits/ Total accesses to the cache
the block to CPU.
Miss Penalty= time for block replacement + time to deliver the block to CPU
Cache performance can be enhanced by using higher cache block size, higher associativity,
reducing miss rate, reducing miss penalty, and reducing the time to hit in the cache. CPU
execution Time of a given task is defined as the time spent by the system executing that task,
including the time spent executing run-time or system services.
CPU execution time=(CPU clock cycles + memory stall cycles (if any))
x Clock cycle time
The memory stall cycles are a measure of count of the memory cycles during which the CPU is
waiting for memory accesses. This is dependent on caches misses and cost per miss (cache
penalty).
Memory stall cycles = number of cache misses x miss penalty
Instruction Count x (misses/ instruction) x miss penalty
Instruction Count (IC) x (memory access/ instruction) x miss penalty
IC x Reads per instruction x Read miss rate X Read miss penalty + IC x
Write per instruction x Write miss rate X Write miss penalty
These policies determine the way of loading the main memory to the cache block. Main
memory is divided into equal size partitions called as blocks or frames. The cache memory is
divided into fixed size partitions called as lines. During cache mapping, block of main memory is
copied to the cache and further access is made from the cache not from the main memory.
Direct mapping
Direct Mapping
The simplest technique is direct mapping that maps each block of main memory into only
one possible cache line.
Here, each memory block is assigned to a specific line in the cache.
The direct mapping concept is if the ith block of main memory has to be placed at the jth
block of cache memory j = i % (number of blocks in cache memory)
Consider a 128 block cache memory. Whenever the main memory blocks 0, 128, 256 are
loaded in the cache, they will be allotted cache block 0, since j= (0 or 128 or 256) % 128 is
zero).
Contention or collision is resolved by replacing the older contents with latest contents.
The placement of the block from main memory to the cache is determined from the 16 bit
memory address.
The lower order four bits are used to select one of the 16 words in the block.
The 7 bit block field indicates the cache position where the block has to be stored.
The 5 bit tag field represents which block of main memory resides inside the cache.
Associative Mapping:
The associative memory is used to store content and addresses of the memory word.
Any block can go into any line of the cache. The 4 word id bits are used to identify which
word in the block is needed and the remaining 12 bits represents the tag bit that identifies
the main memory block inside the cache.
This enables the placement of any word at any place in the cache memory. It is considered
to be the fastest and the most flexible mapping form.
The tag bits of an address received from the processor are compared to the tag bits of each
block of the cache to check, if the desired block is present. Hence it is known as Associative
Mapping technique.
Cost of an associated mapped cache is higher than the cost of direct-mapped because of the
need to search all 128 tag patterns to determine whether a block is in cache.
When a program accesses a memory location that is not in the cache, it is called a cache
miss. The performance impact of a cache miss depends on the latency of fetching the data from
the next cache level or main memory. The cache miss handling is done with the processor
control unit and with a separate controller that initiates the memory access and refills the cache.
The following are the steps taken when a cache miss occurs:
Send the original PC value (PC - 4) to the memory.
Instruct main memory to perform a read and wait for the memory to complete its access.
Write the cache entry, putting the data from memory in the data portion of the entry,
writing the upper bits of the address (from the ALU) into the tag field, and turning the valid
bit on.
Restart the instruction execution at the first step, which will refetch the instruction, this
time finding it in the cache.
Writing to a cache:
Suppose on a store instruction, the data is written into only the data cache (without
changing main memory); then, after the write into the cache, memory would have a
different value from that in the cache. This leads to inconsistency.
The simplest way to keep the main memory and the cache consistent is to always write the
data into both the memory and the cache. This scheme is called write-through.
Write through is a scheme in which writes always update both the cache and the memory,
ensuring that data is always consistent between the two.
With a write-through scheme, every write causes the data to be written to main memory.
These writes will take a long time.
A potential solution to this problem is deploying write buffer.
A write buffer stores the data while it is waiting to be written to memory.
After writing the data into the cache and into the write buffer, the processor can continue
Write buffer is a queue that holds data while the data are waiting to be
written to memory.
iii) The rate at which writes are generated may also be less than the rate at which the memory
can accept them, and yet stalls may still occur. To reduce the occurrence of such stalls,
processors usually increase the depth of the write buffer beyond a single entry.
iv) Another alternative to a write-through scheme is a scheme called write-back. When a write
occurs, the new value is written only to the block in the cache.
v) The modified block is written to the lower level of the hierarchy when it is replaced.
vi) Write-back schemes can improve performance, especially when processors can generate
writes as fast or faster than the writes can be handled by main memory; a write-back
scheme is, however, more complex to implement than write-through.
Write-back is a scheme that handles writes by updating values only to the block in the
cache, then writing the modified block to the lower level of the hierarchy when the block is
replaced.
When a main memory block needs to be brought into the cache while all the blocks are
occupied, then one of them has to be replaced. This selection of the block to be replaced is
using cache replacement algorithms. Replacement algorithms are only needed for
associative and set associative techniques. The following are the common replacement
techniques:
Least Recently Used (LRU): This replaces the cache line that has been in the cache the
longest with no references to it.
First-in First-out (FIFO): This replaces the cache line that has been in the cache the
longest.
Least Frequently Used (LFU): This replaces the cache line that has experienced the fewest
references.
Random: This picks a line at random from the candidate lines.
Example 4.1: Program P runs on computer A in 10 seconds. Designer says clock rate can be
increased significantly, but total cycle count will also increase by 20%. What clock rate do we
need on computer B for P to run in 6 seconds? (Clock rate on A is 100 MHz). The new machine is
B. We want CPU Time_B = 6 seconds.
We know that Cycles count_B = 1.2 Cycles count_A. Calculate Cycles count_A. CPU Time_A = 10
sec. = ; Cycles count_A = 1000 x 106 cycles Calculate Clock rate_B:
Example 4.3: Consider an implementation of MIPS ISA with 500 MHz clock and
– each ALU instruction takes 3 clock cycles,
– each branch/jump instruction takes 2 clock cycles,
– each sw instruction takes 4 clock cycles,
– eachlw instruction takes 5 clock cycles.
Also, consider a program that during its execution executes:
– x=200 million ALU instructions
– y=55 million branch/jump instructions
– z=25 million sw instructions
– w=20 million lw instructions
Find CPU time. Assume sequentially executing CPU.
Clock cycles for a program = (3x + 2y + 4z + 5w)
= 910 x 106 clock cycles CPU_time = Clock cycles for a program /
Clock rate
= 910 x 106 / 500 x 106 = 1.82 sec
Example 4.4: Consider another implementation of MIPS ISA with 1 GHz clock and
The concept of virtual memory in computer organization is allocating memory from the
hard disk and making that part of the hard disk as a temporary RAM. In other words, it is a
technique that uses main memory as a cache for secondary storage. The motivations for
virtual memory are:
To allow efficient and safe sharing of memory among multiple programs
To remove the programming burdens of a small, limited amount of main memory.
Virtual memory provides an illusion to the users that the PC has enough primary memory
left to run the programs. Sometimes the size of programs to be executed may sometimes
very bigger than the size of primary memory left, the user never feels that the system needs
a bigger primary storage to run that program. When the RAM is full, the operating system
occupies a portion of the hard disk and uses it as a RAM. In that part of the secondary
storage, the part of the program which not currently being executed is stored and all the
parts of the program that are executed are first brought into the main memory. This is the
theory behind virtual memory.
Terminologies:
Protection is a set of mechanisms for ensuring that multiple processes sharing the
processor, memory, or I/O devices cannot interfere, with one another by reading or writing
each other╆s data.
Virtual memory breaks programs into fixed-size blocks called pages.
Page fault is an event that occurs when an accessed page is not present in main memory.
Virtual address is an address that corresponds to a location in virtual space and is
translated by address mapping to a physical address when memory is accessed.
Address translation or address mapping is the process by which a virtual address is
mapped to an address used to access memory.
Working mechanism
In virtual memory, blocks of memory are mapped from one set of addresses (virtual
addresses) to another set (physical addresses).
The processor generates virtual addresses while the memory is accessed using physical
addresses.
Both the virtual memory and the physical memory are broken into pages, so that a virtual
page is really mapped to a physical page.
It is also possible for a virtual page to be absent from main memory and not be mapped to a
physical address, residing instead on disk.
Physical pages can be shared by having two virtual addresses point to the same physical
address. This capability is used to allow two different programs to share data or code.
Virtual memory also simplifies loading the program for execution by providing relocation.
Relocation maps the virtual addresses used by a program to different physical addresses
before the addresses are used to access memory. This relocation allows us to load the
program anywhere in main memory.
A virtual address is considered as a pair (p,d) where lower order bits give an offset d within
the page and high-order bits specify the page p.
The job of the Memory Management Unit (MMU) is to translate the page number p to a
frame number f.
The physical address is then (f,d), and this is what goes on the memory bus.
For every process, there is a page and page-number p is used as an index into this array for
the translation.
The following are the entries in page tables:
1. Validity bit: Set to 0 if the corresponding page is not in memory
4. Referenced bit is set to 1 by hardware when the page is accessed: used by page
replacement policy
5. Modified bit (dirty bit) set to 1 by hardware on write-access: used to avoid writing
when swapped out.
Page replacement is a process of swapping out an existing page from the frame of a main
memory and replacing it with the required page.
Page replacement is done when all the frames of main memory are already occupied and a
page has to be replaced to create a space for the newly referenced page. A good
replacement algorithm will have least number of page faults.
It replaces the oldest page that has been present in the main memory for the longest time. It
is implemented by keeping track of all the pages in a queue.
Example 4.5. Find the page faults when the following pages are requested to be loaded in a
page frame of size 3: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1
Page faults= 15
2. Last In First Out (LIFO) page replacement algorithm
It replaces the newest page that arrived at last in the main memory. It is implemented by
keeping track of all the pages in a stack.
3. Least Recently Used (LRU) page replacement algorithm The new page will be replaced
with least recently used page.
Random replacement algorithm replaces a random page in memory. This eliminates the
overhead cost of tracking page references.
Translation Look aside Buffer (TLB)
A translation look aside buffer (TLB) is a memory cache that stores recent translations of
virtual memory to physical addresses for faster retrieval.
The page tables are stored in main memory and every memory access by a program
to the page table takes longer time. This is because it does one memory access to obtain the
physical address and a second access to get the data. The virtual to physical memory
address translation occurs twice. But a TLB will exploit the locality of reference and can
reduce the memory access time.
TLB hit is a condition where the desired entry is found in translation look aside
buffer.
If this happens then the CPU simply access the actual location in the main memory.
If the entry is not found in TLB (TLB miss) then CPU has to access page table in the
main memory and then access the actual frame in the main memory. Therefore, in the case
of TLB hit, the effective access time will be lesser as compare to the case of TLB miss.
If the probability of TLB hit is P% (TLB hit rate) then the probability of TLB miss (TLB miss
rate) will be (1-P) %. The effective access time can be defined as
Effective access time = P (t + m) + (1 - p) (t + k.m + m)
Hardware Level:
The machine should support two modes: supervisor mode and user mode. This indicates
whether the current running process is a user or supervisory process. The processes
running in supervisor or kernel mode is an operating system process.
system mode is done through system calls that transfers control to a dedicated location in
supervisor code space.
System call is a special instruction that transfers control from user mode to a
dedicated location in supervisor code space, invoking the exception mechanism
in the process.
PARALLEL BUS ARCHITECTURES
Single bus architectures connect multiple processors with their own cache memory using
shared bus. This is a simple architecture but it suffers from latency and bandwidth issues.
This naturally led to deploying parallel or multiple bus architectures. Multiple bus
multiprocessor systems use several parallel buses to interconnect multiple processors with
multiple memory modules. The following are the connection schemes in multi bus
architectures:
1. Multiple-bus with full bus–memory connection (MBFBMC)
This has all memory modules connected to all buses. The multiple-bus with single bus
memory connection has each memory module connected to a specific bus. For N processors
with M memory modules and B buses, the number of connections requires are: B(N+M) and
the load on each bus will ne N+M.
2. Multiple bus with partial bus–memory connection (MBPBMC)
The multiple-bus with partial bus–memory connection, has each memory module
connected to a subset of buses.
3. Multiple bus with class-based memory connection (MBCBMC)
The multiple-bus with class-based memory connection (MBCBMC), has memory modules
grouped into classes whereby each class is connected to a specific subset of buses. A class is
just an arbitrary collection of memory modules.
4. Multiple bus with single bus memory connection (MBSBMC)
Here, only single bus will be connected to single memory, but the processor can access all
the buses. The numbers of connections:
Fig 4.16 b) Multiple bus with single bus memory connection (MBSBMC)
Bus Synchronization:
In a single bus multiprocessor system, bus arbitration is required in order to resolve the
bus contention that takes place when more than one processor competes to access the bus.
A bus can be classified as synchronous or asynchronous. The time for any transaction over a
synchronous bus is known in advance. Asynchronous bus depends on the availability of
data and the readiness of devices to initiate bus transactions.
The processors that want to use the bus submit their requests to bus arbitration logic. The
latter decides, using a certain priority scheme, which processor will be granted access to the
bus during a certain time interval (bus master).
The process of passing bus mastership from one processor to another is called
handshaking, which requires the use of two control signals: bus request and bus grant.
Bus request indicates that a given processor is requesting mastership of the bus.
Bus grant: indicates that bus mastership is granted.
Bus busy: is usually used to indicate whether or not the bus is currently being used.
In deciding which processor gains control of the bus, the bus arbitration logic uses a
predefined priority scheme.
Among the priority schemes used are random priority, simple rotating priority, equal
priority, and least recently used (LRU) priority.
After each arbitration cycle, in simple rotating priority, all priority levels are reduced one
place, with the lowest priority processor taking the highest priority. In equal priority, when
two or more requests are made, there is equal chance of any one request being processed.
In the LRU algorithm, the highest priority is given to the processor that has not used the bus
for the longest time.
CPU of the computer system communicates with the memory and the I/O devices in
order to transfer data between them. The method of communication of the CPU with
memory and I/O devices is different. The CPU may communicate with the memory either
Programmed I/O is implicated to data transfers that are initiated by a CPU, under driver
software control to access Registers or Memory on a device.
With programmed I/O, data are exchanged between the processor and the I/O module.
The processor executes a program that gives it direct control of the I/O operation, including
sensing device status, sending a read or write command, and transferring the data.
When the processor issues a command to the I/O module, it must wait until the I/O
operation is complete.
If the processor is faster than the I/O module, this is wasteful of processor time. With
interrupt-driven I/O, the processor issues I/O command, continues to execute other
instructions, and is interrupted by the I/O module when the latter has completed its work.
With both programmed and interrupt I/O, the processor is responsible for extracting data
from main memory for output and storing data in main memory for input.
The alternative is known as direct memory access. In this mode, the I/O module and main
memory exchange data directly, without processor involvement.
With programmed I/O, the I/O module will perform the requested action and then set the
appropriate bits in the I/O status register.
The I/O module takes no further action to alert the processor.
When the processor is executing a program and encounters an instruction relating to I/O, it
executes that instruction by issuing a command to the appropriate I/O module. In
particular, it does not interrupt the processor.
It is the responsibility of the processor periodically to check the status of the I/O module.
Then if the device is ready for the transfer (read/write).
The processor transfers the data to or from the I/O device as required. As the CPU is faster
than the I/O module, the problem with programmed I/O is that the CPU has to wait a long
time for the I/O module of concern to be ready for either reception or transmission of data.
The CPU, while waiting, must repeatedly check the status of the I/O module, and this
process is known as Polling.
The level of the performance of the entire system is severely degraded.
2. I/O module gets data from peripheral whilst CPU does other work
Direct Memory Access (DMA) means CPU grants I/O module authority to read from or write
to memory without involvement.
DMA module controls exchange of data between main memory and the I/O device.
Because of DMA device can transfer data directly to and from memory, rather than using
the CPU as an intermediary, and can thus relieve congestion on the bus.
CPU is only involved at the beginning and end of the transfer and interrupted only after
entire block has been transferred.
The CPU programs the DMA controller by setting its registers so it knows what to transfer
where.
It also issues a command to the disk controller telling it to read data from the disk into its
internal buffer and verify the checksum.
When valid data are in the disk controller╆s buffer, DMA can begin. The DMA controller
initiates the transfer by issuing a read request over the bus to the disk controller.
This read request looks like any other read request, and the disk controller does not know
whether it came from the CPU or from a DMA controller.
The memory address to write to is on the bus address lines, so when the disk controller
fetches the next word from its internal buffer, it knows where to write it.
The write to memory is another standard bus cycle.
When the write is complete, the disk controller sends an acknowledgement signal to the
DMA controller, also over the bus.
The DMA controller then increments the memory address to use and decrements the byte
count. If the byte count is still greater than 0, steps 2 through 4 are repeated until the count
reaches 0.
At that time, the DMA controller interrupts the CPU to let it know that the transfer is now
complete.
When the operating system starts up, it does not have to copy the disk block to memory; it
The DMA controller requests the disk controller to transfer data from the disk controller╆s
buffer to the main memory. In the first step, the CPU issues a command to the disk
controller telling it to read data from the disk into its internal buffer.
The peripheral devices and external buffer that operate at relatively low frequencies
communicate with the processor using serial bus. There are two popular serial buses: Serial
Peripheral Interface (SPI) and Inter-Integrated Circuit (I2C).
Serial Peripheral Interface (SPI)
Serial Peripheral Interface (SPI) is an interface bus designed by Motorola to send data
between microcontrollers and small peripherals such as shift registers, sensors, and SD
cards. It uses separate clock and data lines, along with a select line to choose the device.
A standard SPI connection involves a master connected to slaves using the serial clock
(SCK), Master Out Slave In (MOSI), Master In Slave Out (MISO), and Slave Select
(SS) lines.
The SCK, MOSI, and MISO signals can be shared by slaves while each slave has a unique SS
line.
The SPI interface defines no protocol for data exchange, limiting overhead and allowing for
high speed data streaming.
Clock polarity ゅCPOLょ and clock phase ゅCP(Aょ can be specified as ╅ど╆ or ╅な╆ to form four
unique modes to provide flexibility in communication between master and slave.
)f CPOL and CP(A are both ╅ど╆ ゅdefined as Mode どょ data is sampled at the leading rising edge
of the clock. Mode 0 is by far the most common mode for SPI bus slave communication.
)f CPOL is ╅な╆ and CP(A is ╅ど╆ ゅMode にょ, data is sampled at the leading falling edge of the
clock.Likewise, CPOL = ╅ど╆ and CP(A = ╅な╆ ゅMode なょ results in data sampled at on the
0 0 0
1 0 1
2 1 0
3 1 1
Fig 4.22: Modes in SPI
In addition to the standard 4-wire configuration, the SPI interface has been extended to
include a variety of IO standards including 3-wire for reduced pin count and dual or quad
I/O for higher throughput.
In 3-wire mode, MOSI and MISO lines are combined to a single bidirectional data line.
Transactions are half-duplex to allow for bidirectional communication. Reducing the
number of data lines and operating in half-duplex mode also decreases maximum possible
throughput; many 3-wire devices have low performance requirements and are instead
designed with low pin count in mind.
Multi I/O variants such as dual I/O and quad I/O add additional data lines to the standard
for increased throughput.
Components that utilize multi I/O modes can rival the read speed of parallel devices while
still offering reduced pin counts. This performance increase enables random access and
direct program execution from flash memory (execute-in-place).
Inter-Integrated Circuit (I2C)
An inter-integrated circuit (Inter-IC or I2C) is a multi-master serial bus that connects low-
speed peripherals to a motherboard, mobile phone, embedded system or other electronic
devices.
SCL (Serial Clock) – The line that carries the clock signal.
Devices on an I2C bus are always a master or a slave. Master is the device which always
initiates a communication and drives the clock line (SCL). Usually a microcontroller or
microprocessor acts a master which needs to read data from or write data to slave
peripherals.
Slave devices are always responds to master and won╆t initiate any communication by itself.
Devices like EEPROM, LCDs, RTCs acts as a slave device. Each slave device will have a
unique address such that master can request data from or write data to it.
The master device uses either a 7-bit or 10-bit address to specify the slave device as its
partner of data communication and it supports bi-directional data transfer.
Working of I2C
The I2C, data is transferred in messages, which are broken up into frames of data. Each
message has an address frame that contains the binary address of the slave, and one or
more data frames that contain the data being transmitted.
The message also includes start and stop conditions, read/write bits, and ACK/NACK bits
between each data frame.
The following are the bits in data frames:
1. Start Condition: The SDA line switches from a high voltage level to a low voltage level
before the SCL line switches from high to low.
2. Stop Condition: The SDA line switches from a low voltage level to a high voltage level after
the SCL line switches from low to high.
3. Address Frame: A 7 or 10 bit sequence unique to each slave that identifies the slave when
the master wants to talk to it.
4. Read/Write Bit: A single bit specifying whether the master is sending data to the slave
(low voltage level) or requesting data from it (high voltage level).
5. ACK/NACK Bit: Each frame in a message is followed by an acknowledge/no-acknowledge
bit. If an address frame or data frame was successfully received, an ACK bit is returned to
the sender from the receiving device.
Addressing:
I2C doesn╆t have slave select lines like SP), so it needs another way to let the slave know that
data is being sent to it, and not another slave. It does this by addressing. The address frame
is always the first frame after the start bit in a new message.
The master sends the address of the slave it wants to communicate with to every slave
connected to it. Each slave then compares the address sent from the master to its own
address.
If the address matches, it sends a low voltage ACK bit back to the master. If the address
doesn╆t match, the slave does nothing and the SDA line remains high.
Read/Write Bit
The address frame includes a single bit at the end that informs the slave whether the master
wants to write data to it or receive data from it. If the master wants to send data to the
slave, the read/write bit is a low voltage level. If the master is requesting data from the
slave, the bit is a high voltage level.
Data Frame
After the master detects the ACK bit from the slave, the first data frame is ready to be sent.
The data frame is always 8 bits long, and sent with the most significant bit first.
Each data frame is immediately followed by an ACK/NACK bit to verify that the frame has
been received successfully.
The ACK bit must be received by either the master or the slave (depending on who is
sending the data) before the next data frame can be sent.
After all of the data frames have been sent, the master can send a stop condition to the slave
to halt the transmission.
The stop condition is a voltage transition from low to high on the SDA line after a low to
high transition on the SCL line, with the SCL line remaining high.
3. Each slave compares the address sent from the master to its own address. If the address
matches, the slave returns an ACK bit by pulling the SDA line low for one bit. If the address
from the master does not match the slave╆s own address, the slave leaves the SDA line high.
4. The master sends or receives the data frame.
5. After each data frame has been transferred, the receiving device returns another ACK bit to
the sender to acknowledge successful receipt of the frame.
6. To stop the data transmission, the master sends a stop condition to the slave by switching
Mass storage refers to various techniques and devices for storing large amounts of data.
Mass storage is distinct from memory, which refers to temporary storage areas within the
computer. Unlike main memory, mass storage devices retain data even when the computer
is turned off.
tape drives
RAID storage
USB storage
flash memory cards
Solid State Devices
Solid-state devices are electronic devices in which electricity flows through solid
semiconductor crystals like silicon, gallium arsenide, and germanium rather than through
vacuum tubes.
It do not involve any moving parts or magnetic materials.
RAM is a solid state device that consists of microchips that store data on non-moving
components, providing for fast retrieval of that data.
Transistors are the most important solid state devices. The transistors contain two p– n
junctions, have three contacts or terminals.
They require the action of perpendicular electrical fields, their behavior is more difficult to
understand than that of diodes.
The different types of transistors are: bipolar junction transistor (BJT) where the current is
Hard Drives
A hard disk drive is a non-volatile memory hardware device that permanently stores and
retrieves data on a computer.
A hard drive is a secondary storage device that consists of one or more platters to which
data is written using a magnetic head, all inside of an air-sealed casing.
Internal hard disks reside in a drive bay, connect to the motherboard using an ATA, SCSI, or
SATA cable, and are powered by a connection to the power supply unit.
An external hard drive is a portable storage device that can be attached to a computer
through a USB or FireWire connection, or wirelessly.
External hard drives typically have high storage capacities and are often used to back up
computers or serve as a network drive.
Optical Drives
An Optical Drive refers to a computer system that allows users to use DVDs, CDs and Blu-
ray optical drives.
The drive contains some lenses that project electromagnetic waves that are responsible for
reading and writing data on optical discs.
An optical disk drive uses a laser to read and write data. A laser in this context means an
electromagnetic wave with a very specific wavelength within or near the visible light
spectrum.
An optical drive that works with all types of discs will have two separate lenses: one for
CD/DVD and one for Blu-ray.
An optical drive has a rotational mechanism to spin the disc. Optical drives were originally
designed to work at a constant linear velocity (CLV) (i.e.) the disc spins at varying speeds
depending on where the laser beam is reading, so the spiral groove of the disc passes by the
laser at a constant speed.
An optical drive also needs a loading mechanism: A tray-loading mechanism, where the
disc is placed onto a motorized tray, which moves in and out of the computer case and slot-
loading mechanism, where the disc is slid into a slot and motorized rollers are used to
move the disc in and out.
Tape disks
A tape drive is a device that stores computer data on magnetic tape, especially for backup
and archiving purposes.
Tape drives work either by using a traditional helical scan where the recording and
playback heads touch the tape, or linear tape technology, where the heads never actually
RAID 1 writes and reads identical data to pairs of drives. This process is often called data
mirroring and it╆s a primary function is to provide redundancy. If any of the disks in the
array fails, the system can still access data from the remaining disk(s).
3. RAID 5 (Striping with parity):
RAID 5 stripes data blocks across multiple disks like RAID 0, however, it also stores parity
information (Small amount of data that can accurately describe larger amounts of data)
which is used to recover the data in case of disk failure. This level offers both speed (data is
accessed from multiple disks) and redundancy as parity data is stored across all of the
disks.
4. RAID 6 (Striping with double parity):
Raid 6 is similar to RAID 5, however, it provides increased reliability as it stores an extra
parity block. That effectively means that it is possible for two drives to fail at once without
breaking the array.
5. RAID 10 (Striping + Mirroring):
RAID 10 combines the mirroring of RAID 1 with the striping of RAID 0. Or in other words, it
combines the redundancy of RAID 1 with the increased performance of RAID 0. It is best
suitable for environments where both high performance and security is required.
When a device is attached to the USB system, it gets assigned a number called its address.
The address is uniquely used by that device while it is connected.
Each device also contains a number of endpoints, which are a collection of sources and
Flash Drives
A flash drive stores data using flash memory. Flash memory uses an electrically erasable
programmable read-only (EEPROM) format to store and retrieve data.
Flash drives are non-volatile, which means they do not need a battery backup.
Most computers come equipped with USB ports, which detect inserted flash drives and
install the necessary drivers to make the data retrievable.
Computer users can store and retrieve data once the operating system has detected a
connection to the USB port.
Flash drives have a USB mass storage device classification, which means they do not require
additional drivers.
The computer╆s operating system recognizes a block-structured logical unit, which means it
can use any file system or block addressing system to read the information on the flash
drive.
A flash drive enters emulation mode, or acts a hard drive, once it has connected to the USB
port. This makes it easier to transfer data between the flash drive and the computer.
Flash memory is known as a solid state storage device, meaning there are no moving parts
— everything is electronic instead of mechanical.
Input Devices
Keyboard
A keyboard has its own processor and circuitry that carries information to and from that
processor.
The mechanical action of the switch causes some vibration, called bounce, which the
processor filters out.
If the key is pressed and held continuously, the processor recognizes it as the equivalent of
pressing a key repeatedly.
Another type of keyboard has three layers: top plasticized layer with key positions marked
on the top surface and conducting traces on another side; middle layer made of rubber with
hole for key positions; bottom metallic layer with raised bumps for key positions.
When a key is pressed the trace underneath the top layer comes in contact with the bump in
the last layer, thus completing an electrical circuit. The current flow is sensed by the
microcontroller.
Fig 4.24: Layers in keyboard
Mouse
The optical mouse shines a bright light down onto the desk from an LED mounted on the
bottom of the mouse.
The light bounces straight back up off the desk into a photocell also mounted under the
mouse, a short distance from the LED.
The photocell has a lens in front of it that magnifies the reflected light, so the mouse can
respond more precisely to your hand movements.
As the mouse is pushed, the pattern of reflected light changes, and the chip inside the
A trackball can also be used as an alternative to a mouse. This device also has buttons
similar to those on a mouse.
It holds a large moving ball on the top. The body of the trackball is not moved. The ball is
rolled with fingers. The position of the cursor on the screen is controlled by rotating the
ball.
The main benefit of the trackball over a mouse is that it takes less space to move. The
trackball is often included in laptop computers. The standard desktop computer also uses a
trackball operated as a separate input device.
A touchpad is a small, plane surface over which the user moves his finger. The user controls
the movement of the cursor on the screen by moving his fingers on the touchpad. It is also
known as a track pad.
A touchpad also has one or more buttons near it. These button work like mouse buttons.
Touchpads are commonly used with notebook computers.
A joystick consists of a base and a stick. The stick can be moved in several directions to shift
an object anywhere on the computer screen.
A joystick can perform a similar function to a mouse or trackball. It is often considered less
comfortable and efficient. The most common use of a joystick is for playing computer
games.
Scanners
Scanners operate by shining light at the object or document being digitized and directing
the reflected light onto a photosensitive element.
In most scanners, the sensing medium is an electronic, light-sensing integrated circuit
known as a charged coupled device (CCD).
Light-sensitive photo sites arrayed along the CCD convert levels of brightness into
electronic signals that are then processed into a digital image.
A scanner consists of a flat transparent glass bed under which the CCD sensors, lamp,
lenses, filters and also mirrors are fixed.
The document has to be placed on the glass bed. There will also be a cover to close the
scanner.
The lamp brightens up the text to be scanned. Most scanners use a cold cathode fluorescent
lamp (CCFL).
A stepper motor under the scanner moves the scanner head from one end to the other. The
movement will be slow and is controlled by a belt.
The scanner head consists of the mirrors, lens, CCD sensors and also the filter. The scan
head moves parallel to the glass bed and that too in a constant path.
As the scan head moves under the glass bed, the light from the lamp hits the document and
is reflected back with the help of mirrors angled to one another.
According to the design of the device there may be either 2-way mirrors or 3-way mirrors.
The mirrors will be angled in such a way that the reflected image will be hitting a smaller
surface.
In the end, the image will reach a lens which passes it through a filter and causes the image
The screen is coated with a pattern of dots using phosphor that glow when struck by the
electron stream.
The image on the monitor screen is usually made up from at least tens of thousands of such
tiny dots glowing on command from the computer.
The closer together the pixels are, the sharper the image on screen.
The distance between pixels on a computer monitor screen is called its dot pitch and is
measured in millimeters. Most monitors have a dot pitch of 0.28 mm or less.
There are two electromagnets around the collar of the tube which deflect the electron
beam.
The beam scans across the top of the monitor from left to right, is then blanked and moved
back to the left-hand side slightly below the previous trace (on the next scan line), scans
across the second line and so on until the bottom right of the screen is reached.
The beam is again blanked, and moved back to the top left to start again.
This process draws a complete picture, typically 50 to 100 times a second.
The number of times in one second that the electron gun redraws the entire image is called
the refresh rate and is measured in hertz (cycles per second).
It is common, particularly in lower priced equipment, for all the odd-numbered lines of an
image to be traced, and then all the even-numbered lines; the circuitry of such an interlaced
display need to be have only half the speed of a non-interlaced display.
An interlaced display, particularly at a relatively low refresh rate, can appear to some
observers to flicker, and may cause eye strain and nausea.
The intensity or strength of the electron beam is controlled by setting the voltage levels.
The number of electrons that hits the screen determines the light emitted by the screen.
When the voltage is varied in the electron gun, the brightness of the display also varies.
The focusing hardware focuses the beam at all positions on the screen.
The deflection of electron beam is controlled by electric or magnetic fields.
Two pairs of coils mounted on the CRT to produce the necessary defection.
An LED screen is an LCD screen, but instead of having a normal CCFL backlight, it uses light-
emitting diodes (LEDs) as a source of light behind the screen.
An LED is more energy efficient and a lot smaller than a CCFL, enabling a thinner television
screen.
Printers
A printer is an electromechanical device which converts the text and graphical documents
from electronic form to the physical form.
They are the external peripheral devices which are connected with the computers or
laptops through a cable or wirelessly to receive input data and print them on the papers.
Quality of printers is identified by its features like color quality, speed of printing,
resolution etc. Modern printers come with multipurpose functions i.e. they are combination
of printer, scanner, photocopier, fax, etc.
Broadly printers are categorized as impact and non impact printers.
Daisy Wheel Printers
Daisy wheel printers print only characters and symbols and cannot print graphics. They are
generally slow with a printing speed of about 10 to 75 characters per second.
A circular printing element is the heart of these printers that contains all text, numeric
characters and symbols mould on each petal on the circumference of the circle.
The printing element rotates rapidly with the help of a servo motor and pauses to allow the
printing hammer to strike the character against the paper.
It is a popular computer printer that prints text and graphics on the paper by using tiny
dots to form the desired shapes.
It uses an array of metal pins known as print head to strike an inked printer ribbon and
produce dots on the paper.
These combinations of dots form the desired shape on the paper.
The key component in the dot matrix printer is the ╅print head╆ which is about one inch long
and contains a number of tiny pins aligned in a column varying from 9 to 24.
The print head is driven by several hammers which force each pin to make contact with the
paper at the certain time. These hammers are pulled by small electromagnet which is
energized at a specific time depending on the character to be printed.
Inkjet printers
Inkjet printers are most popular printers for home and small scale offices as they have a
reasonable cost and a good quality of printing as well.
An inkjet printer is made of the following parts:
i) Print head – It is the heart of the printer which holds a series a nozzles which sprays the ink
drops over the paper.
ii) Ink cartridge – It is the part that contains the ink for printing. Generally monochrome (black
& white) printers contain a black colored ink cartridges and a color printer
contains two cartridges – one with black ink and other with primary colors (cyan, magenta
and yellow).
iii) Stepper motor – It is housed in the printer to move the printer head and ink cartridges back
and forth across the paper.
iv) Stabilizer bar – A stabilizer bar is used in printer to ensure the movement of print head is
précised and controlled over the paper.
v) Belt – A belt is used to attach the print head with the stepper motor.
vi) Paper Tray – It is the place where papers are placed to be printed.
vii) Rollers – Printers have a set of rollers that helps to pull paper from the tray for printing
purpose.
viii) Paper tray stepper motor- another stepper motor is used to rotate the rollers in order to
pull the paper in the printer.
ix) Control Circuitry – The control circuit takes the input from the computer and by decoding
the input controls all mechanical operation of the printer.
Laser Printers
Laser printers are the most popular printers that are mainly used for large scale qualitative
printing.
They are among the most popularly used fastest printers available in the market.
A laser printer uses a slight different approach for printing. It does not use ink like inkjet
printers, instead it uses a very fine powder known as Toner.
The control circuitry is the part of the printer that talks with the computer and receives the
printing data.
A Raster Image Processor (RIP) converts the text and images in to a virtual matrix of dots.
The photo conducting drum which is the key component of the laser printer has a special
coating which receives the positive and negative charge from a charging roller.
A rapidly switching laser beam scans the charged drum line by line. When the beam flashes
on, it reverses the charge of tiny spots on the drum, respecting to the dots that are to be
printed black.
As soon the laser scans a line, a stepper motor moves the drum in order to scan the
next line by the laser.
A developer roller plays the vital role to paste the tonner on the paper. It is coated
with charged tonner particles.
As the drum touches the developer roller, the charged tonner particles cling to the
discharged areas of the drum, reproducing your images and text reversely.
Meanwhile a paper is drawn from the paper tray with help of a belt. As the paper
passes through a charging wire it applies a charge on it opposite to the toner╆s charge.
When the paper meets the drum, due to the opposite charge between the paper and
toner particles, the toner particles are transferred to the paper.
A cleaning blade then cleans the drum and the whole process runs smoothly
continuously.
Finally paper passes through the fuser which is a heat and presser roller, melts the
toner and fixes on the paper perfectly.