Coa Unit-3,4 Notes
Coa Unit-3,4 Notes
Parallel processing
Parallel computing refers to the process of breaking down larger problems into smaller,
independent, often similar parts that can be executed simultaneously by multiple processors
communicating via shared memory.
Parallel processing systems are created to speed up the implementation of programs by
breaking the program into several fragments and processing these fragments together.
Flynn has classified the computer systems based on parallelism in the instructions and in the
data streams. These are:
1. Single instruction stream, single data stream (SISD): this is the traditional CPU
architecture: at any one time only a single instruction is executed, operating on a single data
item.
2. Single instruction stream, multiple data stream (SIMD).
3. Multiple instruction streams, single data stream (MISD).
4. Multiple instruction stream, multiple data stream (MIMD).
Flynn has classified the computer ‘systems into four types based on parallelism but only two
of them are relevant to parallel computers. These are SIMD and MIMD computers.
SIMD computers are consisting of ‘n’ processing units receiving a single stream of
instruction from a central control unit and each processing unit operates on a different piece
of data. Most SIMD computers operate synchronously using a single global dock. The block
diagram of SIMD computer is shown below:
MIMD computers are consisting of ‘n’ processing units; each with its own stream of
instruction and each processing unit operate on unit operates on a different piece of data.
MIMD is the most powerful computer system that covers the range of multiprocessor
systems. The block diagram of MIMD computer is shown.
Cache Coherence
In a shared memory multiprocessor with a separate cache memory for each processor , it is
possible to have many copies of any one instruction operand : one copy in the main memory
and one in each cache memory. When one copy of an operand is changed, the other copies of
the operand must be changed also. Cache coherence is the discipline that ensures that changes
in the values of shared operands are propagated throughout the system in a timely fashion.
2. All processes see exactly the same sequence of changes of values for each separate
operand.
Vector Processing
Definition: Vector processor is basically a central processing unit that has the ability to
execute the complete vector input in a single instruction. More specifically we can say, it is a
complete unit of hardware resources that executes a sequential set of similar data items in the
memory using a single instruction.
The figure below represents the typical diagram showing vector processing by a vector
computer:
The functional units of a vector computer are as follows:
i. IPU or instruction processing unit
ii. Vector register
iii. Scalar register
iv. Scalar processor
v. Vector instruction controller
vi. Vector access controller
vii. Vector processor
We know that both data and instructions are present in the memory at the desired memory
location. So, the instruction processing unit i.e., IPU fetches the instruction from the memory.
Once the instruction is fetched then IPU determines either the fetched instruction is scalar or
vector in nature. If it is scalar in nature, then the instruction is transferred to the scalar
register and then further scalar processing is performed.
While, when the instruction is a vector in nature then it is fed to the vector instruction
controller. This vector instruction controller first decodes the vector instruction then
accordingly determines the address of the vector operand present in the memory.
Then it gives a signal to the vector access controller about the demand of the respective
operand. This vector access controller then fetches the desired operand from the memory.
Once the operand is fetched then it is provided to the instruction register so that it can be
processed at the vector processor.
At times when multiple vector instructions are present, then the vector instruction controller
provides the multiple vector instructions to the task system. And in case the task system
shows that the vector task is very long then the processor divides the task into subvectors.
These subvectors are fed to the vector processor that makes use of several pipelines in order
to execute the instruction over the operand fetched from the memory at the same time.
Classification of Vector Processor
The several vector pipelines present in the vector computer help in retrieving the data from
the registers and also storing the results in the desired register.
This architecture enables the fetching of data of size 512 bits from memory to pipeline.
However, due to high memory access time, the pipelines of the vector computer requires
higher start-up time, as higher time is required to initiate the vector instruction.
Array processors are also known as multiprocessors or vector processors. They perform
computations on large arrays of data. Thus, they are used to improve the performance of the
computer.
A general block diagram of an array processor is shown below. It contains a set of identical
processing elements (PE's), each of which is having a local memory M. Each processor
element includes an ALU and registers. The master control unit controls all the operations of
the processor elements. It also decodes the instructions and determines how the instruction is
to be executed.
The main memory is used for storing the program. The control unit is responsible for fetching
the instructions. Vector instructions are send to all PE's simultaneously and results are
returned to the memory.
Why use the Array Processor
As most of the Array processors operates asynchronously from the host CPU,
hence it improves the overall capacity of the system.
Array Processors has its own local memory, hence providing extra memory for
systems with low memory.
Memory Interleaving
Abstraction is one of the most important aspects of computing. It is a widely
implemented Practice in the Computational field.
Memory Interleaving is less or More an Abstraction technique. Though it’s a bit
different from Abstraction. It is a Technique that divides memory into a number of
modules such that Successive words in the address space are placed in the Different
modules.
Why do we use Memory Interleaving? [Advantages]:
Whenever Processor requests Data from the main memory. A block (chunk) of Data is
Transferred to the cache and then to Processor. So whenever a cache miss occurs the
Data is to be fetched from the main memory. But main memory is relatively slower
than the cache. So to improve the access time of the main memory interleaving is
used.
1. High order interleaving: In high order memory interleaving, the most significant bits
of the memory address decides memory banks where a particular location resides.
But, in low order interleaving the least significant bits of the memory address decides
the memory banks. The least significant bits are sent as addresses to each chip.
2. Low order interleaving: The least significant bits select the memory bank
(module) in low-order interleaving. In this, consecutive memory addresses are in
different memory modules, allowing memory access faster than the cycle time.
UNIT-4
INPUT-OUTPUT ORGANIZATION
Input-Output Interface
Peripheral Devices
Input or output devices that are connected to computer are called peripheral devices. These
devices are designed to read information into or out of the memory unit upon command
from the CPU and are considered to be the part of computer system. These devices are also
called peripherals.
1. Input peripherals: Allows user input, from the outside world to the computer.
Example: Keyboard, Mouse etc.
2. Output peripherals: Allows information output, from the computer to the outside
world. Example: Printer, Monitor etc.
The binary information that is received from an external device is usually stored in the
memory unit. The information that is transferred from the CPU to the external device is
originated from the memory unit. CPU merely processes the information but the source and
target is always the memory unit. Data transfer between CPU and the I/O devices may be
done in different modes
Data transfer between the central unit and I/O devices can be handled in generally three
types of modes which are given below:
1. Programmed I/O
1) Programmed I/O
The programmed I/O method controls the transfer of data between connected devices and
the computer. Each I/O device connected to the computer is continually checked for inputs.
Once it receives an input signal from a device, it carries out that request until it no longer
receives an input signal. Let's say you want to print a document. When you select print on
your computer, the request is sent through the central processing unit (CPU) and the
communication signal is acknowledged and sent out to the printer.
Advantages:
Programmed I/O is simple to implement.
It requires very little hardware support.
CPU checks status bits periodically.
Disadvantages:
The processor has to wait for a long time for the I/O module to be ready for either
transmission or reception of data.
The performance of the entire system is severely degraded.
DMA controller provides an interface between the bus and the input-output devices.
Although it transfers data without intervention of processor, it is controlled by the
processor.
DMA controller contains an address unit, for generating addresses and selecting I/O device
for transfer. It also contains the control unit and data count for keeping counts of the
number of blocks transferred and indicating the direction of transfer of data. When the
transfer is completed, DMA informs the processor by raising an interrupt.
In any Operating System, it is necessary to have a Dual Mode Operation to ensure the
protection and security of the System from unauthorized or errant users. This Dual Mode
separates the User Mode from the System Mode or Kernel Mode.
What are Privileged Instructions?
The Instructions that can run only in Kernel Mode are called Privileged Instructions.
If a privileged instruction is attempted to get executed in user mode, that instruction will
get ignored and treated as an illegal instruction. It is trapped in the operating system by
the hardware.
It is the responsibility of the operating system to ensure that the Timer is set to interrupt
before transferring control to any user application.
The operating system uses privileged instruction to ensure proper operation.
Examples of privileged instructions are I/O instructions, Context switching, clear
memory, set the timer of the CPU etc.
Interrupt
Memory Hierarchy
The Computer memory hierarchy looks like a pyramid structure which is used to describe the
differences among memory types. It separates the computer storage based on hierarchy.
Level 0: CPU registers
Level 1: Cache memory
Level 2: Main memory or primary memory
Level 3: Magnetic disks or secondary memory
Level 4: Optical disks or magnetic types or tertiary Memory
In Memory Hierarchy the cost of memory, capacity is inversely proportional to speed. Here
the devices are arranged in a manner Fast to slow that is form register to Tertiary memory.
Let us discuss each level in detail:
Level-0 − Registers
The registers are present inside the CPU. As they are present inside the CPU, they have least
access time. Registers are most expensive and smallest in size generally in kilobytes. They
are implemented by using Flip-Flops.
Level-1 − Cache
Cache memory is used to store the segments of a program that are frequently accessed by the
processor. It is expensive and smaller in size generally in Megabytes and is implemented by
using static RAM.
Level-2 − Primary or Main Memory
It directly communicates with the CPU and with auxiliary memory devices through an I/O
processor. Main memory is less expensive than cache memory and larger in size generally in
Gigabytes. This memory is implemented by using dynamic RAM.
Level-3 − Secondary storage
Secondary storage devices like Magnetic Disk are present at level 3. They are used as backup
storage. They are cheaper than main memory and larger in size generally in a few TB.
Level-4 − Tertiary storage
Tertiary storage devices like magnetic tape are present at level 4. They are used to store
removable files and are the cheapest and largest in size (1-20 TB).
Main Memory
The memory unit that communicates directly within the CPU, Auxiliary memory and Cache
memory, is called main memory. It is the central storage unit of the computer system. It is a
large and fast memory used to store data during computer operations. The main memory acts
as the central storage unit in a computer system. It is a relatively large and fast memory
which is used to store programs and data during the run time operations.
The primary technology used for the main memory is based on semiconductor integrated
circuits. The integrated circuits for the main memory are classified into two major units.
2. SRAM: Static RAM, has a six transistor circuit in each cell and retains data,
until powered off.
3. NVRAM: Non-Volatile RAM, retains its data, even when turned off.
Example: Flash memory.
ROM: Read Only Memory, is non-volatile and is more like a permanent storage for
information. It also stores the bootstrap loader program, to load and start the operating system
when computer is turned on. PROM (Programmable ROM), EPROM (Erasable PROM)
and EEPROM (Electrically Erasable PROM) are some commonly used ROMs.
Auxiliary Memory
Magnetic Disks
A magnetic disk is a type of memory constructed using a circular plate of metal or plastic
coated with magnetized materials. Usually, both sides of the disks are used to carry out
read/write operations. However, several disks may be stacked on one spindle with read/write
head available on each surface.
The following image shows the structural representation for a magnetic disk.
o The memory bits are stored in the magnetized surface in spots along the concentric
circles called tracks.
o The concentric circles (tracks) are commonly divided into sections called sectors.
Cache Memory
The data or contents of the main memory that are used again and again by CPU, are stored in
the cache memory so that we can easily access that data in shorter time.
Whenever the CPU needs to access memory, it first checks the cache memory. If the data is
not found in cache memory then the CPU moves onto the main memory. It also transfers
block of recent data into the cache and keeps on deleting the old data in cache to
accommodate the new one.
Hit Ratio
The performance of cache memory is measured in terms of a quantity called hit ratio. When
the CPU refers to memory and finds the word in cache it is said to produce a hit. If the word
is not found in cache, it is in main memory then it counts as a miss.
The ratio of the number of hits to the total CPU references to memory is called hit ratio.
Associative Memory
An associative memory can be considered as a memory unit whose stored data can be
identified for access by the content of the data itself rather than by an address or memory
location.
On the other hand, when the word is to be read from an associative memory, the content of
the word, or part of the word, is specified. The words which match the specified content are
located by the memory and are marked for reading.
Cache Mapping:
The process /technique of bringing data of main memory blocks into the cache block
is known as cache mapping.
The mapping techniques can be classified as :
1. Direct Mapping
2. Associative
3. Set-Associative
Associative Mapping
In associative mapping both the address and data of the memory word are stored.
The associative mapping method used by cache memory is very flexible one as well as very
fast.
This mapping method is also known as fully associative cache.
Advantages of associative mapping
Associative mapping is fast.
Associative mapping is easy to implement.
Disadvantages of associative mapping
Cache Memory implementing associative mapping is expensive as it requires to store
address along with the data.
Direct Mapping
In direct mapping cache, instead of storing total address information with data in cache only
part of address bits is stored along with data.
The new data has to be stored only in a specified cache location as per the mapping rule for
direct mapping. So it doesn't need replacement algorithm.
Advantages of direct mapping
Direct mapping is simplest type of cache memory mapping.
Here only tag field is required to match while searching word that is why it fastest
cache.
Direct mapping cache is less expensive compared to associative cache mapping.
Disadvantages of direct mapping
The performance of direct mapping cache is not good as requires replacement for
data-tag value.
Set-Associative Mapping
In Set-Associative cache memory two or more words can be stored under the same index
address.
Here every data word is stored along with its tag. The number of tag-data words under an
index is said to form a text.
Advantages of Set-Associative mapping
Set-Associative cache memory has highest hit-ratio compared two previous two cache
memory discussed above. Thus its performance is considerably better.
Disadvantages of Set-Associative mapping
Set-Associative cache memory is very expensive. As the set size increases the cost
increases.
Virtual memory
Virtual Memory is a storage scheme that provides user an illusion of having a very big main
memory. This is done by treating a part of secondary memory as the main memory.
In this scheme, User can load the bigger size processes than the available main memory by
having the illusion that the memory is available to load the process.
Instead of loading one big process in the main memory, the Operating System loads the
different parts of more than one process in the main memory.
In modern word, virtual memory has become quite common these days. In this scheme,
whenever some pages needs to be loaded in the main memory for the execution and the
memory is not available for those many pages, then in that case, instead of stopping the pages
from entering in the main memory, the OS search for the RAM area that are least used in the
recent times or that are not referenced and copy that into the secondary memory to make the
space for the new pages in the main memory.
Since all this procedure happens automatically, therefore it makes the computer feel like it is
having the unlimited RAM.