Pca Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

PCA

Unit 2
Q.Explain vector processing concepts.
Vector processing is a computer process that can process numerous
data component at once. It operates every element of vector in one
operation. Or in parallel to avoid the overhead of the processing
loop.when simultaneous operation must be independent on one
another process in this situation vector processing to be effective.
Dig. Of scalar and vector.
The difference between parallel processing and vector processing is
that parallel processing uses multiple processor for separate tasks.
While vector processor uses single processor for performing same
operation on multiple data element simultaneously.
It classified in two types on the basis of formation of vector and
presence of vector instruction for processing.
1.Register to register architecture.
This architecture are used in high-performance processor and
modern high performance processors because of faster register
access. In register to register archi. data is fetched and stored in
register. This architecture is complex in the case of hardware and
instruction set.
2.Memory to memory architecture.
This architecture are used in low-performance and old performance
processor because of it’s slower memory access. In this architecture
data is fetched and stored in memory. And this architecture typically
simpler.
Advantages
1.improved performance
2.Reduce loop overhead.
3.Optimized hardware
4.Reduction in instruction bandwidth.
Q. Explain the CRAY-1 processor with it’s characteristics.
The CRAY-1 is designed by seymour , it was one of the first
supercomputer and one of the milestone in history of high-
performance computing.it was introduced in 1976 by Cray Research.
The characteristics are
1.innovative design:- it was the unique and iconic cylindrical shape.
2.Speed :- it uses vector processing so it’s speed performance is high.
And it’s speed upto 160 megaflops.
3.Memory:- it had maximum memory capacity of 8 megaword
which is 64 megabytes
4.High performance:- it was vector processor supercomputer,
capable of executing single instruction using multiple data elements
5.Word length:- it had a 64-bit word-length which was considered
as quite advanced during 1970’s

Q.Explain the architecture of CRAY-1 processor.


The CRAY-1 is designed by seymour , it was one of the first
supercomputer and one of the milestone in history of high-
performance computing.it was introduced in 1976 by Cray Research.

Dig.
The central processing unit(CPU) is a single integrated processor
which consists of computation section, memory section, and input
or output channels section.the memory expandable from 0.25
million 64-bit to 1.0 million words.
1.computation section
The computation section contains instruction buffers, registers and
functional units which operate together to execute a program of
instructions stored in memory. It perform arithmetic operation
where the integer arithmetic is performed in two’s complement and
floating point is performed in signed magnitude representation.
2.Memory section
the memory expandable from 0.25 million 64-bit to 1.0 million
words. The CRAY-1 memory consists of normally 16 banks of bi-polar
1024-bit LSI memory. And provide three memory options such as
262,144 word , 524,288 (262,144 x 2) words and 1,048,576(262,144
x 4 )words. Each word is 72 bits long and consists of 64-bit data bits
and 8 check bits.
3.I/O section
Input and output communication with the CRAY-1 is over 12 full
duplex 16-bits channels. Each channel are control lines that
indicates the presence of data of channel(ready) , data receives
(resume) or data transfer complete(Disconnect). the channels are
divided in four groups. A group consist of either six inputs or six
output.if more then one channel group is active then the request is
resolved based on the priority. The request from the lowest
numbered channel is served first.

Q.characteristics of vector processing.


See this answer from google .

Q.Explain array processing.


The processor which are used to perform different computations on
a huge array of data is called array processing. The other term are
used to this processor are vector processing or
multiprocessors.but this processor performs only one instruction at
a time on array of data.these processor work with huge data sets
to execute computations.. so they are used to enhancing the
computer.
Array Architecture. Dig.
1.In this array architecture it includes number of ALUs(Arithmetic,
logic units)which allows all array element to be processed together.
2.each ALU in this processor provided with local memory which is
known as Processor Element or PE.
3.By using this processor single instruction is issued through control
unit. And that can be simply applied on the number of data set
simultaneously.
Working in your words.
There are two types of array processor
1.attached array processor
Dig.
This processor is connect to computer machine to improve the
numerical computational tasks.this processor connect to general
purpose computer and I/O interface which is connected to main
memory and local memory.this processor achieves high
performance through parallel processing by using multiple function
units.
2.SIMD array processor.(Single instruction and multiple data
stream’s)
Dig.
Master control unit control the all processor and provide same
instruction to processor elements for perform multiple data
operations. In this processor each processor element(PE) has it’s
own local memory(M).
If the instruction is single or program control instruction then it
executes within master control unit or if the instruction is vector
instruction then it passes to processor element.main memory used
to store program.
Applications
1.this processor used in medical and astronomy applications.
2.Used in sonar and radar system.
3.Used to improve speech
https://youtube.com/watch?v=gKYGA7fFad4
Q.Interleaved memory organization
1.It is less or more abstraction technique. It is the technique which
divides the memory into a number of modules such that successive
address space are placed in the different modules.
2.An instruction pipeline may demand both instructions and
operands from the main memory simultaneously, which is not in
traditional way of memory access.
3.It fetching two operands from the main memory to solve problems.
It enables concurrent access to many memory modules

https://www.youtube.com/watch?v=uzECa-
TZ0cw&list=RDCMUCIs6YfZjrJ29sHd3-qDTGBQ&index=4
There are two types of interleaved memory.
1. high order interleaved
In high order memory interleaving, the most significant bits of the
memory address decide memory banks.in this type the consecutive
address are stored in same modules. As shown in fig.(number the
address via column like 1,2,3,4 in one module. And next in next
modules.). let us see below is the memory is divided into 8 modules
https://www.youtube.com/watch?v=Rpkq9x5qnwU&list=RDCMUCIs
6YfZjrJ29sHd3-qDTGBQ&index=1
2.Low order interleaved
In low order memory interleaving, the least significant bits of the
memory address decide memory banks.consecutive address are
stored in a consecutive modules. As shown in fig.(number the
address via rows like 1,2,3,4 in different modules. And next in next
modules.). let us see below is the memory is divided into 8 modules.

https://www.youtube.com/watch?v=8Tf1wWg6tJU&list=RDCMUC
Is6YfZjrJ29sHd3-qDTGBQ&index=2
Q.Write short note on associative memory processors.
https://www.youtube.com/watch?v=bpipcaL0tRw

Unit 1

Q.What is pipelining ? explain


1.Pipelining is the process of accumulating instruction from the
processor through a pipeline.
2.it allows storing and execution in an orderly process.it is also
known as pipeline processing.
3.pipeline is the technique where multiple instructions are
overlapped.
4.it divides into stages and these stages are connected to another
like pipeline structure. Instructions are enter from one end and exit
from another end.
5.it increase the overall throughput
6.in pipeline system each segment consists of input register and
register are used to hold data and combinational circuit.
Dig.

Types of pipeline
It divides into 2 categories.
1.Arithmetic pipeline
Arithmetic pipeline are mostly found in computers. They are used
for floating operations and multiplication of fixed-point numbers.
The input to the floating point pipeline is given as
X=A*2^a
Y=B*2^b
Where A and B are mantissa and a&b are exponents.
The floating point addition and substraction is done in 4 steps.
Which are
1.Compare the exponents
2.Align the mantissa
3.Add or sub mantissa
4.Produce the result
Register are used to store intermediate result of above operation.
2. Instruction pipeline
In this , a stream of instruction can be executed by overlapping fetch,
decode and execute phases of an instruction cycle. The instruction
pipeline reads instruction from memory while the previous
instruction executing in another segments of the pipeline. We can
execute multiple instruction simultaneously.
Advantages
1. reduce cycle time of processor.
2. Increase the throughput of the system
Disadvantages
1.design of pipeline processor is complex
2.High cost of manufacturing
3.The instruction latency is more.

Q.Describe in detail the classification of pipelined processors.


The pipelined processors are classified based on the following
factors.
1.level of processing(arithmetic , processor, instruction pipeline)
2.Pipeline configuration(unifunction vs multifunction pipelines ,
static and dynamic pipelines)
3.Type of instruction and data.(scalar and vector pipeline.)

1.level of processing
a.arithmetic pipeline:-In this type of pipeline generally breaks
arithmetic operations into multiple arithmetic steps that can be
executed one by one in segments in ALU. (Eg.4-stage pipeline in
star-100).
Dig.
b.processor pipeline:-In processor pipeline processing, the same
data stream is processed by a cascade of processor.each processor
perform specific task. The data stream passes in first processor with
result is stored in memory and this data is accessible to next
processor and then the second processor processes this result and
passes to the third processor and so on. This pipeline processor is
not much popular. So it not found in practically.
c. Instruction pipeline:-This technique is also known as look
ahead.almost all high performance computers are nowadays uses
instruction pipeline processing.the steps are fetch, decode, operand
fetch and execute.
2.pipeline configuration
a.unifunction vs multifuncion
Unifunction:- a pipeline with fixed and dedicated functions are
called as unifunction. Eg.CRAY-1 have fixed 12 unifunctional pipeline
Multifunction:- a pipeline that perform different function either at
same time or different times eg. TI-ASC
b.Static vs dynamic pipeline
Static :- this pipeline assumes only one functional configuration at a
time it either unifunctional or multifunctional. A unifunction pipeline
must be static.
Dynamic:-this pipelines assumes multiple configuration at a time. A
dynamic pipeline must be multifunctional.
2.Based on the type of instruction and data.
a.scalar pipeline :-In a scalar pipeline, each instruction processes
only one piece of data at a time. It is designed for single data
elements, meaning that each instruction operates on a single data
item.eg. IBM-360.
b.vector pipeline :-In vector pipeline instructions are designed to
process multiple data elements in parallel. It is designed for multiple
data elements. Same instruction is applied on the multiple data
elements simultaneously. Eg. Star-100, CRAY-1.

Q.3.What are different performance evaluation metrics in parallel


architecture.
1.clock period:-The CPU of digital computer is driven by a clock with
constant cycle time (time in nano second.) The clock rate is inverse
of cycle time.
f=1/T in megahertz
Instruction count:- the size of the program is determined by it’s
instruction count.
Cycles per instruction(CPI):-it is time needed to execute each
instruction.
3.speedup:- how much speed up performance we get through
pipelining
1.(Non-Pipelined)
t1 = n * tn
Here, n: Number of tasks to be performed In conventional Machine
(Non-Pipelined)
tn: Clock cycle
t1: Time required to complete the n tasks
2.In pipelined Machine (k stages)
tp: Clock cycle (timde to complete each sub operation)
tk: Time required to complete the n tasks
tk = (k+n-1)* tp
So the sk(speedup) is calculated as
Sk= t1/tk
Eg. In your own. And dig.
4.Efficiency
The efficiency of a pipeline can be measured as the ratio of busy
time span to the total time span including the idle time.
Let c be the clock period of the pipeline, the efficiency E can be
denoted as:
E = (n. m. c) / m. [m. c + (n-1).c]
E= n / [(m + (n-1)]
As n → ∞, E becomes 1.
4.throughput
Throughput of a pipeline can be defined as the number of results we
get per unit time.
It can be denoted as: T = (n/ [m + (n-1)]) / c = E / c

Unit 3
Q. Explain principle of multithreading
Q.What are different latency hiding techniques?
Explain any one in detail.
This mechanism is used to increase the scalability and
programmability
There are three main latency hidding mechanism or techniques
1.pre-fetching technique:-
2.Coherent caches
3.Multiple context processors

Using pre-fetching techique:- which bring the instruction or data


close to the processor before they are actually needed.
Using coherent caches :- support for hardware to reduce cache
misses.
Using multiple context processor:- to allow a processor to switch
from one context to another.

Pre-fetching technique.
1.this technique reduce latency by bringing instruction or data closer
to processor before they are actually needed.
2.It uses knowledge about the expected misses in the program.
3.It uses this knowledge to moves corresponding instruction closer
to processor before it actually need.
This technique divided into two types
1.Hardware controlled pre-fetching
2.Software controlled pre-fetching
The hardware controlled pre-fetching is done using two schema
1.Using Long cache line:-it introduce the problem of false sharing
2.using Instruction look-ahead:- it limited by branch prediction
accuracy.And finite look-ahead buffer-size.
Software controlled pre-fetching
in this approach, explicit “pre-fetch” instruction Is issued for the
data that is “known” to be remote. In this technique pre-fetching is
done selectively
Binding pre-fetching policy :- it is the responsiblility of “fetching
process” to ensure that there is no other processor will update pre-
fetched value before it is actually used.
Non-binding pre-fetching policy:-The cache coherence protocol will
make sure invalidate the pre-fetched values ,if it is updated before
it is used. The pre-fetched values remains visible to the cache
coherence protocol. And the data is kept consistent until the
processor actually reads the values.

Q.What is cache coherency problem? How this


problem is resolved?
The cache coherency problem arises in multiprocessing or multi-
core computing system. It occurs when multiple processor have it’s
own caches and they store the copies of the same data in those
caches. The problem arises when one processor update or modifies
data .And other processor still hold outdated copies of that data.
This leads to be inconsistency and can lead incorrect program
behaviour and unexpected output.
Dig.

This problem can be resolved and to maintain the consistency using


two methods which are
1.write invalidate:- when the local cache copy is modified then the
write invalidate policy invalidates remote all copies of
caches(invalidate items sometimes called dirty)

2.Write update(write-broadcast):- when the local cache copy is


modified then the write update policy broadcast the updated values
to all caches at the time of modification.
Let’s take example give below

Unit 4

Q.20.what are multiple context processors? Explain in detail


Multiple context processor means multiple threads.
it is one of the component of parallel computing which is
responsible for managing and execution of multiple thread and
multiple processes. The context term refers a state of information
for particular execution context which includes register, program
counter, and relevant information.the multiple context processors
are allow parallel execution of multiple tasks, which can be
significantly improve performance and efficiency in computer
system. There are so many processors who uses multiple context
processors.
1.multicore processor
it enables concurrent execution of multiple tasks by distributing
them across multiple cores. Eg. Intel i7, Ryzen series.
2.manycore processor
It is similar to multicore processor but it uses large number of
cores. It is designed for high throughput parallel processor. It
handles the large number of parallel task simultaneouly. Eg.intel
xeon
3.GPU(graphical processing unit)
GPU is originally design for graphical rendering but it involved in
parallel processing due to it’s numerous cores. And it is well suited
for parallel processing tasks such as scientific simulation and
machine learning. Eg. NVIDIA, Geforce, AMD radeon.
4.accelerators
it is specialized hardware designed to accelerate specific type
of computations. It accelerate the specific tasks. Eg.NVIDIA
Tesla(used for AI and parallel processing)
5.DSP(digital signal processor)
It is an specialized microprocessor which is designed for
effifient digital signal processor. It optimize mathematical
operations commanly used for signal processing application.
Texas instruments C6000 series
6.vector processor
A processor designed to perform operations on vectors or
arrays of data in a single instruction. It is Well-suited for scientific
and mathematical computations involving large datasets. Eg.: Cray-1
Supercomputer.

Q.14.What are Cluster computers. Explain types and attributes of


cluster computers
1.Cluster computers are the type of computing system that consist
of multiple interconnected computer working together as single
system. This clusters are designed to work collaboratively to execute
tasks or applications And enhance the performance,reliability,
scalability compared to single standalone computer.
Attributes of Cluster Computers:
1.Parallel Processing
2.High Performance
3.Scalability: They are scalable, allowing nodes to be added or
removed easily to accommodate changing computational needs
without major disruptions.
4.Fault Tolerance: if one node fails, others can continue operations,
ensuring system reliability.
5.Load Balancing

Types of Cluster Computers:


1.High-Performance Computing (HPC) Clusters:
HP clusters use computer clusters and supercomputers to solve
advance computational problems……..(write in your words)
2.High-Availability Clusters:
HA clusters are designed to maintain redundant nodes that can
act as backup systems in case any failure occurs. Consistent
computing services like business activities, complicated databases,
customer services like e-websites and network file distribution are
provided. They are designed to give uninterrupted data availability
to the customers.
3.Load-Balancing Clusters
Load-balancing clusters distribute incoming network traffic or
tasks across multiple nodes to ensure efficient resource utilization
and prevent overload on any single node. They are commonly used
in web servers, content delivery networks (CDNs), and networking
infrastructure.
4.Failover Clusters
Also known as disaster recovery clusters, these systems
replicate data and services across multiple nodes. If one node fails,
another takes over seamlessly to ensure uninterrupted service.
Failover clusters are crucial for systems requiring continuous
operation, such as banking, healthcare, and emergency services.
Q.15.What are different parallel programming models? Explain
Data parallel model in detail
parallel programming models provides abstraction and
methodologies to express parallelism in software.This model is not
specific on any particular type of machine or memory architecture,
In fact this models can be implemented on any underlying hardware
1.shared memory(without thread) :-Multiple processes share a
common address space but do not utilize threads.
2.thread :-multiple threads within single process share the same
memory space
3.distributed memory/ message passing :-each process is execute
on separate memory space and communicate through message
passing
4.data parallel :- tasks are divided and processing is on each task on
different portion of dataset.
5.hybrid :-combines multiple programming models, such as
distributed memory model and shared memory model
6.single program multiple data(SPMD) :-all processes execute same
program but operate on different data
7.Multiple Program Multiple Data(MPMD) :-different processes
execute on distinct program on different datasets

Data parallel Model


1.Data parallel is refer Partitioned Globally Address Space(PGAS).
2.Address space is treated as Globally.
3.In this model the task is divided in different portion or tasks and
then perform same operations on different data structure on
different portions.
4.Most of the work and operations are performed on data set.
5.The data set is typically organized into common structure such as
Array or Cube
6.Set of task are performed on same data.
7.On shared memory architecture, all tasks have access to data
structure through global memory.
8.On distributed memory architecture, the global data structure can
be split logically or physically across task.
Explain dig.

You might also like