6105 Computer Architecture Final
6105 Computer Architecture Final
a. Data sequencing.
b. CPU’s
c. Instruction execution
d. Printer
Ans: b
2. The beginning of the architecture of the Itanium processor took place at ___.
a. Intel
b. Microsoft
c. Hewlett-Packard
d. Dell
Ans: c
3. Ease-of-use and extensive graphic capabilities are the important characteristics of ___.
a. Servers
b. Desktop computers
c. Minicomputers
d. Micro-computers
Ans: b
a. Memory addressing
b. Operation field
c. Address field
d. Addressing mode
Ans: a
5. In dynamic scheduling, the hardware ___ the instruction execution to reduce stalling of the
pipeline.
a. Rearranges
b. Bypasses
c. Forwards
d. Unhide
Ans: a
a. ROM
b. EROM
c. RAM
Ans: c
a. Register
b. Hard disk
c. RAM
Ans: a
b. Exponential
c. Square root
Ans: d
a. Decimal
b. Octal
c. BCD
d. Hexadecimal
Ans: c
10. Which of the following memory of the computer is used to speed up the computer processing?
a. Cache memory
b. RAM
c. ROM
Ans: a
a. Multidirectional
b. Bidirectional
c. Unidirectional
Ans: c
12. Which of the following circuit is used to store one bit of data?
a. Flip Flop
b. Decoder
d. Register
Ans: a
13. Which of the following is a way in which the components of a computer are connected to each
other?
a. Computer parts
b. Computer architecture
c. Computer hardware
Ans : b
14. Which of the following circuit convert the binary data into a decimal?
a. Decoder
b. Encoder
c. Code converter
d. Multiplexer
Ans: c
a. Logical address
b. Physical address
c. Memory address
Ans: b
a. 1's complement
b. 2's complement
c. 3's complement
Ans: b
17. Which of the following computer bus connects the CPU to a memory on the system board?
a. Expansion bus
b. Width bus
c. System bus
Ans: c
18. Which of the following register can interact with the secondary storage?
a. PC
b. MAR
c. MDR
d. IR
Ans: b
19. In which of the following form the computer stores its data in memory?
a. Hexadecimal form
b. Octal form
c. Binary form
d. Decimal form
Ans: c
Ans: b
a. Instruction Pointer
b. Data Counter
c. Memory pointer
Ans: a
22. Which of the following register keeps track of the instructions stored in the program stored in
memory?
a. Accumulator
b. Address Register
c. Program Counter
d. Index Register
Ans: c
23. Which of the following building block can be used to implement any combinational logic circuit?
a. AND
b. OR
c. NAND
Ans: d
24. Which of the following is the circuit board on which chips and processor are placed?
a. Master circuit
b. Motherboard
c. Big board
Ans: b
Ans: a
26. In which of the following term the performance of cache memory is measured?
a. Chat ratio
b. Hit ratio
c. Copy ratio
d. Data ratio
Ans: b
Ans: a
a. MMA
b. DMA
c. CAD
d. CAM
Ans: b
29. The ___________ model refers to both the style and method of problem description.
31. ___________ execution is characterised by the rule that an operation is activated as soon as all
the needed input data is available.
Ans: Data-driven
32. _________ was the first mechanical device, invented by Blaise Pascal.
Ans: Pascaline
33. ___________ was a new version of the EDVAC, which was built by von Neumann.
34. The fourth generation of computers was marked by use of Integrated Circuits (ICs) in place of
transistors. (True/ False)
Ans: False
Ans: True
36. All threads of a process share its virtual address space and system resources. (True/ False)
Ans: True
37. When the scheduler selects a process for execution, its state is changed from ready-to-run to
the wait state. (True/ False)
Ans: False
39. During selection, the ranks of all competing clients are computed and the client with the highest
rank is scheduled for service. (True/ False)
Ans: True
40. In ___________ all processing units execute the same instruction at any given clock cycle.
41. In which system a single data stream is fed into multiple processing units?
43. Parallel computers offer the potential to concentrate computational resources on important
computational problems. (True/ False)
Ans: True
Ans: False
46. Parallelism at the instruction level is also called middle-grained parallelism. (True/ False)
Ans: False
47. Data parallelism is regular, whereas functional parallelism, with the execution of loop-level
parallelism, is usually irregular. (True/ False)
Ans: True
48. The ____________ had the ability to integrate the functions of a computer’s Central Processing
Unit (CPU) on a single-integrated circuit.
Ans: Microprocessor
49. _______________ computers used to support typical applications like business data support
and large-scale scientific computing.
Ans: Main-frame
Ans: True
Ans: 16
55. The designer should never plan for the technology changes that would lead to the success of
the computer. (True/False)
Ans: False
58. The ability of the servers to expand its processors and disks is known as
____________________.
Ans: Scalability
Ans: Pipelining
60. ____________________ declares that the item referred in the recent times has potential to be
accessed in the near future.
61. ____________________ states that the items nearby the location of the recently used items
may also be referred close together in time.
63. ____________________ is the product of the transistor switching and the switching rate.
65. The bits of the instruction are divided into groups called ___________.
Ans: Fields
66. ______________ use an implied accumulator (AC) register for all data manipulation.
67. Selection of operands during program execution does not depend on the addressing mode of
the instruction. (True/ False)
Ans: False
68. Hardware-accessible units of memory larger than one cell are called words. (True/ False)
Ans: True
71. The _______________ is a special CPU register that contains an index value.
72. In an improved instruction execution cycle, we can introduce a third cycle known as the _____.
a. MAR
b. MBR
74. When processed in the CPU, the instructions are fetched from ______________ locations and
implemented.
75. The ______________ and ______________ are identical in their use but sometimes they are
used to denote different addressing modes.
76. One of the key features of the MIPS architecture is the ____________.
77. Two separate 32-bit registers called ____________ and ___________ are provided for the
integer multiplication and division instructions.
Ans: HI, LO
78. An implementation technique by which the execution of multiple instructions can be overlapped
is called _________.
Ans: Pipelining
82. ____________ pipelines perform only one pre-defined fixed functions at specific times in a
forward direction from one stage to next stage.
Ans: Linear
83. ________________ pipelines can perform more than one operation at a time as they have the
provision to be reconfigured to execute variable functions at different times.
Ans: Non-Linear
85. _______________ are the situations that stop the next instruction in the instruction stream from
being executed during its designated clock cycle.
89. Pipelining has a major effect on changing the relative timing of instructions by overlapping their
execution. (True/False)
Ans: True
Ans: 6
91. __________ cause a greater performance failure for a pipeline than _____________________.
92. If the PC is changed by the branch to its target address, then it is known as ______________
branch; else it is known as ___________.
93. The problem posed due to data hazards can be solved with a simple hardware technique called
___________________ .
Ans: Forwarding
95. ______________ is the method of holding or deleting any instructions after the branch until the
branch destination is known.
96. __________________ technique simply allows the hardware to continue as if the branch were
not executed.
98. _____________ instruction takes a source label and stores its address into the destination
register.
Ans: LD
99. ____________ stores the source register’s value plus an immediate value offset and stores it in
the destination register.
Ans: LDR
100. A ___________ hazard causes the pipeline performance to degrade the ideal performance.
Ans: Stall
Ans: Dynamically
104. In ________________ the result into the register file is written or stored into the memory.
105. In Decode Instruction/Register Fetch operation, the _______________ and the __________
are determined and the register file is accessed to read the registers.
106. While processing operates instructions, RISC pipelines have to cope only with ____________.
Ans: True
110. In traditional pipeline implementations, load and store instructions are processed by the
____________________.
111. The consistency of instruction completion with that of sequential instruction execution is
specified b _______________.
112. Reordering of memory accesses is not allowed by the processor which endorses weak
memory consistency does not allow (True/False).
Ans: False
Ans: Renaming
114. In reservation stations, the instruction issue does not follow the FIFO order. (True/ False).
Ans: True
116. To commence with the execution of the SUBD, we need to separate the issue method into 2
parts: firstly ___________ and secondly ___________.
Ans: Check the structural hazards, wait for the absence of a data hazards
118. The ________________ stage follows the read operands stage similar to the DLX pipeline.
Ans: EX
119. When the pipeline executes ____________________ before ADDD, it violates the
interdependence between instructions leading to wrong execution.
121. The source operand for ADDD is ____________, and is similar to destination register of
SUBD.
Ans: F8
Ans: Busy
124. The ________________ could hold 3 operations for the FP adder and 2 for the FP multiplier.
125. The ____________ and ______________ are used to store the data/ addresses that come
from or go to memory.
Ans: ID
127. The ______ field helps check the addresses of the known branch instructions.
Ans: Lookup
129. ____________ makes use of dynamic data dependences to select when to carry out
instructions.
Ans: false
135. ________ is a flow altering instruction that is required to be handled in a special manner in
pipelined processors.
Ans: Branch
136. Wasteful work done by pipeline for a considerable time is called the _______________.
137. A number of processors such as _________ and __________ make use of delayed execution
for procedure calls as well as branching.
138. If any valuable instruction cannot be moved into delay slot, ___________ is placed.
140. In case of Fixed Branch Prediction, the branch is presumed to be either _________ or
________________.
141. Static strategy makes use of _______________ for predicting whether the branch is taken.
148. A ___________ memory is an intermediate memory between two memories having large
difference between their speeds of operation.
Ans: Cache
149. If the processor detects a word in cache, while referring that word in main memory is known to
produce a _________________.
Ans: Hit
150. When both instructions and data are stored into the same cache, the design is called the
______________ cache design.
Ans: Unified
152. The translation of main memory address to the cache memory address is known as ________.
Ans: Mapping
Ans: Associative
Ans: Split
157. Average memory access time = Hit time + Miss rate x _____________.
Ans: m, 1
163. __________________ are assigned in the high-order interleaved memory in each memory
module.
Ans: Consistency
165. The two categories of consistency models are ______ and _________
167. They also take advantage of ____________ in huge scientific as well as multimedia
applications.
168. _______ is a modern shared-memory multiprocessor version of the CDC Cyber 205 _______.
Ans: ETA-10
169. The memory-memory vector processors can prove to be much efficient in case the vectors are
sufficiently long. (True/False)
Ans: True
170. The scalar registers are linked to the functional units with the help of a pair of _____________.
Ans: Crossbars
Ans: False
173. The instance of larger vectors is dealt with by a method called _______________.
175. List two factors which enable a program to run successfully in vector mode.
176. There does not exist any variation in the capability of compilers to decide if a loop can be
vectorised. (True/False)
Ans: False
Ans: Instructions
Ans: True
182. Flynn classified computing architectures into SISD, MISD, SIMD and ___________________ .
183. SIMD is known as ____________ because its basic unit of organisation is the vector.
184. Superscalar SISD machines use one property of the instruction stream by the name of
___________.
187. The two parallel computers manufacturers of coarse-grained architecture are ____________ .
Ans: Cm1
Ans: Cm5
192. ________ operations get the scalar inputs present in scalar registers.
Ans: Vector
193. _____________ specifies the count of instructions completed per unit time.
195. Linear pipelines perform only one pre-defined fixed functions at specific times in a forward
direction. (True/False)
Ans: True
Ans: MIMD
197. In UMA machines, each memory word cannot be read as quickly as other memory word.
(True/ False).
Ans: False
199. ___________ are unavoidable in scalable parallel systems which use some form of distributed
memory.
Ans: False
211. _______________ was a key problem in the data-parallel architectures since they aimed at
massively parallel systems.
222. Early multiprocessors applied __________ which becomes a bottleneck in scalable parallel
computers.
223. ____________ signifies those systems attributes which are visible to a developer.
224. _______________ signifies operational units in addition to the connection between them that
recognise the specification of architecture.
228. To read data from ______________ , a laser beam is directed on the surface of a spinning
disk.
Ans: CD-ROM
231. A ____________ is issued to test various status conditions in the interface and the peripheral.
234. ______________ refers to consistent reporting when information is lost because of failure.
Ans: Throughput
240. The multithreading system has a concept of priorities and therefore, is also called
___________ or ___________.
241. It takes much more time to create a new thread in an existing process than to create a brand-
new process. (True/False)
Ans: False
242. The number of switches is proportional to the number of remote reads. (True/ False)
Ans: True
Ans: Scalable
245. When an application is run, each of the processes contains at least one _________________ .
Ans: Thread
246. The combination of languages and the computer architecture in a common foundation or
paradigm is called ____________.
247. Computational model uses mathematical language to describe a system (True/ False)
Ans: True
248. The instruction set together with the resources needed for their execution is called the ______.
249. The memory is a collection of storage cells, each of which can be in one of two different states
(True/False).
Ans: True
251. Once a node is activated and the nodal operation is performed, this is called ____________ .
252. ___________ combined dataflow ideas with sequential thread execution to define a hybrid
computation model.
Ans: Iannucci
253. P-RISC explores the possibility of constructing a multithreaded architecture around a CISC
processor. (True/ False)
Ans: False
Ans: True
Ans:
process: A process has an executable code, a virtual address space, open handles to system
objects, a unique process identifier, a security context, minimum and maximum working set sizes,
Thread: A thread is the entity within a process that can be scheduled for execution. All threads of a
process share its system resources and virtual address space. Additionally, each thread maintains
exception handlers, thread local storage, a scheduling priority, a unique thread identifier, and a set
of structures the system will utilise to save the thread context until it is scheduled. The thread
context includes the thread's set of machine registers, a thread environment block, the kernel stack
and a user stack in the address space of the thread's process. Threads can also have their own
security context, which is valuable in impersonating clients.
Ans: A parallel computer is a set of processors that are able to work cooperatively to solve a
computational problem. This definition broadly includes parallel supercomputers that have more
than hundreds of processors, networks of workstations, embedded systems and multiple processor
workstations. The following are the various types of parallelism:
Bit-level parallelism:
Bit-level parallelism is a form of parallel computing based on increasing processor word size. From
the advent of very-largescale integration (VLSI) computer chip fabrication technology in the 1970s
until about 1986, advancements in computer architecture were conducted by increasing bit-level
parallelism
Instruction-level parallelism:
Data parallelism:
Data parallelism is parallelism inbuilt in program loops. It centres at allocating the data across
various computing nodes to be processed in parallel. Parallelising loops recurrently leads to related
(not necessarily identical) operation sequences or functions being performed on elements of a large
data structure. Many scientific and engineering applications display data parallelism.
Ans:
Q4. What is ISA ( Instruction Set Architecture)? Explain classified into the following
categories.
Ans: The Instruction Set Architecture (ISA) is the part of the processor that is visible to the
programmer or compiler writer. The ISA acts as the boundary between software and hardware. It
includes the native data types, instructions, registers, addressing modes, memory architecture,
interrupt and exception handling, and external I/O.
1. Complex instruction set computer (CISC) – It consists of a variety expert instructions and may
just not be frequently used in practical programs.
2. Reduced instruction set computer (RISC) – This executes only the instructions that are
commonly used in programs and thus, makes the processor simpler. The extraordinary operations
are executed as subroutines, where the extra processor execution time is offset by their infrequent
use.
3. Very long instruction word (VLIW) – In this, the processor receives many instructions encoded
and retrieved in one instruction word.
Q5. Difference between CISC (Complex instruction set computer) and RISC (Reduced
instruction set computer) ?
Ans:
RISC CISC
It is a Reduced Instruction Set Computer. It is a Complex Instruction Set Computer.
It emphasizes on software to optimize the It emphasizes on hardware to optimize the
instruction set. instruction set.
It is a hard wired unit of programming in the RISC Microprogramming unit in CISC Processor.
Ans:
Desktop computing
Desktop computers have the largest market in terms of costs. It varies from low-end systems to
very high-end heavily configured computer systems. Throughout this range the cost and the
competence also varies in terms of performance. This blend of the performance and the price
concerns most to the customers in the market and thus, to the computer designers. Consequently,
the latest, the highest-performance microprocessors and cost-reduced microprocessors are largely
sold in the category of the desktop systems.
Servers
The existence and popularity of servers emerged with the evolution of the desktop computers. The
role of servers expanded to provide more reliable usage, storage and access of data. It helped the
users provide with large scale computing services. The web-based services accelerated this trend
of servers tremendously. These servers are successful in replacing the traditional main-frame
computers and have become the backbone of large enterprises to help the users with high memory
storage.
Embedded computers
a) Linear pipelines: These pipelines perform only one pre-defined fixed functions at specific times
in a forward direction from one stage to next stage. A linear pipeline can be visualised as a
collection of processing segments, where each segment completes a part of an instruction. The
result obtained from the processing in each segment is transferred to the next segment in the
pipeline. As in these pipelines, repeated evaluations of the same function are performed with
different data for some specified period of time, these pipelines are also called static pipelines.
b) Non-linear pipelines: These pipelines can perform more than one operation at a time as they
have the provision to be reconfigured to execute variable functions at different times. As these
pipelines can execute different functions at different times, they are called dynamic pipelines.
Implied Mode: The operands in this mode are specified implicitly in the explanation of the
instruction. For example, the instruction ‘‘complement accumulator’’ is considered as an implied
mode instruction as the description of the instruction implies the operand in the accumulator
register. In fact, all register references instructions that use an accumulator are implied mode
instructions. Zero-address introductions are implied mode instructions.
Immediate Mode: The operand in this mode is stated in the instruction itself, i.e. there is an
operand field rather than an address field in the immediate mode instruction. The operand field
contains the actual operand to be used in union with the operation specific in the instruction
Register Mode: In this mode, the operands are in registers that reside within the CPU. The register
required is chosen from a register field in the instruction.
Register Indirect Mode: In this mode, the instruction specifies a register in the CPU that contains
the address of the operand and not the operand itself. Usage of register indirect mode instruction
necessitates the placing of memory address of the operand in the processor register with a
previous instruction.
Auto-increment or Auto-decrement Mode: After execution of every instruction from the data in
memory it is necessary to increment or decrement the register. This is done by using the increment
or decrement instruction. Given upon its sheer necessity, some computers use special mode that
increments or decrements the content of the registers automatically
Indirect Addressing Mode: Unlike direct address mode, in this mode, the address field gives the
address where the effective address is stored in memory. The instruction from memory is fetched
through control to read the address part in order to access memory again to read the effective
address
Relative Address Mode: This mode is applied often with branch type instruction where the branch
address position is relative to the instruction word address. As such in this mode, the program
counter contents are added to the address element of the instruction so as to acquire the effectual
address whose location in memory is relative to the address of the following instruction. Since the
relative address can be specified with the smaller number of bits than those required to design the
entire memory address, it results in a shorter address field in the instruction format.
Ans:
a) Instruction fetch cycle (IF): In the first step, the address of the instruction to be fetched is taken
from memory into Instruction Register (IR) and is stored in PC register.
b) Instruction decode fetch cycle (ID): The fetched instruction is decoded and instruction is send
into two temporary registers. Decoding and reading of registers is done in parallel.
c) Instruction execution cycle (EX): In this cycle, the result is written into the register file.
d) Memory access completion cycle (MEM): In this cycle, the address of the operand calculated
during the prior cycle is used to access memory. In case of load and store instructions, either data
returns from memory and is placed in the Load Memory Data (LMD) register or is written into
e) Register write back cycle (WB): During this stage, both single cycle and two cycle instructions
write their results into the register file.
Ans:
Pipelining Hazards
Hazards are the situations that stop the next instruction in the instruction stream from being
executed during its designated clock cycle. Hazards reduce the performance from the ideal
speedup gained by pipelining. In general, there are three major categories of hazards that can
affect normal operation of a pipeline.
1. Structural hazards (also called resource conflicts): They occur from resource conflicts
when the hardware cannot support all possible combinations of instructions in simultaneous
overlapped execution. These are caused by multiple accesses to memory performed by
segments. In most cases this problem can be resolved by using separate instruction and
data memories.
2. Data hazards (also called data dependency): They occur when an instruction depends on
the result of a previous instruction in a way that is exposed by the overlapping of instructions
in the pipeline. This happens arise when an instruction requires the previous output and
output is not yet present.
3. Control hazards (also called branch difficulties): Branch difficulties arise from branch
and other instructions that change the content of PC (Program Counter)
Ans: A dynamic scheduling is the hardware based scheduling. In this approach, the hardware
rearranges the instruction execution to reduce the stalls. Dynamic scheduling reduces the stalls and
simultaneously maintains the data flow & exceptions in the instruction execution.
1. Dynamic scheduling is helpful in situations where the data dependencies between the
instructions are not known during the time of compilation.
3. It permits code compiled by one pipeline in mind to execute efficiently on some other pipeline.
Unconditional Branch
This type of branch is considered as the simplest one. It is used to transfer control to a particular
target. Let us discuss an example as follows: branch target Target address specification can be
performed in any of the following ways:
absolute address or
PC-relative address
Conditional Branch
Here if a particular condition meets its requirements, then only the jump is conducted. For instance,
a branch may be needed when two values are equal. These types of conditional branches can be
managed in any of the following fundamental ways:
Set-then-Jump: This design separate the testing for condition as well as branching. A condition
code register is used for attaining communication among the instructions for condition as well as
branching.
Test-and-Jump: Many of the processors merge the testing as well as branching into a particular
instruction. MIPS processor is used to demonstrate the rule included in this approach.
Ans:
Branch Processing helps in instruction execution. It receives branch instructions and resolves the
conditional branches as early as possible. For resolving it uses static and dynamic branch
prediction. Effective processing of branches has become a cornerstone of increased performance in
ILPprocessors. No wonder, therefore, that in the pursuit of more performance, predominantly in the
past few years, computer architects have developed a confusing variety of branch processing
schemes.
Ans:
A cache memory is an intermediate memory between two memories having large difference
between their speeds of operation. Cache memory is located between main memory and CPU. It
may also be inserted between CPU and RAM to hold the most frequently used data and
instructions. Communicating with devices with a cache memory in between enhances the
performance of a system significantly. Locality of reference is a common observation that at a
particular time interval, references to memory acts limited for some localised memory areas. Its
Ans:
Associative memories are very costly as compared to RAM due to the additional logic association
with all cells. Generally there are 2j words in main memory and 2k words in cache memory. The j-bit
memory address is separated by 2 fields. k bits are used for index field. j-k bits are long-fields. The
direct mapping cache organization utilizes k-bit indexes to access the cache memory and j-bit
address for main memory. Cache words contain data and related tags. Every memory block is
assigned to a particular line of cache in direct mapping. But if a line already contains memory block
when new block is to be added then the old memory block is removed.
1. Vector processors take advantage of data parallelism in huge scientific as well as multimedia
applications.
2. The moment a vector instruction begins functioning, just the register buses and the functional
unit feeding it require to be powered. Power can be turned off for Fetch unit, decode unit, Re-order
Buffer (ROB) etc. This leads to reduction in power usage.
3. Vector processors are able to function on one whole vector in a single instruction. Therefore,
vector processors lessen the fetch and decode bandwidth because of less number of instructions
fetched.
4. In vector processing, the size of programs is small, because it needs fewer numbers of
instructions.
5. Vector memory access does not cause any wastage just like cache access. Each data item
requested by the processor is utilised in actual terms.
6. Vector instructions also don’t reveal a lot of branches by implementing a loop in a single
instruction.
Ans: The major components of the vector unit of a vector register machine are as given below:
1. Vector registers: There are many vector registers that can perform different vector operations in
an overlapped manner. Every vector register is a fixed-length bank that consists of one vector with
multiple elements and each element is 64-bit in length. There are also many read and write ports. A
pair of crossbars connects these ports to the inputs/ outputs of functional unit.
3. Vector functional units: These units are generally floating-point units that are completely
pipelined. They are able to initiate a new operation on each clock cycle. They comprise all operation
units that are utilised by the vector instructions.
4. Vector load and store unit: This unit can also be pipelined and perform an overlapped but
independent transfer to or from the vector registers.
5. Control unit: This unit decodes and coordinates among functional units. It can detect data
hazards as well as structural hazards. Data hazards are the conflicts in register accesses while
functional hazards are the conflicts in functional units.
Ans:
The various types of vector instructions for a register-register vector processor are:
(a) Vector-scalar instructions:Using these instructions, a scalar operand can be combined with a
vector one. If A and B are vector registers and f is a function that performs some operation on each
element of a single or two vector operands, a vector-scalar operand can be defined as follows: Ai: =
f (scalar, Bi)
(b) Vector-vector instructions: Using these instructions, one or two vector operands are fetched
from respective vector registers and produce results in another vector register. If A, B, and C are
three vector registers, a vector-vector operand can be defined as follows: Ai: = f (Bi, Ci)
(c) Vector-memory instructions: These instructions correspond to vector load or vector store. The
vector load can be defined as follows: A: = f (M) where M is a memory register The vector store can
be defined as follows: M: = f (A)
(e) Masking instructions: These instructions use a mask vector to expand or compress a vector
as defined below: V = f (A x VM) where V is a mask vector
(f) Vector reduction instructions: These instructions accept one or two vectors as input and
produce a scalar as output.
Ans:
Vector Length
we have not stated anything about the real vector size. We just supposed that the size of the vector
register is similar to the size of the vector we hold. But this may not turn out to be always true.
Particularly, we have two cases in our hands:
One in which the vector size is less than the vector register size, and
The second in which the vector size is larger than the vector register size.
To be more concrete, we assume 64-element vector registers as offered by the Cray systems. Let’s
observe the easier of these two problems.
In case the vector size is less than 64, we have to permit the system to be aware that it should not
function on all the 64 elements in the vector registers. This can be simply done by utilising the
vector length register. The Vector Length (VL) register carries the appropriate vector length. The
entire vector operations are conducted on the first VL elements (in other words, elements in the
series 0 to VL - 1).
Smaller vector sizes can be handled by the VL register, but this does not apply to vectors of larger
sizes. For instance, we possess 200-element vectors (i.e., N = 200), in which way the vector
instructions can be used to total two such vectors? The instance of larger vectors is handled by a
method called strip mining.
Vector Stride
We have to know the way in which elements are stored in memory in order to understand vector
stride. Let’s first observe vectors. Because vectors are one-dimensional groups, saving a vector in
memory is considerably easy: vector elements are saved as sequential words in memory. In case,
we wish to fetch 40 elements, 40 contiguous words from memory have to be read. Such elements
Q20. What do you understand by pipelining? State steps in which instructions are executed
in pipelining.
Ans:
Pipelining is also called virtual parallelism as it provides an essence of parallelism only at the
instruction level. In pipelining, the CPU executes each instruction in a series of following small
common steps:
1. Instruction fetching
2. Instruction decoding
4. Instruction execution
Consider a pipeline with five processing units, where each unit is assumed to take 1 cycle to finish
its execution as described in the following steps:
a) Instruction fetch cycle: In the first step, the address of the instruction to be fetched from
memory into Instruction Register (IR) is stored in PC register.
b) Instruction decode fetch cycle: The instruction thus fetched is decoded and register is read
into two temporary registers. Decoding and reading of registers is done in parallel.
d) Memory access completion cycle: In this cycle, the address of the operand calculated during
the prior cycle is used to access memory. In case of load and store instructions, either data returns
from memory and is placed in the Load Memory Data (LMD) register or is written into memory. In
case of branch instruction, the PC is replaced with the branch destination address in the ALU output
register.
e) Instruction execution cycle: In the last cycle, the result is written into the register file.
Ans:
Multiprocessor
Multiprocessor are systems with multiple CPUs, which are capable of independently executing
different tasks in parallel. They have the following main features:
They also share resources for example I/O devices, system utilities, program libraries, and
databases.
They are operated on integrated operating system that gives interaction among processors and
their programs at the task, files, job levels and also in data element level.
Multi-computer
A multi-computer consists of numerous von Neumann computers that are associated with
interconnection network. For accessing the local memory and sending/receiving messages on
network, every computer on the network will executes there programs. Typically, both memory and
I/O is distributed among the processors. So, each individual processor-memory I/O module in a
multi-computer forms a node and is essentially a separate stand-alone autonomous computer.
Multi-computer is actually a group of MIMD computers with physically distributed memory
Ans:
UMA (Uniform Memory Access): In this category every processor and memory module has
similar access time. Hence each memory word can be read as quickly as other memory word. If not
then quick references are slowed down to match the slow ones, so that programmers cannot find
NUMA (Non Uniform Memory Access): They are intended for avoiding the memory access
disadvantage of Uniform Memory Access machines. The logically shared memory is spread
between all the processing nodes of NUMA machines, giving rise to distributed shared memory
architectures. NUMA machines programming depends on the global address space (shared
memory) principle while distributed memory multi-computers programming depends on the
message-passing paradigm.
COMA (Cache Only Memory Access): You can say that COMA machine act as non-uniform but
differently. It also avoids the effects of static memory allocation of NUMA and Cache Coherent Non-
Uniform Memory Architecture (CC-NUMA) machines. This is done by doing two activities:
excluding main memory blocks from the local memory of nodes Cache memory is present in the
above architectures. Main memory does not exist nor in the form of NUMA and CC-NUMA
distributed memory neither in the form of UMA’s central shared memory,
Ans:
Multithreading
Multithreading is the capability of a processor to do multiple things at one time. The Windows
operating system uses the API (Application Programming Interface) calls to manipulate threads in
multithreaded applications.
Multithreading is needed to create an application that is able to perform more than one task at once.
For example, all GUI (Graphical User Interface) programs can perform more than one task (such as
editing a document as well as printing it) at a time. A multithreading system can perform the
following tasks:
Ans:
Dataflow Architecture In a traditional computer design, the processor executes instructions, which
are stored in memory in particular sequences. In each processor, the instruction executions are in
serial order and therefore are slow. There are four possible ways of executing instructions:
1. Control-flow Method: In this mechanism, an instruction is executed when the previous one in a
defined sequence has been executed. This is the traditional way.
2. Demand-driven Method: In this mechanism, an instruction is executed when the results of the
instruction are required by other instruction
4. Dataflow Method: In dataflow method, an instruction is executed when the operands required
become available
Ans: "DMA, which stands for Direct Memory Access, is a feature of computer systems that allows
an input/output device to receive or send data directly from or to the main memory, bypassing the
CPU to boost memory operations. The process is performed by a chip known as the DMA
controller."