0% found this document useful (0 votes)
110 views16 pages

CH 2 Vector Processing

This document discusses vector processing and the Cray-1 vector processor. It begins by defining vector processing as performing the same operation on multiple data elements simultaneously using a single processor. It then describes the classification of vector processors based on register-to-register and memory-to-memory architectures. The document then summarizes the key features of the pioneering Cray-1 supercomputer, including its innovative cylindrical design, high performance through vector processing, word length, memory capacity, and reliability. It concludes by outlining the architecture of the Cray-1, describing its computation, memory, and input/output sections.

Uploaded by

digvijay dhole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views16 pages

CH 2 Vector Processing

This document discusses vector processing and the Cray-1 vector processor. It begins by defining vector processing as performing the same operation on multiple data elements simultaneously using a single processor. It then describes the classification of vector processors based on register-to-register and memory-to-memory architectures. The document then summarizes the key features of the pioneering Cray-1 supercomputer, including its innovative cylindrical design, high performance through vector processing, word length, memory capacity, and reliability. It concludes by outlining the architecture of the Cray-1, describing its computation, memory, and input/output sections.

Uploaded by

digvijay dhole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Chapter 2 Vector Processing

Chapter 2 Vector Processors

Vector processing concepts, pipelined vector processors, Cray-1 type vector processor,
architecture of Cray-1, Characteristics of Cray-1, Instruction formats of the Cray-1
Characteristics of Vector Processing. Array processors, Introduction to Associative memory
processors, Interleaved Memory organization.

❖ Vector processing concepts


• Vector processing is a computer method that can process numerous data components at
once. It operates on every element of the entire vector in one operation, or in parallel,
to avoid the overhead of the processing loop. Yet simultaneous operations must be
independent of one another in order for vector processing to be effective.

• Vector processing vs. array and parallel processing


Arrays are groups of data elements that are kept in close proximity to one another in
memory. They’re frequently used to symbolize parallel-processable datasets, while the
term “vector processing” describes the simultaneous processing of many data units
using specialized technology. The distinction between array processing and vector
processing is that while vector processing uses a single processor to execute the same
operation on numerous data items concurrently, array processing uses several
processors to work on individual array elements.

The difference between parallel processing and vector processing is that parallel
processing involves multiple processors working on separate tasks simultaneously. In
contrast, vector processing involves a single processor performing the same operation
on multiple data elements simultaneously.

1|P r ep are d b y: Priy an ka Mo re


Chapter 2 Vector Processing

• Classification of Vector Processor


The classification of vector processor relies on the ability of vector formation as well
as the presence of vector instruction for processing. So, depending on these criteria,
vector processing is classified as follows:

i. Register to Register Architecture


- This architecture is highly used in vector computers. As in this architecture, the
fetching of the operand or previous results indirectly takes place through the main
memory by the use of registers.
- The several vector pipelines present in the vector computer help in retrieving the
data from the registers and also storing the results in the desired register. These
vector registers are user instruction programmable.
- This means that according to the register address present in the instruction, the data
is fetched and stored in the desired register. These vector registers hold fixed length
like the register length in a normal processing unit.
- Some examples of a supercomputer using the register to register architecture are
Cray – 1, Fujitsu etc.
ii. Memory to Memory Architecture
- Here in memory to memory architecture, the operands or the results are directly
fetched from the memory despite using registers. However, it is to be noted here
that the address of the desired data to be accessed must be present in the vector
instruction.
- This architecture enables the fetching of data of size 512 bits from memory to
pipeline. However, due to high memory access time, the pipelines of the vector
computer requires higher startup time, as higher time is required to initiate the
vector instruction.

2|P r ep are d b y: Priy an ka Mo re


Chapter 2 Vector Processing

- Some examples of supercomputers that possess memory to memory architecture are


Cyber 205, CDC etc.
• Advantages of Vector Processor
- Vector processor uses vector instructions by which code density of the instructions
can be improved.
- The sequential arrangement of data helps to handle the data by the hardware in a
better way.
- It offers a reduction in instruction bandwidth.

❖ Cray-1 type vector processor


The CRAY-1, designed by Seymour Cray, was one of the first supercomputers and a
significant milestone in the history of high-performance computing. It was introduced
in 1976 by Cray Research, Inc. Here are some key features and characteristics of the
CRAY-1:
i. Innovative Design: The CRAY-1 was known for its unique and iconic cylindrical shape,
which housed its processing components. This design allowed for efficient cooling and
short interconnection paths, enhancing its overall performance.
ii. High Performance: It was a vector processor supercomputer, capable of executing a
single instruction on multiple data elements, making it exceptionally powerful for
scientific and engineering calculations.
iii. Speed: The CRAY-1 was incredibly fast for its time, with a peak performance of around
160 megaflops (million floating-point operations per second). This speed made it a
preferred choice for tasks like weather modeling, nuclear simulations, and fluid
dynamics.
iv. Word Length: It had a 64-bit word length, which was considered quite advanced during
the 1970s and allowed for precise and efficient numerical computations.
v. Memory: The CRAY-1 had a maximum memory capacity of 8 megawords (64
megabytes), which was substantial for its era.
vi. Reliability: Seymour Cray's emphasis on simplicity in design contributed to the CRAY-
1's reliability. Its limited instruction set and straightforward architecture reduced the
likelihood of errors and system failures.

3|P r ep are d b y: Priy an ka Mo re


Chapter 2 Vector Processing

vii. Legacy: The CRAY-1 paved the way for future supercomputers and established
Seymour Cray as a pioneer in high-performance computing. Its success led to the
development of subsequent Cray supercomputers.

❖ architecture of Cray-1

- Above represents the basic organization of a CRAY-1 system.


- The central processor unit (CPU) is a single integrated processing unit consisting
of a computation section, a memory section, and an input/ output section.
- The memory is expandable from 0.25 million 64-bit words to a maximum of 1.0
million words.
- The 12 input channels and 12 output channels in the input/output section connect to
a maintenance control unit (MCU), a mass storage subsystem, and a variety of front-
end systems or peripheral equipment. The MCU provides for system initialization
and for monitoring system performance.

4|P r ep are d b y: Priy an ka Mo re


Chapter 2 Vector Processing

- The mass storage subsystem provides secondary storage and consists of one to eight
Cray Research DCU-2 Disk Controllers, each with one to four DD-19 Disk Storage
Units. Each DD-19 has a capacity of 2.424 x 109 bits so that a maximum mass
storage configuration could hold 9.7 x 109 8-bit characters.
- I/0 channels can be connected to independent processors referred to as front-end
computers or 1/0 stations or can be connected to peripheral equipment according to
the requirements of the individual installation.
- At least one front-end system is considered standard to collect data and present it to
the CRAY-1 for processing and to receive output from the CRAY-1 for distribution
to slower devices.
• Computation section
- The computation section contains instruction buffers, registers and functional units
which operate together to execute a program of instructions stored in memory.
- Arithmetic operations are either integer or floating point. Integer arithmetic is
performed in two's complement mode. Floating point quantities have signed-
magnitude representation.
- The CRAY-1 executes 128 operation codes as either 16-bit (one parcel) or 32-bit
(two-parcel) instructions. Operation codes provide for both scalar and vector
processing.
- Floating point instructions provide for addition, subtraction, multiplication, and
reciprocal approximation. The reciprocal approximation instruction allows for the
computation of a floating divide operation using a multiple instruction sequence.
- Integer or fixed point operations are provided as follows: integer addition, integer
subtraction, and integer multiplication. An integer multiply operation produces a
24-bit result; additions and subtractions produce either 24-bit or 64-bit results. No
integer divide instruction is provided and the operation is accomplished through a
software algorithm using floating point hardware.
- The instruction set includes Boolean operations for OR, AND, and exclusive OR
and for a mask-controlled merge operation. Shift operations allow the manipulation
of either 64-bit or 128-bit operands to produce 64-bit results. With the exception of
24-bit integer arithmetic, all operations are implemented in vector as well as scalar
instructions. The integer product is a scalar instruction designed for index
calculation. Full indexing capability allows the programmer to index throughout
memory in either scalar or vector modes. The index may be positive or negative in
5|P r ep are d b y: Priy an ka Mo re
Chapter 2 Vector Processing

either mode. This allows matrix operations in vector mode to be performed on rows
or the diagonal as well as conventional column-oriented operations.
- Each functional unit implements an algorithm or a portion of the instruction set.
Units are independent and are fully segmented. This means that a new set of
operands for unrelated computation may enter a functional unit each clock period.
• Memory section
- The memory for the CRAY-1 normally consists of 16 banks of bi-polar 1024- bit
LSI memory. Three memory size options are available: 262,144 words, 524,288
words, or 1,048,576 words. Each word is 72 bits long and consists of 64 data bits
and 8 check bits. The banks are independent of each other.
- Sequentially addressed words reside in sequential banks. The memory cycle time is
four clock periods (50 nsec). The access time, that is, the time required to fetch an
operand from memory to a scalar register is 11 clock periods (132.5 nsec). There is
no inherent memory degradation for 16-bank memories of less than one million
words.
- The maximum transfer rate for B, T, and V registers is one word per clock period.
For A and S registers, it is one word per two clock periods. Transfers of instructions
to the instruction buffers occur at a rate of 16 parcels (four words) per clock period.
Thus, the high speed of memory supports the requirements of scientific applications
while its low cycle time is well suited to random access applications. The phased
memory banks allow high communication rates through the I/0 section and provide
low read/store times for vector registers.
• IO Section
- Input and output communication with the CRAY-1 is over 12 full duplex 16-bit
channels. Associated with each channel are control lines that indicate the presence
of data on the channel (ready), data received (resume), or transfer complete
(disconnect).
- The channels are divided into four channel groups. A channel group consists of
either six input paths or six output paths. The four channel groups are scanned
sequentially for I/0 requests at a rate of one channel group per clock period. The
channel group will be reinterrogated four clock periods later whether any I/0 request
is pending in the channel or not. If more than one channel of the channel group is

6|P r ep are d b y: Priy an ka Mo re


Chapter 2 Vector Processing

active, the requests are resolved on a priority basis. The request from the lowest
numbered channel is serviced first.
• Vector Processing
- All operands processed by the CRAY-1 are held in registers prior to their being
processed by the functional units and are received by registers after processing. In
general, the sequence of operations is to load one or more vector registers from
memory and pass them to functional units. Results from this operation are received
by another vector register and may be processed additionally in another operation
or returned to memory if the results are to be retained.
- The contents of a V register are transferred to or from memory by specifying a first
word address in memory, an increment for the memory address, and a length. The
transfer proceeds beginning with the first element of the V register and
incrementing by one in the V register at a rate of up to one word per clock period
depending on memory conflicts.
- A result may be received by a V register and re-entered as an operand to another
vector computation in the same clock period. This mechanism allows for "chaining"
two or more vector operations together. Chain operation allows the CRAY-1 to
produce more than one result per clock period. Chain operation is detected
automatically by the CRAY-1 and is not explicitly specified by the programmer,
although the programmer may reorder certain code segments in order to enable
chain operation.

❖ Characteristics of Cray-1

7|P r ep are d b y: Priy an ka Mo re


Chapter 2 Vector Processing

❖ Instruction formats of the Cray-1

❖ Characteristics of Vector Processing


i. A vector is an ordered set of elements. A vector operand contains an ordered set of
n elements, where n is called the length of the vector. Each element in a vector is a
scalar quantity, which may be floating point number, an integer, a logical value, or
a character (byte).
ii. In vector processing, two successive pairs of elements are processed each clock
period. In dual vector pipes and dual sets of vector functional units allow two pairs
of elements to be processed during the same clock period. As each pair of operations
is completed, the results are delivered to the appropriate elements of the result
8|P r ep are d b y: Priy an ka Mo re
Chapter 2 Vector Processing

register. The operation continues until the number of elements processed is equal to
the count specified by the vector length register.
For example: C (1:50) = A (1:50) + B (1:50)
This vector instruction includes the initial addresses of the two source operands,
one destination operand, the length of the vectors and the operation to be performed.
iii. Vector instructions are classified into for basic types:
F1: V = V f2: V = S
F3: V * V = V f4: V*S = V
Where V indicates vector operand and S indicates scalar operand. The operations f1
and f2 are unary operations such as vector square root, vector sine, vector complement,
vector summation and so on. On the other hand, operations f3 and f4 are binary
operations such as vector add, vector multiply, vector scalar adds and so on.
iv. In vector processing, identical processes are repeatedly invoked many times, each
of which can be subdivided into subprocesses.
v. In vector processing, successive operands are fed through the pipeline segments and
require as few buffers and local controls as possible. This parallel vector processing
allows the generation of more than two results per clock period. The parallel vector
operations are automatically initiated either when successive vector instructions use
different functional units and different vector registers, or when successive vector
instructions use the result stream from one vector register as the operand of another
operation using different functional units. This process is known as chaining.
vi. Because of the startup delay in a pipeline, a vector processor performs better with
longer vectors.
vii. Vector processing is usually faster and more efficient than scalar processing
because it reduces the overhead associated with maintenance of the loop control
variables.

❖ Array processors

• A processor which is used to perform different computations on a huge array of data is


called an array processor. The other terms used for this processor are vector processors
or multiprocessors. This processor performs only single instruction at a time on an array
of data. These processors work with huge data sets to execute computations. So, they
are mainly used for enhancing the performance of computers.
9|P r ep are d b y: Priy an ka Mo re
Chapter 2 Vector Processing

• Array Processor Architecture


- The architecture of this processor is shown below.

- An array processor includes a number of ALUs (Arithmetic Logic Units) which


allows all the array elements to be processed together.
- Each ALU in the processor is provided with local memory which is known as a
Processing Element or PE.
- By using this processor, a single instruction is issued through a control unit & that
instruction is simply applied to a number of data sets simultaneously.
- By using a single instruction, a similar operation is performed on an array of data
which makes it suitable for vector computations.
• Working of Array Processor
- An array processor has an architecture mainly designed for processing arrays of
numbers. This processor architecture contains a number of processors that works
simultaneously, each handling one array element, so that a single operation is
applied to all the array elements in parallel.
- To get the same effect within a conventional processor, the operation should be
applied to every array element sequentially and much more slowly.
- This processor is a self-contained unit connected to the main computer through an
internal bus or an I/O port.
- This processor increases the overall speed of instruction processing. These
processors operate asynchronously from the host CPU to improve the overall
system capacity. This processor is a very powerful tool that handles troubles with a
high level of parallelism.
• Types of Array Processor
There are two types of array processor
i. Attached array Processor

10 | P r e p a r e d b y : P r i y a n k a M o r e
Chapter 2 Vector Processing

- The auxiliary processor like the attached array processor is shown below.

- This processor is simply connected to a computer for enhancing the performance of


a machine within numerical computational tasks.
- This processor is connected to the General-Purpose Computer through an I/O
interface and a local memory interface where both the memories like the main &
the local are connected.
- This processor achieves high performance through parallel processing by multiple
functional units.
ii. SIMD array Processor
- SIMD (‘Single Instruction and Multiple Data Stream’) processors is a computer
with several processing units which operate in parallel.
- These processing units perform the same operation in synchronizing under the
supervision of the common control unit (CCU).
- The SIMD processor includes a set of identical PEs (processing elements) where
each PES has a local memory.

11 | P r e p a r e d b y : P r i y a n k a M o r e
Chapter 2 Vector Processing

- This processor includes a master control unit and main memory. The master control
unit in the processor controls the operation of the processing elements. And also,
decodes the instruction & determines how the instruction is executed.
- So, if the instruction is program control or scalar then it is executed directly in the
master control unit. Main memory is mainly used to store the program while every
processing unit uses operands that are stored in its local memory.
• Applications of array processors
i. This processor is used in medical and astronomy applications.
ii. These are very helpful in speech improvement.
iii. These are used in sonar and radar systems.
iv. These are applicable in anti-jamming, seismic exploration & wireless
communication.
v. This processor is connected to a general-purpose computer to improve the
computer’s performance within arithmetic computational tasks. So it attains
high performance through parallel processing by several functional units.

❖ Introduction to Associative memory processors


- Refer PPT
-
❖ Interleaved Memory organization
• Memory Interleaving is less or More an Abstraction technique. It is a Technique that
divides memory into a number of modules such that Successive words in the address
space are placed in the Different modules.
• An instruction pipeline may demand both instructions and operands from the main
memory simultaneously, which is not conceivable with the traditional way of memory
access.
• On the other hand, an arithmetic pipeline necessitates the simultaneous fetching of two
operands from the main memory. To solve this problem, memory interleaving comes
in handy.
• It enables concurrent access to many memory modules. The modular memory approach
enables the CPU to commence memory access with one module while other modules
are engaged in reading or writing operations with the CPU.
• This figure shows non-interleaved memory organization

12 | P r e p a r e d b y : P r i y a n k a M o r e
Chapter 2 Vector Processing

• Following figure shows 2-way interleaved memory organization where memory is


divided in two logical blocks:

• Why do we use Memory Interleaving?


- Main memory (random-access memory, RAM) is often made up of a collection of
DRAM memory chips, with several chips grouped to form a memory bank. The
memory banks can then be laid out to interleave using a memory controller that
supports interleaving.
- In turn, interleaved memory addresses are assigned to each memory bank. In a
system with two interleaved memory banks (assuming word-addressable memory),

13 | P r e p a r e d b y : P r i y a n k a M o r e
Chapter 2 Vector Processing

for example, if logical address 32 belongs to bank 0, logical address 33 belongs to


bank 1, logical address 34 belongs to bank 0, and so on. When there are n banks
and a memory location ( i sits in bank i mod n.), The memory is said to be n-way
interleaved.
- For example, if we have four memory banks (four-way interleaved memory), each
with 256 bytes, the Block Oriented method (no interleaving) will allocate an
instruction pipeline that can request both instructions and operands from the main
memory at the same time, which is not possible with regular memory access.
- As a result, the CPU can access different parts without waiting for memory to be
cached. Several memory banks supply data in turn.
- Memory interleaving is a memory speed-up technique. It is a procedure that
improves the system's efficiency, speed, and dependability.
• Types of interleaved Memory
In an operating system, there are two types of interleaved memory, such as:
1. High order interleaving:
- In high order memory interleaving, the most significant bits of the memory address
decide memory banks where a particular location resides. But, in low order
interleaving the least significant bits of the memory address decides the memory
banks.
- The least significant bits are sent as addresses to each chip. One problem is that
consecutive addresses tend to be in the same chip. The maximum rate of data
transfer is limited by the memory cycle time. It is also known as Memory Banking.

2. Low order interleaving:

14 | P r e p a r e d b y : P r i y a n k a M o r e
Chapter 2 Vector Processing

- The least significant bits select the memory bank (module) in low-order
interleaving. In this, consecutive memory addresses are in different memory
modules, allowing memory access faster than the cycle time.

• Benefits of Interleaved Memory


i. An instruction pipeline may require instruction and operands both at the same time
from main memory, which is not possible in the traditional method of memory
access. Similarly, an arithmetic pipeline requires two operands to be fetched
simultaneously from the main memory. So, to overcome this problem, memory
interleaving comes to resolve this.
ii. It allows simultaneous access to different modules of memory. The modular
memory technique allows the CPU to initiate memory access with one module
while others are busy with the CPU in reading or write operations. So, we can say
interleave memory honors every memory request independent of the state of the
other modules.
iii. So, for this obvious reason, interleave memory makes a system more responsive
and faster than non-interleaving. Additionally, with simultaneous memory access,
the CPU processing time also decreases and increasing throughput. Interleave
memory is useful in the system with pipelining and vector processing.
iv. In an interleaved memory, consecutive memory addresses are spread across
different memory modules. Say, in a byte-addressable 4 way interleave memory, if
byte 0 is in the first module, then byte 1 will be in the 2nd module, byte 2 will be in
the 3rd module, byte 3 will be in the 4th module, and again byte 4 will fall in the
first module, and this goes on.
v. An n-way interleaved memory where main memory is divided into n-banks and
system can access n operands/instruction simultaneously from n different memory

15 | P r e p a r e d b y : P r i y a n k a M o r e
Chapter 2 Vector Processing

banks. This kind of memory access can reduce the memory access time by a factor
close to the number of memory banks. In this memory interleaving memory
location, i can be found in bank i mod n.

16 | P r e p a r e d b y : P r i y a n k a M o r e

You might also like