SIMD Architecture
SIMD Architecture
SIMD Architecture
SIMD
ARCHITECTURE
Submitted By:Nancy Mahajan Roll No.-RB1801A22 Reg. No.-10809333 B.Tech. Cse
Submitted to:-
ACKNOWLEDGEMENT
First n foremost I want to thanks my electrical sciences Teacher "Mr. Vijay Garg" for giving me a term paper on SIMD Architecture. Such term papers enhance our capabilities, mental ability and keep us up-to-date about the related topic and subject. Secondly I would like to thanks the whole library faculty of "Lovely Professional University" for helping me preparing the project. In the last I would like to thanks my parents for cooperating and helping in everyway throughout the project.
TABLE OF CONTENTS
S.No. 1 2 3
TOPIC Introduction SIMD Operations Types Of SIMD Architecture advantages Disadvantages Bibliography
4 5 6
11-13 13-14 15
Introduction
SIMD(Single-Instruction Stream Multiple-Data Stream) architectures are essential in the parallel world of computers. Their ability to manipulate large vectors and matrices in minimal time has created a phenomenal demand in such areas as weather data and cancer radiation research. The power behind this type of architecture can be seen when the number of processor elements is equivalent to the size of your vector. In this situation, componentwise addition and multiplication of vector elements can be done simultaneously. Even when the size of the vector is larger than the number of processors elements available, the speedup, compared to a sequential algorithm, is immense.
SIMD ARCHITECTURE
The processor array is a set of identical synchronized processing elements capable of simultaneously performing the same operation on different data. Each processor in the array has a small amount of local memory where the distributed data resides while it is being processed in parallel. The processor array is connected to the memory bus of the front end so that the front end can randomly access the local
processor memories as if it were another memory. Thus, the front end can issue special commands that cause parts of the memory to be operated on simultaneously or cause data to move around in the memory. A program can be eveloped and executed on the front end using a traditional serial programming language. The application program is executed by the front end in the usual serial way, but issues commands to the processor array to carry out SIMD operations in parallel. The similarity between serial and data parallel programming is one of the strong points of data parallelism. Synchronization is made irrelevant by the lockstep synchronization of the processors. Processors either do nothing or exactly the same operations at the same time. In SIMD architecture, parallelism is exploited by applying simultaneous operations across large sets of data. This paradigm is most useful for solving problems that have lots of data that need to be updated on a wholesale basis. It is especially powerful in many regular numerical calculations.
There are two main configurations that have been used in SIMD machines . In the first scheme, each processor has its own local memory. Processors can communicate with each other through the interconnection network. If the interconnection network does not provide direct connection between a given pair of processors, then this pair can exchange data via an intermediate processor. The ILLIAC IV used such an interconnection scheme. The interconnection network in the ILLIAC IV allowed each processor to communicate directly with four neighboring processors in an 8 _ 8 matrix pattern such that the i th processor can communicate directly with the (i 2 1)th, (i 1)th, (i 2 8)th, and (i 8)th processors. In the second SIMD scheme, processors and memory modules communicate with each other via the interconnection network. Two processors can transfer data between each other via intermediate memory module(s) or possibly via intermediate processor(s). The BSP (Burroughs Scientific Processor) used the second SIMD scheme.
SIMD operations
The basic unit of SIMD love is the vector, which is why SIMD computing is also known as vector processing. A vector is nothing more than a row of individual numbers, or scalars.
A regular CPU operates on scalars, one at a time.(A superscalar CPU operates on multiple scalars at once, but it performs a different operation on each instruction.) A vector processor, on the other hand, lines up a whole row of these scalars, all of the same type, and operates on them as a unit. These vectors are represented in what is called packed data format.Data are grouped into bytes (8 bits) or words (16 bits), and packed into a vector to be operated on. One of the biggest issues in designing a SIMD implementation is how many data elements will it be able to operate on in parallel. If you want to do single-precision (32-bit) floating-point calculations in parallel, then you can use a 4-element, 128-bit vector to do four-way single-precision floating-point, or you can use a 2-element 64-bit vector to do two-way SP FP. So the length of the individual vectors dictates how many elements of what type of data you can work with.
True Simd
(overview)
Both types of true SIMD architecture organizations differ only in connection of memory models, M, to the arithmetic units, D. From above, the D, or arithmetic units, are called the processing elements (PEs). In distributed memory, each memory model is uniquely associated with a particular arithmetic unit. The synchronized PE's are controlled by one control unit. Each PE is basically an arithmetic logic unit with attached working registers and local memories for storage of distributed data. The CU decodes the instructions and determines where they should be executed. The scalar or control type of instructions are executed in CU whereas the vector instructions are broadcast to PE's. In shared memory SIMD machines, t he local memories attached to PE's are replaced by memory modules shared by all PE's through an alignment network. This configuration allows the individual PE's to share their memory without accessing the CU.
The True SIMD architecture contains a single contol unit(CU) with multiple processor elements(PE) acting as arithmetic units(AU). In this situation, the arithmetic units are slaves to the control unit. The AU's cannot fetch or interpret any instructions. They are merely a unit which has capabilities of addition, subtraction, multiplication, and division. Each AU has access only to its own memory. In this sense, if a AU needs the information contained in a different AU, it must put in a request to the CU and the CU must manage the transferring of information. The advantage of this type of architecture is in the ease of adding more memory and AU's to the computer. The disadvantage can be found in the time wasted by the CU managing all memory exchanges.
Pipelined SIMD
Pipelined SIMD architecture is composed of a pipeline of arithmetic units with shared memory. The pipeline takes different streams of instructions and performs all the operations of an arithmetic unit. The pipeline is a first in first out type of procedure. The size of the pipelines are relative. To take advantage of the pipeline, the data to be evaluated must be stored in different memory modules so the pipeline can be fed with this information as fast as possible. The advantages to this architecture can be found in the
speed and efficiency of data processing assuming the above stipulation is met. It is also possible for a single processor to perform the same instruction on a large set of data items. In this case, parallelism is achieved by pipelining One set of operands starts through the pipeline, and Before the computation is finished on this set of operands, another set of operands starts flowing through the pipeline.
Advantages of SIMD
The main advantage of SIMD is that processing multiple data elements at the same time, with a single instruction, can dramatically improve performance. For example, processing 12 data items could take 12 instructions for scalar processing, but would require only three instructions if four data elements are processed per instruction using Page 819 SIMD. While the exact increase in code speed that you observe depends on many factors, you can achieve a dramatic performance boost if SIMD techniques can be utilized. Not everything is suitable for SIMD processing, and not all parts of an application need to be SIMD accelerated to realize significant improvements. SIMD offers greater flexibility and opportunities for better performance in video, audio and communications tasks which are increasingly important for applications. SIMD provides a cornerstone for robust and powerful
SIMD can provide a substantial boost in performance and capability for an application that makes significant use of 3D graphics, image processing, audio compression or other calculation-intense functions. Other features of a program may be accelerated by recoding to take advantage of the parallelism and additional operations of SIMD. Apple is adding SIMD capabilities to Core Graphics, QuickDraw and QuickTime. An application that calls them today will see improvements from SIMD without any changes. SIMD also offers the potential to create new applications that take advantage of its features and power. To take advantage of SIMD, an application must be reprogrammed or at least recompiled; however you do not need to rewrite the entire application. SIMD typically works best for that 10% of the application that consumes 80% of your CPU time -- these functions typically have heavy computational and data loads, two areas where SIMD excels.
Check On Disadvantages
Because, in an SIMD machine, a single ACU provides the instruction stream for all of the array processors, the system will frequently be under-utilized whenever programs are run that require only a few PEs. To alleviate this problem, multiple-SIMD (MSIMD) machines were designed. They consist of multiple control units, each with its own program memory. The PEs are controlled by U control units that divide the machine into U independent virtual SIMD machines of various sizes. U is usually much smaller than N and determines the maximum number of SIMD programs that can operate simultaneously. The distribution of the PEs onto the ACUs can be either static or dynamic.
The MSIMD machine architecture has several advantages over normal SIMD machines, including:
Efficiency: If a program requires only a subset of the available PEs, the remaining PEs can be used for other programs.
Multiple users: Up to U different users can execute different SIMD programs on the machine simultaneously.
Fault detection: A program runs on two independent machine partitions, and errors are detected by result comparison.
Fault tolerance: A faulty PE only affects one of the multiple SIMD machines, and other machines can still operate correctly.
Bibliography
http://carbon.cudenver.edu/csprojects/CSC5809S01/Simd/archi.html http://papers.ssrn.com/sol3/papers.cfm?abstract_id=944733 http://arstechnica.com/old/content/2000/03/simd.ars