0% found this document useful (0 votes)

112 views

Vector Processors

The document describes the architecture of a SIMD-type vector processor. It aims to improve performance by exploiting data-level parallelism through SIMD execution, where instructions are executed on multiple data elements simultaneously. The proposed SIMD-Vector processor implements parallelism by operating on short 4-word vectors, performing the operations on the 4 words concurrently in a single clock cycle. This reduces the clock cycles per instruction. The architecture includes a 128-bit SIMD unit with 4 execution units that can execute 4 operations in parallel each cycle on 32-bit vector elements.

Uploaded by

saeed2525

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

112 views

Vector Processors

Uploaded by

saeed2525

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

International Journal of Computer Applications (0975 8887) Volume 20 No.

4, April 2011

Architecture of SIMD Type Vector Processor

Mohammad Suaib
National Institute of Technology Hamirpur, India

Abel Palaty
National Institute of Technology Hamirpur, India

Kumar Sambhav Pandey

National Institute of Technology Hamirpur, India

ABSTRACT
Throughput and performance are the major constraints in designing system level models. As vector processor used deeply pipelined functional unit, the operation on elements of vector was performed concurrently. It means the elements were processed one by one. Improvement can be made in vector processing by incorporating parallelism in execution of these concurrent operations so that these operations can be performed simultaneously. This paper presents a design and implementation of SIMD-Vector processor that implements this parallelism on short vectors having 4 words. The operation on these words is performed simultaneously i.e. the operation on these words is performed in one cycle. This reduces the clock cycles per instruction (CPI). To implement parallelism in vector processing requires parallel issue and execution of vector instructions. Vector processor operates on a vector and superscalar processor issues multiple instructions at a time. This means parallel pipelines are implemented and then made these to support vector data. SIMD-Vector processor will operate on short vector say 4 words vector in a superscalar fashion i.e. 4 words will be fetched at a time and then executed in parallel. This requires redundant functional units e.g. if addition is to be performed on two vectors multiple adders are needed. We have designed the architecture of SIMD type Vector processor. All the designing parameters are explained.

the performance of the processor by exploiting data level parallelism (DLP) and instruction level parallelism (ILP). To exploit DLP, instructions are executed in single instruction multiple data (SIMD) fashion. We adopt the SIMD processors into general purpose processors [2]. Multimedia processors has a lot of inherent parallelism so it can be easily exploited by SIMD instructions at low cost and energy overhead. Here we can see a lot of superior theoretic performance. But practically it is not possible due to some limitations. If we add more processing unit into our SIMD-Vector architecture then it sufficiently increase the hardware cost as well as complexity of the processor. So as a result we worked on short vector. SIMDVector architecture supports the instructions of vector length 4. In this architecture we assume that all the instructions are vector and should be of the length of four. This architecture has 4 execution units. All the four vector elements are processed on four different processing units. This execution is performed parallel in one clock cycle. Hence we can reduce the clock cycles to perform multimedia applications. To reduce the complexity of the system chaining is not used to improve the performance of vector processing. If some instructions have the vector length less than four then available vector elements are sent to execution engines and remaining are circuited to ground. Short vector implementation introduces large parallelization overhead such as loop handling and address generation [1]. There are many examples of SIMD processors such as IBMs VMX, AMDs 3D Now!, Intels SSE and Motorolas Altivec. In these processors we can embed vector processing with taking the advantage of 4 way superscalar processor. The SIMD-Vector architecture brings new levels of performance and energy efficiency. Organization of paper is as follows. In section 2 the motivations of this work is introduced. Section 3 describes the SIMD-Vector architecture. SIMD-Vector is compared with other conventional vector architecture in section 4. Then the evaluation result is shown in section 5. Section 6 describes the conclusion of whole work. Finally section 7 gives the future work.

Keywords
SIMD type Vector processor, vertical and horizontal parallelism, ILP.

1. INTRODUCTION
Parallel processing is the need of todays architectures. Parallel processing reduces the execution time taken by any program. The execution time taken by any program is determined by three factors: First, the number of instructions executed. Second, number of clock cycles needed to execute each instruction and the third is the length of each clock cycle. Here we shall try to reduce the number of clock cycles by introducing a new processor named SIMD type of vector processor. Superscalar and VLIW architectures improve the performance by reducing the Cycles Per Instruction (CPI). This architecture take the advantages of superscalar processor as well as vector processor. SIMD-Vector architecture supports In-order issue with out-oforder completion. All the vector instructions are issued in-order and kept in Instruction cache. After checking the structural and data hazard all the vector instructions are executed in out-oforder sequence. Reorder buffer is used to write the output inorder. Hence we get the correct output sequence. Technology is changing rapidly and significantly in past few years. For microprocessor technologies multimedia applications are the main stream computing. In this scenario we can improve

2. MOTIVATION
A vector ISA packages multiple homogeneous, independent operations into a single short instruction which results into a compact code. The code is compact because a single short vector instruction can describe N operation. This reduces instruction bandwidth requirements. Reduction in instruction bandwidth: A single vector instruction comprises of N operations thereby reducing the instruction bandwidth. In the proposed scheme throughput and performance can be enhanced by introducing parallelism. It can be done by incorporating superscalar issue in vector processing.

International Journal of Computer Applications (0975 8887) Volume 20 No.4, April 2011 Hardware reduction: In vector instruction N operations are homogeneous. This saves hardware in the decode and issue stage. The opcode is decoded once and all N operations can be issued as a group to the same functional unit. In our proposed scheme, this is taken as the basic design constraint. SIMD extensions and vector architecture are quite similar. The principle difference is that how the instructions control is implemented and communication between execution unit and memory unit. With the help of pipelining technology vector processor can overlap computation, load, store operations on vector elements. So vector length may be long and variable. This kind of parallelism is called vertical parallelism. Instruction latency is bigger than one cycle per vector element. While SIMD extension duplicates the execution units to perform the parallel execution. This type of parallelization is called horizontal parallelism. Due to limitation of hardware cost we cannot add much execution units so the vector length should be fixed and short. for (int a=0;a<64;a++) { z[a]=x[a]+y[a]: } (a) Scalar form BVE LV Bit size of vector element Vector length 32 4 BVRF BLS Bit size of vector register file Bit size of load store unit 128 128 applications. Loop controller generates the loop control signal to complete long vector operations with keeping in mind that 4 operation can be done in one clock cycle. It is very tedious to provide the memory location to all the vector element using conventional memory system. To support the strided memory location to vector elements we need an address generator unit [3]. This address generator unit is connected to vector register file and memory via load-store unit. And all remaining units are as conventional with standard meaning. Figure 2 shows the SIMD unit having 4 execution units that can execute 4 operations in parallel in one clock cycle. Table 1. Architectural parameter Parameter BS Explanation Bit size of SIMD unit Bit Size 128

for (int a=0;a<64;a+=4) { z[a+3:a]=x[a+3:a]+y[a+3:a] } (b) SIMD-Vector form For above given example there are 64 iterations in scalar architecture. Scalar architecture takes one clock cycle instruction latency. While using SIMD-Vector architecture four vector instructions can be executed in one clock cycle simultaneously. So instruction latency is just greater than 16.

3. SIMD TYPE VECTOR PROCESSOR

In this section we describe the architecture of SIMD-Vector processor, pipelining and working of proposed architecture.

We have described some parameters for SIMD type Vector processor that are listed in table 1. Our vector register should support 4 vector element of 32 bit each. So length of vector register file (VRF) would be 128. Generally we take the SIMD unit of 128 bit length. Memory unit that is load-store unit would also be 128 bit long. These type of architecture is well supported by IBM's Altivec ISA [4] and Intel's SSE ISA. We are taking 32 bit long vector element. Our proposed architecture would support the instructions of vector length 4.

3.2 Pipelining In SIMD Type Vector processor

In Figure 3 it is shown that how pipeline technology is exploited in SIMD-Vector architecture. At x axis clock cycle is plotted and y axis vector instructions (VI) are shown. Five stage pipelines are shown in Figure 3. By seeing pipeline structure it is easily understood there are four functional unit that can be operated simultaneously on 4 vector element in one clock cycle.

3.1 Proposed Architecture

In proposed architecture SIMD unit is the functional unit to perform the vector operations. It is similar as conventional SIMD unit. Architectural overview of proposed scheme is given in Figure 1. For a given set of vector operations each time SIMD unit executes one vector instruction at a time concurrently as vector instruction has four vector element only. To handle the long vector operations we need the smart compiler for vectorizing the instructions. All the vectorized instructions should be of length 4. We add a additional unit called vector code cache (VCC) to handle the long vector operations. We restrict the size of VCCache to 1 KB that can store 256 operations of 32 bit instruction encoding that is enough for most of the multimedia

International Journal of Computer Applications (0975 8887) Volume 20 No.4, April 2011

I Cache

Loop Controller

SIMD Unit
Regs Regs Regs

Regs

PE1

PE2 mem mem

PE3 mem

PE4 mem

Address Generator

LD/ST

VRF

Data Bus

D Cache Data Bus

Fig 2: SIMD unit

Fig 1: Proposed Architecture of SIMD type Vector Processor

3.3 Working Of SIMD Type Vector Processor

In SIMD-Vector, superscalar implementation is converted to support vector data instead of scalar data. To implement parallel operations on vector redundant functional units are needed. SIMD-Vector behavior is shown in figure 4. Fig 3: Pipelining in SIMD type Vector Processor

4. COMPARISON WITH OTHER ARCHITECTURE

In this section we have compared SIMD-Vector architecture with SIMD extensions and vector architecture. Proposed architecture take the advantages of SIMD as well as vector processors. The width of SIMD-Vector VRF file is much smaller than vector architecture implemented in recent single chip processors [5,6]. Fig 4: Working of SIMD-Vector processor

International Journal of Computer Applications (0975 8887) Volume 20 No.4, April 2011 Table 2. Architecture Comparison Feature SIMDVector 4 SIMD Vector

6. CONCLUSION
SIMD-Vector processor implements parallelisms on shorts vector having four words. The operation on these words is performed simultaneously i.e. the operation on these words is performed in one cycle. This reduces the clock cycles per instruction (CPI). The parallelism in vector processing requires superscalar issue of vector instructions. Above paper gives the architecture of proposed processor that can be exploited in many multimedia applications.

Vector Length Memory access

>=64

Automatic address generation 1 cycle per vector element combined

Sequential access

Strided access

7. FUTURE WORK
In the future, the parallelism in operation can be enhanced to support longer vectors having more words. This leads to an increase in the hardware as more parallelism requires more functional units.

Instruction latency

1 cycle per instruction

1 cycle per element

8. REFERENCES
Vertical Horizontal [1] Shin, J., Hall, M.W., Chame, J.: Superword-Level Parallelism in the Presence of Control Flow. In: CGO 2005, pp. 165175 (2005). [2] Lee, R.: Multimedia Extensions for General-purpose Processors. In: SIPS 1997, pp. 923 (1997). [3] Talla, D.: Architectural techniques to accelerate multimedia applications on general-purpose processors, Ph.D. Thesis, The University of Texas at Austin (2001). [4] Diefendorff, K., et al.: Altivec Extension to PowerPC Accelerates Media Processing. IEEE Micro 2000 20(2), 8595 (2000). [5] Corbal, J., Espasa, R., Valero, M.: Exploiting a New Level of DLP in Multimedia Applications. In: MICRO 1999 (1999). [6] Kozyrakis, C.E., Patterson, D.A.: Scalable Vector Processors for Embedded Systems. IEEE Micro 23(6), 36 45 (2003).
Scalar SIMD SIMD-Vector

Parallelism

5. EVALUATION
By using proposed SIMD-Vector architecture we can enhance the performance of the system. We have analyzed instruction counts on many multimedia operations like fast fourier transform, matrix multiplication, finite impulse response filter infinite impulse response filter using scalar, SIMD and SIMDVector architecture. Response of the analysis is shown n the figure 5. This figure completely shows that when we use SIMDVector architecture number of instructions are fairly less.

Instruction count

0.8 0.6 0.4 0.2 0 FFT MAT FIR IIR

[7] K. Yeager, The MIPS R10000 Superscalar Microprocessor, in Proceedings of IEEE Micro, Vol. 16, No. 2, pp. 28-41, April 1996. [8] James E. Smith, Gurindar S. Sohi, The Microarchitecture of Superscalar Processors, in Proceedings of IEEE, Vol. 83, No. 12, pp. 1609-1624, December 1995). [9] Open SystemC Initiative (OSCI), www.systemc.org.

Fig 5: Comparison of instruction counts

A Comparative Analysis of SIMD and MIMD Architectures
No ratings yet
A Comparative Analysis of SIMD and MIMD Architectures
6 pages
ACA1
No ratings yet
ACA1
29 pages
BCSE412L - Parallel Computing 04
No ratings yet
BCSE412L - Parallel Computing 04
9 pages
SIMD Architecture
100% (1)
SIMD Architecture
16 pages
Advanced Computer Architecture: Presented By, Farhan Mukhtiar
No ratings yet
Advanced Computer Architecture: Presented By, Farhan Mukhtiar
9 pages
26-27 SIMD Architecture
No ratings yet
26-27 SIMD Architecture
33 pages
SIMD and Associative Computational Models: Parallel & Distributed Algorithms
No ratings yet
SIMD and Associative Computational Models: Parallel & Distributed Algorithms
31 pages
MCA Computer Organization and Architecture 14
No ratings yet
MCA Computer Organization and Architecture 14
9 pages
Zareen 6
No ratings yet
Zareen 6
11 pages
array & vector processor
No ratings yet
array & vector processor
17 pages
Design by Mohammed Intekhab Khan
No ratings yet
Design by Mohammed Intekhab Khan
33 pages
CS7103 - MultiCore Architecture Ppts Unit-II
No ratings yet
CS7103 - MultiCore Architecture Ppts Unit-II
43 pages
SIMD
No ratings yet
SIMD
10 pages
CA Classes-221-225
No ratings yet
CA Classes-221-225
5 pages
SIMD Presentation
No ratings yet
SIMD Presentation
28 pages
CA 4 notes
No ratings yet
CA 4 notes
34 pages
Study of Architectural Design of VLSI: Veni Madhav Sharma, Javed Ali Mansuri, Sunil Sharma
No ratings yet
Study of Architectural Design of VLSI: Veni Madhav Sharma, Javed Ali Mansuri, Sunil Sharma
2 pages
onur-digitaldesign-2020-lecture19-simd-beforelecture
No ratings yet
onur-digitaldesign-2020-lecture19-simd-beforelecture
64 pages
IJARCCE6G S Prabhudev Parallel PDF
No ratings yet
IJARCCE6G S Prabhudev Parallel PDF
4 pages
Parallel Processing in Processor Organization: Prabhudev S Irabashetti
No ratings yet
Parallel Processing in Processor Organization: Prabhudev S Irabashetti
4 pages
Module 3- Processors
No ratings yet
Module 3- Processors
22 pages
Chapter 8
No ratings yet
Chapter 8
59 pages
Onur 447 Spring15 Lecture14 Simd Afterlecture
No ratings yet
Onur 447 Spring15 Lecture14 Simd Afterlecture
60 pages
Flynn's Taxonomy of Computer Architecture
No ratings yet
Flynn's Taxonomy of Computer Architecture
8 pages
Mcap Notes
No ratings yet
Mcap Notes
186 pages
Advanced Computer Architecture: Presented By, Krishna
No ratings yet
Advanced Computer Architecture: Presented By, Krishna
35 pages
Introduction to SIMD Array Processors
No ratings yet
Introduction to SIMD Array Processors
4 pages
Microprocessor Array System
No ratings yet
Microprocessor Array System
7 pages
Computer_ARCHITECTURE_Lecture_8_10_1738846483
No ratings yet
Computer_ARCHITECTURE_Lecture_8_10_1738846483
202 pages
Lecture 3 Flynn's Classical Taxonomy
No ratings yet
Lecture 3 Flynn's Classical Taxonomy
29 pages
Coa Unit-3,4 Notes
No ratings yet
Coa Unit-3,4 Notes
17 pages
3.array Processors
100% (3)
3.array Processors
14 pages
CH 04. Data-Level Parallelism in Vector, SIMD, and GPU Architectures
No ratings yet
CH 04. Data-Level Parallelism in Vector, SIMD, and GPU Architectures
50 pages
CA Classes-236-240
No ratings yet
CA Classes-236-240
5 pages
EE6304 Lecture13 Processors
No ratings yet
EE6304 Lecture13 Processors
69 pages
Lecture 10 - SIMD Architecture
No ratings yet
Lecture 10 - SIMD Architecture
27 pages
Memory Controller
No ratings yet
Memory Controller
26 pages
Lec 18-VectorSIMDGPUArchitectures
No ratings yet
Lec 18-VectorSIMDGPUArchitectures
29 pages
GUC_315_61_38694_2023-11-23T11_50_52
No ratings yet
GUC_315_61_38694_2023-11-23T11_50_52
33 pages
Lecture 2
No ratings yet
Lecture 2
12 pages
Copy of Unit IV CA
No ratings yet
Copy of Unit IV CA
73 pages
Notes_FT_HA
No ratings yet
Notes_FT_HA
4 pages
Chapter
No ratings yet
Chapter
9 pages
Difference Between Vector Processor and Scalar Processor
No ratings yet
Difference Between Vector Processor and Scalar Processor
1 page
Chapter 04
No ratings yet
Chapter 04
47 pages
Data-Level Parallelism in Vector, SIMD, and GPU Architectures
No ratings yet
Data-Level Parallelism in Vector, SIMD, and GPU Architectures
58 pages
Flynn's Classification - SISD, SIMD,MISD & MIMD
No ratings yet
Flynn's Classification - SISD, SIMD,MISD & MIMD
15 pages
F 23
No ratings yet
F 23
20 pages
Coa-Unit - 5 Notes
No ratings yet
Coa-Unit - 5 Notes
38 pages
Unit Iii Data-Level Parallelism in Vector, Simd, and Gpu Architectures
No ratings yet
Unit Iii Data-Level Parallelism in Vector, Simd, and Gpu Architectures
26 pages
CP4253 Map Unit I
No ratings yet
CP4253 Map Unit I
31 pages
For Example: C (1:50) A (1:50) + B (1:50)
No ratings yet
For Example: C (1:50) A (1:50) + B (1:50)
7 pages
Vector Processor
No ratings yet
Vector Processor
83 pages
What's New in .NET 8? A Complete Guide to the Latest Features
From Everand
What's New in .NET 8? A Complete Guide to the Latest Features
Nitika
No ratings yet
7TH_UNIT 4-21EC74H6_CA
No ratings yet
7TH_UNIT 4-21EC74H6_CA
67 pages
Unit 3-4
No ratings yet
Unit 3-4
76 pages
Parallel & Distributed Computing: By: M. Imran Siddiqui
No ratings yet
Parallel & Distributed Computing: By: M. Imran Siddiqui
25 pages
Architecture Assignment: 1. What Is Array Processor? Describe The SIMD Array Processor? Answer-Instruction
No ratings yet
Architecture Assignment: 1. What Is Array Processor? Describe The SIMD Array Processor? Answer-Instruction
2 pages
SLP Pldi 2000
No ratings yet
SLP Pldi 2000
12 pages
Ca Part 4
No ratings yet
Ca Part 4
25 pages
Inquiry Letter
100% (1)
Inquiry Letter
2 pages
Cover Letter
No ratings yet
Cover Letter
4 pages
Session 1
No ratings yet
Session 1
2 pages
Context: Wake (1939) - These Two Works Emblematize His Signature Stream-Of-Consciousness Prose Style, Which Mirrors
No ratings yet
Context: Wake (1939) - These Two Works Emblematize His Signature Stream-Of-Consciousness Prose Style, Which Mirrors
48 pages
Type of Business Letter: For Example, Order Letters Have The Following Parts
No ratings yet
Type of Business Letter: For Example, Order Letters Have The Following Parts
3 pages
Discussion Topics
No ratings yet
Discussion Topics
5 pages
Instructor: Mr. Habibi: A. Things Students or Teachers Do in The Classroom
No ratings yet
Instructor: Mr. Habibi: A. Things Students or Teachers Do in The Classroom
1 page
Derrida and Kierkegaard Thinking The Fall
No ratings yet
Derrida and Kierkegaard Thinking The Fall
14 pages
Dubliners Summary ENGA10 For Leonard On 2011-10-12 at University of Toronto
No ratings yet
Dubliners Summary ENGA10 For Leonard On 2011-10-12 at University of Toronto
6 pages
Application For Employment
No ratings yet
Application For Employment
2 pages
Irish Womanhood in Dubliners
No ratings yet
Irish Womanhood in Dubliners
7 pages
A: Choose The Best Answer
No ratings yet
A: Choose The Best Answer
2 pages
Appraisal
No ratings yet
Appraisal
85 pages
Title Pages
No ratings yet
Title Pages
16 pages
FAX UX 45 Ux67 Manual de Servico PDF
No ratings yet
FAX UX 45 Ux67 Manual de Servico PDF
96 pages
POE61U_560DG-3217737 (1)
No ratings yet
POE61U_560DG-3217737 (1)
5 pages
Ec6004 Satellite Communication Unit 2 PDF
No ratings yet
Ec6004 Satellite Communication Unit 2 PDF
133 pages
Quiz Question and Answers For Electrical
No ratings yet
Quiz Question and Answers For Electrical
119 pages
Asia Sat Azimuth
No ratings yet
Asia Sat Azimuth
7 pages
Quanta Bd1 r3b Schematics Da0wj1mb6f0
No ratings yet
Quanta Bd1 r3b Schematics Da0wj1mb6f0
33 pages
DDC Exersice Sheet 3
No ratings yet
DDC Exersice Sheet 3
5 pages
Electronic Circuit II Chap 3 Power Electronics: DC DC Converters
No ratings yet
Electronic Circuit II Chap 3 Power Electronics: DC DC Converters
17 pages
2017 18 Sem1 ElectMachines Compre
No ratings yet
2017 18 Sem1 ElectMachines Compre
2 pages
Sơ đồ đấu dây biến tần Hitachi L200 (2)
No ratings yet
Sơ đồ đấu dây biến tần Hitachi L200 (2)
1 page
Basic RF Optimization
No ratings yet
Basic RF Optimization
14 pages
Wireless World 1979 01 S OCR
No ratings yet
Wireless World 1979 01 S OCR
148 pages
05 Exploded View Part List (Ver2.0)
No ratings yet
05 Exploded View Part List (Ver2.0)
172 pages
VFD E Modbus RTU SS PDF
No ratings yet
VFD E Modbus RTU SS PDF
5 pages
HSBB OVERVIEW FOR RNO MANAGEMENT v2
100% (1)
HSBB OVERVIEW FOR RNO MANAGEMENT v2
68 pages
BPW82 Data Sheets
No ratings yet
BPW82 Data Sheets
5 pages
Metasys® Ifc2-3030
No ratings yet
Metasys® Ifc2-3030
2 pages
RXV 385
No ratings yet
RXV 385
133 pages
Programmable Logic Devices
100% (1)
Programmable Logic Devices
12 pages
OTN Framing V1.2
No ratings yet
OTN Framing V1.2
23 pages
Intro Fiber Optic
No ratings yet
Intro Fiber Optic
25 pages
Opc Server Omron Hostlink Serial Configuration Manual
No ratings yet
Opc Server Omron Hostlink Serial Configuration Manual
53 pages
DB Catalog 2022 010922 Low Singole
No ratings yet
DB Catalog 2022 010922 Low Singole
36 pages
EELE 461/561 - Digital System Design Module #4 - Interconnect Construction (Printed Circuit Boards)
0% (1)
EELE 461/561 - Digital System Design Module #4 - Interconnect Construction (Printed Circuit Boards)
64 pages
Quiz 2 (Diode) A
No ratings yet
Quiz 2 (Diode) A
3 pages
Mod-MTS 2 and MTS 4 Hardware and Software Configuration
No ratings yet
Mod-MTS 2 and MTS 4 Hardware and Software Configuration
27 pages
Page 1-24
No ratings yet
Page 1-24
24 pages
Controller Setpoints at Commissioning
No ratings yet
Controller Setpoints at Commissioning
2 pages
MC Doc
No ratings yet
MC Doc
56 pages
U9024N
No ratings yet
U9024N
10 pages

Vector Processors

Uploaded by

Vector Processors

Uploaded by

International Journal of Computer Applications (0975 8887) Volume 20 No.

Architecture of SIMD Type Vector Processor

Kumar Sambhav Pandey

3. SIMD TYPE VECTOR PROCESSOR

3.2 Pipelining In SIMD Type Vector processor

3.1 Proposed Architecture

PE2 mem mem

D Cache Data Bus

Fig 2: SIMD unit

Fig 1: Proposed Architecture of SIMD type Vector Processor

3.3 Working Of SIMD Type Vector Processor

4. COMPARISON WITH OTHER ARCHITECTURE

Vector Length Memory access

Automatic address generation 1 cycle per vector element combined

1 cycle per instruction

1 cycle per element

0.8 0.6 0.4 0.2 0 FFT MAT FIR IIR

Fig 5: Comparison of instruction counts

You might also like