0% found this document useful (0 votes)

82 views5 pages

An FPGA Based SIMD Architecture Implemented With 2D Systolic Architecture For Image Processing

This document summarizes an FPGA-based SIMD architecture for parallel image processing. It implements a 2D systolic array using an SIMD controller to broadcast instructions to processing elements connected in a 2D torus topology. Each processing element performs the same operation on different data pixels in parallel. The architecture achieves high throughput for image processing tasks through parallel and pipelined execution across the array of processing elements.

Uploaded by

Vitalie Vital

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views5 pages

An FPGA Based SIMD Architecture Implemented With 2D Systolic Architecture For Image Processing

Uploaded by

Vitalie Vital

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

An FPGA Based SIMD Architecture Implemented with 2D Systolic Architecture for Image Processing

Pankaj Kumar
Lecturer, Department of Computer Application Sahara Arts & Management Academy Bakshi Ka Talab, Lucknow [email protected]

Abstract: Image processing is widely used in many applications, including medical imaging, industrial manufacturing, and security systems. It is gaining larger importance in a variety of application areas e.g. for autonomous vehicles, requires substantial computational power, in order to be able to operate in real time. The recent advances in image processing make popular using images in different branches of human activity; robotics, biomedical applications, industrial process control and environmental control are among them. Each procedure in this environment demands variety of processes, methods and hardware. Nowadays parallel computer architecture has been developed to pipeline successive image processing functions. The architecture allows us to run many image processing operations in parallel. With this we can achieve a much higher data throughput than traditional computing systems. Parallel computer architecture is suggested for highly efficient image processing which includes parallel processors of the SIMD, MIMD type, multiprocessor systems, and pipelined processors. The main objective of this paper is to present the implementation of the image processor which is an SIMD processor build with large FPGA. The paper describes how the systolic architecture is best suitable for this architecture. And last it describes how the image processing is happening in this parallel processor. Keyword: FPGA, SIMD, MIMD, 2D Systolic Array, Pipelined processor General Word: Image Processing, Parallel Processor, Multiprocessor are hidden from users of the PIPT through a registration/call-back mechanism, which provides an opaque transport mechanism for moving parameterized data between nodes. The main work of PIPT is data distribution and massage passing. The PIPT uses a manager/worker scheme in which a manager process reads an image file from disk, partitions it into equally sized slices, and sends the pieces to worker processes. The worker programs invoke a specified image processing routine to process their slices and then send the processed slices back to the manager. The manager re-assembles the processed slices to create a final processed output image. Applications use the PIPT by calling image processing routines which are in turn built up from abstract computation kernel functions. The kernel functions interface to an opaque transport layer which transparently effects

Introduction

Image processing may be defined as the science of modification and analysis of continuous tone pictures. It is for improvement of pictorial information for human interpretation and processing of scene data for analysis by computers. Sometimes image processing is confused with computer graphics due to methods used in computer graphics and image processing overlap but these two are different to each other [ 1 3 ] . In computer graphics, a computer is used to create picture. Image processing, on the other hand, applies techniques to modify or interpret existing pictures, such as photographs and TV scans. The result of parallel image processing depends upon the degree of parallelism. Thats why PIPT (parallel image processing tool kit) is developed [12]. The details of parallelization

parallel execution. The transport mechanism makes calls to an MPI (massage passing interface) library for its parallel communication operations [12]. Although the PIPT provides for parallel execution of image processing routines, parallelism is encapsulated at a low level of the system so that users of the PIPT do not need to be concerned with parallel programming. Additionally, MPI functions allow the PIPT to create its own message passing space that is guaranteed not to conflict with any other messages from the user's application. 2 Cellular Arrays for Pixel Parallelism Over the last few years, advances in programmable logic devices have resulted in the commercialization of field programmable gate arrays (FPGA) which allow putting large numbers of programmable logic elements on a single chip [10]. The size and speed of those circuits improve at the same rate as microprocessors size and speed, since they rely on the same technology. Field programmable Gate Arrays (FPGA) offers the possibility that re-programmable, reconfigurable arrays can be constructed to efficiently compute certain problems. Now a day FPGA can implement a programmable, maximally parallel implementation of a cellular array.

can only be efficiently implement a small numbers of cells. 3 Image Pipeline for Instruction Level Parallelism

This is similar to cellular architecture. Data is provided to the cell through a continuous stream of pixels [6]. They are often supplied to the cell one sample at a time, and usually in raster scan order. This arrangement is often suitable for real-t i m e s ystems where data arrives directly from a serial I/O sensor. Since pixels are processed sequentially, the main way to achieve speed-up for an image pipeline is to execute multiple instructions in parallel. As shown in Figure 2 instructions can be implemented either in parallel (increasing the pipeline width) or in series (increasing the pipeline depth). Unlike cellular architectures, accessing a local neighborhood within an image pipeline must be carefully considered. All instructions in a pipeline are being executed at the same time, and therefore it may be difficult to provide data to all instructions at the right time.

Fig 2

Image Pipeline for Instruction Parallelism (Pixel in time)

Fig 1

Cellular Array for Pixel Parallelism (Instruction in time)

Cellular arrays are a natural model for parallel image processing [6]. They consist of an array of cells in two, three or more dimensions. Each cell is associated with an image pixel and each cell has dedicated connections to its local neighborhood. This high bandwidth local communication is ideal for implementing neighborhood functions; all pixels are processed in parallel, and the entire image is updated in 1 instruction cycle. FPGA can implement a programmable, maximally parallel implementation of a cellular array, but

Unlike cellular architectures, accessing a local neighborhood within an image pipeline must be carefully considered. All instructions in a pipeline are being executed at the same time, and therefore it may be difficult to provide data to all instructions at the right time. 4 SIMD Based Architecture For implementation of image processing on parallel computer architecture I have taken SIMD (Single Instruction Multiple-Data) architecture together with a 2D torus connection topology, which includes the 2D systolic architecture (Fig 3). The SIMD architecture has a 2D torus interconnection topology of its processing elements (PE) and the same address and control signals are used

by every PE. The interconnection of all PEs is called processing matrix [11].

Exchanging status information with the SIMD controller. Managing data transfer between the host and the board. Data transfers between the host and the board use the global bus to send address and data to PEs and SIMD controller. 4.2 SIMD Controller

The SIMD controller is the control unit of the Image Processor. It reads a program from its instruction memory and uses its data memory for storing global information. Once an instruction is decoded, data and control signals are sent to the PEs through the global bus and a dedicated control bus. The global bus may be used to send both control and data signals.

Fig 3

SIMD Based parallel architecture for image processing

There are three distinct memory used in this architecture; instruction memory, data memory and PEs memory ( register file) . T h e combination of instruction memory, data memory is called Stream memory [1]. The SIMD controller unit is responsible for reading data (e.g. pixel value) from the stream memory and transferring it to the register files and vice versa. The architecture is composed of three basic components: the SIMD controller, the processing matrix and the I/O controller. These three components are connected by one shared global bus and two control buses i.e. program control and PEs memory control and address signals. All processing element can communicate to each other by means of Inter PE Communication. 4.1 I/O Controller The I/O controller manages off-board communication and initiates memory transfer. The I/O controller is responsible for the following operations: Communicating with the host.

Fig 4

SIMD Controller

SIMD processor includes an image sensor which senses the image which is to be processed (fig 4) [2, 11]. The SIMD controller also provides addresses and control to every memory during both program execution and I/O memory transfers. If configured accordingly, it exchanges status information with the I/O controller. 4.3 Stream Memory The stream memory unit is the connection between external memory and I/O and the PE. It takes the data from external memory and sends it to PE via the SIMD controller. It is combination of data memory and instruction memory.

4.4

Processing Matrix

The PE matrix is a set of identical PEs interconnected in a 2D grid topology. Each PE is connected in direction North, South, East and West to its 4 neighbors (Figure 5).

FPGAs through the global bus and a dedicated control bus. The processing matrix is implemented by 2*2 FPGA matrixes. Each FPGA is connected to its North, South, East, West neighbors and to a local memory. Each North, South, East, West connection has 32 bits (Table 1). Conceptually an FPGA are shared among every PEs of its sub-matrix. The data and control signal of an FPGA are shared among every PEs of its submatrix. PCI local bus standard is taken for implementing the I/O interface. The design of interface is simplified by the PCI controller. It is a powerful and flexible controller supporting several levels of interface sophistication. Two clock signals are available on board. The first one is provided by the PCI controller while the second one is an on-board crystal clock. It is possible to connect two or more of this architecture by means of 124 pin connectors. In that case the north and south connections of the processing matrix are routed to other boards. 6 Result The parallel computer based on SIMD architecture is t o t a l l y dedicated to the processing of image but it has been seen that this architecture is also well suited for pattern recognition and neural network. The image processor has SIMD architecture with a 2D interconnection network that is well suited for implementing 2D systolic networks. The described architecture can be reconfigured. It is possible to connect two or more of this architecture by means of 124 pin connectors. In that case the north and south connections of the processing matrix are routed to other boards. As in this architecture process matrix put a great role so by changing the architecture of process matrix any one can find a new architecture. 7 Discussion Now days parallel processing concepts are used in variety of application where speed and accuracy matters a lot. Parallel processing improves the performance by executing more than one task at a time. So here parallel processing architecture is used. Here in this architecture pixel level parallelism is achieved thats why only one instruction is given at a

Fig 5

Detailed view of PE

Each PE has a local memory addressed by the SIMD controller. The processing matrix is implemented by a 2*2 FPGA matrix. Conceptually a FPGA represents a sub matrix of the global PE matrix [11]. The data and control signals are shared among every PE and its sub-matrix. 5 Implementation The architecture has 2*2 FPGA matrix and each of the memory blocks is a SRAM module of 64K*32 bits. The processing element as well as SIMD controller is implemented by SRAM. The I/O controller is implemented by EPROM.
Table 1 Width of each Bus

Bus Global Shared Bus Control Bus between I/O and SIMD Controller Three Address Buses Control Bus between SIMD controller and PE matrix 2D Grid connection Configuration Signal

No. of bits 32 23 18 bits each 10 32 8

SIMD controller is implemented by an FPGA. It implies that decoding and executing instruction may be different from one application to the other. Once the instruction is decoded data & control signal are send to the

time. So SIMD (Single Instruction and multiple data stream) parallel computer architecture is used. As this architecture is dedicated for image processing so this can be said as image processor. The speed as well as accuracy of the described architecture is found better than the other architecture which is based on serial computing. 8 Conclusion The demand for image processing applications for high application such as robotics, biomedical application, and industrial process has grown rapidly in recent years, so it become necessary to design a architecture which is well suited for these type of application. So In this paper, I took a suitable SIMD parallel computer architecture for image processing. The parameters which I took are the number of ALUs, the number of PEs, I/O controller and the SIMD controller. A parallel approach is used in the design so that data throughput can be maximized. The architecture is implemented on FPGA which allow putting large numbers of programmable logic elements on a single chip. Acknowledgement I would express my gratitude to Prof. D. N. Kakkar, Director Sahara Arts & Management Academy for his sincere s u p p o r t a n d motivation. I can not forget the cooperation of Mr. Brijesh Khendelwal HOD, Computer Science. I would also like to thank my entire colleague which gave me a lot of encouragement for writing this paper. Reference [1] Hamed Fatemi, Henk Corporaal Twan Basten Richard Kleihorst, and Pieter J o n k e r Designing Area and Performance Constrained SIMD/VLIW Image Processing Architectures Antonio Gentile, Jos L. Cruz-Rivera, D. Scott Wills, Leugim Bustelo, Jos J. Figueroa, Javier E. Fonseca-Camacho, Wilfredo E. Lugo-Beauchamp, Ricardo Olivieri, Marlyn Quiones-Cerpa, Alexis H. Rivera-Ros, Iomar Vargas-Gonzles, Michelle Viera-Vera Real-Time Image Processing on a Focal Plane SIMD Array

[3]

Alok Choudhary and Sanjay Ranka, Syracuse University A Parallel Processing for Computer Vision and Image Understanding F.J. Seinstra D. Koelma J.M. Geusebroek Bridging the Gap between Computing and Imaging: Towards Effortless Parallel Image Processing Intelligent Sensory Information Systems. Andriy Lutsyk, Oleksiy Lutsyk, Olexandr Pelenskyy Parallel Image Processing on Configurable Computing Architecture . Reid B. Porter Image Processing pp. 4-5 Winner E. Alexander Parallel image processing with block data Parallel architecture IBM J: RES. Develop vol. 44 no. 5 sept. 2000. Mirosaw GAJER Parallel Image Processing on the Texas Instruments Multiprocessor System Systems Pro Dialog 11 (2000), 1329 NAKOM Publishers Pozna, Poland. Thomas Brunl Tutorial in Data Parallel Image Processing Australian Journal of Information Processing Systems (AJIIPS), vol. 6, no. 3, 2001, pp. 164174 (11).

[4]

[5]

[6] [7]

[8]

[9]

[10] James Greco Parallel Image Processing and Computer Vision Architecture University of Florida 2005, pp 15-16 [11] Jocelyn Cloutier, Eric Cosatto, Steven Pigeon, Francois R. Boyer and Patrice Y. Simardn An FPGA based image processor for image processing and neural networks [12] J. M. Squyresy, A. Lumsdainey, R. L. Stevensonz A toolkit for parallel image processing pp. 1-2 [13] Baker A survey of computer Science pp. 32-33

[2]

[14] Syeda

Mohisna Afroze Systolic Architecture Advanced Logic Synthesis

Clusterguide-V4 0
No ratings yet
Clusterguide-V4 0
112 pages
A Comparative Analysis of SIMD and MIMD Architectures
No ratings yet
A Comparative Analysis of SIMD and MIMD Architectures
6 pages
07 - MSOFTX3000 SBC SIP Introduce - 201306 V2 1
100% (1)
07 - MSOFTX3000 SBC SIP Introduce - 201306 V2 1
43 pages
Communication Architectures For SOC AYALA
100% (1)
Communication Architectures For SOC AYALA
434 pages
Intecont - Loss in Weight - Manual - FH399gb PDF
100% (1)
Intecont - Loss in Weight - Manual - FH399gb PDF
130 pages
XCell Equipment User Guide 2.01
100% (1)
XCell Equipment User Guide 2.01
49 pages
Aca 2 Marks With Answers
No ratings yet
Aca 2 Marks With Answers
22 pages
d502 Console Architecture
No ratings yet
d502 Console Architecture
22 pages
Chapter 08 - Pipeline and Vector Processing
No ratings yet
Chapter 08 - Pipeline and Vector Processing
14 pages
ECE 4100/6100 Advanced Computer Architecture: Lecture 13 Multithreading and Multicore Processors
No ratings yet
ECE 4100/6100 Advanced Computer Architecture: Lecture 13 Multithreading and Multicore Processors
56 pages
3.array Processors
100% (3)
3.array Processors
14 pages
ch.9 Pipeline MoDIFIED
No ratings yet
ch.9 Pipeline MoDIFIED
76 pages
Ca Unit 4 Prabu
No ratings yet
Ca Unit 4 Prabu
24 pages
Ahmad Aljebaly Department of Computer Science Western Michigan University
No ratings yet
Ahmad Aljebaly Department of Computer Science Western Michigan University
42 pages
SIMD Architecture
100% (1)
SIMD Architecture
16 pages
Image Processing Using Fpgas: Imaging
No ratings yet
Image Processing Using Fpgas: Imaging
4 pages
Flynn Classification
No ratings yet
Flynn Classification
4 pages
Image Hardware PDF
No ratings yet
Image Hardware PDF
19 pages
CP4253 Map Unit I
No ratings yet
CP4253 Map Unit I
31 pages
On Chip Network by Natalie PDF
100% (1)
On Chip Network by Natalie PDF
141 pages
Cell Tutorial
No ratings yet
Cell Tutorial
87 pages
Parallel Processing Report
No ratings yet
Parallel Processing Report
9 pages
IJARCCE6G S Prabhudev Parallel PDF
No ratings yet
IJARCCE6G S Prabhudev Parallel PDF
4 pages
Thesis Lenart
No ratings yet
Thesis Lenart
195 pages
Parallel Processing in Processor Organization: Prabhudev S Irabashetti
No ratings yet
Parallel Processing in Processor Organization: Prabhudev S Irabashetti
4 pages
Coa Unit-3,4 Notes
No ratings yet
Coa Unit-3,4 Notes
17 pages
SIMD and Associative Computational Models: Parallel & Distributed Algorithms
No ratings yet
SIMD and Associative Computational Models: Parallel & Distributed Algorithms
31 pages
Flyyn's Taxonomy Research
No ratings yet
Flyyn's Taxonomy Research
13 pages
ASSEMBLY Language Basic Computer Architecture Address, Data, and Control Buses
No ratings yet
ASSEMBLY Language Basic Computer Architecture Address, Data, and Control Buses
3 pages
RX 8640
No ratings yet
RX 8640
26 pages
High Level FPGA Modeling For Image Processing Algorithms Using Xilinx System Generator
No ratings yet
High Level FPGA Modeling For Image Processing Algorithms Using Xilinx System Generator
8 pages
Implementing Image Processing Algorithms On Fpgas: C. T. Johnston, K. T. Gribbon, D. G. Bailey
No ratings yet
Implementing Image Processing Algorithms On Fpgas: C. T. Johnston, K. T. Gribbon, D. G. Bailey
6 pages
Chapter-2: Literature Review
No ratings yet
Chapter-2: Literature Review
11 pages
F 23
No ratings yet
F 23
20 pages
Parallel Processing
100% (1)
Parallel Processing
4 pages
Taxonomy Parallel Computer Architectures Instruction Data
No ratings yet
Taxonomy Parallel Computer Architectures Instruction Data
2 pages
Flynn's Taxonomy of Computer Architecture
No ratings yet
Flynn's Taxonomy of Computer Architecture
8 pages
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
No ratings yet
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
14 pages
Parallel Video Processing Performance Evaluation On The Ibm Cell Broadband Engine Processor
No ratings yet
Parallel Video Processing Performance Evaluation On The Ibm Cell Broadband Engine Processor
13 pages
An FPGA Implementation of A Flexible, Parallel Image Processing Architecture Suitable For Embedded Vision Systems
No ratings yet
An FPGA Implementation of A Flexible, Parallel Image Processing Architecture Suitable For Embedded Vision Systems
6 pages
Hardware Implementation of Iot-Based Image Processing Filters
No ratings yet
Hardware Implementation of Iot-Based Image Processing Filters
10 pages
Vector (Array) Processing and Superscalar Processors
No ratings yet
Vector (Array) Processing and Superscalar Processors
7 pages
SIMD
No ratings yet
SIMD
10 pages
Dikshant's Project
No ratings yet
Dikshant's Project
60 pages
MultiCore Architecture
100% (2)
MultiCore Architecture
44 pages
Dsa00170624 PDF
No ratings yet
Dsa00170624 PDF
10 pages
Multiple Instruction Stream PDF
No ratings yet
Multiple Instruction Stream PDF
4 pages
Python Bindings For The Open Source Electromagnetic Simulator Meep
No ratings yet
Python Bindings For The Open Source Electromagnetic Simulator Meep
20 pages
Model
No ratings yet
Model
14 pages
Accelerating Marching Cubes With Graphics Hardware
No ratings yet
Accelerating Marching Cubes With Graphics Hardware
6 pages
5 Marks Q. Describe Array Processor Architecture
No ratings yet
5 Marks Q. Describe Array Processor Architecture
11 pages
A Journey Through The CPU Pipeline
No ratings yet
A Journey Through The CPU Pipeline
20 pages
MCA Computer Organization and Architecture 14
No ratings yet
MCA Computer Organization and Architecture 14
9 pages
Design and Implementation of A Soc Reconfigurable Computing Architecture For Multimedia Applications
No ratings yet
Design and Implementation of A Soc Reconfigurable Computing Architecture For Multimedia Applications
7 pages
Montgomery Multiplication On The Cell
No ratings yet
Montgomery Multiplication On The Cell
9 pages
On The Security of 1024-Bit RSA and 160-Bit Elliptic Curve Cryptography
No ratings yet
On The Security of 1024-Bit RSA and 160-Bit Elliptic Curve Cryptography
20 pages
Assign
No ratings yet
Assign
12 pages
A Comprehensive Survey of Various Processor Types & Latest Architectures
No ratings yet
A Comprehensive Survey of Various Processor Types & Latest Architectures
7 pages
Assignment
No ratings yet
Assignment
16 pages
Ringtree - A Vlsi Architecture For Fast Image Generation and Processing (1988)
No ratings yet
Ringtree - A Vlsi Architecture For Fast Image Generation and Processing (1988)
4 pages
CA Classes-221-225
No ratings yet
CA Classes-221-225
5 pages
Parallel & Distributed Computing: By: M. Imran Siddiqui
No ratings yet
Parallel & Distributed Computing: By: M. Imran Siddiqui
25 pages
Chapter
No ratings yet
Chapter
9 pages
BCSE412L - Parallel Computing 04
No ratings yet
BCSE412L - Parallel Computing 04
9 pages
Aca Unit 1.1
No ratings yet
Aca Unit 1.1
20 pages
PlayStation 3 Secrets
No ratings yet
PlayStation 3 Secrets
19 pages
Lecture 2
No ratings yet
Lecture 2
12 pages
Fast Line Detection Using Major Line Removal Morphological Hough
No ratings yet
Fast Line Detection Using Major Line Removal Morphological Hough
5 pages
SciNet Tutorial
No ratings yet
SciNet Tutorial
22 pages
SBI Computer Question Paper For Clerk
No ratings yet
SBI Computer Question Paper For Clerk
13 pages
An FPGA Based SIMD Architecture Implemented With 2D Systolic Architecture For Image Processing
No ratings yet
An FPGA Based SIMD Architecture Implemented With 2D Systolic Architecture For Image Processing
5 pages
VLSI Design Style
No ratings yet
VLSI Design Style
1 page
Unit 4 - Parallel Computer Structures Word
No ratings yet
Unit 4 - Parallel Computer Structures Word
12 pages
Lecture 10 - SIMD Architecture
No ratings yet
Lecture 10 - SIMD Architecture
27 pages
Lecture3 (Form Parallelism&flynn)
No ratings yet
Lecture3 (Form Parallelism&flynn)
12 pages
ACA Unit. 1 Parallel Processing
No ratings yet
ACA Unit. 1 Parallel Processing
10 pages
ACA1
No ratings yet
ACA1
26 pages
Lecture 2
No ratings yet
Lecture 2
51 pages
COA Module5 Notes
No ratings yet
COA Module5 Notes
20 pages
ACA1
No ratings yet
ACA1
29 pages
Notes FT HA
No ratings yet
Notes FT HA
4 pages
Lecture 3.1.1 (Parallelism in Uniprocessor System, Flynn - S Classification)
No ratings yet
Lecture 3.1.1 (Parallelism in Uniprocessor System, Flynn - S Classification)
8 pages
Unit 4 COA
No ratings yet
Unit 4 COA
8 pages
GPU v1.1
No ratings yet
GPU v1.1
25 pages
IJCRT2304397
No ratings yet
IJCRT2304397
5 pages
Lecture 3.1.1 (Parallelism in Uniprocessor System, Flynns Classification)
No ratings yet
Lecture 3.1.1 (Parallelism in Uniprocessor System, Flynns Classification)
21 pages
CPE110 Finals Matching Type
No ratings yet
CPE110 Finals Matching Type
1 page
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
3D Hardware design:: Software applications for GPU
From Everand
3D Hardware design:: Software applications for GPU
S Mathioudakis
No ratings yet

An FPGA Based SIMD Architecture Implemented With 2D Systolic Architecture For Image Processing

Uploaded by

An FPGA Based SIMD Architecture Implemented With 2D Systolic Architecture For Image Processing

Uploaded by

An FPGA Based SIMD Architecture Implemented with 2D Systolic Architecture for Image Processing

Image Pipeline for Instruction Parallelism (Pixel in time)

Cellular Array for Pixel Parallelism (Instruction in time)

SIMD Based parallel architecture for image processing

No. of bits 32 23 18 bits each 10 32 8

Mohisna Afroze Systolic Architecture Advanced Logic Synthesis

You might also like