0% found this document useful (0 votes)
184 views

INDS08 Implementation of One Dimensional CNN Array On FPGA A Design Based On Verilog HDL

This document describes the implementation of a 1D convolutional neural network (CNN) array on an FPGA using Verilog HDL. It involves designing convolution modules to apply control and feedback templates to input data and cell outputs. The design simplifies the CNN cell dynamics to discrete-time equations that can be mapped to digital integrators, multipliers, and adders on the FPGA. The overall architecture consists of convolution modules wired together to extend the size of the CNN structure. Registers are used to implement integrators and store previous cell values, allowing the design to function as a parallel processor emulating the CNN computations.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
184 views

INDS08 Implementation of One Dimensional CNN Array On FPGA A Design Based On Verilog HDL

This document describes the implementation of a 1D convolutional neural network (CNN) array on an FPGA using Verilog HDL. It involves designing convolution modules to apply control and feedback templates to input data and cell outputs. The design simplifies the CNN cell dynamics to discrete-time equations that can be mapped to digital integrators, multipliers, and adders on the FPGA. The overall architecture consists of convolution modules wired together to extend the size of the CNN structure. Registers are used to implement integrators and store previous cell values, allowing the design to function as a parallel processor emulating the CNN computations.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Implementation of One Dimensional CNN Array

on FPGA - A Design Based on Verilog HDL


Alireza Fasih Jean C. Chedjou Kyandoghere Kyamakya
Transportation Informatics Group Transportation Informatics Group Transportation Informatics Group
[email protected] University of Klagenfurt University of Klagenfurt
Klagenfurt-Austria Klagenfurt-Austria
[email protected] [email protected]

Abstract— In this paper an FPGA based Implementation of a universal machine by using CNN [7]. A test-bed for CNN
1D-CNN with a 3×1 template and 8×1 length will be described. emulation on an FPGA evaluation board is a relatively cheap
The Cellular Neural Networks (CNN) is a parallel processing option and the required development time should be
technology that has been generally used for image processing. significantly low.
This system is a reduced version of a Hopfield Neural Network.
The local connections between a cell and the neighbors in this
implementation of this technology is easier than in the case of
II. CNN DIGITAL EMULATION/MODELLING
Hopfield Neural Networks. There are various implementation
options of CNN on chips, the best solution being using ASIC CNN mathematical models for each cell are first-order
technology. The next best is an emulation on top of a digital equations like Eq-1 shown below.
reconfigurable chip such as FPGA. Designing and developing Eq-1:
universal CNN based machines using these technologies is
possible. Since FPGAs are COTS components and their growth is
high, a simple and economical architecture is obtained by State Equation:
designing an CNN emulation on FPGA chips. This digital dvxij(t) 1
designing on FPGA does however have a tradeoff between speed C = − vxij(t) + ∑ A(i, j; k, l)vykl(t) +
and area. One key target is therefore to reach to a best dt Rx C(k ,l )∈Nr (i , j )

∑B(i, j; k, l)v
performance for this emulation architecture under the
mentioned constraints. ukl +I
C(k ,l )∈Nr (i , j )

1 ≤ i ≤ M ;1 ≤ j ≤ N
Keywords— Cellular Neural Network; CNN Emulation on
FPGA; Simulation; HDL.
For modeling this equation in HDL, we must simplify
nonlinear terms. The simplified equation takes the form of
I. INTRODUCTION
equation (Eq-2) as following.
This paper briefly introduces a the design of digital
emulation of CNN based on hardware description languages.
Cellular Neural Network has been introduced by Chua and Eq-2:
Yang from the University of California at Berkeley in 1988
[4]. This type of neural networks is a reduced version of x& = − x + A * v y + B * vu + I
Hopfield Neural Networks. One of the most important features
of CNN is the local connectivity; in this technology each cell
In this equation 'A' is a template for feedback operator and
is connected only to its neighbor cells. Due to local connection
'B' is a template for control.
between a cell and the neighbors a hardware implementation
of this type of neural networks is easily realizable [5]. By
digitizing the analog behavior of this system (i.e. emulation on Eq-3:
a digital platform) one is able realize this system based on
FPGA. Due to the local connectivity and processing around Output Equation :
each cell the global system works like a parallel processor
v yij ( t ) = ( v xij ( t ) + 1 − v xij ( t ) − 1 )
1
system [6].
The behavior of a CNN system is based on the settings of
2
the template values. By changing these template values the 1 ≤ i ≤ M ;1 ≤ j ≤ N
CNN behavior is affected. This is a very feature for realizing a
According to the Equation-4, we are able to normalize the
input data.
The output Equation is a linear sigmoid function for
limiting the output state value. In some references the sigmoid Eq-4:
function is noted by f (.).
Converting equation-2 to a discrete time model is possible. U = Pixel_value * 2 – 1
A Discrete Time CNN can easily be mapped to an FPGA by
defining digital integrators, multipliers, adders and other
digital operators. After defining fundamental operators in The convolution module loads template values and inputs
FPGA and wiring of/between these operators, a dynamic data and then return the product of these values. The following
modeling of CNN is possible [7]. A single CNN cell model is block diagram shows the convolution operator I/O (see Fig.2).
like a first-order differential equation; therefore, solving this We set zero for out of bound values in the CNN array. The
equation by this architecture is possible. module cloning based on this diagram is simple in HDL code.

a1 a2 a3 b1 b2 b3

Conv Module C (a,b)

Figure 2. Convulotion I/O Block Diagram

This module of Fig.2 operates according to Equation-5.

Figure 1. Simplify CNN Cell Dynamic Model


Eq-5:

The architecture of this module consists of two main parts,


hardwire and behavioral sequential units. In the hardwire part
we must define the relationships of the CNN cells with their This module is common for TA*Y and TB*U.
neighbors by the template values. These connections are based The code below shows the calling conv2 function for solving
on convolution operators. With array cloning of convolution the convulotion between Control Template and Input Data on
modules in HDL we are able to develop/extent the size of the cell 7:
CNN module. Using this approach we can realize a growth of
the CNN structure up to a simple 8×1 width. In the CNN conv2 ccn7 (M0[7],VB1,VB2,VB3,16'd0,u7,u6)
structure, there are 2 type templates: control templates and
feedback templates. Due to the architecture of the feedback
path we have to define a memory block for a appropriate Further, the convolution on feedback template and output state
handling of this path. Another main memory unit is defined is similar to the code below:
for integrators components. In this system, all convolution unit
should work together concurrently. Therefore the results of
conv2 fcn7 (M1[7],VA1,VA2,VA3,16'd0,Y[7],Y[6])
TA*Y and TB*U are immediately accessible (see Fig.1).

More details on the convolution module are presented in the


In the top-level CNN module Verilog code defines 16-Bit
2’s complement variables for loading data and templates to paper appendix. Other main important units for developing
this module. One bit for sign and 3 bit for round value and 12 this module are the integrator and linear sigmoid function. To
implement an integrator in HDL we need a register. In
bit float value. Therefore, we are able to load values to this
module in the range of [-7, +7]. For the 12 bit fixed float previous steps we determined the convolution for feedback
register, the accuracy is 1/ ( 212 ). Input data range is limited to and control templates. In the integrator unit we must sum the
result in each new cycle with previous values of the register.
the range of [-1, +1]; -1 means black and +1 is white value.
This procedure means that for image processing purposes we The convolution module’s length defines on 18bits. According
must rescale the image values to this range. On the other hand, to the length of M0 and M1, the length of the integrator
register should be 32 bit. The following code below is
value of each gray pixel must be in range of [-1, +1].
obtained after synthesis and is like an 8 integrator that work Figure 3 shows the digital architecture of a CNN cell
concurrently. introduced in this paper. According to this architecture the
always @(posedge clk)
system is synchronous. Parts of integrator, sigmoid function
and loading state variable are triggered by a rising clock.
begin
for (j=0;j<=7;j=j+1)
III. SIMULATION RESULTS
begin
To simulate this system we use ModelSim 6 software. The
res[j] = S2[j]; result is very closed to the one obtained from a simulation in
Matlab.
end
For example, by setting TA= [0.5, +1, -1] and TB=Bias=0,
the following wave form appears. The converge time for this
In this code the term of S2 is the sum of C(TA,Y) and case in this architecture is 200ns (8 clock).
C(TB,U) from the previous cycle.
u= [+1,-1,-1, +1, +1,-1,-1, +1]
The best method to design a sigmoid function is to use an if-
TA= [0.5, +1, -1]
then rule. The following code below shows the way this unit
operates. “Greater than values” will be limited by this TB=Bias=0
procedure between +1 and -1. The output for this defined template is
for (j=0;j<=7;j=j+1)
xt=[+1,-1,+1,-1,+1,-1,+1,-1]
begin
if (res[j]>32'sh00000_000) // > 0
begin The converge time for the case below in this architecture is
Y[j]=16'h1_000; // +1 150ns (6 clock).
res[j]=32'h00001_000;
end u= [+1,-1,-1, +1, +1,-1,-1, +1]
if (res[j]==32'sh00000_000) // = 0 TA= [-1, +2, +1]
begin
Y[j]=16'h0_000; // 0 TB=Bias=0
res[j]=32'h00000_000; The output for this defined template is
end
if (res[j]<32'sh00000_000) // < 0
xt=[+1,+1,+1,+1,-1,+1,-1,+1]
begin
Y[j]=16'hf_000; // -1
In an advanced mode, there is another way for simulating the
res[j]=32'hfffff_000;
HDL code. In Matlab 2006a we have been able to establish a
end
connection between Modelsim simulator and Simulink.
End
In these units the “res” vector is a temporary register for By a TCP/IP connection between Simulink in Matlab and the
simulating the integrator and “Y” variable is a memory for Modelsim simulator we were able to test this CNN code on
storing CNN output state. We used 2 level memories for Images. We must operate this CNN module on each Image
designing totally a simple 8×1 CNN.. line separately.

CLK
TA= [-1, +2, +1]
TB=Bias=0

Figure 4. Hole Counting Sample, (Left) Input Image, (Right) output result.
Figure 3. Digital Architecture of CNN
[10] Sadeghi-Emamchaie, S., G. A. Jullien, et al. (1998). "Digital arithmetic
using analog arrays." VLSI, 1998. Proceedings of the 8th Great Lakes
TA= [+1, -1, +1] Symposium on: 202-205.
[11] Espejo, S., A. Rodriguez-Vazquez, et al. (1994). "Smart-pixel cellular
TB= [0, 1, 0] neural networks in analog current-mode CMOStechnology." Solid-State
Bias = 0 Circuits, IEEE Journal of 29(8): 895-905.

APPENDIX

// 1×3 Convolution Module

module conv2 (conv,VA1,VA2,VA3,Y1,Y2,Y3);

output [17:0] conv; //17

input [15:0]VA1;

input [15:0]VA2;

input [15:0]VA3;

input [15:0]Y1;
Figure 5. Filter Sample: (Left) Input Image, (Right) output result.
input [15:0]Y2;

input [15:0]Y3;
IV. CONCLUSION wire signed [17:0] conv;

wire signed [15:0] out1;


In this paper we introduced a new model for simulation wire signed [15:0] out2;
and digital implementation/emulation of digital CNN. Finally,
the model could be simulated and validate by several wire signed [15:0] out3;
templates. Future works are going to improve this module to signe_mul MUL1(out1,VA1,Y1);
realize a large-scale CNN based universal machine system.
signe_mul MUL2(out2,VA2,Y2);
signe_mul MUL3(out3,VA3,Y3);
REFERENCES assign conv = out1+out2+out3;

endmodule
[1] Martinez, J. J., F. J. Toledo, et al. "New emulated discrete model of
CNN architecture for FPGA and DSP applications." Lecture notes in
computer science: 33-40. // resule range [-7,+7] accuracy 12bit Fixed Float
[2] T. Roska, A. Rodriguez-Vazquez, “Review of CMOS implementations
of the CNN universal machine-type visual microprocessors”, IEEE Int. module signe_mul (out,a,b);
Symp. on Circuits and Systems, ISCAS 2000, Geneva-Italia, vol. 2, pp.
output [15:0] out;
120-123, 2000.
[3] Chua, L. O. and L. Yang (1988). "Cellular neural networks: input [15:0] a;
applications." Circuits and Systems, IEEE Transactions on 35(10):
1273-1290. input [15:0] b;
[4] Chua, L. O. and L. Yang (1988). "Cellular neural networks: theory." wire signed [15:0] out;
Circuits and Systems, IEEE Transactions on 35(10): 1257-1272.
[5] J.Zhao, Q. Ren, J. Wang, and H. Meng,"A New Approach for Image wire signed [31:0] mul_out;
Restoration Based on CNN Processor",ISNN 2007, Part III, LNCS
4493, pp. 821–827, 2007. assign mul_out = a*b;
[6] Toledo, F. J., J. J. Martínez, et al. (2005). "Image processing with CNN assign out = {mul_out[31],mul_out[26:12]};
in a FPGA-based augmented reality system for visually impaired
people." 8º Int. Work-Conference on Artificial and Natural Neural endmodule
Networks, IWANN: 906-912.
[7] Martinez-Alvarez, J. J., F. J. Garrigos-Guerrero, et al. "High
Performance Implementation of an FPGA-Based Sequential DT-CNN."
[8] Eric Y. Chou, Bing J. Sheu, Topzy H. Wu, Robert C. Chang ,"VLSI
Design of Densely-Connected Array Processors", Proceedings of the
International Conference on Computer Design: VLSI in computers &
Processor (ICCD '95)
[9] Lai, K. and P. Leong "Implementation of Time-Multiplexed CNN
Building Block Cell." Proc. MicroNeuro 96: 80-85.

You might also like