0% found this document useful (0 votes)

10 views11 pages

A Programmable Vlsi Architecture Based ON Processing? Multilayer CNN Paradigms For Real-Time Visual

Uploaded by

rohobak328

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views11 pages

A Programmable Vlsi Architecture Based ON Processing? Multilayer CNN Paradigms For Real-Time Visual

Uploaded by

rohobak328

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, VOL.

24,357-367 (1996)

A PROGRAMMABLE VLSI ARCHITECTURE BASED ON

MULTILAYER CNN PARADIGMS FOR REAL-TIME VISUAL
PROCESSING?
LUIGl R A F F O
Department of Electrical and Electronic Engineering, University of Cagliari, P i a z a d A r m i , 1-09123 Cagliari, Italy

AND

SlLVlO P. S A B A T l N l A N D GIACOMO M. B l S l O
Department of Biophysical and Electronic Engineering, University of Genova, Via all'opera Pin I I A , 1-16145 Genova, Italy

SUMMARY
A new digital VLSI architecture has been presented for the implementation of discrete-time multilayer CNNs. At
functional level, the architecture is organized as 12 layers of 64 x 64 cells which interact as specified by a set of 3D
generalized templates. At structural level the application of cloning templates occurs in a set of processing units
programmed by instruction masks, generated on the basis of the algorithm to be emulated. It is demonstrated that this
architecture is applicable to multilayer algorithms for visual processing and also to standard CNNs, including those that
use sequences of templates or that work in parallel. Simulations evidence the high efficiency of this implementation.

1. INTRODUCTION
Analogue CNNs have been proved to be very effective in various image-processing tasks, that can be
related to local interactions among processing units arranged in a two-dimensional grid. ' J However, for
machine vision processing and other related intelligent assignments it is necessary t o combine several
different elementary tasks defined by a series of templates to be used in sequence or conc~rrently.~ Hence
the need for programmable architectures emerges strongly.
Even though CNNs are especially tailored for analogue processing, since they can be therefore directly
mapped on a grid of simple analogue processors, and a number of papers have been devoted to the study
of specific analogue building blocks for CNN implementation and to programmable solutions based on
operational amplifiers, several practical constraints/limitations on the efficacy of analogue CNN
implementation are posed by occupation area, input-output interfacing, status memorization and flexibility.
Moreover, these solutions have a limited programmability of system parameters, thus preventing a global
reconfiguration of the structure of the elaboration (i.e. network structure for CNNs). Hence architectural
solutions able to map on several CNN paradigms the same computational substrate are strongly desirable to
fully exploit the potentialities of CNNs in real applications.
In this paper, starting from a generalized reformulation of cell dynamics for multilayer CNNs, we
present a reconfigurable digital VLSI architecture able to fulfil both these demands on programmability and
the requirements of higher efficiency with respect to commercial DSP or hardware accelerators . This
architecture will be motivated in relation both to standard CNN templates and to a specific algorithm4 based
on a multilayer cortical-like computational model of preattentive visual processing.

t Pan ofthis research has been reported in the Proceedings of the 1994 IEEE International Workshop on Cellular Neural Networks
and Their Applications held in Rome.

CCC 0098-98861961030357- 11 Received 23 January I995

0 1996 by John Wiley & Sons, Ltd. Revised I I July 1995
358 L.RAFFO, S. P. SABATINI AND G. M. BISIO

2. FROM ALGORITHM TO ARCHITECTURE SPECIFICATION

In discrete time the dynamics of a CNN cell can be described by the recursive algorithm
x,(n + 1) = c
I /t NJri)
y,(n)) +
AIj,dY,](~)? c
A/EN,(rl)
~ l , . L / ( ~ l f ( 4 *udn))

ME N, ( t ) )

Y(N) =f(x(n))
~ ( 0=)xn
where x , y , u, and I denote cell state, output, input and bias, respectively; N, is the r neighbourhood of the
cell ( i , j ) ; A, A', B , B' are the cloning templates; z is the memory duration time; and f is the non-linear
output function. It is noteworthy that if the input u is constant during the iteration, the delayed B-template
(B') is null.
N-dimensional generalizations (multilayer CNNs) of discrete-time CNNs can also be formulated on the
basis of the multilayer generalization introduced by Chua and Yang.' In multiple-layer CNNs each cell is
characterized by several variables instead of only one state variable as in the single-layer case. We can
observe that, whatever its level of complexity, a discrete-time multilayer CNN can be computationally
described by an ensemble of nodes locally interacting to reach the prescribed computation.
The set of L nodes associated with each cell can be viewed as the components of a column vector v j j
representing the whole set of inputs, outputs and delayed outputs related to the corresponding location ( i , j )
on the cell layer. Each set of nodes interacts only with neighbouring sets through a generalized 30 template
0 that spans the L layers:

where a and 6 index the components of the column vector (i.e. the layer) and ( i , j ) the elements of a layer.
For each component of a vector v j j one can recognize in the corresponding section of the 3D template
the conventional control operators B and B ' and the feedback operators A and A I. For example, the CNN
described in equation (I) with null delayed templates can be implemented assuming L = 3: v' = I , v 2 = u ,
v3 5 y ; D 2 2 =D3' = 1 for ( i , j )= (k,I) and null otherwise, D3'= B , 033A , OI2= 0 ' 3 0 2 ' 5 D 2 3 , 0.
g ' ( x ) = x , g 2 ( x ) = x, g3(.)= f(-).if delay temperatures are present, more components of ?I should be
considered, one for each previous output (and/or input) present in the algorithm. For example, t = 1: v l = I ,
v 2 = u , v 3 = y , v 4 = y ( t - 1); D " = D 2 2 = D D 3 1 r D 4 3 = I f(oi ,rj ) = ( k , l ) andnullotherwise, D 3 2 = B , D 3 3 = A ,
D34=A-', 0 ' 2 , D " = D2' 0 2 3 , DI4= D z 4DMs ~ 0 4 2 , D4' ~ 0g ' (; x = X , g3(.) = f ( . ) , g 4 ( . ) s x .
It is worth noting that in this way D specifies not only the strength of connections among the cells of the
CNN but also the interconnection structure of the CNN itself, thus allowing us to achieve the higher degree
of programmability required. This is the form of computation to which we should refer to for devising
architectural solutions.

3. ARCHITECTURAL SPECIFICATION

3.1. Organization of architectural resources

Considering linear, space-invariant templates, equation (2) is require to perform convolution operations
with a proper set of masks. It is easy to verify that a direct implementation, i.e. a circuit composed of a
single adder-multiplier performing these calculations over the whole cellular array, would result in an
infeasible solution for real-time needs. In addition to the performance bottleneck, the need for accessing
each single cell of the layers implies an inefficient memorization scheme. On the other side, conceiving one
such device for each cell would result in an excessive silicon area. A trade-off can be sought by considering
VLSI ARCHITECTURE BASED ON MULTILAYER CNN PARADIGMS 359

a high-level transformation of the original specification through unrolling of the innermost loop of
elaboration, to extract the implicit parallelism contained in the original specification, and subsequent loop
f i ~ l l i n g,' to exploit this parallelism by means of pipelining. This leads to an architectural specification
based on a limited set (one per layer) of processing units able to evaluate the new state of a cell through
few iterations but no reload of already processed input data. Two main blocks characterize the architecture:
the storage blockjn which the vector v are stored, and the processing block, which updates each vector
according to the cloning ternplates D. In this respect each template can be viewed as a 3D array. Since many
elements of a cloning template are null, in order to make the implementation more efficient, the 3D
template can be projected onto a reduced number of 2D masks (see Figure 1).

S 2 Iirzplementation
Limits on VLSI technology, power consumption and speed of computation pose some constraints on the
number of layers and cells, on the dimension of the instruction masks and on the number of bits to
represent weights. The trade-off between performance and available resource depends on the target
application domain, specified later in Section 4.2. On the basis of it we consider 12 layers of 64 x 64 cells,
interactions among first neighbours only (i.e. 3 x 3 x 12 cloning template) and weight magnitudes specified
with 3 bits as power of 2 (the successive non-linear block takes care of scaling). With this choice a
compact memorization of weights is achieved and weight multiplications occur through arithmetic shifts.
The architectural schema of our system is illustrated in Figure 2.
The storuge block is based on a single-port RAM. The currentlprevious outputs of cellular neural
network elements are stored in 64 x 64 locations of 96 bits, functionally subdivided into 12 groups of
8 hits to implement 12 layers (L1 , ...,L12).
The processing block is composed of 12 processing units (one for each layer), 12 pairs of 16 bit row
buffers and 12 instruction mask sets that play the role of cloning templates. The behaviour of each
proceccing unit is controlled by its set of masks, whose elements determine the weight sign and magnitude
anti the number that identifies the layer in which to read the output. Specifically, each element of a mask is

LA Y

mask1
Ki -w3

mask2
, . I
+Wl +w2 +w3 -wl -w2 -w3 . .

Figurc 1, A pictorial view of the generalized 3D CNN.On the left side, all the nodes contributing to the output of the marked cell in
layer L3 are evidenced. (positive weights are represented the corresponding 3D cloning template, while below the related 2D
projection masks are represented
360 L. RAFFO, S. P. SABATINI AND G. M.BISIO

I/O i I PROCESSING BLOCK

STORAGE BLOCK
Figure 2. Overall structure of the architecture; control signals arc omitted

from
RAM 1.111..

I
mNL
BLOCK
-.
signs s

'
BUFFERS 1
' I*--
Figure 3. The first and last processing units are depicted. From the bus they rcceivc the same datum referred to a complete colurnn
of cells. The data processor extracts and manipulates the data according to the actual mask element (sec text). When a datum
becomes available at the end of the adder cascade, it is stored in the buffer and then moved to the RAM
VLSI ARCHITECTURE BASED ON MULTILAYER CNN PARADIGMS 36 1

composed of two fields: the first specifies the weight (null flag, sign and magnitude); the second addresses
the layer in which the mask has to act.
At each iteration the actual value of the vector v is moved from the RAM (scanned row by row) towards
all the processing units (see Figure 3). In each processing unit: (i) each data processor (see Figure 4)
extracts a portion of the datum according to the content of the element of its masks, then shifts and
complements the result if requested; (ii) a cascade of adders operates to add it to the partial sum coming
from the preceding rows stored in the buffer.
At each iteration we need to have available three rows of data (the preceding, the actual and the next).
The data belonging to these rows are sent to the processing unit, row by row, element by element. When all
the data of a row are transferred to the processing unit in the buffer, the convolution between the row
considered and the first row mask is available. The content of the buffer is the starting value for the
convolution of the second tOW of the mask with the actual row and so on for all the masks. When a row is
completely processed, it cannot be moved directly to the RAM, because its values should be used for the
next row. Hence we need another buffer to store it for the time in which the next row is processed. When the
processing of the next row is completed, the content of the second buffer is moved to the RAM through a
non-linear block implemented by a clipping function with a programmable slope (Figure 5).

mask element I INSTRUCTION DECODER I

Figure 4. A data processor is depicted. The data from the 96 bit bus is subdivided into 12 blocks connected to and 8 bit bus through
three state buffers. The 8 bits are inverted o r buffered (according to the signs of the weights) and arithmetically shifted. To complete
the 2-complementation, the resulting value is incremented by one if the weight is negative

-1
./// ,'- - -
I//, - -
z< - ,
0-
-4 -3 -2 -1
c
1 2 3 4 x

Figure 5 . Transfer functions of the non-linear programmable block

362 L. RAFFO, S . P. SABA'MNI AND G. M. BISIO

This schema limits the number of transfers to the RAM, allowing the storage of the values useful only
for the next iteration and avoiding the physical duplication of the storage block.
Thanks to the horizontal pipeline schema and the parallelization of fetch from rnernory with mask
computation, the number of clock periods needed for an iteration update is nine, since a convolution mask
needs three buffer updates each lasting three clock periods (for data processing and sum).

4. APPLICATIONS

Single-layer CNN algorithms

Most applications of CNNs could be performed by our architecture too. In its simplest discrete-time
formulation a CNN can be implemented with two layers, i.e. one input layer and a status/output layer.

OUTPUT OUTPUT OUTPUT

STATUS

OUTPUT OUTPUT OUTPUT

OUTPUT

& CURRENT STATUS

PREVIOUS STATES

(4
Figure 6. Possible utilization of the architecture for CNN algorithms: (a) different CNNs performing tliff'crent computations on the
same input; (b) several CNNs working in parallel on different inputs; (c) a delay-type CNN
VLSI ARCHITECTURE BASED ON MULTILAYER CNN PARADIGMS 363

Hence our architecture is able to implement both several CNNs working in parallel and delay-type CNNs,
as sketched in Figure 6 and detailed in the following two examples.
A delay template CNN can he implemented using a layer for each previous state of the elaboration we
are interested in. This architecture can implement a CNN with t C 10.
Edge detection. Many cloning templates for edge detection have been presented.' In Figure 7 is shown
the result of the implementation of a 3 x 3 cloning template A with circular symmetry (2 in the middle,
-0-25 the neighbours). This operator is mapped on the architecture according to the example of Section 2.
Connected component detection. We present in Figure 8 the results of the simulation of the CNN
proposed in Reference 8 for connected component detection (see caption).

4.2. A multilayer algorithm for preattentive visual tasks

Problem description. Many machine vision processing tasks are based on the recurrent application of
simple and uniform operators on a large set of data representing the image. These applications usually
require real-time performances that cannot be achieved by software implementation. In particular solving
visual tasks requires one (i) to extract elementary information from the data image (e.g. contrast, contrast
differences, etc.) and (ii) to merge such information in a global unifying percept. Both operations resort to
point and local interactions within restricted portions of the image. For an efficient hardware design it is
important to have a structure based on simple modules locally connected to limit communication overhead.
To this end, by studying biological solutions for vision processes and especially those evolved in visual
c ~ r t e x , one ~ " derive the following set of computational paradigm^.^
~ ~ can
1 . Local feature extraction. Each cell analyses the input image by performing a weighted sum over the
portion of the image around the current pixel.
2. Topology preservation. Adjacent locations on the visual cortex (i.e. the output port) correspond to
adjacent locations in the image, thus preserving the topographic organization of the image.
3. 3 0 mapping of local information. The 3D structure representing the cortex is composed of layers,
organized hierarchically. Each cell in a layer gains its properties both through feedforward

-
I.,* __.. :
1
i

(4 (b)

Figure 7. (a) Test image. (b) Output of the edge detection CNN using the template of Reference 7

Figure 8. (a) Test image. (b) Output of a connected component detection CNN with template of Reference 8. (c) Same as (b) using
a delay-type template A' with t = 3.' A' is mapped on the architecture by considering three additional layers in which the outputs of
the previous ones are copied at each step, realizing a memory of the last four output values
364 L. RAFFO, S. P. SABATINI AND G. M. BISIO

connections from cells in the previous layer and through horizontal and vertical, locally confined
recurrent paths. These computations, together with topology preservation, ensure a direct
correspondence between the morphology of connections and the detection of spatial relations among
featural elements.
Algorithm .ym$kxztion. The fundamental module of the model is a ‘column’, i.e. an ensemble of
orientation-selective cells present in simple, coniplex, and hypercomplex layers at the same location (see
Figure 9(a)). Each layer can be described as being composed of a number of (e.g. four) subluyers, each of
which can be described as a 2D regular grid of cells selective to the same oriented featural element. The
simple layer is the input layer and provides computational primitives to the complex layer to extract
oriented featural elements: the excitation e,(i,j , 8) reflects the dominant featural element among those
detected by a convolution with different kernels. The excitation of a neuron in the complex layer belonging
to column (i,;), with orientation preference 8, is the result of four contributions: direct excitation
z , = g ( r , ) from the corresponding position in the simple layer, where g ( . ) is a sigmoidal transfer function;
feedforward inhibition from a set M ( B ) of simple cells; recurrent inhibition from a set Nc(i, j, 6) of
complex neurons; positive feedback zh from the corresponding neuron in the hypercomplex layer. The
excitation of neurons in the tiyperromnpfexlayer results from two contributions: the feedforward actions
from il set f , ( i , j , 0) of neurons in the complex layer and the cross-orientation inhibition from a set
N,,(i,j , 0) of neurons in the hypercomplex layer (see Figure 9(b)).
Summarizing, the algorithm can be described by the following system of equations:

nub 12
Br rub 11
(33 iub 10

a sub 9
Layer

nub 6
sub 7
iub 6

sub 6
tar

Bub 4
nub 3

sub 2

84 rub 1
nimple layer

(a) (b)
Figure 9. (a) Artistic view of columns: the fundamental module of the neural computational model for visual processing
( s = simple; c = coniplex: h = hypercomplex). The arrows evidence feedforward and recurrent interactions occurring among layer.
(b) Feedforward, inhibitory and recurrent connection schemata among cells
VLSI ARCHITECTURE BASED ON MULTILAYER CNN PARADIGMS 365

where I(m, n ) denotes the intensity of a pixel at point ( m , n ) in the image plane; wP(rn,n , i , j , 8 ) with
p = 1 , 2 , 3 , 4 are the kernels of different contrast selectivity that describe the receptive field profile of the
neuron belonging to column ( i , j ) ; wsc, w,,, wh,, Whh and w,h denote the weights of connection from simple
to complex (feedforward), from complex to complex (intralayer), from hypercomplex to complex
(feedback), from hypercomplex to hypercomplex and from complex to hypercomplex respectively; and k is
the iteration index.
It is worth noting that the feedforward inhibition schema M ( 8 ) does not depend on the position of the
neuron considered in the layer; a complex neuron selective to I3 is inhibited by the two simple neurons (with
similar orientation preferences) belonging to the same column. N c ( i , j , 0) depends on the orientation
preference 8 of the target neuron; more precisely, a neuron selective to 8 receives inhibitory inputs from
two complex neurons (selective to I3 + n/2) that belong to the two closest columns lying along an axis
orthogonal to 8.
The set L ( i , j , 8 ) depends on the orientation preference of the target neuron. More specifically, the
connection schema can be defined as follows: if the target neuron is selective to 8, then the complex
neurons that provide the input are selective to 8 and belong to neighbouring columns that lie on an axis
oriented along 8. Typical values for the number of columns involved in the interaction range from three to
seven, but three is sufficient for major applications.

Architectural mapping. The functionality of complex and hypercomplex layers can be mapped on the
architecture presented here, while the functionality of simple cells has to be implemented by a specific
convolution block. I ' This block performs convolutions with four pairs of orthogonal filters (oriented along
0", 45", 90" and 135" directions), by four-pixel steps, and provides as output for each orientation the
maximum of the absolute value of the convolution pairs (see equation (3)) on an array of 64 x 64
elements. In this way, with an input image of 256 x 256 pixels, the resulting convolution is an array of
61 x 61 elements for each orientation. It is noteworthy that the frequencial selectivity of the masks of the
convolution blocks will determine the capability of the whole network to be sensitive to particular textures.
The outputs of the convolution stage z, are stored as excitation inputs in the four layers of the simple
layer. The statuses of complex and hypercomplex cells are stored in the corresponding quartet of layers.
The values in complex and hypercomplex layers are updated according to the programmed rules and the
values stored in all the layers. This occurs by setting the generalized template D of equation (2) according
to the explicit algorithmic formulation of equations (4) and (5).

Simulation results, performance and implementation perspectives. We have tested this implementation
on a natural textured image." The simulations presented here concern texture segregation on natural
images. In Figure 10 the test image and the content of the four hypercomplex layers of the architecture are
presented at convergence. The image is subdivided into four square areas that represent the resulting images
for the four types of orientation-selective cells along 0", 45", 90" and 135". The luminous intensity of a
pixel codes the activity of the corresponding neuron: if the pixel is light, the neuron is active; if the pixel is
dark, the neuron is inhibited; if the pixel has an intermediate value, the corresponding neuron is silent (i.e.
the neuron is not selective to the stimulus present in its receptive field). Taking into account the number of
elements per layer (64 x 64), the number of masks per layer (four) and the number of iterations (10) and
assuming a clock frequency of 50 MHz, a complete texture segregation of 256 x 256 pixel images is
estimated be obtained in about 30 ms, allowing one to process images at a commercial camera frame rate
(25 images/second). The VLSI design of this architecture is being pursued using a standard cell approach
366 L. RAFFO, S . P.SABATINI AND G. M. BISIO

(h)

Figure 10. (a) 256 x 256 pixel test image. (ti)Outputs of the four hypcrcornplex layers for the four angles (OO, 45'. 90° and 135'),
evidencing the pressure of textural features of corresponding orientation

with an appropriate customized memory rnodule generator. On the basis of a similar implementationt3it is
estimated that 15 mm x 15 mm of silicon in a 0.5 p m technology will be necessary.

5 . CONCLUSIONS
We have considered a digital VLSI architecture for the implementation of multilayer CNNs. This
architecture combines programmability with high efficiency. This has been achieved with the following
strategy: (i) an elementary recursive algorithm has been defined as the building block of every multilayer
C N N by introducing 3D generalized templates that tit well to a direct VLSI mapping; (ii) such sparse 3D
templates are projected onto a small set of 2D templates; (iii) the recursive operations of the whole
algorithm are sequenced with high efficiency using programmable dedicated architectural resources,
In comparison with general-purpose CNN implementations such as the CNN universal machine, l4 the
following major differences can be evidenced: (i) the issue of programmability for this reconfigurable
VLSI ARCHITECTURE BASED ON MULTILAYER CNN PARADIGMS 367

digital architecture has been explored in a specific application context, though this architectural approach
could be extended to other domains of application; (ii) a fully digital solution has been pursued.

ACKNOWLEDGEMENTS

This work was supported in part by CEC ESPRIT-BRA Project CORMORANT. The authors wish to thank
Dr. Paolo Faraboschi and Dr. Giovanni Nateri for useful suggestions.

REFERENCES
I . J. Vandewalle and T. Roska, Guest editorial-special issue on cellular neural networks’, Int. J. cir. theor. appl. 20, 449-451
(1992).
2. CNNA-94, IEEE, New York, 1994.
3. K. Halonen, V. Porra and T. Roska, ‘Programmable analogue VLSI CNN with local digital logic’, Int. J. cir. theor. appl., 20,
573-582 (1992).
4. G. Indiveri, L. Raffo, S . Sabatini and G. Bisio, ‘A neuromorphic architecture for cortical multi-layer integration of early visual
tasks’, Machine Vision Appl., in press.
5. L. Chua and L. Yang, ‘Cellular neural networks: theory’, IEEE Trans Circuits and Systems, CAS-35, 1257-1272 (1988).
6. G. Goossens, J. Rabaey, J. Vandewalle and H. De Man, ‘Loop optimization in register transfer scheduling for DSP systems’,
Proc. 26th ACMIIEEE Design Automation Conf., IEEE, New York, 1989.
7. L. Chua and C. Wu, ‘On the universe of stable cellular neural networks’, Int. j . cir. theor. appl. 20,497-518 (1992).
8. T. Roska and L. Chua, ‘Cellular neural networks with non-linear and delay-type template elements and non-uniform grids’, Int.
J. cir, theor. appl,, 20, 469-482 (1992).
9. D. Van Essen, C. Anderson and D. Felleman, ‘Information processing in the primate visual system-an integrated system
perspective’, Science, 255,419-423 (1992).
10. S. Grossberg, E. Mingolla and D. Todovoric, ‘A neural network architecture for preattentive vision’, IEEE Trans. Biomed. Eng.,
BE-36, 65-83 (1989).
1 1 . L. Raffo, S. Sabatini, G. Indiveri, G. Nateri and G. Bisio, ‘A memory-based recurrent neural architecture for chips emulating
cortical visual processing’, IEICE Trans. Electron., E77-C (1994).
12. P. Brodatz, Textures, a Photographic Album for Artists and Designers, Dover, New York, 1966.
13. M. Valle, G. Nateri, D. Caviglia, G. Bisio and L. Bnozzo, ‘An ASIC design for real time image processing in industrial
applications’, Proc. EDTC’95, 1995, pp. 385-390.
14. T. Roska and L. Chua, ’The CNN universal machine: an analogic array computer’, IEEE Trans. Circuits and Systems [ I ,
CAS-40, 163-173 (1993).

BS EN 12524-2000 Hygrothermal Properties
No ratings yet
BS EN 12524-2000 Hygrothermal Properties
14 pages
Unix PPT Lesson
75% (4)
Unix PPT Lesson
70 pages
PSLE Maths 2020 Paper 1 Booklet B
No ratings yet
PSLE Maths 2020 Paper 1 Booklet B
8 pages
Champion Aviation Product Application / SkySupplyUSA
No ratings yet
Champion Aviation Product Application / SkySupplyUSA
64 pages
EUROCOD 5 - Design of Timber Structures - General Rules
100% (1)
EUROCOD 5 - Design of Timber Structures - General Rules
72 pages
C Program by Best Author
No ratings yet
C Program by Best Author
358 pages
Kobelco 6E - Hyd Motors PDF
100% (1)
Kobelco 6E - Hyd Motors PDF
26 pages
Production of Ceramic Foam Filters For Molten Meta
No ratings yet
Production of Ceramic Foam Filters For Molten Meta
5 pages
Heat of Combustion Lab 2
No ratings yet
Heat of Combustion Lab 2
14 pages
Crude Oil Emulsions A State-Of-The-Art Review
100% (3)
Crude Oil Emulsions A State-Of-The-Art Review
11 pages
The Definite Integrals
No ratings yet
The Definite Integrals
25 pages
Dimensionality Reduction in Hyperspectral Image Analysis Using Independent Component Analysis
No ratings yet
Dimensionality Reduction in Hyperspectral Image Analysis Using Independent Component Analysis
19 pages
Advanced Micro Controller: Unit I - AVR Microcontroller
No ratings yet
Advanced Micro Controller: Unit I - AVR Microcontroller
52 pages
Cellular Neural Networks: A Review
No ratings yet
Cellular Neural Networks: A Review
31 pages
Object Oriented Analysis and Design
No ratings yet
Object Oriented Analysis and Design
12 pages
An Introduction To Convolutional Neural Networks: Abstract
No ratings yet
An Introduction To Convolutional Neural Networks: Abstract
11 pages
Table 1: Sales and Advertising Data Agent Sales Advertising: Regression Statistics
No ratings yet
Table 1: Sales and Advertising Data Agent Sales Advertising: Regression Statistics
10 pages
Visual and Audio Signal Processing Lab University of Wollongong
No ratings yet
Visual and Audio Signal Processing Lab University of Wollongong
20 pages
Eng CD 2374900 A4-3077475
No ratings yet
Eng CD 2374900 A4-3077475
4 pages
Fajfar 1998
No ratings yet
Fajfar 1998
9 pages
College of Engineering Science and Technology Department of Computing Science & Information Systems
No ratings yet
College of Engineering Science and Technology Department of Computing Science & Information Systems
3 pages
Finite - Element - Modeling - of - Prestressed - Concrete - SP
No ratings yet
Finite - Element - Modeling - of - Prestressed - Concrete - SP
11 pages
10 1002@sici1097-007x199901@0227@15@@aid-Cta383 0 Co2-C
No ratings yet
10 1002@sici1097-007x199901@0227@15@@aid-Cta383 0 Co2-C
19 pages
An Introduction To Convolutional Neural Networks
No ratings yet
An Introduction To Convolutional Neural Networks
7 pages
CH-10 Boiler Performance
No ratings yet
CH-10 Boiler Performance
19 pages
A Dynamically Configurable Coprocessor For Convolutional Neural Networks
No ratings yet
A Dynamically Configurable Coprocessor For Convolutional Neural Networks
11 pages
IVECO Daily E6 Van Spec Sheet
No ratings yet
IVECO Daily E6 Van Spec Sheet
8 pages
Acceleration and Optimization of Artificial Intelligence CNN Image Recognition Based On F
No ratings yet
Acceleration and Optimization of Artificial Intelligence CNN Image Recognition Based On F
5 pages
Performance Modeling For CNN Inference Accelerators On FPGA
No ratings yet
Performance Modeling For CNN Inference Accelerators On FPGA
14 pages
CNN MLFA Ons-Part1
No ratings yet
CNN MLFA Ons-Part1
65 pages
3098 15835 1 PB 2011 PDF
No ratings yet
3098 15835 1 PB 2011 PDF
6 pages
Classroom Activity - Externally Applied Loads
No ratings yet
Classroom Activity - Externally Applied Loads
1 page
Cafpga: An Automatic Generation Model For CNN Accelerator
No ratings yet
Cafpga: An Automatic Generation Model For CNN Accelerator
30 pages
Signals and Systems PDF
No ratings yet
Signals and Systems PDF
1 page
Configuring A Build Pipeline On Azure DevOps For An ASP - Net Core API - CodeProject
No ratings yet
Configuring A Build Pipeline On Azure DevOps For An ASP - Net Core API - CodeProject
18 pages
Road Traffic Analysis On The CNN Universal Machine: Emel Arslan, Zeynep Orman
No ratings yet
Road Traffic Analysis On The CNN Universal Machine: Emel Arslan, Zeynep Orman
6 pages
Electronics 08 00065
No ratings yet
Electronics 08 00065
19 pages
IO Wheel Balancer WB220L - CE - 1.1 - ENG - Set910710984
No ratings yet
IO Wheel Balancer WB220L - CE - 1.1 - ENG - Set910710984
18 pages
Data and Hardware Efficient Design For Convolutional Neural Network!
No ratings yet
Data and Hardware Efficient Design For Convolutional Neural Network!
10 pages
Fully Convolutional
No ratings yet
Fully Convolutional
4 pages
Kanoria Shubham Anil 2023HT01569
No ratings yet
Kanoria Shubham Anil 2023HT01569
9 pages
Ec34 Question Bank
No ratings yet
Ec34 Question Bank
6 pages
A Genetic Programming Approach To Designing Convolutional
No ratings yet
A Genetic Programming Approach To Designing Convolutional
9 pages
Designing Convolutional Neural Network Architecture Using Genetic Algorithms
No ratings yet
Designing Convolutional Neural Network Architecture Using Genetic Algorithms
7 pages
Outline For Photosynthesis
No ratings yet
Outline For Photosynthesis
6 pages
Deep Learning Unit 5
No ratings yet
Deep Learning Unit 5
23 pages
Ane Cient Implementation of 2D Convolution in CNN: Jing Chang and Jin Sha
No ratings yet
Ane Cient Implementation of 2D Convolution in CNN: Jing Chang and Jin Sha
8 pages
Nested List Home Work
No ratings yet
Nested List Home Work
2 pages
Lec 4 Paper - Cellular Neural Network
No ratings yet
Lec 4 Paper - Cellular Neural Network
13 pages
2 Convolutional Neural Network For Image Classification
No ratings yet
2 Convolutional Neural Network For Image Classification
6 pages
A Survey of FPGA Based Accelerators For
No ratings yet
A Survey of FPGA Based Accelerators For
32 pages
A CNN Accelerator On FPGA With A Flexible Structure
No ratings yet
A CNN Accelerator On FPGA With A Flexible Structure
6 pages
3 Distributing Tensor Flow Across Devices and Ser 241120 095224
No ratings yet
3 Distributing Tensor Flow Across Devices and Ser 241120 095224
47 pages
Cellular Neural Networks: A Review
No ratings yet
Cellular Neural Networks: A Review
31 pages
Cellular Neural Network.....
No ratings yet
Cellular Neural Network.....
20 pages
3 Magnetic Effect of Current and Magnetism
No ratings yet
3 Magnetic Effect of Current and Magnetism
12 pages
Systematic Analysis of FPGA-based Hardware Acceler
No ratings yet
Systematic Analysis of FPGA-based Hardware Acceler
9 pages
FFCNN: Fast FPGA Based Acceleration For Convolution Neural Network Inference
No ratings yet
FFCNN: Fast FPGA Based Acceleration For Convolution Neural Network Inference
5 pages
Efficient Hardware Architectures For Deep Convolutional Neural Network
No ratings yet
Efficient Hardware Architectures For Deep Convolutional Neural Network
13 pages
Dimensioning and Tolerances
No ratings yet
Dimensioning and Tolerances
51 pages
Convolution Optimization For DNN
No ratings yet
Convolution Optimization For DNN
14 pages
Image Skin Cancer Classification Based On FPGA and Convolutional Neural Network
No ratings yet
Image Skin Cancer Classification Based On FPGA and Convolutional Neural Network
7 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
Implementation of Deep Neural Networks Learning On Unmanned Aerial Vehicle Based Remote-Sensing
No ratings yet
Implementation of Deep Neural Networks Learning On Unmanned Aerial Vehicle Based Remote-Sensing
7 pages
A Deep Learning Prediction Process Accelerator Based FPGA PDF
No ratings yet
A Deep Learning Prediction Process Accelerator Based FPGA PDF
4 pages
Paper 2
No ratings yet
Paper 2
99 pages
5 Lecture 28 01 25
No ratings yet
5 Lecture 28 01 25
47 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
BM466 - Homework 4
No ratings yet
BM466 - Homework 4
10 pages
The Art of CNN Template Design: Ákos Zarándy
No ratings yet
The Art of CNN Template Design: Ákos Zarándy
20 pages
Cao 2019
No ratings yet
Cao 2019
5 pages
2017.01.jssc - Eyeriss Design
No ratings yet
2017.01.jssc - Eyeriss Design
12 pages
Irmak2021energy Efficient
No ratings yet
Irmak2021energy Efficient
4 pages
CSC270 DB CDF V4.0
No ratings yet
CSC270 DB CDF V4.0
2 pages
A Programmable CNN Architecture and Its Hardware-Software Co-Design Approach For Image Processing and Stimulating Visual Illusions
No ratings yet
A Programmable CNN Architecture and Its Hardware-Software Co-Design Approach For Image Processing and Stimulating Visual Illusions
6 pages
FlashLoanExample Sol
No ratings yet
FlashLoanExample Sol
3 pages
An Efficient Reconfigurable Hardware Accelerator For CNN
No ratings yet
An Efficient Reconfigurable Hardware Accelerator For CNN
5 pages
CNN Notes Architecture
No ratings yet
CNN Notes Architecture
4 pages
Module 05
No ratings yet
Module 05
10 pages
ML Seminar
No ratings yet
ML Seminar
58 pages
7-Research On FPGA High-Performance Implementation Method of CNN
No ratings yet
7-Research On FPGA High-Performance Implementation Method of CNN
5 pages
Emerging Field of NoC-Based Convolution Architecture With Systolic Arrays
No ratings yet
Emerging Field of NoC-Based Convolution Architecture With Systolic Arrays
6 pages
High Throughput and Low Bandwidth Demand Accelerating CNN Inference Block-By-block On FPGAs
No ratings yet
High Throughput and Low Bandwidth Demand Accelerating CNN Inference Block-By-block On FPGAs
9 pages
Design of A Lightweight Convolutional Neural Network Accelerated by FPGA
No ratings yet
Design of A Lightweight Convolutional Neural Network Accelerated by FPGA
4 pages
2022 Review of FPGA-Based Accelerators of Deep Convolutional Neural Networks
No ratings yet
2022 Review of FPGA-Based Accelerators of Deep Convolutional Neural Networks
7 pages
IGNOU BCA Fundamentals of Computer Networks Previous Year Unsolved Papers BCS 041
From Everand
IGNOU BCA Fundamentals of Computer Networks Previous Year Unsolved Papers BCS 041
Manish Soni
No ratings yet
Analog Dialogue, Volume 47, Number 1: Analog Dialogue, #9
From Everand
Analog Dialogue, Volume 47, Number 1: Analog Dialogue, #9
Analog Dialogue
No ratings yet
Advances in Chemical Physics
From Everand
Advances in Chemical Physics
Stuart A. Rice
No ratings yet

A Programmable Vlsi Architecture Based ON Processing? Multilayer CNN Paradigms For Real-Time Visual

Uploaded by

A Programmable Vlsi Architecture Based ON Processing? Multilayer CNN Paradigms For Real-Time Visual

Uploaded by

INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, VOL.

A PROGRAMMABLE VLSI ARCHITECTURE BASED ON

CCC 0098-98861961030357- 11 Received 23 January I995

2. FROM ALGORITHM TO ARCHITECTURE SPECIFICATION

3.1. Organization of architectural resources

I/O i I PROCESSING BLOCK

mask element I INSTRUCTION DECODER I

Figure 5 . Transfer functions of the non-linear programmable block

Single-layer CNN algorithms

OUTPUT OUTPUT OUTPUT

OUTPUT OUTPUT OUTPUT

& CURRENT STATUS

4.2. A multilayer algorithm for preattentive visual tasks

You might also like