CH 9

Uploaded by

Gem Kartik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

89 views41 pages

CH 9

Uploaded by

Gem Kartik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 41

Chapter 9 Convolutional Networks Convolutional networks (LeCun, 1989), also known as convolutional neural networks, or CNN: ing data that has a known grid-like topology. Examples include time-series data, which ean be thought of as a 1-D grid taking samples at regular time intervals, and image data, which can be thought of as a 2-D grid of pixels. Convolutional networks have been tremendously successful in practical applications. ‘The name “convolutional neural ‘ork employs a mathematical operation called convolution. Convolution is a specialized kind of linear operation. Convolutional are a specialized kind of neural network for proces network” indicates that the n networks are simply neural networks that use convolution in place of general matrie multiplication in at least one of their layers. In this chapter, we first describe what convolution is, Next, we explain the motivation behind using convolution in a neural network. We then describe an operation called pooling, which almost all convolutional networks employ. Usually, ly to the definition of convolution as us such as engineering or pure mathematies. We describe several variants on the convolution function that We also show how convolution may be applied to many kinds of data, with different numbers of dimensions. We then discuss means of making convolution more efficient. Convolutional networks stand out as an example of neuroscientific principles influencing deep learning, the operation used in a convolutional neural network does not correspond preci ed in other fields, are widely used in practice for neural networ! We discuss these neuroscientifie principles, then conclude with comments about, the role convolutional networks have played in the history of deep learning. One topic this chapter does not address is how to choose the architecture of your convolutional network. The goal of this chapter is to describe the kinds of tools that convolutional networks provide, while chapter 11 describes general guidelines 326CHAPTER 9, CONVOLUTIONAL NETWORKS for choosing which tools to use in which circumstances. Research into convolutional network architectures proceeds so rapidly that a new best architecture for a given benchmark is announced every few weeks to months, rendering it impractical to describe in print the best architecture. Nonetheless, the best architectures have consistently been composed of the building blocks described here. 9.1 The Convolution Operation In its most general form, convolution is an operation on two functions of a real- valued argument, To motivate the definition of convolution, we start with examples of two functions we might use. Suppose we are tracking the location of a spaceship with a laser sensor. Our laser sensor provides a single output x(t), the position of the spaceship at time t Both « and ¢ are real valued, that is, we can get a different reading from the laser sensor at any instant in time. ow suppose that our laser sensor is somewhat nois estimate of the spaceship’s position, we would like to average several measurements. Of course, more recent measurements are more relevant, so we will want this to be a weighted average that gives more weight to recent measurements. We can do this with a weighting function w(a), where a is the age of a measurement. If we apply such a weighted average operation at every moment, we obtain a new function s providing a smoothed estimate of the position of the spaceship: To obtain a less noisy s(t) | a(a)w(t—a)da (9.1) This operation is called convolution. The convolution operation is ty} denoted with an asterisk: s(t) = (w+ w)(t). (9.2) In our example, w needs to be a valid probability density function, or the output will not be a weighted average. Also, w needs to be 0 for all negative arguments, or it will look into the future, which is presumably beyond our capabilities. These limitations are particular to our example, though. In general, convolution is defined for any functions for which the above integral is defined and may be used for other purposes besides taking weighted averages In convolutional network terminology, the first argument (in this example, the function e(a)w(t-a) (9.3) In machine learning applications, the input is usually a multidimensional array of data, and the kernel is usually a multidimensional array of parameters that are adapted by the learning algorithm. We will refer to these multidimensional arrays as tensors. Because each element of the input and kernel must be explicitly stored separately, we usually assume that these functions are zero everywhere but in the finite set of points for which we store the values, This means that in practice, we can implement the infinite summation as a summation over a finite number of array elements. Finally, we often use convolutions over more than one axis at a time. For example, if we use a two-dimensional image I as our input, we probably also want to use a two-dimensional kernel K: S(i,9) = (1* K)(i, 3) = SOO m,n) K (i= m,j — 0). (9.4) Convolution is commutative, meaning we can equivalently write S(i, 9) = (K *D(i,3) = SOO 1G = m,j = n)K (m,n). Usually learning library, because there is I and n. the latter formula is more straightforward to implement in a machine variation in the range of valid values of m The commutative property of convolution arises because we have flipped the kernel relative to the input, in the sense that as m increases, the index into the input increases, but the index into the kernel decreases. The only reason to flip the kernel is to obtain the commutative property. While the commutative property 328CHAPTER 9, CONVOLUTIONAL NETWORKS is useful for writing proofs, it is not usually an important property of a neural network implementation. Instead, many neural network libraries implement a related function called the eross-correlation, which is the same as convolution but without flipping the kernel: S(i, 9) = (K *D(i,5) = SOI 1G + mj + n)K (m,n). (9.6) Many machine learning libraries implement, cross-correlation but call it convolution. In this text we follow this convention of calling both operations convolution and specify whether we mean to flip the kernel or not in contexts where kernel flipping is relevant. In the context of machine learning, the learning algorithm will learn the appropriate values of the kernel in the appropriate place, so an algorithm based on convolution with kernel flipping will learn a kernel that is flipped relative to the kernel learned by an algorithm without the flipping. It is also rare for convolution to be used alone in machine learning; instead convolution is used simultaneously with other functions, and the combination of these functions does not commute regardless of whether the convolution operation flips its kernel or not. See figure 9.1 for an example of convolution (without kernel flipping) applied to a 2-D tensor. Discrete convolution can be viewed as multiplication by a matrix, but the matrix has several entries constrained to be equal to other entries. For example, for univariate discrete convolution, each row of the matrix is constrained to be equal to the row above shifted by one element. This is known as a Toeplitz matrix. In two dimensions, a doubly block circulant matrix corresponds to convolution. In addition to these constraints that several elements be equal to each other, convolution usually corresponds to a very sparse matrix (a matrix whose entries are mostly equal to zero). This is because the kernel is usually much smaller than the input image, Any neural network algorithm that works with matrix multiplication and does not depend on specific properties of the matrix structure should work with convolution, without requiring any further changes to the neural network. Typical convolutional neural networks do make use of further specializations in order to deal with large inpu' not strictly necessary from a theoretical perspective. s efficiently, but these are 9.2 Motivation Convolution leverages three important ideas that can help improve a machine learning system: sparse interactions, parameter sharing and equivariant 329CHAPTER 9, CONVOLUTIONAL NETWORKS Input Kernel affo fic |fa e flr fie |p y fle i ke fa p>} aw + te + [flow + a + | |ow + ax + oy + fe fy + a sv + he ew + ie +] ftw + ge + | few + me + iy + je iy + ky + le Figure 9.1: An example of 2-D convolution without kemel flipping. We restrict the output to only positions where the kemel lies entirely within the image, called “valid” convolution in some contexts. We draw boxes with arrows to indicate how the upper-left element of, the output tensor is formed by applying the kernel to the corresponding upper-left region of the input tensor. representations. Moreover, convolution provides a means for working with inputs of variable size, We now describe each of these ideas in turn. Traditional neural network layers use matrix multiplication by a matrix of parameters with a separate parameter describing the interaction between each input unit and each output unit. This means that every output unit interacts with every input unit. Convolutional networks, however, typically have sparse interactions (also referred to as sparse connectivity or sparse weights). This is accomplished by making the kernel smaller than the input. For example, when processing an image, the input image might have thousands or millions of pixels, but we can detect small, meaningful features such as edges with kernels that occupy only tens or hundreds of pixels. This means that we need to store fewer parameters, which both reduces the memory requirements of the model and improves its statistical efficiency. It also means that computing the output 330CHAPTER 9, CONVOLUTIONAL NETWORKS 1 operations. These improvements in efficiency are usually quite large. If there are m inputs and n outputs, then matrix multiplication requires m x n parameters, and the algorithms used in practice have O(m x n) runtime (per example). If we limit the number of connections each output may have to k, then the sparsely connected approach requires only k x n parameters and O(k x n) runtime, For many practical applications, it is possible to obtain good performance on the machine learning task while keeping & several orders of magnitude smaller than m. For graphical demonstrations of sparse connectivity, sce figure 9.2 and figure 9.3. In a deep convolutional network, units in the deeper layers may indirectly interact with a larger portion of the input, as shown in figure 9.4, This allows the network to efficiently describe complicated interactions between many variables by constructing such interactions from simple building blocks that each describe only sparse interactions. Parameter sharing refers to using the same parameter for more than one function in a model. In a traditional neural net, each element of the weight matrix Figure 9.2: Sparse connectivity, viewed from below. We highlight one input unit, 3, and highlight the output units ins that are affected by this unit. (Top)When sis formed by convolution with a kernel of width 3, only three outputs are affected by x. (Bottom) When s is formed by matrix multiplication, connectivity is no longer sparse, so all the outputs are affected by x 3 331CHAPTER 9, CONVOLUTIONAL NETWORKS Figure 9.3: Sparse connectivity, viewed from above. We highlight one output unit, sj, and highlight the input units in x that affect this unit, ‘These units are knownas the receptive field of s3. (Top)When is formed by convolution with a kernel of width 3, only three inputs affect s 3. (Bottom)When s is formed by matrix multiplication, connectivity is no longer sparse, so all the inputs affect s Figure 9.4: The receptive field of the units in the deeper layers of a convolutional network is larger than the receptive field of the units in the shallow layers. This effect increases if the network includes architectural features like strided convolution (figure 9.12) or pooling (section 9.3). This means that even though direct connections in a convolutional net are very sparse, units in the deeper layers can be indirectly connected to all or most of the input image 332CHAPTER 9, CONVOLUTIONAL NETWORKS is used exactly once when computing the output of a layer. It is multiplied by one element of the input and then never revisited. As a synonym for parameter sharing, one can say that a network has tied weights, because the value of the weight applied to one input is tied to the value of a weight applied elsewhere. In a convolutional neural net, each member of the kernel is used at every position of the input (except perhaps some of the boundary pixels, depending on the design decisions regarding the boundary). The parameter sharing used by the convolution operation means that rather than learning a separate set of parameters for every location, we learn only one set. This does not affect the runtime of forward propagation—it is still O(k x n)—but it does further reduce the storage requirements of the model to k parameters. Recall that & is usually several orders of magnitude smaller than m. Since m and n are usually roughly the same size, k is practically insignificant compared to m xn, Convolution is thus dramatically more efficient than dense matrix multiplication in terms of the memory requirements and statistical efficiency. For a graphical depiction of how parameter sharing works, see figure 9.5: As an example of both of these first two principles in action, figure 9.6 shows how sparse connectivity and parameter sharing can dramatically improve the AODOO OOOO C000 OOOO OO OO OO ©-O Figure 9.5: Parameter sharing, Black arrows indicate the connections that use a particular, parameter in two different models, (Top)The black arrows indicate uses of the central element of a 3-element kernel in a convolutional model. Because of parameter sharing, this single parameter is used at all input locations. (Bottom)The single black arrow indicates the use of the central element of the weight matrix in a fully connected model. This model has no parameter sharing, so the parameter is used only once. 333,CHAPTER 9, CONVOLUTIONAL NETWORKS Figure 9.6: Efficiency of edge detection. The image on the right was formed by taking cach pixel in the original image and subtracting the value of its neighboring pixel on the left. This shows the strength of all the vertically oriented edges in the input image, which can be a useful operation for object detection. Both images are 280 pixels tall. The input image is 320 pixels wide, while the output image is 319 pixels wide. This transformation can be described by a convolution kernel containing two elements, and requires 319 x 280 x 3 = 267,960 floating-point operations (two multiplications and one addition per output pixel) to compute using convolution, To describe the same transformation with a matrix multiplication would take 320 x 280 x 319 x 280, or over eight billion, entries in the matrix, making convolution four billion times more efficient for representing this transformation. The straightforward matrix multiplication algorithm performs over sixteen billion floating point operations, making convolution roughly 60,000 times more efficient computationally. Of course, most of the entries of the matrix would be zero. If we stored only the nonzero entries of the matrix, then both matrix multiplication and convolution would require the same number of floating-point operations to compute. The matrix would still need to contain 2 x 319 x 280 = 178,640 entries. Convolution is an extremely efficient way of describing transformations that apply the same linear transformation of a small local region across the entire input, Photo credit: Paula Goodfellow. efficiency of a linear function for detecting edges in an image. In the case of convolution, the particular form of parameter sharing causes the layer to have a property called equivariance to translation. To say a function is equivariant means that if the input changes, the output changes in the same wi Specifically, a function f(z) is equivariant to a function g if f(g(z)) = 9(f(x)). In the case of convolution, if we let g be any function that translates the input, that is, shifts it, then the convolution function is equivariant to g. For example, let I be a function giving image brightness at integer coordinates. Let g be a function mapping one image function to another image function, such that 1’ = (1) is the image function with I’ (x,y) = I(« — 1,y). This shifts every pixel of I one unit to the right. If we apply this transformation to J, then apply convolution, the result will be the same as if we applied convolution to I’, then applied the transformation 334

Ch. 10: Introduction To Convolution Neural Networks CNN and Systems
No ratings yet
Ch. 10: Introduction To Convolution Neural Networks CNN and Systems
69 pages
TinyML Epi 7
No ratings yet
TinyML Epi 7
21 pages
Deep Learning 4/7: Convolutional Neural Networks: C. de Castro, IEIIT-CNR, Cristina - Decastro@ieiit - Cnr.it
0% (1)
Deep Learning 4/7: Convolutional Neural Networks: C. de Castro, IEIIT-CNR, Cristina - Decastro@ieiit - Cnr.it
49 pages
Ch. 9: Introduction To Convolution Neural Networks (CNN) and Systems
No ratings yet
Ch. 9: Introduction To Convolution Neural Networks (CNN) and Systems
96 pages
CNN Concept
No ratings yet
CNN Concept
57 pages
Scan 30 Sep 23 18 20 44
No ratings yet
Scan 30 Sep 23 18 20 44
30 pages
Iii Unit - Deeplearning
No ratings yet
Iii Unit - Deeplearning
93 pages
Conv Net Intution
No ratings yet
Conv Net Intution
28 pages
Convolutional Neural Networks (Part I)
No ratings yet
Convolutional Neural Networks (Part I)
61 pages
Unit 1
No ratings yet
Unit 1
109 pages
Module 3
No ratings yet
Module 3
46 pages
2 ConvolutionFilterig
No ratings yet
2 ConvolutionFilterig
42 pages
Unit 2a
No ratings yet
Unit 2a
31 pages
Neural Networks and Deep Learning (PE - V) (18CSE23) Unit - 4
No ratings yet
Neural Networks and Deep Learning (PE - V) (18CSE23) Unit - 4
11 pages
L09 Convolutional Networks
No ratings yet
L09 Convolutional Networks
9 pages
NN 06
No ratings yet
NN 06
18 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning
7 pages
4th Unit Aktu Machine Learning
No ratings yet
4th Unit Aktu Machine Learning
9 pages
Deep Learning
No ratings yet
Deep Learning
17 pages
CNN
No ratings yet
CNN
62 pages
DSA5102 Lecture5
No ratings yet
DSA5102 Lecture5
45 pages
Convolution Neural Network-1
No ratings yet
Convolution Neural Network-1
44 pages
Convolution in Machine Learning
No ratings yet
Convolution in Machine Learning
2 pages
ChatGPT - Convolution and Pooling Operations
No ratings yet
ChatGPT - Convolution and Pooling Operations
43 pages
Lecture CNN
No ratings yet
Lecture CNN
68 pages
CNN PPT Unit Iv
No ratings yet
CNN PPT Unit Iv
134 pages
Chap4 CNN (20240205) - DL4H Practioner Guide
No ratings yet
Chap4 CNN (20240205) - DL4H Practioner Guide
23 pages
Module 4
No ratings yet
Module 4
20 pages
09 Conv
No ratings yet
09 Conv
27 pages
DSA5102X Lecture5
No ratings yet
DSA5102X Lecture5
44 pages
Convolution and Pooling Layers
No ratings yet
Convolution and Pooling Layers
42 pages
DL Mod4
No ratings yet
DL Mod4
18 pages
CNN 1
No ratings yet
CNN 1
9 pages
Module 3
No ratings yet
Module 3
67 pages
21CS743 Module4 Notes
No ratings yet
21CS743 Module4 Notes
15 pages
UNIT 2 Study Materials 1
No ratings yet
UNIT 2 Study Materials 1
42 pages
M4 Ia2
No ratings yet
M4 Ia2
6 pages
Sarma CNN Vce Oct 2022
No ratings yet
Sarma CNN Vce Oct 2022
63 pages
Convolution Nueral Networks
No ratings yet
Convolution Nueral Networks
32 pages
21CS743 DL Module4 Notes
No ratings yet
21CS743 DL Module4 Notes
7 pages
Convolution Neural Networks: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
No ratings yet
Convolution Neural Networks: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
123 pages
Chap 9-2 - Convolutional Neural Network - Heechul Lim
No ratings yet
Chap 9-2 - Convolutional Neural Network - Heechul Lim
58 pages
CNNs
No ratings yet
CNNs
22 pages
Chap 9-1 Convolutional Neural Network - Keonwoo Noh
No ratings yet
Chap 9-1 Convolutional Neural Network - Keonwoo Noh
53 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
108 pages
Lecture 10 Slides - After
No ratings yet
Lecture 10 Slides - After
66 pages
Lecture 3 Updated
No ratings yet
Lecture 3 Updated
56 pages
Convolutional Neural Networks - Part 1
No ratings yet
Convolutional Neural Networks - Part 1
44 pages
CNN New
No ratings yet
CNN New
225 pages
DL Unit-4
No ratings yet
DL Unit-4
19 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
Lecture 2-Convolutional Networks
No ratings yet
Lecture 2-Convolutional Networks
20 pages
E-Note 33951 Content Document 20250328020322PM
No ratings yet
E-Note 33951 Content Document 20250328020322PM
29 pages
Module5 ML
No ratings yet
Module5 ML
112 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages

CH 9

Uploaded by

CH 9

Uploaded by

You might also like