0% found this document useful (0 votes)

834 views12 pages

Convolutional Neural Networks (LeNet) - DeepLearning 0.1 Documentation

This document provides an overview of convolutional neural networks (CNNs), including their biological inspiration from the visual cortex, key concepts like sparse connectivity, shared weights, and convolutional layers. It explains how CNNs exploit spatially local correlations in images by enforcing local connectivity between layers and parameter sharing. The document includes code to implement a basic convolutional layer in Theano that filters an input image.

Uploaded by

Sumit Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

834 views12 pages

Convolutional Neural Networks (LeNet) - DeepLearning 0.1 Documentation

Uploaded by

Sumit Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Convolutional Neural Networks (LeNet)

Note
This section assumes the reader has already read through Classifying MNIST digits
using Logistic Regression and Multilayer Perceptron. Additionally, it uses the
following new Theano functions and concepts: T.tanh, shared variables, basic
arithmetic ops, T.grad, floatX, pool , conv2d, dimshuffle. If you intend to run the
code on GPU also read GPU.
To run this example on a GPU, you need a good GPU. It needs at least 1GB of GPU
RAM. More may be required if your monitor is connected to the GPU.
When the GPU is connected to the monitor, there is a limit of a few seconds for
each GPU function call. This is needed as current GPUs cant be used for the
monitor while doing computation. Without this limit, the screen would freeze for too
long and make it look as if the computer froze. This example hits this limit with
medium-quality GPUs. When the GPU isnt connected to a monitor, there is no time
limit. You can lower the batch size to fix the time out problem.
Note
The code for this section is available for download here and the 3wolfmoon image

Motivation
Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.
From Hubel and Wiesels early work on the cats visual cortex [Hubel68], we know
the visual cortex contains a complex arrangement of cells. These cells are sensitive
to small sub-regions of the visual field, called a receptive field. The sub-regions are
tiled to cover the entire visual field. These cells act as local filters over the input
space and are well-suited to exploit the strong spatially local correlation present in
natural images.
Additionally, two basic cell types have been identified: Simple cells respond
maximally to specific edge-like patterns within their receptive field. Complex cells
have larger receptive fields and are locally invariant to the exact position of the
pattern.
The animal visual cortex being the most powerful visual processing system in
existence, it seems natural to emulate its behavior. Hence, many neurally-inspired
models can be found in the literature. To name a few: the NeoCognitron
[Fukushima], HMAX [Serre07] and LeNet-5 [LeCun98], which will be the focus of this
tutorial.

Sparse Connectivity
CNNs exploit spatially-local correlation by enforcing a local connectivity pattern
between neurons of adjacent layers. In other words, the inputs of hidden units in

layer m are from a subset of units in layer m-1, units that have spatially contiguous
receptive fields. We can illustrate this graphically as follows:

Imagine that layer m-1 is the input retina. In the above figure, units in layer m have
receptive fields of width 3 in the input retina and are thus only connected to 3
adjacent neurons in the retina layer. Units in layer m+1 have a similar connectivity
with the layer below. We say that their receptive field with respect to the layer below
is also 3, but their receptive field with respect to the input is larger (5). Each unit is
unresponsive to variations outside of its receptive field with respect to the retina.
The architecture thus ensures that the learnt filters produce the strongest
response to a spatially local input pattern.
However, as shown above, stacking many such layers leads to (non-linear) filters
that become increasingly global (i.e. responsive to a larger region of pixel space).
For example, the unit in hidden layer m+1 can encode a non-linear feature of width 5
(in terms of pixel space).

Shared Weights
In addition, in CNNs, each filter is replicated across the entire visual field. These
replicated units share the same parameterization (weight vector and bias) and form
a feature map.

In the above figure, we show 3 hidden units belonging to the same feature map.
Weights of the same color are sharedconstrained to be identical. Gradient descent
can still be used to learn such shared parameters, with only a small change to the
original algorithm. The gradient of a shared weight is simply the sum of the
gradients of the parameters being shared.
Replicating units in this way allows for features to be detected regardless of their
position in the visual field. Additionally, weight sharing increases learning efficiency
by greatly reducing the number of free parameters being learnt. The constraints on
the model enable CNNs to achieve better generalization on vision problems.

Details and Notation

A feature map is obtained by repeated application of a function across sub-regions
of the entire image, in other words, by convolution of the input image with a linear
filter, adding a bias term and then applying a non-linear function. If we denote the k-

th feature map at a given layer as

and bias , then the feature map

, whose filters are determined by the weights

is obtained as follows (for
non-linearities):

Note
Recall the following definition of convolution for a 1D signal.
.
This can be extended to 2D as follows:
.

To form a richer representation of the data, each hidden layer is composed of

multiple feature maps,
. The weights of a hidden layer can be
represented in a 4D tensor containing elements for every combination of destination
feature map, source feature map, source vertical position, and source horizontal
position. The biases can be represented as a vector containing one element for
every destination feature map. We illustrate this graphically as follows:

Figure 1: example of a convolutional layer

The figure shows two layers of a CNN. Layer m-1 contains four feature maps.
Hidden layer m contains two feature maps ( and ). Pixels (neuron outputs) in
and (outlined as blue and red squares) are computed from pixels of layer (m-1)
which fall within their 2x2 receptive field in the layer below (shown as colored
rectangles). Notice how the receptive field spans all four input feature maps. The
weights
and
of and are thus 3D weight tensors. The leading dimension
indexes the input feature maps, while the other two refer to the pixel coordinates.
Putting it all together,
denotes the weight connecting each pixel of the k-th
feature map at layer m, with the pixel at coordinates (i,j) of the l-th feature map of
layer (m-1).

The Convolution Operator

ConvOp is the main workhorse for implementing a convolutional layer in Theano.

ConvOp is used by theano.tensor.signal.conv2d, which takes two symbolic
inputs:
a 4D tensor corresponding to a mini-batch of input images. The shape of the
tensor is as follows: [mini-batch size, number of input feature maps, image
height, image width].
a 4D tensor corresponding to the weight matrix . The shape of the tensor is:
[number of feature maps at layer m, number of feature maps at layer m-1, filter
height, filter width]
Below is the Theano code for implementing a convolutional layer similar to the one
of Figure 1. The input consists of 3 features maps (an RGB color image) of size
120x160. We use two convolutional filters with 9x9 receptive fields.
import theano
from theano import tensor as T
from theano.tensor.nnet import conv2d
import numpy
rng = numpy.random.RandomState(23455)

# instantiate 4D tensor for input

input = T.tensor4(name='input')
# initialize shared variable for weights.
w_shp = (2, 3, 9, 9)
w_bound = numpy.sqrt(3 * 9 * 9)
W = theano.shared( numpy.asarray(
rng.uniform(
low=-1.0 / w_bound,
high=1.0 / w_bound,
size=w_shp),
dtype=input.dtype), name ='W')
# initialize shared variable for bias (1D tensor) with random values
# IMPORTANT: biases are usually initialized to zero. However in this
# particular application, we simply apply the convolutional layer to
# an image without learning the parameters. We therefore initialize
# them to random values to "simulate" learning.
b_shp = (2,)
b = theano.shared(numpy.asarray(
rng.uniform(low=-.5, high=.5, size=b_shp),
dtype=input.dtype), name ='b')
# build symbolic expression that computes the convolution of input with filters in w
conv_out = conv2d(input, W)

# build symbolic expression to add bias and apply activation function, i.e. produce neural net
# A few words on ``dimshuffle`` :
# ``dimshuffle`` is a powerful tool in reshaping a tensor;
# what it allows you to do is to shuffle dimension around
# but also to insert new ones along which the tensor will be
# broadcastable;
# dimshuffle('x', 2, 'x', 0, 1)
# This will work on 3d tensors with no broadcastable
# dimensions. The first dimension will be broadcastable,
# then we will have the third dimension of the input tensor as
# the second of the resulting tensor, etc. If the tensor has
# shape (20, 30, 40), the resulting tensor will have dimensions
# (1, 40, 1, 20, 30). (AxBxC tensor is mapped to 1xCx1xAxB tensor)
# More examples:
#
dimshuffle('x') -> make a 0d (scalar) into a 1d vector
#
dimshuffle(0, 1) -> identity
#
dimshuffle(1, 0) -> inverts the first and second dimensions
#
dimshuffle('x', 0) -> make a row out of a 1d vector (N to 1xN)
#
dimshuffle(0, 'x') -> make a column out of a 1d vector (N to Nx1)
#
dimshuffle(2, 0, 1) -> AxBxC to CxAxB
#
dimshuffle(0, 'x', 1) -> AxB to Ax1xB
#
dimshuffle(1, 'x', 0) -> AxB to Bx1xA

output = T.nnet.sigmoid(conv_out + b.dimshuffle('x', 0, 'x', 'x'))

# create theano function to compute filtered images

f = theano.function([input], output)

Lets have a little bit of fun with this...

import numpy
import pylab
from PIL import Image

# open random image of dimensions 639x516

img = Image.open(open('doc/images/3wolfmoon.jpg'))
# dimensions are (height, width, channel)
img = numpy.asarray(img, dtype='float64') / 256.
# put image in 4D tensor of shape (1, 3, height, width)
img_ = img.transpose(2, 0, 1).reshape(1, 3, 639, 516)
filtered_img = f(img_)
# plot original image and first and second components of output
pylab.subplot(1, 3, 1); pylab.axis('off'); pylab.imshow(img)
pylab.gray();
# recall that the convOp output (filtered image) is actually a "minibatch",
# of size 1 here, so we take index 0 in the first dimension:
pylab.subplot(1, 3, 2); pylab.axis('off'); pylab.imshow(filtered_img[0, 0, :, :])
pylab.subplot(1, 3, 3); pylab.axis('off'); pylab.imshow(filtered_img[0, 1, :, :])
pylab.show()

This should generate the following output.

Notice that a randomly initialized filter acts very much like an edge detector!
Note that we use the same weight initialization formula as with the MLP. Weights
are sampled randomly from a uniform distribution in the range [-1/fan-in, 1/fan-in],
where fan-in is the number of inputs to a hidden unit. For MLPs, this was the number
of units in the layer below. For CNNs however, we have to take into account the
number of input feature maps and the size of the receptive fields.

MaxPooling
Another important concept of CNNs is max-pooling, which is a form of non-linear
down-sampling. Max-pooling partitions the input image into a set of non-overlapping
rectangles and, for each such sub-region, outputs the maximum value.
Max-pooling is useful in vision for two reasons:
1. By eliminating non-maximal values, it reduces computation for upper layers.
2. It provides a form of translation invariance. Imagine cascading a maxpooling layer with a convolutional layer. There are 8 directions in which one
can translate the input image by a single pixel. If max-pooling is done over a
2x2 region, 3 out of these 8 possible configurations will produce exactly the

same output at the convolutional layer. For max-pooling over a 3x3 window,
this jumps to 5/8.
Since it provides additional robustness to position, max-pooling is a smart
way of reducing the dimensionality of intermediate representations.
Max-pooling is done in Theano by way of
theano.tensor.signal.pool.pool_2d. This function takes as input an N
dimensional tensor (where N >= 2) and a downscaling factor and performs maxpooling over the 2 trailing dimensions of the tensor.
An example is worth a thousand words:
from theano.tensor.signal import pool
input = T.dtensor4('input')
maxpool_shape = (2, 2)
pool_out = pool.pool_2d(input, maxpool_shape, ignore_border=True)
f = theano.function([input],pool_out)
invals = numpy.random.RandomState(1).rand(3, 2, 5, 5)
print 'With ignore_border set to True:'
print 'invals[0, 0, :, :] =\n', invals[0, 0, :, :]
print 'output[0, 0, :, :] =\n', f(invals)[0, 0, :, :]
pool_out = pool.pool_2d(input, maxpool_shape, ignore_border=False)
f = theano.function([input],pool_out)
print 'With ignore_border set to False:'
print 'invals[1, 0, :, :] =\n ', invals[1, 0, :, :]
print 'output[1, 0, :, :] =\n ', f(invals)[1, 0, :, :]

This should generate the following output:

With ignore_border set to True:
invals[0, 0, :, :] =
[[ 4.17022005e-01 7.20324493e-01
[ 9.23385948e-02 1.86260211e-01
[ 4.19194514e-01 6.85219500e-01
[ 6.70467510e-01 4.17304802e-01
[ 8.00744569e-01 9.68261576e-01
output[0, 0, :, :] =
[[ 0.72032449 0.39676747]
[ 0.6852195 0.87811744]]

1.14374817e-04
3.45560727e-01
2.04452250e-01
5.58689828e-01
3.13424178e-01

With ignore_border set to False:

invals[1, 0, :, :] =
[[ 0.01936696 0.67883553 0.21162812 0.26554666
[ 0.05336255 0.57411761 0.14672857 0.58930554
[ 0.10233443 0.41405599 0.69440016 0.41417927
[ 0.53589641 0.66379465 0.51488911 0.94459476
[ 0.90340192 0.1374747 0.13927635 0.80739129
output[1, 0, :, :] =
[[ 0.67883553 0.58930554 0.69975836]
[ 0.66379465 0.94459476 0.58655504]
[ 0.90340192 0.80739129 0.39767684]]

3.02332573e-01 1.46755891e-01
3.96767474e-01 5.38816734e-01
8.78117436e-01 2.73875932e-02
1.40386939e-01 1.98101489e-01
6.92322616e-01 8.76389152e-01

0.49157316]
0.69975836]
0.04995346]
0.58655504]
0.39767684]]

Note that compared to most Theano code, the max_pool_2doperation is a little

special. It requires the downscaling factor ds(tuple of length 2 containing
downscaling factors for image width and height) to be known at graph build time.
This may change in the near future.

The Full Model: LeNet

Sparse, convolutional layers and max-pooling are at the heart of the LeNet family of
models. While the exact details of the model will vary greatly, the figure below shows

a graphical depiction of a LeNet model.

The lower-layers are composed to alternating convolution and max-pooling layers.

The upper-layers however are fully-connected and correspond to a traditional MLP
(hidden layer + logistic regression). The input to the first fully-connected layer is the
set of all features maps at the layer below.
From an implementation point of view, this means lower-layers operate on 4D
tensors. These are then flattened to a 2D matrix of rasterized feature maps, to be
compatible with our previous MLP implementation.
Note
Note that the term convolution could corresponds to different mathematical
operations:
1. theano.tensor.nnet.conv2d, which is the most common one in almost all of the
recent published convolutional models. In this operation, each output feature map
is connected to each input feature map by a different 2D filter, and its value is the
sum of the individual convolution of all inputs through the corresponding filter.
2. The convolution used in the original LeNet model: In this work, each output feature
map is only connected to a subset of input feature maps.
3. The convolution used in signal processing: theano.tensor.signal.conv.conv2d,
which works only on single channel inputs.
Here, we use the first operation, so this models differ slightly from the original
LeNet paper. One reason to use 2. would be to reduce the amount of computation
needed, but modern hardware makes it as fast to have the full connection pattern.
Another reason would be to slightly reduce the number of free parameters, but we
have other regularization techniques at our disposal.

Putting it All Together

We now have all we need to implement a LeNet model in Theano. We start with the
LeNetConvPoolLayer class, which implements a {convolution + max-pooling} layer.
class LeNetConvPoolLayer(object):
"""Pool Layer of a convolutional network """
def __init__(self, rng, input, filter_shape, image_shape, poolsize=(2, 2)):
"""
Allocate a LeNetConvPoolLayer with shared variable internal parameters.

:type rng: numpy.random.RandomState

:param rng: a random number generator used to initialize weights
:type input: theano.tensor.dtensor4
:param input: symbolic image tensor, of shape image_shape

:type filter_shape: tuple or list of length 4

:param filter_shape: (number of filters, num input feature maps,
filter height, filter width)
:type image_shape: tuple or list of length 4
:param image_shape: (batch size, num input feature maps,
image height, image width)
:type poolsize: tuple or list of length 2
:param poolsize: the downsampling (pooling) factor (#rows, #cols)
"""
assert image_shape[1] == filter_shape[1]
self.input = input

# there are "num input feature maps * filter height * filter width"
# inputs to each hidden unit
fan_in = numpy.prod(filter_shape[1:])
# each unit in the lower layer receives a gradient from:
# "num output feature maps * filter height * filter width" /
# pooling size
fan_out = (filter_shape[0] * numpy.prod(filter_shape[2:]) //
numpy.prod(poolsize))
# initialize weights with random weights
W_bound = numpy.sqrt(6. / (fan_in + fan_out))
self.W = theano.shared(
numpy.asarray(
rng.uniform(low=-W_bound, high=W_bound, size=filter_shape),
dtype=theano.config.floatX
),
borrow=True
)
# the bias is a 1D tensor -- one bias per output feature map
b_values = numpy.zeros((filter_shape[0],), dtype=theano.config.floatX)
self.b = theano.shared(value=b_values, borrow=True)
# convolve input feature maps with filters
conv_out = conv2d(
input=input,
filters=self.W,
filter_shape=filter_shape,
input_shape=image_shape
)
# pool each feature map individually, using maxpooling
pooled_out = pool.pool_2d(
input=conv_out,
ds=poolsize,
ignore_border=True
)
# add the bias term. Since the bias is a vector (1D array), we first
# reshape it to a tensor of shape (1, n_filters, 1, 1). Each bias will
# thus be broadcasted across mini-batches and feature map
# width & height
self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))
# store parameters of this layer
self.params = [self.W, self.b]
# keep track of model input
self.input = input

Notice that when initializing the weight values, the fan-in is determined by the size of
the receptive fields and the number of input feature maps.
Finally, using the LogisticRegression class defined in Classifying MNIST digits using
Logistic Regression and the HiddenLayer class defined in Multilayer Perceptron , we
can instantiate the network as follows.
x = T.matrix('x') # the data is presented as rasterized images
y = T.ivector('y') # the labels are presented as 1D vector of
# [int] labels

######################
# BUILD ACTUAL MODEL #
######################
print('... building the model')
# Reshape matrix of rasterized images of shape (batch_size, 28 * 28)
# to a 4D tensor, compatible with our LeNetConvPoolLayer
# (28, 28) is the size of MNIST images.
layer0_input = x.reshape((batch_size, 1, 28, 28))
# Construct the first convolutional pooling layer:
# filtering reduces the image size to (28-5+1 , 28-5+1) = (24, 24)
# maxpooling reduces this further to (24/2, 24/2) = (12, 12)
# 4D output tensor is thus of shape (batch_size, nkerns[0], 12, 12)
layer0 = LeNetConvPoolLayer(
rng,
input=layer0_input,
image_shape=(batch_size, 1, 28, 28),
filter_shape=(nkerns[0], 1, 5, 5),
poolsize=(2, 2)
)
# Construct the second convolutional pooling layer
# filtering reduces the image size to (12-5+1, 12-5+1) = (8, 8)
# maxpooling reduces this further to (8/2, 8/2) = (4, 4)
# 4D output tensor is thus of shape (batch_size, nkerns[1], 4, 4)
layer1 = LeNetConvPoolLayer(
rng,
input=layer0.output,
image_shape=(batch_size, nkerns[0], 12, 12),
filter_shape=(nkerns[1], nkerns[0], 5, 5),
poolsize=(2, 2)
)
# the HiddenLayer being fully-connected, it operates on 2D matrices of
# shape (batch_size, num_pixels) (i.e matrix of rasterized images).
# This will generate a matrix of shape (batch_size, nkerns[1] * 4 * 4),
# or (500, 50 * 4 * 4) = (500, 800) with the default values.
layer2_input = layer1.output.flatten(2)
# construct a fully-connected sigmoidal layer
layer2 = HiddenLayer(
rng,
input=layer2_input,
n_in=nkerns[1] * 4 * 4,
n_out=500,
activation=T.tanh
)
# classify the values of the fully-connected sigmoidal layer
layer3 = LogisticRegression(input=layer2.output, n_in=500, n_out=10)
# the cost we minimize during training is the NLL of the model
cost = layer3.negative_log_likelihood(y)
# create a function to compute the mistakes that are made by the model
test_model = theano.function(
[index],
layer3.errors(y),
givens={
x: test_set_x[index * batch_size: (index + 1) * batch_size],
y: test_set_y[index * batch_size: (index + 1) * batch_size]
}
)
validate_model = theano.function(
[index],
layer3.errors(y),
givens={
x: valid_set_x[index * batch_size: (index + 1) * batch_size],
y: valid_set_y[index * batch_size: (index + 1) * batch_size]
}
)

# create a list of all model parameters to be fit by gradient descent

params = layer3.params + layer2.params + layer1.params + layer0.params
# create a list of gradients for all model parameters
grads = T.grad(cost, params)

# train_model is a function that updates the model parameters by

# SGD Since this model has many parameters, it would be tedious to
# manually create an update rule for each model parameter. We thus
# create the updates list by automatically looping over all
# (params[i], grads[i]) pairs.
updates = [
(param_i, param_i - learning_rate * grad_i)
for param_i, grad_i in zip(params, grads)
]
train_model = theano.function(
[index],
cost,
updates=updates,
givens={
x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]
}
)

We leave out the code that performs the actual training and early-stopping, since it is
exactly the same as with an MLP. The interested reader can nevertheless access the
code in the code folder of DeepLearningTutorials.

Running the Code

The user can then run the code by calling:
python code/convolutional_mlp.py

The following output was obtained with the default parameters on a Core i7-2600K
CPU clocked at 3.40GHz and using flags floatX=float32:
Optimization complete.
Best validation score of 0.910000 % obtained at iteration 17800,with test
performance 0.920000 %
The code for file convolutional_mlp.py ran for 380.28m

Using a GeForce GTX 285, we obtained the following:

Optimization complete.
Best validation score of 0.910000 % obtained at iteration 15500,with test
performance 0.930000 %
The code for file convolutional_mlp.py ran for 46.76m

And similarly on a GeForce GTX 480:

Optimization complete.
Best validation score of 0.910000 % obtained at iteration 16400,with test
performance 0.930000 %
The code for file convolutional_mlp.py ran for 32.52m

Note that the discrepancies in validation and test error (as well as iteration count)
are due to different implementations of the rounding mechanism in hardware. They
can be safely ignored.

Tips and Tricks

Choosing Hyperparameters

CNNs are especially tricky to train, as they add even more hyper-parameters than a
standard MLP. While the usual rules of thumb for learning rates and regularization
constants still apply, the following should be kept in mind when optimizing CNNs.
Number of filters
When choosing the number of filters per layer, keep in mind that computing the
activations of a single convolutional filter is much more expensive than with
traditional MLPs !
Assume layer
contains
feature maps and
pixel positions (i.e.,
number of positions times number of feature maps), and there are
filters at layer
of shape
. Then computing a feature map (applying an
filter at all
pixel positions where the filter can be applied) costs
. The total cost is
times that. Things may be
more complicated if not all features at one level are connected to all features at the
previous one.
For a standard MLP, the cost would only be
where there are
different
neurons at level . As such, the number of filters used in CNNs is typically much
smaller than the number of hidden units in MLPs and depends on the size of the
feature maps (itself a function of input image size and filter shapes).
Since feature map size decreases with depth, layers near the input layer will tend to
have fewer filters while layers higher up can have much more. In fact, to equalize
computation at each layer, the product of the number of features and the number of
pixel positions is typically picked to be roughly constant across layers. To preserve
the information about the input would require keeping the total number of
activations (number of feature maps times number of pixel positions) to be nondecreasing from one layer to the next (of course we could hope to get away with
less when we are doing supervised learning). The number of feature maps directly
controls capacity and so that depends on the number of available examples and the
complexity of the task.
Filter Shape
Common filter shapes found in the litterature vary greatly, usually based on the
dataset. Best results on MNIST-sized images (28x28) are usually in the 5x5 range on
the first layer, while natural image datasets (often with hundreds of pixels in each
dimension) tend to use larger first-layer filters of shape 12x12 or 15x15.
The trick is thus to find the right level of granularity (i.e. filter shapes) in order to
create abstractions at the proper scale, given a particular dataset.
Max Pooling Shape
Typical values are 2x2 or no max-pooling. Very large input images may warrant 4x4
pooling in the lower-layers. Keep in mind however, that this will reduce the

dimension of the signal by a factor of 16, and may result in throwing away too much
information.
Footnotes
[1] For clarity, we use the word unit or neuron to refer to the artificial neuron and cell to refer to the biological
neuron.

Tips
If you want to try this model on a new dataset, here are a few tips that can help you
get better results:
Whitening the data (e.g. with PCA)
Decay the learning rate in each epoch

Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
71 pages
Lecture2.2 UnimodalRepresentations Part1 PDF
No ratings yet
Lecture2.2 UnimodalRepresentations Part1 PDF
92 pages
Module11 - NNandDeep Learning
No ratings yet
Module11 - NNandDeep Learning
84 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
UNIT 2 Study Materials 1
No ratings yet
UNIT 2 Study Materials 1
42 pages
MLP and CNN
No ratings yet
MLP and CNN
56 pages
DL Unit-Ii
No ratings yet
DL Unit-Ii
34 pages
ML 2
No ratings yet
ML 2
70 pages
PNAL9 CNNs
No ratings yet
PNAL9 CNNs
61 pages
Deep Learning UNIT-4
No ratings yet
Deep Learning UNIT-4
34 pages
CNNs
No ratings yet
CNNs
88 pages
Convolutional Networks
No ratings yet
Convolutional Networks
37 pages
CNNS and Classification Networks
No ratings yet
CNNS and Classification Networks
115 pages
Intro To CNN
No ratings yet
Intro To CNN
93 pages
Module11 - NNandDeep Learning
No ratings yet
Module11 - NNandDeep Learning
84 pages
Unit 3
No ratings yet
Unit 3
105 pages
CNN2
No ratings yet
CNN2
70 pages
Chapter14 CNN
No ratings yet
Chapter14 CNN
54 pages
Lecture - 07 (Convolutional Neural Networks)
No ratings yet
Lecture - 07 (Convolutional Neural Networks)
57 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
47 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES (WWW - Jntumaterials.co - In)
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES (WWW - Jntumaterials.co - In)
26 pages
Deep Learning
No ratings yet
Deep Learning
90 pages
Lec6 RNN Attention Search
No ratings yet
Lec6 RNN Attention Search
62 pages
CNN 2
No ratings yet
CNN 2
47 pages
Deep Learning Convolutional Neural Networks: Non-Trivial Human Percep3on
No ratings yet
Deep Learning Convolutional Neural Networks: Non-Trivial Human Percep3on
32 pages
Unit - 2
No ratings yet
Unit - 2
31 pages
Nria20-Dl - Unit-3 Notes-Final
No ratings yet
Nria20-Dl - Unit-3 Notes-Final
23 pages
An Introduction To Convolutional Neural Networks: Abstract
No ratings yet
An Introduction To Convolutional Neural Networks: Abstract
11 pages
Unit 3 CNN 2024
No ratings yet
Unit 3 CNN 2024
58 pages
Introduction To Physiological Psychology 2nd Edition Charles F Levinthal PDF Download
No ratings yet
Introduction To Physiological Psychology 2nd Edition Charles F Levinthal PDF Download
41 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
Unit - 4 DL
No ratings yet
Unit - 4 DL
19 pages
Assignment 5 - Implementing Image Classification Using Deep Learning
No ratings yet
Assignment 5 - Implementing Image Classification Using Deep Learning
8 pages
CS601 Machine Learning Unit 3
No ratings yet
CS601 Machine Learning Unit 3
47 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
78 pages
Module 5
No ratings yet
Module 5
20 pages
CVlecture 5
No ratings yet
CVlecture 5
56 pages
DL-Unit-3 Final
No ratings yet
DL-Unit-3 Final
25 pages
Introduction To Convolutional Neural Networks
No ratings yet
Introduction To Convolutional Neural Networks
4 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
8 pages
Deep Learning Image Classification
No ratings yet
Deep Learning Image Classification
11 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
26 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
Unit - 2
No ratings yet
Unit - 2
51 pages
Convolutional NN
No ratings yet
Convolutional NN
34 pages
L09-10 DL and CNN
No ratings yet
L09-10 DL and CNN
56 pages
AD3501-DL-Unit 2
No ratings yet
AD3501-DL-Unit 2
33 pages
UNIT-III DeepLearning Notes
No ratings yet
UNIT-III DeepLearning Notes
30 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
11 pages
What Is A Convolutional Neural Network-Unit3
No ratings yet
What Is A Convolutional Neural Network-Unit3
12 pages
Deep Learning
No ratings yet
Deep Learning
17 pages
Unit Iv DL
No ratings yet
Unit Iv DL
26 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
9 pages
CNN Notes Unit-3
No ratings yet
CNN Notes Unit-3
12 pages
DL Unit4
No ratings yet
DL Unit4
31 pages
Convolutional Neural Network - Wikipedia
No ratings yet
Convolutional Neural Network - Wikipedia
21 pages
Fundamentals of Computer Vision 1st Edition Wesley E. Snyder - The Ebook With Rich Content Is Ready For You To Download
No ratings yet
Fundamentals of Computer Vision 1st Edition Wesley E. Snyder - The Ebook With Rich Content Is Ready For You To Download
60 pages
Instant Download Tutorials in Visual Cognition 1st Edition Veronika Coltheart (Ed.) PDF All Chapter
100% (4)
Instant Download Tutorials in Visual Cognition 1st Edition Veronika Coltheart (Ed.) PDF All Chapter
75 pages
The Cat Primary Visual Cortex 1st Edition Bertram Payne
No ratings yet
The Cat Primary Visual Cortex 1st Edition Bertram Payne
51 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
35 pages
Brain and Behavior-Sensory
No ratings yet
Brain and Behavior-Sensory
40 pages
Ad3501-Dl-Unit 2 Notes
No ratings yet
Ad3501-Dl-Unit 2 Notes
29 pages
Lila R. Gleitman An Invitation To Cognitive Science Visual Cognition MIT Press 1995
No ratings yet
Lila R. Gleitman An Invitation To Cognitive Science Visual Cognition MIT Press 1995
805 pages
(Bradford Books) Manfred Fahle, Tomaso Poggio - Perceptual Learning-MIT Press (2002)
No ratings yet
(Bradford Books) Manfred Fahle, Tomaso Poggio - Perceptual Learning-MIT Press (2002)
476 pages
Reality Truth Ai
No ratings yet
Reality Truth Ai
37 pages
DL Unit-4
No ratings yet
DL Unit-4
26 pages
Seeley 2013
No ratings yet
Seeley 2013
18 pages
Test Bank For Foundations of Behavioral Neuroscience 8th Edition Carlson
100% (51)
Test Bank For Foundations of Behavioral Neuroscience 8th Edition Carlson
31 pages
Mount Castle 1957
No ratings yet
Mount Castle 1957
27 pages
Predictive Processing A Canonical Cortical Computation
No ratings yet
Predictive Processing A Canonical Cortical Computation
12 pages
Bio Psychology Lecture Notes Chapter 6 The Visual System
33% (3)
Bio Psychology Lecture Notes Chapter 6 The Visual System
7 pages
Chapter 5
No ratings yet
Chapter 5
38 pages
What Is Convolutional Neural Network
No ratings yet
What Is Convolutional Neural Network
16 pages
4 Neural Network
No ratings yet
4 Neural Network
74 pages
Somatosensory Cortical Map Changes Following Digit Amputation in Adult Monkeys
No ratings yet
Somatosensory Cortical Map Changes Following Digit Amputation in Adult Monkeys
15 pages
Letvin Ieee 1959 PDF
No ratings yet
Letvin Ieee 1959 PDF
12 pages
BIOPSYCH Memareview
No ratings yet
BIOPSYCH Memareview
7 pages
Two Point Threshold
No ratings yet
Two Point Threshold
4 pages
PSY280 Midterm Keyterms
No ratings yet
PSY280 Midterm Keyterms
13 pages
Engineering Applications of Artificial Intelligence: Saijie Fan, Wei Liang, Derui Ding, Hui Yu
No ratings yet
Engineering Applications of Artificial Intelligence: Saijie Fan, Wei Liang, Derui Ding, Hui Yu
11 pages
50 Most Important CNN Interview Questions
No ratings yet
50 Most Important CNN Interview Questions
18 pages
Psychology 114 - Sensation and Perception
No ratings yet
Psychology 114 - Sensation and Perception
2 pages
Visual System (Sensory System) Part 2
No ratings yet
Visual System (Sensory System) Part 2
8 pages
Detecting Faces in Images: A Survey
100% (17)
Detecting Faces in Images: A Survey
25 pages
Physiological:biological Psychology Notes
No ratings yet
Physiological:biological Psychology Notes
36 pages
Types of Camera Lenses
100% (1)
Types of Camera Lenses
12 pages
Psych 1XX3 Form Perception II Lecture Notes
No ratings yet
Psych 1XX3 Form Perception II Lecture Notes
11 pages
Goldstone, R. L. (1998) - Perceptual Learning. Annual Review of Psychology, 49 (1), 585-612.
No ratings yet
Goldstone, R. L. (1998) - Perceptual Learning. Annual Review of Psychology, 49 (1), 585-612.
28 pages
Binocular Vision: Rahul Bhola, MD
No ratings yet
Binocular Vision: Rahul Bhola, MD
17 pages
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet