Convolutional Neural Networks in Python Master Data Science and Machine Learning With Modern Deep Learning in Python, Theano, And TensorFlow (Machine Learning in Python) by LazyProgrammer (Z-lib.org)
Convolutional Neural Networks in Python Master Data Science and Machine Learning With Modern Deep Learning in Python, Theano, And TensorFlow (Machine Learning in Python) by LazyProgrammer (Z-lib.org)
Networks in Python
Chapter 2: Convolution
This book is all about how to use deep learning for computer
vision using convolutional neural networks. These are the
state of the art when it comes to image classification and
they beat vanilla deep networks at tasks like MNIST.
All the materials used in this book are FREE. You can
download and install Python, Numpy, Scipy, Theano, and
TensorFlow with pip or easy_install.
Lastly, my goal is to show you that convolutional networks
aren’t magical and they don’t require expert-level math to
figure out.
y = softmax( relu(X.dot(W1).dot(W2) )
Except we replace the first “dot product” with a convolution:
Predict
Data Preprocessing
Note that these are MATLAB binary data files, so we’ll need
to use the Scipy library to load them, which I’m sure you
have heard of if you’re familiar with the Numpy stack.
Chapter 2: Convolution
So what is convolution?
Think of your favorite audio effect (suppose that’s the
“echo”). An echo is simply the same sound bouncing back at
you in the future, but with less volume. We’ll see how we
can do that mathematically later.
--------
x(t)--->| h(t) |--->y(t)
--------
You can see from this formula that this just does both
convolutions independently in each direction. I’ve got some
pseudocode here to demonstrate how you might write this
in code, but notice there’s a problem. If i > n or j > m, we’ll
go out of bounds.
y = np.zeros(x.shape)
for n in xrange(x.shape[0]):
for m in xrange(x.shape[1]):
for i in xrange(w.shape[0]):
for j in xrange(w.shape[1]):
y[n,m] += w[i,j]*x[n-i,m-j]
Gaussian Blur
If you’ve ever done image editing with applications like
Photoshop or GIMP you are probably familiar with the blur
filter. Sometimes it’s called a Gaussian blur, and you’ll see
why in a minute.
If you just want to see the code that’s already been written,
check out the file
https://github.com/lazyprogrammer/machine_learning_exam
ples/blob/master/cnn_class/blur.py from Github.
The idea is the same as we did with the sound echo. We’re
going to take a signal and spread it out.
But this time instead of having predefined delays we are
going to spread out the signal in the shape of a 2-
dimensional Gaussian.
W = np.zeros((20, 20))
for i in xrange(20):
for j in xrange(20):
import numpy as np
img = mpimg.imread('lena.png')
plt.imshow(img)
plt.show()
# make it B&W
bw = img.mean(axis=2)
plt.imshow(bw, cmap='gray')
plt.show()
for i in xrange(20):
for j in xrange(20):
plt.show()
out = convolve2d(bw, W)
plt.imshow(out, cmap='gray')
plt.show()
# what's that weird black stuff on the edges? let's check the
size of output
print out.shape
# we can also just make the output the same size as the
input
out = convolve2d(bw, W, mode='same')
plt.imshow(out, cmap='gray')
plt.show()
print out.shape
Edge Detection
Edge detection is another important operation in computer
vision. If you just want to see the code that’s already been
written, check out the file
https://github.com/lazyprogrammer/machine_learning_exam
ples/blob/master/cnn_class/edge.py from Github.
[-1, 0, 1],
[-2, 0, 2],
[-1, 0, 1],
], dtype=np.float32)
Hy = np.array([
[-1, -2, -1],
[0, 0, 0],
[1, 2, 1],
], dtype=np.float32)
import numpy as np
img = mpimg.imread('lena.png')
# make it B&W
bw = img.mean(axis=2)
# Sobel operator - approximate gradient in X dir
Hx = np.array([
[-1, 0, 1],
[-2, 0, 2],
[-1, 0, 1],
], dtype=np.float32)
# Sobel operator - approximate gradient in Y dir
Hy = np.array([
[0, 0, 0],
[1, 2, 1],
], dtype=np.float32)
Gx = convolve2d(bw, Hx)
plt.imshow(Gx, cmap='gray')
plt.show()
Gy = convolve2d(bw, Hy)
plt.imshow(Gy, cmap='gray')
plt.show()
# Gradient magnitude
G = np.sqrt(Gx*Gx + Gy*Gy)
plt.imshow(G, cmap='gray')
plt.show()
The Takeaway
Downsampling
Another important operation we’ll need before we build the
convolutional neural network is downsampling. So
remember our audio sample where we did an echo - that
was a 16kHz sample. Why 16kHz? Because this is adequate
for representing voices.
Z = conv(X, W1)
Y = softmax(Z.dot(W2))
As stated previously, you could then train this simply by
doing gradient descent.
So in the first layer, you take the image, and keep all the
colors and the original shape, meaning you don’t flatten it.
(i.e. it remains (3 x W x H))
Finally, you flatten these features into a vector and you put
it into a regular, fully connected neural network like the
ones we’ve been talking about.
Schematically it would look like this:
The basic pattern is:
Technicalities
Look familiar?
That’s because it’s the same “backpropagation” (gradient
descent) equation from plain neural networks!
pooled_out = downsample.max_pool_2d(
input=conv_out,
ds=poolsize,
ignore_border=True
def rearrange(X):
# input is (32, 32, 3, N)
N = X.shape[-1]
for i in xrange(N):
for j in xrange(3):
out[i, j, :, :] = X[:, :, j, i]
return out / 255
W1_shape = (20, 3, 5, 5)
W1 = np.random.randn(W1_shape)
b1_init = np.zeros(W1_shape[0])
W2 = np.random.randn(W2_shape)
b2_init = np.zeros(W2_shape[0])
W3_init = np.random.randn(W2_shape[0]*5*5, M)
b3_init = np.zeros(M)
W4_init = np.random.randn(M, K)
b4_init = np.zeros(K)
Note that the bias is the same size as the number of feature
maps.
Since that image was 5x5 and had 50 feature maps, the
new flattened dimension will be 50x5x5.
Now that we have all the initial weights and operations we
need, we can compute the output of the neural network. So
we do the convpool twice, and then notice this flatten()
operation before I do the dot product. That’s because Z2,
after convpooling, will still be an image.
# forward pass
pY = T.nnet.softmax(Z3.dot(W4) + b4)
But if you call flatten() by itself it’ll turn into a 1-D array,
which we don’t want, and luckily Theano provides us with a
parameter that allows us to control how much to flatten the
array. ndim=2 means to flatten all the dimensions after the
2nd dimension.
import theano
import theano.tensor as T
return np.mean(p != t)
def relu(a):
return a * (a > 0)
def y2indicator(y):
N = len(y)
for i in xrange(N):
ind[i, y[i]] = 1
return ind
pooled_out = downsample.max_pool_2d(
input=conv_out,
ds=poolsize,
ignore_border=True
w = np.random.randn(*shape) / np.sqrt(np.prod(shape[1:])
+ shape[0]*np.prod(shape[2:] / np.prod(poolsz)))
return w.astype(np.float32)
def rearrange(X):
N = X.shape[-1]
for i in xrange(N):
for j in xrange(3):
out[i, j, :, :] = X[:, :, j, i]
def main():
train = loadmat('../large_files/train_32x32.mat')
test = loadmat('../large_files/test_32x32.mat')
Ytrain = train['y'].flatten() - 1
del train
Ytrain_ind = y2indicator(Ytrain)
Xtest = rearrange(test['X'])
Ytest = test['y'].flatten() - 1
del test
Ytest_ind = y2indicator(Ytest)
max_iter = 8
print_period = 10
lr = np.float32(0.00001)
reg = np.float32(0.01)
mu = np.float32(0.99)
N = Xtrain.shape[0]
batch_sz = 500
n_batches = N / batch_sz
M = 500
K = 10
poolsz = (2, 2)
# after downsample 10 / 2 = 5
W2_shape = (50, 20, 5, 5) # (num_feature_maps,
old_num_feature_maps, filter_width, filter_height)
W3_init = np.random.randn(W2_shape[0]*5*5, M) /
np.sqrt(W2_shape[0]*5*5 + M)
b3_init = np.zeros(M, dtype=np.float32)
X = T.tensor4('X', dtype='float32')
Y = T.matrix('T')
W1 = theano.shared(W1_init, 'W1')
b1 = theano.shared(b1_init, 'b1')
W2 = theano.shared(W2_init, 'W2')
b2 = theano.shared(b2_init, 'b2')
W3 = theano.shared(W3_init.astype(np.float32), 'W3')
b3 = theano.shared(b3_init, 'b3')
W4 = theano.shared(W4_init.astype(np.float32), 'W4')
b4 = theano.shared(b4_init, 'b4')
# momentum changes
dW1 = theano.shared(np.zeros(W1_init.shape,
dtype=np.float32), 'dW1')
db1 = theano.shared(np.zeros(b1_init.shape,
dtype=np.float32), 'db1')
dW2 = theano.shared(np.zeros(W2_init.shape,
dtype=np.float32), 'dW2')
db2 = theano.shared(np.zeros(b2_init.shape,
dtype=np.float32), 'db2')
dW3 = theano.shared(np.zeros(W3_init.shape,
dtype=np.float32), 'dW3')
db3 = theano.shared(np.zeros(b3_init.shape,
dtype=np.float32), 'db3')
dW4 = theano.shared(np.zeros(W4_init.shape,
dtype=np.float32), 'dW4')
db4 = theano.shared(np.zeros(b4_init.shape,
dtype=np.float32), 'db4')
# forward pass
Z3 = relu(Z2.flatten(ndim=2).dot(W3) + b3)
train = theano.function(
inputs=[X, Y],
updates=[
(W1, update_W1),
(b1, update_b1),
(W2, update_W2),
(b2, update_b2),
(W3, update_W3),
(b3, update_b3),
(W4, update_W4),
(b4, update_b4),
(dW1, update_dW1),
(db1, update_db1),
(dW2, update_dW2),
(db2, update_db2),
(dW3, update_dW3),
(db3, update_db3),
(dW4, update_dW4),
(db4, update_db4),
],
)
# create another function for this because we want it over
the whole dataset
get_prediction = theano.function(
inputs=[X, Y],
outputs=[cost, prediction],
t0 = datetime.now()
LL = []
for i in xrange(max_iter):
for j in xrange(n_batches):
train(Xbatch, Ybatch)
if j % print_period == 0:
LL.append(cost_val)
plt.plot(LL)
plt.show()
if __name__ == '__main__':
main()
Conclusion
I really hope you had as much fun reading this book as I did
making it.
https://www.udemy.com/deep-learning-convolutional-neural-
networks-theano-tensorflow
https://udemy.com/data-science-deep-learning-in-python
Are you comfortable with this material, and you want to take
your deep learning skillset to the next level? Then my
follow-up Udemy course on deep learning is for you. Similar
to previous book, I take you through the basics of Theano
and TensorFlow - creating functions, variables, and
expressions, and build up neural networks from scratch. I
teach you about ways to accelerate the learning process,
including batch gradient descent, momentum, and adaptive
learning rates. I also show you live how to create a GPU
instance on Amazon AWS EC2, and prove to you that
training a neural network with GPU optimization can be
orders of magnitude faster than on your CPU.
https://www.udemy.com/unsupervised-deep-learning-in-
python
https://udemy.com/data-science-logistic-regression-in-
python
My Facebook page,
https://facebook.com/lazyprogrammer.me (don’t forget to
hit “like”!)