Data Science RR Itec-Deep Learning
Data Science RR Itec-Deep Learning
33.1 Introduction
1. Deep learning is a machine learning technique that teaches computers to do what comes
naturally to humans: learn by example
2. Most deep learning methods use neural network architectures, which is why deep learning
models are often referred to as deep neural networks.
3. The term “deep” usually refers to the number of hidden layers in the neural network.
Traditional neural networks only contain 2-3 hidden layers, while deep networks can have as
many as required.
4. Deep learning models are trained by using large sets of labelled data and neural network
architectures that learn features directly from the data without the need for manual feature
extraction.
1. Deep learning applications are used in industries from automated driving to medical devices.
189
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
3. Aerospace and Defence: Deep learning is used to identify objects from satellites that
locate areas of interest, and identify safe or unsafe zones for troops.
4. Medical Research: Cancer researchers are using deep learning to automatically detect
cancer cells. Teams at UCLA built an advanced microscope that yields a high-dimensional
data set used to train a deep learning application to accurately identify cancer cells.
5. Industrial Automation: Deep learning is helping to improve worker safety around heavy
machinery by automatically detecting when people or objects are within an unsafe distance
of machines.
6. Electronics: Deep learning is being used in automated hearing and speech translation. For
example, home assistance devices that respond to your voice and know your preferences
are powered by deep learning applications.
a.Number of children
190
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
import numpy as np
print(hidden_layer_values) #[5, 1]
print(output)
a. Linear
b. Non linear
import numpy as np
191
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
weights = {'node_0': np.array([1, 1]), 'node_1': np.array([-1, 1]),'output': np.array([2, -1])}
node_0_output = np.tanh(node_0_input)
node_1_output = np.tanh(node_1_input)
print(output)
192
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
33.4 Deeper networks
import numpy as np
193
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
'node_2': np.array([-1, 1]),
node_0_output_relu = np.maximum(node_0_output,0)
node_1_output_relu = np.maximum(node_1_output,0)
hidden_layer1_output_relu = np.maximum(hidden_layer1_output,0)
node_2_output_relu = np.maximum(node_2_output,0)
node_3_output_relu = np.maximum(node_3_output,0)
hidden_layer2_output_relu = np.maximum(hidden_layer2_output,0)
output_relu = np.maximum(output,0)
print(output_relu)
https://stackoverflow.com/questions/32109319/how-to-implement-the-relu-function-in-
numpy
194
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
33.5 Need for optimization
2. Hence in ordered to get good quality/optimization, choosing right weights play main role.
def relu(my_input):
return(max(0, my_input))
node_0_output = relu(node_0_input)
node_1_output = relu(node_1_input)
model_output = relu(input_to_final_layer)
return(model_output)
195
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
#Step 3: Use above functions to predict
# Sample weights
target_actual = 3
# Create weights that cause the network to make perfect prediction (3): weights_1
print(error_0);print(error_1)
196
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
33.6 Gradient descent
1. How many
197
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
# Step 1 of 2: Calculate slope/gradient
import numpy as np
# Define weights
target = 6
learning_rate = 0.01
print(error)
gradient
print(error_updated)
import numpy as np
weights = np.array([0,2,1])
198
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
input_data = np.array([1,2,3])
target = 0
print(slope)
#################################################
learning_rate = 0.01
print(error)
print(error_updated)
#################################################
199
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
def get_error(input_data, target, weights):
return(error)
return(slope)
mse = np.mean(errors**2)
return(mse)
n_updates = 20
mse_hist = []
for i in range(n_updates):
mse_hist.append(mse)
200
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
plt.plot(mse_hist)
plt.xlabel('Iterations')
plt.show() # Notice that, the mean squared error decreases as the number of iterations
go up.
201
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
33.7 Backpropagation
1. Update weights using error and iterate till we meet actual target data
2. Try to understand the process, however you will generally use a library that implements forward
and backward propagations.
202
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
4. If a variable z depends on the variable y, y variables depends on the variable x.We can say z
depends on x . we can write chain rule as for below
a. If you have gone through 4 iterations of calculating slopes (using backward propagation)
and then updated weights, how many times must you have done forward propagation?
i. 0
ii. 1
iii. 4
iv. 8
203
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
b. If your predictions were all exactly right, and your errors were all exactly 0, the slope of
the loss function with respect to your predictions would also be 0. In that circumstance,
which of the following statements would be correct?
ii. The updates to all weights in the network would be dependent on the activation
functions.
iii. The updates to all weights in the network would be proportional to values from
the input data.
204
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
33.8 Creating keras Regression Model
1. Specify Architecture
v. Define output
2. Compile
i. Define optimizer
3. Fit
i. Applying backpropagation
4. Predict
import pandas as pd
df = pd.read_csv("hourly_wages.csv")
df
predictors = (df[df.columns[[1,2,3,4,5,6,7,8,9]]].values)
target = (df[df.columns[0]].values)
205
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
# Import necessary modules
import keras
n_cols = predictors.shape[1]
model = Sequential()
model.add(Dense(32, activation='relu'))
model.add(Dense(1))
a. To compile the model, you need to specify the optimizer and loss function
206
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])
model.fit(predictors, target)
5. Step 4 of 4: predict
model.predict(predictors)
207
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
33.9 Creating keras Classification Models
4. Output layer has separate node for each possible outcome, and uses ‘softmax’
activation
Process:
2. You will use predictors such as age, fare and where each passenger embarked from to
predict who will survive.
# understand data
import pandas as pd
df = pd.read_csv("titanic_all_numeric_train.csv")
target = to_categorical(df.survived)
df = pd.read_csv("titanic_all_numeric_test.csv")
import keras
model = Sequential()
n_cols = predictors.shape[1]
208
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
# Add the first layer
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='sgd',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(predictors, target)
1. Save
2. Reload
3. Make predictions
model.save('model_file.h5')
my_model = load_model('model_file.h5')
predictions = model.predict(test_data)
predicted_prob_true = predictions[:,1]
# print predicted_prob_true
print(predicted_prob_true)
209
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
33.11 Understanding Model Optimization
c. Updates too small (if learning rate is low) or too large (if learning rate is high)
2. Scenario: Try to optimize a model at a very low learning rate, a very high learning rate,
and a "just right" learning rate. We need to look at the results after running this exercise,
remembering that a low value for the loss function is good
input_shape=(10,)
type(input_shape)
input_shape
model = Sequential()
model.add(Dense(100, activation='relu'))
model.add(Dense(2, activation='softmax'))
return(model)
for lr in lr_to_test:
210
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
print('\n\nTesting model with learning rate: %f\n'%lr )
model = get_new_model()
my_optimizer = SGD(lr=lr)
model.compile(optimizer=my_optimizer, loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(predictors, target)
3. Which of the following could prevent a model from showing an improved loss in its first
few epochs?
211
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
33.12 Model Validation
n_cols = predictors.shape[1]
input_shape = (n_cols,)
model = Sequential()
model.add(Dense(100, activation='relu'))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy'])
# Import EarlyStopping
n_cols = predictors.shape[1]
input_shape = (n_cols,)
212
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
# Specify the model
model = Sequential()
model.add(Dense(100, activation='relu'))
model.add(Dense(2, activation='softmax'))
# Define early_stopping_monitor
early_stopping_monitor = EarlyStopping(patience=2)
# Define early_stopping_monitor
early_stopping_monitor = EarlyStopping(patience=2)
model_1 = Sequential()
model_1.add(Dense(10, activation='relu'))
model_1.add(Dense(2, activation='softmax'))
# Compile model_1
model_1.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy'])
213
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
# Create the new model: model_2
model_2 = Sequential()
model_2.add(Dense(100, activation='relu'))
model_2.add(Dense(2, activation='softmax'))
# Compile model_2
model_2.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy'])
# Fit model_1
# Fit model_2
plt.xlabel('Epochs')
plt.ylabel('Validation score')
plt.show()
5. Note: Model_2 blue line in the graph has less loss ,so it is good
a. In above exercise 3, you’ve seen how to experiment with wider networks. In this
exercise, you'll try a deeper network (more hidden layers).
214
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
input_shape = (n_cols,)
model_1 = Sequential()
model_1.add(Dense(2, activation='softmax'))
# Compile model_1
model_1.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy'])
model_2 = Sequential()
model_2.add(Dense(50, activation='relu'))
model_2.add(Dense(50, activation='relu'))
model_2.add(Dense(2, activation='softmax'))
# Compile model_2
model_2.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy'])
# Fit model 1
# Fit model 2
215
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
# Create the plot
plt.xlabel('Epochs')
plt.ylabel('Validation score')
plt.show()
216
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
33.13 Model Capacity
Mean Squared
Hidden Layers Nodes Per Layer Error Next Step
1 100 5.4 Increase Capacity
1 250 4.8 Increase Capacity
2 250 4.4 Increase Capacity
3 250 4.5 Decrease Capacity
3 200 4.3 Done
2. If we are not checking as shown above ,there is a chance to overfit the model
217
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
218
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
### Level 06 of 08: Project on Deep Learning ###
import numpy as np
import matplotlib
matplotlib.use('agg')
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='3'
os.environ['CUDA_VISIBLE_DEVICES'] = ''
# keras imports for the dataset and building our neural network
fig = plt.figure()
for i in range(9):
plt.subplot(3,3,i+1)
219
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
plt.tight_layout()
plt.title("Class {}".format(y_train[i]))
plt.xticks([])
plt.yticks([])
fig
# In order to train our neural network to classify images we first have to unroll the # height
×width pixel format into one big vector - the input vector. So its length
# must be 28 * 28 = 784. But let's graph the distribution of our pixel values.
fig = plt.figure()
plt.subplot(2,1,1)
plt.title("Class {}".format(y_train[0]))
plt.xticks([])
plt.yticks([])
plt.subplot(2,1,2)
plt.hist(X_train[0].reshape(784))
fig
#Note that the pixel values range from 0 to 255: the background majority close to 0, and those
close to #255 representing the digit.
# Normalizing the input data helps to speed up the training. Also, it reduces the chance of
getting stuck # in local optima, since we're using stochastic gradient descent to find the optimal
weights for the #network.
#Let's reshape our inputs to a single vector and normalize the pixel values to lie between 0 and
1.
220
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
print("X_train shape", X_train.shape)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print(np.unique(y_train, return_counts=True))
#Let's encode our categories - digits from 0 to 9 - using one-hot encoding. The result is a vector
with a #length equal to the number of categories. The vector is all zeroes except in the position
for the #respective category. Thus a '5' will be represented by [0,0,0,0,0,1,0,0,0,0]
n_classes = 10
221
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
1. Our pixel vector serves as the input. Then, two hidden 512-node layers, with enough
model complexity for recognizing digits. For the multi-class classification we add another
densely-connected (or fully-connected) layer for the 10 different output classes. For this
network architecture we can use the Keras Sequential Model. We can stack layers using
the .add() method.
2. When adding the first layer in the Sequential Model we need to specify the input shape
so Keras can create the appropriate matrices. For all remaining layers the shape is
inferred automatically.
3. In order to introduce nonlinearities into the network and elevate it beyond the capabilities
of a simple perceptron we also add activation functions to the hidden layers. The
differentiation for the training via backpropagation is happening behind the scenes
without having to implement the details.
4. We also add dropout as a way to prevent overfitting. Here we randomly keep some
network weights fixed when we would normally update them so that the network doesn't
rely too much on very few nodes.
5. The last layer consists of connections for our 10 classes and the softmax activation
which is standard for multi-class targets.
model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
222
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))
batch_size=128, epochs=8,
verbose=2,
validation_data=(X_test, Y_test))
model_name = 'keras_mnist.h5'
model.save(model_path)
fig = plt.figure()
plt.subplot(2,1,1)
223
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.subplot(2,1,2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.tight_layout()
fig
Note: This learning curve looks quite good! We see that the loss on the training set is
decreasing rapidly for the first two epochs. This shows the network is learning to classify
the digits pretty fast. For the test set the loss does not decrease as fast but stays roughly
within the same range as the training loss. This means our model generalizes well to
unseen data.
224
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
Step 5 of 5: Evaluate the Model Performance
predicted_classes = mnist_model.predict_classes(X_test)
print()
plt.subplot(6,3,i+1)
plt.xticks([])
225
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
plt.yticks([])
plt.subplot(6,3,i+10)
plt.xticks([])
plt.yticks([])
# Import matplotlib
226
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
plt.imshow(data)
plt.show()
data[:40, :40, 0] = 1
data[:40, :40, 1] = 0
data[:40, :40, 2] = 0
plt.imshow(data)
plt.show()
n_categories = 3
for ii in range(len(labels)):
227
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
# Find the location of this label in the categories variable
jj = np.where(categories == labels[ii])
one_hot_encoding_labels[ii, jj] = 1
one_hot_encoding_labels
6.
228
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182
229
RR ITEC #209, Nilagiri Block, Adithya Enclave, Ameerpet @8374899166, 8790998182