0% found this document useful (0 votes)
45 views7 pages

Chapter05 Fundamentals-Of-Ml

This notebook summarizes key concepts from the book Deep Learning with Python. It contains code examples and explanations for fundamental machine learning topics like overfitting, generalization, improving model fit, and regularization. Code blocks demonstrate techniques like adding noise channels to mitigate overfitting, tuning hyperparameters like learning rate, and using dropout and weight regularization to improve generalization. The notebook is intended to be used alongside the corresponding text from the book.

Uploaded by

Jas Lim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views7 pages

Chapter05 Fundamentals-Of-Ml

This notebook summarizes key concepts from the book Deep Learning with Python. It contains code examples and explanations for fundamental machine learning topics like overfitting, generalization, improving model fit, and regularization. Code blocks demonstrate techniques like adding noise channels to mitigate overfitting, tuning hyperparameters like learning rate, and using dropout and weight regularization to improve generalization. The notebook is intended to be used alongside the corresponding text from the book.

Uploaded by

Jas Lim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 7

This is a companion notebook for the book Deep Learning with Python, Second Edition.

For
readability, it only contains runnable code blocks and section titles, and omits everything
else in the book: text paragraphs, figures, and pseudocode.
If you want to be able to follow what's going on, I recommend reading the notebook
side by side with your copy of the book.
This notebook was generated for TensorFlow 2.6.

Fundamentals of machine learning


Generalization: The goal of machine learning
Underfitting and overfitting

Noisy training data

Ambiguous features

Rare features and spurious correlations


Adding white-noise channels or all-zeros channels to MNIST
from tensorflow.keras.datasets import mnist
import numpy as np

(train_images, train_labels), _ = mnist.load_data()


train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255

train_images_with_noise_channels = np.concatenate(
[train_images, np.random.random((len(train_images), 784))], axis=1)

train_images_with_zeros_channels = np.concatenate(
[train_images, np.zeros((len(train_images), 784))], axis=1)

Training the same model on MNIST data with noise channels or all-zero channels
from tensorflow import keras
from tensorflow.keras import layers

def get_model():
model = keras.Sequential([
layers.Dense(512, activation="relu"),
layers.Dense(10, activation="softmax")
])
model.compile(optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
return model

model = get_model()
history_noise = model.fit(
train_images_with_noise_channels, train_labels,
epochs=10,
batch_size=128,
validation_split=0.2)

model = get_model()
history_zeros = model.fit(
train_images_with_zeros_channels, train_labels,
epochs=10,
batch_size=128,
validation_split=0.2)

Plotting a validation accuracy comparison


import matplotlib.pyplot as plt
val_acc_noise = history_noise.history["val_accuracy"]
val_acc_zeros = history_zeros.history["val_accuracy"]
epochs = range(1, 11)
plt.plot(epochs, val_acc_noise, "b-",
label="Validation accuracy with noise channels")
plt.plot(epochs, val_acc_zeros, "b--",
label="Validation accuracy with zeros channels")
plt.title("Effect of noise channels on validation accuracy")
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.legend()

The nature of generalization in deep learning


Fitting a MNIST model with randomly shuffled labels
(train_images, train_labels), _ = mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255

random_train_labels = train_labels[:]
np.random.shuffle(random_train_labels)

model = keras.Sequential([
layers.Dense(512, activation="relu"),
layers.Dense(10, activation="softmax")
])
model.compile(optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
model.fit(train_images, random_train_labels,
epochs=100,
batch_size=128,
validation_split=0.2)

The manifold hypothesis

Interpolation as a source of generalization

Why deep learning works

Training data is paramount

Evaluating machine-learning models


Training, validation, and test sets

Simple hold-out validation

K-fold validation

Iterated K-fold validation with shuffling

Beating a common-sense baseline

Things to keep in mind about model evaluation

Improving model fit


Tuning key gradient descent parameters
Training a MNIST model with an incorrectly high learning rate
(train_images, train_labels), _ = mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255

model = keras.Sequential([
layers.Dense(512, activation="relu"),
layers.Dense(10, activation="softmax")
])
model.compile(optimizer=keras.optimizers.RMSprop(1.),
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
model.fit(train_images, train_labels,
epochs=10,
batch_size=128,
validation_split=0.2)

The same model with a more appropriate learning rate


model = keras.Sequential([
layers.Dense(512, activation="relu"),
layers.Dense(10, activation="softmax")
])
model.compile(optimizer=keras.optimizers.RMSprop(1e-2),
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
model.fit(train_images, train_labels,
epochs=10,
batch_size=128,
validation_split=0.2)

Leveraging better architecture priors

Increasing model capacity


A simple logistic regression on MNIST
model = keras.Sequential([layers.Dense(10, activation="softmax")])
model.compile(optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
history_small_model = model.fit(
train_images, train_labels,
epochs=20,
batch_size=128,
validation_split=0.2)

import matplotlib.pyplot as plt


val_loss = history_small_model.history["val_loss"]
epochs = range(1, 21)
plt.plot(epochs, val_loss, "b--",
label="Validation loss")
plt.title("Effect of insufficient model capacity on validation loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()

model = keras.Sequential([
layers.Dense(96, activation="relu"),
layers.Dense(96, activation="relu"),
layers.Dense(10, activation="softmax"),
])
model.compile(optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
history_large_model = model.fit(
train_images, train_labels,
epochs=20,
batch_size=128,
validation_split=0.2)
Improving generalization
Dataset curation

Feature engineering

Using early stopping

Regularizing your model

Reducing the network's size


Original model
from tensorflow.keras.datasets import imdb
(train_data, train_labels), _ = imdb.load_data(num_words=10000)

def vectorize_sequences(sequences, dimension=10000):


results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
train_data = vectorize_sequences(train_data)

model = keras.Sequential([
layers.Dense(16, activation="relu"),
layers.Dense(16, activation="relu"),
layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
loss="binary_crossentropy",
metrics=["accuracy"])
history_original = model.fit(train_data, train_labels,
epochs=20, batch_size=512, validation_split=0.4)

Version of the model with lower capacity


model = keras.Sequential([
layers.Dense(4, activation="relu"),
layers.Dense(4, activation="relu"),
layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
loss="binary_crossentropy",
metrics=["accuracy"])
history_smaller_model = model.fit(
train_data, train_labels,
epochs=20, batch_size=512, validation_split=0.4)

Version of the model with higher capacity


model = keras.Sequential([
layers.Dense(512, activation="relu"),
layers.Dense(512, activation="relu"),
layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
loss="binary_crossentropy",
metrics=["accuracy"])
history_larger_model = model.fit(
train_data, train_labels,
epochs=20, batch_size=512, validation_split=0.4)

Adding weight regularization


Adding L2 weight regularization to the model
from tensorflow.keras import regularizers
model = keras.Sequential([
layers.Dense(16,
kernel_regularizer=regularizers.l2(0.002),
activation="relu"),
layers.Dense(16,
kernel_regularizer=regularizers.l2(0.002),
activation="relu"),
layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
loss="binary_crossentropy",
metrics=["accuracy"])
history_l2_reg = model.fit(
train_data, train_labels,
epochs=20, batch_size=512, validation_split=0.4)

Different weight regularizers available in Keras


from tensorflow.keras import regularizers
regularizers.l1(0.001)
regularizers.l1_l2(l1=0.001, l2=0.001)

Adding dropout
Adding dropout to the IMDB model
model = keras.Sequential([
layers.Dense(16, activation="relu"),
layers.Dropout(0.5),
layers.Dense(16, activation="relu"),
layers.Dropout(0.5),
layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
loss="binary_crossentropy",
metrics=["accuracy"])
history_dropout = model.fit(
train_data, train_labels,
epochs=20, batch_size=512, validation_split=0.4)

Summary

You might also like