This Python script implements a single-layer perceptron (a basic neural
network) using TensorFlow 2 to classify handwritten digits from the MNIST
dataset. Below is a detailed explanation of the code, broken down into its main
components:
1. Importing Libraries
import tensorflow as tf
from tensorflow.keras.datasets import mnist
import matplotlib.pyplot as plt
tensorflow: The main library for building and training neural networks.
mnist: A dataset of 28x28 grayscale images of handwritten digits (0–9),
included in TensorFlow’s Keras API.
matplotlib.pyplot: Used to visualize the training progress by plotting the
loss (cost) over epochs.
2. Loading and Preprocessing the MNIST Dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(-1, 784).astype('float32') / 255.0
x_test = x_test.reshape(-1, 784).astype('float32') / 255.0
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)
Loading the dataset:
o mnist.load_data() retrieves the MNIST dataset, splitting it into:
x_train, y_train: Training data (60,000 images and their
labels).
x_test, y_test: Test data (10,000 images and their labels).
o Each image is a 28x28 pixel array, and each label is an integer (0–9)
representing the digit.
Preprocessing the input data (x_train, x_test):
o reshape(-1, 784): Flattens each 28x28 image into a 1D array of 784
values (28 × 28 = 784).
o .astype('float32'): Converts the pixel values to 32-bit floats for
numerical stability.
o / 255.0: Normalizes pixel values from [0, 255] to [0, 1] to improve
training.
Preprocessing the labels (y_train, y_test):
o to_categorical(y_train, 10): Converts integer labels (e.g., 5) into
one-hot encoded vectors (e.g., [0, 0, 0, 0, 0, 1, 0, 0, 0, 0] for digit
5). The 10 specifies the number of classes (digits 0–9).
3. Defining Training Parameters
training_epochs = 25
learning_rate = 0.01
batch_size = 100
display_step = 1
training_epochs: Number of times the model will iterate over the entire
training dataset (25 epochs).
learning_rate: Step size for gradient descent optimization (0.01).
batch_size: Number of samples processed before updating the model’s
weights (100 images per batch).
display_step: Frequency (in epochs) for printing training progress (every
epoch).
4. Building the Model
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, activation='softmax', input_shape=(784,))
])
The model is a single-layer perceptron built using Keras’ Sequential API:
o tf.keras.layers.Dense: A fully connected layer with:
10 units (one for each digit class, 0–9).
activation='softmax': Applies the softmax function to output
probabilities for each class, summing to 1.
input_shape=(784,): Specifies that each input is a 784-
dimensional vector (flattened image).
This is a simple neural network with no hidden layers, directly mapping
the 784 input features to 10 output classes.
5. Compiling the Model
model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=learning_rate)
,
loss='categorical_crossentropy',
metrics=['accuracy'])
optimizer='SGD': Uses Stochastic Gradient Descent with the specified
learning_rate (0.01) to update weights.
loss='categorical_crossentropy': The loss function, suitable for multi-class
classification with one-hot encoded labels. It measures the difference
between predicted and actual class probabilities.
metrics=['accuracy']: Tracks classification accuracy during training and
evaluation.
6. Training the Model
avg_set = []
epoch_set = []
for epoch in range(training_epochs):
avg_cost = 0.
total_batch = int(x_train.shape[0] / batch_size)
dataset = tf.data.Dataset.from_tensor_slices((x_train,
y_train)).batch(batch_size)
for batch_xs, batch_ys in dataset:
loss, _ = model.train_on_batch(batch_xs, batch_ys)
avg_cost += loss / total_batch
if epoch % display_step == 0:
print(f"Epoch: {epoch+1:04d} cost={avg_cost:.9f}")
avg_set.append(avg_cost)
epoch_set.append(epoch+1)
Initialization:
o avg_set and epoch_set: Lists to store the average loss and epoch
numbers for plotting.
o total_batch: Calculates the number of batches (60,000 training
samples / 100 = 600 batches).
Creating a dataset:
o tf.data.Dataset.from_tensor_slices((x_train, y_train)): Creates a
TensorFlow dataset from the training data.
o .batch(batch_size): Groups the data into batches of 100 samples.
Training loop:
o Iterates over training_epochs (25 times).
o For each epoch:
Loops through batches using the dataset.
model.train_on_batch(batch_xs, batch_ys): Trains the
model on one batch, returning the loss.
Accumulates the average loss (avg_cost) by summing batch
losses and dividing by total_batch.
o Every display_step (1) epoch, prints the epoch number and
average loss, and stores them in epoch_set and avg_set.
Output:
o Prints progress like Epoch: 0001 cost=0.123456789 to show how
the loss decreases over epochs.
7. Evaluating the Model
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
print(f"MODEL accuracy: {test_accuracy:.4f}")
model.evaluate(x_test, y_test, verbose=0): Evaluates the model on the
test dataset, returning the test loss and accuracy.
verbose=0: Suppresses detailed output during evaluation.
Prints the test accuracy (e.g., MODEL accuracy: 0.9265), indicating the
percentage of correctly classified test images.
8. Plotting Training Progress
plt.plot(epoch_set, avg_set, 'o', label='Logistic Regression Training phase')
plt.ylabel('cost')
plt.xlabel('epoch')
plt.legend()
plt.show()
Uses Matplotlib to plot the training loss (avg_set) against epochs
(epoch_set).
'o': Plots data points as circles.
Labels the axes (cost for y-axis, epoch for x-axis) and adds a legend.
Displays a graph showing how the loss decreases over time, indicating
the model is learning.
What the Code Does
The script trains a single-layer perceptron to classify MNIST digits (0–9)
using a fully connected layer with softmax activation.
It preprocesses the data by flattening and normalizing images and
converting labels to one-hot encoding.
The model is trained for 25 epochs using SGD with a learning rate of 0.01
and batch size of 100.
It tracks and plots the training loss and evaluates the model’s accuracy
on the test set (typically ~90–92% for this simple model).
The model is equivalent to logistic regression for multi-class
classification, as it has no hidden layers and uses softmax with
categorical cross-entropy loss.