0% found this document useful (0 votes)
25 views15 pages

Mnist Classification Report

The document outlines a project focused on classifying handwritten digits using a convolutional neural network (CNN) on the MNIST dataset, which consists of 70,000 labeled grayscale images. It details the methodology including data preparation, model architecture, training strategy, and evaluation using 5-fold cross-validation, ultimately achieving robust performance. The project was developed in Google Colab, emphasizing the effectiveness of deep learning techniques in image classification tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views15 pages

Mnist Classification Report

The document outlines a project focused on classifying handwritten digits using a convolutional neural network (CNN) on the MNIST dataset, which consists of 70,000 labeled grayscale images. It details the methodology including data preparation, model architecture, training strategy, and evaluation using 5-fold cross-validation, ultimately achieving robust performance. The project was developed in Google Colab, emphasizing the effectiveness of deep learning techniques in image classification tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

ASSIGNMENT-1

Harshyara
Bukkapatnam

ENG21CS0085

7 Semester B
th


November 6,2024

Sequence Networks and GAN

Prof.Arjun KrishnaMurthy
CNN for MNIST Handwritten Digit
Classification

Dataset

The MNIST handwritten digit classification problem is a standard dataset


used in computer vision and deep learning. Although the dataset is
effectively solved, it can be used as the basis for learning and practicing
how to develop, evaluate, and use convolutional deep learning neural
networks for image classification from scratch. This includes how to
develop a robust test harness for estimating the performance of the model,
how to explore improvements to the model, and how to save the model and
later load it to make predictions on new data.

MNIST is a widely used dataset for the hand-written digit classification


task. It consists of 70,000 labelled 28x28 pixel grayscale images of hand-
written digits. The dataset is split into 60,000 training images and 10,000
test images. There are 10 classes (one for each of the 10 digits). The task
at hand is to train a model using the 60,000 training images and
subsequently test its classification accuracy on the 10,000 test images.

The dataset that is being used here is the MNIST digits classification
dataset. Keras is a deep learning API written in Python and MNIST is a
dataset provided by this API. This dataset consists of 60,000 training
images and 10,000 testing images.
Model Methodology

The methodology for this project involves constructing and evaluating a


convolutional neural network (CNN) to classify handwritten digits from the
MNIST dataset. The process is structured as follows:

Data Preparation:

 Dataset Loading and Preprocessing: The MNIST dataset is loaded


from tensorflow.keras.datasets, consisting of 28x28 grayscale images
with digit labels (0–9). Images are reshaped to a single channel for
CNN compatibility and one-hot encoded for categorical classification.

 Normalization: Pixel values, originally in the range [0, 255], are


scaled to [0, 1] to enhance convergence during training.

Model Architecture:

 CNN Structure: A CNN model is constructed using Sequential from


tensorflow.keras, with layers designed for feature extraction and
classification:

o Convolutional Layers: Two convolutional layers (32 and 64


filters, respectively) apply a 3x3 kernel with ReLU activation and
he_uniform initialization, followed by batch normalization and
max pooling.

o Dropout Layers: Dropout (0.2 and 0.3) is added to mitigate


overfitting by randomly disabling neurons during training.

o Dense Layers: A fully connected dense layer with 100 neurons,


followed by batch normalization and dropout (0.5), is applied
before the final output layer.

o Output Layer: A dense layer with 10 neurons and softmax


activation outputs class probabilities.

 Compilation: The model is compiled using Stochastic Gradient


Descent (SGD) with a learning rate of 0.01 and momentum of 0.9,
optimized for categorical cross-entropy loss.

Evaluation Approach (k-Fold Cross-Validation) :

 Cross-Validation: The model is evaluated using 5-fold cross-


validation, where the dataset is divided into five subsets. For each
fold, four subsets are used for training, and one for testing. This
approach provides a robust estimate of model performance by
assessing variability across different splits.

 Training and Testing: Within each fold, the model trains for 10
epochs with a batch size of 32. Accuracy is recorded for both training
and validation data, enabling performance comparison across folds.

Diagnostics and Performance Summary:

 Learning Curves: Training and validation losses and accuracies are


plotted for each fold to visualize model learning and identify potential
overfitting or underfitting.

 Accuracy Summary: Final performance is summarized by


calculating the mean and standard deviation of accuracies across all
folds, offering a consolidated view of the model's generalization
ability.

Development Environment
Google Colab was used as the development environment for this project,
providing a cloud-based Jupyter notebook interface with pre-installed
libraries for deep learning, such as TensorFlow and Keras. It enables
seamless access to GPU acceleration, enhancing model training efficiency
on the MNIST dataset. Additionally, Colab's collaborative features
facilitate code sharing and documentation, streamlining the development
and testing process.

Principle
The principle behind this CNN model is to classify handwritten digits by
progressively learning spatial hierarchies of features through
convolutional layers. The model leverages convolution to capture local
patterns, like edges and textures, which are essential for recognizing digit
shapes. Max pooling layers down-sample these features, reducing
computational complexity while preserving key information.

Regularization techniques such as dropout prevent overfitting by adding


noise to the network, enhancing generalization. Finally, the model uses
softmax activation to output class probabilities, enabling accurate digit
classification.

Developing a Model
Importing Libraries
To develop the convolutional neural network (CNN) model for digit
classification, we first import essential libraries:

 Numpy: Used for efficient numerical operations, particularly matrix


manipulations, which are crucial in deep learning tasks.

 Matplotlib: A plotting library used to visualize learning curves and


diagnostic plots, aiding in model evaluation and performance
analysis.

 Scikit-Learn's KFold: Provides k-fold cross-validation to estimate


model performance by training and testing on different data splits.

 TensorFlow and Keras Modules:

o Datasets (MNIST): Loads the MNIST dataset, a widely used


collection of handwritten digits, ideal for testing classification
models.

o Utils (to_categorical): Converts class labels to one-hot encoded


format, necessary for multi-class classification.

o Models (Sequential): Facilitates the construction of a layer-by-


layer neural network model.

o Layers (Conv2D, MaxPooling2D, Dense, Flatten): Composes


the CNN architecture, with Conv2D for feature extraction,
MaxPooling2D for down-sampling, Dense for fully connected
layers, and Flatten for reshaping data.

o Optimizers (SGD): Implements Stochastic Gradient Descent


with learning rate and momentum adjustments, optimizing
model convergence.

These libraries collectively provide the tools to preprocess data, build the
CNN model, train with cross-validation, and evaluate performance.

Data Loading and Preparation


 Dataset Loading: The model uses the MNIST dataset, loaded
through tensorflow.keras.datasets. This dataset contains 28x28
grayscale images of handwritten digits (0-9), separated into training
and test sets.

 Reshaping the Dataset: Each image is reshaped to include a single


channel (28x28x1) to suit the CNN model’s input requirements. This
format allows the convolutional layers to process spatial relationships
effectively within each image.

 One-Hot Encoding: Target labels (digit classes) are one-hot


encoded, converting each label into a vector representation. This
encoding is crucial for categorical classification, where the model
predicts the probability for each class.
 Normalization: Pixel values, initially ranging from [0, 255], are
normalized to [0, 1] by dividing by 255.0. Normalization aids in
stabilizing and accelerating the training process, allowing the model
to converge more efficiently by reducing variations in pixel intensity.

By performing these steps, the dataset is prepared for optimal


performance within the CNN model, enhancing both accuracy and training
speed.

Model Architecture
The model is a convolutional neural network (CNN) designed to classify
images of handwritten digits from the MNIST dataset. The architecture is
structured to progressively capture features at multiple levels of
abstraction through several key layers:

 Convolutional Layers: The model begins with two convolutional


layers. The first layer has 32 filters, and the second layer has 64
filters, both with a 3x3 kernel size and ReLU activation. These layers
learn spatial features in the image, such as edges and textures,
crucial for digit recognition.

 Batch Normalization: Each convolutional layer is followed by batch


normalization to stabilize and accelerate training by normalizing the
inputs to each layer, which helps improve model accuracy.

 Max Pooling Layers: Max pooling layers follow each batch-


normalized convolutional layer to down-sample feature maps,
reducing the computational load and focusing on the most significant
features.

 Dropout Layers: Dropout layers are included after each max pooling
layer to reduce overfitting. A dropout rate of 0.2 is applied after the
first convolutional layer, and 0.3 after the second.

 Fully Connected Layers: After flattening the feature maps, the


model includes a dense layer with 100 units and ReLU activation to
learn complex combinations of features before the output layer.

 Output Layer: A dense output layer with 10 neurons and softmax


activation provides class probabilities, corresponding to the ten digit
classes (0–9).

Model Compilation
The model is compiled with the following configurations:

 Optimizer: Stochastic Gradient Descent (SGD) with a learning rate of


0.01 and a momentum of 0.9, which helps the model converge faster
by incorporating previous gradient information.

 Loss Function: Categorical cross-entropy is used as the loss


function, appropriate for multi-class classification tasks.

 Metrics: Model accuracy is tracked as the primary metric, providing


a straightforward evaluation of performance during training and
validation.

Model Training Strategy


The model is trained over multiple epochs with cross-validation to ensure
robust generalization. This strategy helps verify that the model performs
consistently across different data splits, improving its reliability on unseen
data.

This structured approach to model development enhances the model's


ability to effectively capture, retain, and generalize essential image
features, optimizing it for high accuracy in handwritten digit classification.
k-Fold Cross-Validation
To ensure a robust evaluation of model performance, a 5-fold cross-
validation approach was applied. Cross-validation splits the dataset into
five equal parts, or folds, and iteratively trains and tests the model across
these subsets. In each iteration, the model trains on four of the folds and
tests on the remaining one, which varies with each fold. This method
provides a more reliable performance estimate by reducing the impact of
random sampling variations.

Evaluation Function
The evaluate_model function was implemented to automate this cross-
validation process. It initializes a KFold object, shuffling the dataset with a
fixed random state for reproducibility. Within each fold, the model is
defined, trained for 10 epochs with a batch size of 32, and evaluated on the
validation fold. The function then appends the accuracy score and training
history of each fold to respective lists, scores and histories, for
performance analysis.

Model Performance Tracking


For each fold, the model’s accuracy is printed to provide insights into the
network’s performance. Final results are stored, allowing for later
summarization of the model's average accuracy and standard deviation
across all folds. This approach enhances the reliability of performance
estimates and helps assess model consistency across different subsets of
the data.

Diagnostic Learning Curves


To assess the model's training dynamics and identify potential overfitting
or underfitting, we plotted diagnostic learning curves based on cross-
entropy loss and classification accuracy. For each fold in the cross-
validation process, the model's training and validation losses are
visualized, allowing for a comparison of generalization performance.
Similarly, training and validation accuracies are plotted, highlighting how
well the model learns across epochs. This visual analysis provides insights
into model stability and convergence behavior.

Performance Summary
After training the model across multiple folds, we compute an overall
performance summary by calculating the mean and standard deviation of
accuracy scores. This evaluation offers a comprehensive view of the
model’s effectiveness and consistency. A boxplot visualizes the distribution
of accuracy scores across folds, emphasizing the model's generalization
capability and performance stability. This summarization provides a
reliable measure of the model's robustness on unseen data.
Final Model Training

In this step, the model is trained on the entire training dataset using the
previously defined CNN architecture. The training process consists of
fitting the model to the trainX and trainY data for 10 epochs, with a batch
size of 32. During training, the model adjusts its weights to minimize the
loss function and improve its ability to classify handwritten digits from the
MNIST dataset.

Model Saving

After training, the model is saved as final_model.h5 using the model.save()


function. This allows the trained model to be easily loaded and used for
inference or further evaluation without needing to retrain. The saved
model captures the learned parameters, ensuring reproducibility and
facilitating deployment in real-world applications.

Running the Final Model

The final model is trained and saved by invoking the run_final_model()


function, which handles the complete training process and model storage.

Execution
Loading and Evaluating the Final Model
To assess the performance of the trained model, the final version is loaded
using TensorFlow's load_model function. The model, saved as
'final_model.h5', is then evaluated on the test dataset (testX, testY) to
gauge its accuracy. This step provides a final validation of the model's
ability to generalize on unseen data, with the test accuracy printed as the
output.

Image Loading and Preprocessing


The function load_and_prep_image takes an image file as input and
preprocesses it for prediction. The image is loaded in grayscale with a
target size of 28x28 pixels, consistent with the MNIST dataset. The image
is then converted to a array and reshaped into a format suitable for the
CNN model (a single sample with one channel). Pixel values are normalized
to the range [0,1] to match the preprocessing done during model training.
Finally, the pre-processed image is displayed for verification, and
debugging information is printed to ensure correct formatting.

Digit Prediction
The predict_digit function loads the pre-processed image and passes it
through the trained CNN model. The model predicts the class (digit) by
outputting a probability distribution over all 10 classes. The predicted digit
is identified by selecting the class with the highest probability using
np.argmax(). The predicted digit along with its confidence score
(probability distribution) is printed to the console for evaluation.

Testing the Prediction


The function is tested using the image file 'digit_image.png'. Upon
execution, the model loads, preprocesses the image, makes a prediction,
and outputs the predicted digit along with the confidence level. This
ensures the system functions correctly for digit recognition from new
images.
Conclusion
In conclusion, the CNN model successfully classifies handwritten digits
from the MNIST dataset, demonstrating effective use of convolutional
layers for feature extraction and regularization techniques to prevent
overfitting. Through 5-fold cross-validation, the model achieves robust
performance with reliable accuracy. The approach highlights the power of
deep learning in image classification tasks and showcases the efficiency of
using Google Colab as a development environment for training and
evaluation. Future work could explore further optimization and the
application of more complex architectures for even higher performance.

Reference:
Google Collab Link:

https://colab.research.google.com/drive/
1tp8z4wC8olSFZHYByPkjune6FRbw21Iv?usp=sharing

You might also like