ALEXNET: THE BREAKTHROUGH CNN
ARCHITECTURE
DISCUSSIONS ABOUT ALEXNET
GANGAHARIDI NARAYAN KASHYAP ANIK PAUL RUPAM PAUL
AGENDA
HISTORICAL CONTEXT & MOTIVATION
ALEXNET ARCHITECTURE OVERVIEW
DETAILED LOOK AT CONVOLUTIONAL
LAYERS
POOLING, NORMALIZATION &
REGULARIZATION
TRAINING METHODOLOGY
IMPACT AND LEGACY OF ALEXNET
Historical Context & Motivation
Background:
Introduced in 2012 at the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC), it was the winner(15.3% error
rate).
Developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey
Hinton from the University of Toronto.
Motivation:
Conventional methods had reached a plateau in accuracy and
efficiency.
the need for models that leverage GPUs, large-scale datasets,
and deep architectures.
Image classification problem:
1000 classes
650K neurons
62M parameters
Training: 1.2M images
Validation: 50K images
Test: 150K images
ARCHITECTURE
5 CONVOLUTIONAL LAYERS:
• EARLY LAYERS CAPTURE LOW-LEVEL FEATURES LIKE EDGES AND
TEXTURES.
• LATER LAYERS CAPTURE COMPLEX PATTERNS AND ABSTRACT
FEATURES.
•
3 FULLY CONNECTED LAYERS:
THESE LAYERS INTEGRATE THE FEATURES EXTRACTED BY THE
CONVOLUTIONAL LAYERS FOR THE FINAL CLASSIFICATION.
KEY INNOVATIONS:
• USE OF RELU ACTIVATIONS FOR FASTER TRAINING.
• LOCAL RESPONSE NORMALIZATION (LRN) TO IMPROVE
GENERALIZATION.
• OVERLAPPING MAX POOLING FOR SPATIAL DOWNSAMPLING.
• PARALLEL GPU TRAINING
Fig. AlexNet
block diagram
TRAINING METHODOLOGY
Tr a i n i n g D e t a i l s :
Optimizer: Stochastic Gradient Descent (SGD)
Overcoming Hardware Limitations:
Use of GPUs and distributed training to manage
high computational requirements.
Epochs & Data:
Training on millions of images from the ImageNet
dataset.
Fig2.: Two examples of the neural network
POOLING, NORMALIZATION &
REGUL ARIZATION
POOLING MECHANISM:
OVERLAPPING MAX POOLING, WHICH HELPS DOWNSAMPLE THE
FEATURE MAPS WHILE PRESERVING CRITICAL FEATURES.
NORMALIZATION:
LOCAL RESPONSE NORMALIZATION (LRN) ADDS LATERAL INHIBITION,
SIMULATING A KIND OF COMPETITION BETWEEN NEURONS.
REGULARIZATION:
DROPOUT: A PROBABILITY-BASED TECHNIQUE TO DROP NEURONS
DURING TRAINING THAT HELPS REDUCE OVERFITTING
•The CNN is split across two GPUs, which communicate only at certain layers; some convolutional layers are
restricted to processing data only on their own GPU.
•Conv1 uses 96 filters of size 11×11×3 with stride 4, followed by ReLU, response normalization, and max pooling.
•Conv2 has 256 filters of size 5×5×48, also followed by ReLU, response normalization, and max pooling.
•Conv3–5 use 3×3 filters (384, 384, and 256 filters respectively); Conv3 connects across both GPUs, while Conv4 and
Conv5 are GPU-local; Conv5 is followed by max pooling.
•Three fully connected layers with 4096 neurons each follow, all using ReLU, and are fully connected across all
previous layer outputs.
THE LASTING LEGACY OF
ALEXNET IN AI
1. Foundation for Advanced Networks: AlexNet paved the way for
deeper CNN architectures like VGG and ResNet, becoming a cornerstone
in modern deep learning.
2. Broader Adoption of Deep Learning: Its success accelerated the use
of deep learning across various fields, including computer vision and NLP.
3. Ongoing Research Benchmark: AlexNet remains a key reference for
evaluating and developing new AI models and techniques.
.