Deep Learning Approaches To Face Expression Classification - 111124
Deep Learning Approaches To Face Expression Classification - 111124
• Djoko Purwanto
• Artificial Intelligence and Health Technology Research Center
• Institut Teknologi Sepuluh Nopember (ITS)
DEEP LEARNING & IMAGE CLASSIFICATION
Deep Learning for Image Classification
Deep learning, particularly through Convolutional Neural Networks (CNNs), has
revolutionized image classification by automatically learning features from raw data, leading
to higher accuracy and efficiency compared to traditional machine learning methods
Key Points
Feature Learning: Deep learning models can automatically learn and extract features from
images, eliminating the need for manual feature engineering.
Accuracy: Deep learning models, especially CNNs, have achieved state-of-the-art
performance in various image classification tasks.
Scalability: These models can handle large datasets and complex image classification
problems more effectively than traditional methods.
Transfer Learning: Pre-trained deep learning models can be fine-tuned for specific tasks,
making them versatile and efficient.
2
Image Classification
Image classification using deep learning involves training a model to recognize and categorize images
into predefined classes.
Brief Overview
Dataset: Collect a large set of labeled images, each tagged with the correct category.
Preprocessing: Resize, normalize, and augment the images to prepare them for training.
Model Selection: Choose a deep learning model, for image data.
Training: Train the model with the preprocessed images to learn features and patterns,
minimizing errors through forward and backpropagation.
Evaluation: Test the model on separate images to check its accuracy and performance using
metrics like accuracy, precision, recall, and F1-score.
Fine-Tuning: Adjust the model’s parameters and architecture to improve performance,
possibly using transfer learning.
Deployment: Deploy the model to classify new images in real-time applications once it
performs well
3
FACE EXPRESSION CLASSIFICATION
Face expression classification identifies emotions through facial cues, focusing on three classes: Angry,
Happy, and Sad. Angry expressions show furrowed brows and narrowed eyes, indicating tension;
Happy expressions feature raised mouth corners and bright eyes, conveying joy; and Sad expressions
are marked by downturned lips and drooping eyelids, reflecting sorrow. Deep learning techniques, can
be uset to analyze these expressions for applications such as customer service, mental health, and
interactive technologies.
Image input
predicted output
Model [ Happy ]
parameter
4
Dataset
The dataset comprises a diverse collection of facial expression images sourced from various online
platforms, aimed at enhancing the study of emotion recognition through deep learning. It includes
labeled images representing key emotional states: Angry, Happy, and Sad.
5
Custom Model
A custom deep learning model is specifically designed and tailored to meet the unique
requirements of a particular task or dataset. Unlike pre-trained models, trained on large, generic
datasets, custom models are built from scratch or fine-tuned to address specific problems.
Key Aspects
Architecture Design: You can design the architecture of the neural network to suit your specific
needs. This includes choosing the number of layers, types of layers (e.g., convolutional, recurrent),
and the connections between them.
Training from Scratch: Custom models can be trained from scratch using your own dataset. This is
useful when you have a unique dataset that doesn’t match the data used to train pre-existing
models.
Transfer Learning: Often, custom models are built using transfer learning, where a pre-trained
model is adapted to a new task. This involves taking a model trained on a large dataset and fine-
tuning it with your specific data.
6
Hyperparameter Tuning: Custom models allow for extensive hyperparameter tuning to
optimize performance. This includes adjusting learning rates, batch sizes, and other
parameters to achieve the best results.
Specialized Layers and Functions: You can incorporate specialized layers or custom functions
that are not available in standard models. This might include custom loss functions, activation
functions, or other unique components.
Application-Specific: Custom models are tailored to specific applications, such as medical
image analysis, natural language processing, or autonomous driving, ensuring they perform
optimally for the intended use case.
7
Custom Model Example
8
Custom Model using Transfer Learning
9
Deployment
Deployment of face expression classification models involves integrating the trained model into
a real-world application, ensuring it can process input data (like images or video) and return
accurate emotion predictions. This process includes setting up the deployment environment,
optimizing the model for performance, and conducting thorough testing to ensure reliability.
Image input
predicted output
Face
Face [ Happy ]
Expression
Detection
Classification
10
DEEP LEARNING ELEMENTS
Rescaling
Definition: Adjusting the scale of data or images to fit a specific range or size.
In Data Processing:
o Normalization: Scaling data values to a common range (e.g., 0 to 1).
o Standardization: Transforming data to have a mean of 0 and a standard deviation of 1.
In Image Processing:
o Resizing: Changing the dimensions of an image (e.g., increasing or decreasing width and
height).
o Interpolation: Estimating pixel values during resizing using methods like nearest neighbor
or bilinear interpolation
11
Batch Normalization
Definition: A technique used in neural networks to normalize the inputs of each layer, improving
training speed and stability.
Purpose:
o Reduces internal covariate shift by normalizing layer inputs.
o Helps mitigate issues related to vanishing/exploding gradients.
How It Works:
o Normalizes the output of a layer by subtracting the batch mean and dividing by the batch
standard deviation.
o Applies learnable parameters (scale and shift) to allow the model to retain the ability to
represent the original distribution.
Implementation:
o Typically inserted after the linear transformation (e.g., before activation functions).
o Can be applied to both fully connected and convolutional layers.
12
Convolutional 2D
Definition: A core operation in Convolutional Neural Networks (CNNs) for processing 2D data,
primarily images.
Convolution Operation:
o Involves sliding a filter (kernel) over the input image.
o Performs element-wise multiplication and sums the results to produce an output value.
Input and Output:
o Input: 2D array (grayscale) or 3D array (color images).
o Output: Feature map that highlights detected features.
Stride and Padding:
o Stride: Determines how far the filter moves (e.g., stride of 1 moves one pixel at a time).
o Padding: Adds extra pixels around the image to control output size.
Multiple Filters: Uses various filters to capture different features, resulting in multiple feature maps.
Applications: Image classification, object detection, and image segmentation.
Benefits: Captures spatial hierarchies of features, making CNNs effective for visual data tasks.
13
14
Pooling
Definition: A downsampling operation used in Convolutional Neural Networks (CNNs) to reduce the
spatial dimensions of feature maps.
Purpose:
o Decreases the number of parameters and computations in the network.
o Helps prevent overfitting by providing an abstracted representation of the input.
Types of Pooling:
o Max Pooling: Takes the maximum value from a defined window (e.g., 2x2) of the feature map.
o Average Pooling: Computes the average value from the defined window.
o Global Average Pooling: Averages all values in the feature map, resulting in a single value per
feature map.
Stride and Window Size:
o Stride: Determines how far the pooling window moves (e.g., a stride of 2 skips every other pixel).
o Window Size: Defines the dimensions of the pooling operation (e.g., 2x2, 3x3).
Applications: Commonly used in CNN architectures for image classification and object detection.
15
16
Flatten
17
Dropout
Definition: A regularization technique used in neural networks to prevent overfitting by randomly
setting a fraction of the neurons to zero during training.
Purpose:
o Reduces reliance on specific neurons, encouraging the network to learn more robust features.
o Improves generalization to unseen data.
How It Works:
o During each training iteration, a specified percentage (e.g., 20%) of neurons are randomly
“dropped out” (set to zero).
o The remaining neurons continue to learn and update their weights.
Implementation:
o Typically applied after activation functions in fully connected layers or convolutional layers.
o The dropout rate is a hyperparameter that can be tuned.
18
19
ReLU Activation
Definition: ReLU (Rectified Linear Unit) is an activation
function used in neural networks that outputs the input
directly if it is positive; otherwise, it outputs zero.
Purpose:
o Introduces non-linearity into the model, allowing it to
learn complex patterns.
o Helps mitigate the vanishing gradient problem
commonly seen with sigmoid or tanh functions.
Characteristics:
o Sparsity: Activates only a portion of neurons, leading
to a sparse representation.
o Computational Efficiency: Simple to compute, making
it faster than other activation functions.
Implementation: Widely used in hidden layers of deep
learning models, especially in convolutional neural networks
(CNNs).
20
PROGRAMMING
Face Expression Classification using Custom Model
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Face Expression Classification using Transfer Learning
The transfer learning program is similar to the custom models program described earlier, with
the primary difference being the model architecture.
35
Performance Evaluation
36
37
38
Inference using Existing Model
Custom Model
39
40
Model from Transfer Learning
The program closely resembles the
one described earlier, with the key
difference being the need to modify
the model file declared in the main
function.
41
AMOUNT OF DATA FOR IMAGE CLASSIFICATION
1. Minimum Dataset Size
Small Datasets: For simple tasks or when using transfer learning with pre-trained models, you might
get away with as few as 100-1,000 images per class.
Moderate Datasets: For more complex tasks, aim for 1,000-10,000 images per class.
2. Ideal Dataset Size
Large Datasets: For robust performance, especially with deep learning models, having 10,000+ images
per class is ideal. Some successful models use hundreds of thousands of images.
3. Considerations
Class Imbalance: Ensure that you have a balanced number of images across classes to avoid bias.
Data Augmentation: Techniques like rotation, flipping, and scaling can effectively increase your
dataset size without needing more images.
Quality Over Quantity: High-quality, well-labeled images are more beneficial than a large number
of poorly labeled ones.
4. Benchmarking
Look at similar projects or datasets in your domain to gauge what has worked well for others.
5. Experimentation
Start with a smaller dataset and gradually increase it while monitoring model performance to find the
sweet spot for your specific application.
42
THANK YOU