Group Project - ML2
Group Project - ML2
GROUP PROJECT
SUBJECT: MACHINE LEARNING 2
Name of project:
Đà Nẵng, 02/2025
Convolutional Neural Network
Table of contents
Contents
1. Introduction..........................................................................................................................2
1.1 Overview of Artificial Intelligence...............................................................................2
1.2 Evolution of Neural Networks......................................................................................2
1.3 Introduction to Convolutional Neural Networks (CNNs)..........................................3
1.4 Importance of AI-CNN in Modern Technology.........................................................4
2. Fundamentals of Artificial Intelligence..............................................................................4
2.1 Definition and Scope of AI............................................................................................4
3.2 Types of Neural Networks............................................................................................5
3.2.1 Feedforward Neural Networks.............................................................................5
3.2.2 Recurrent Neural Networks (RNNs)....................................................................5
3.2.3 Convolutional Neural Networks (CNNs).............................................................6
2.3 Key Concepts in AI........................................................................................................6
2.3.1 Machine Learning......................................................................................6
2.3.2 Deep Learning.............................................................................................7
2.3.3 Reinforcement Learning.......................................................................8
2.4 Applications of AI in Various Industries.....................................................................8
3. Neural Networks: The Building Blocks of AI..................................................................10
3.1 Basic Structure of Neural Networks..........................................................................10
3.2 Types of Neural Networks..........................................................................................11
3.2.1 Feedforward Neural Networks........................................................11
3.2.2 Recurrent Neural Networks (RNNs).............................................12
3.2.3 Convolutional Neural Networks (CNNs)....................................14
3.3 Training Neural Networks..........................................................................................16
3.3.1 Backpropagation.....................................................................................16
3.3.2 Gradient Descent....................................................................................17
3.3.3 Overfitting and Regularization.......................................................20
4. Convolutional Neural Networks (CNNs).........................................................................21
4.1 Architecture of CNNs..................................................................................................21
4.1.1 Convolutional Layers..........................................................................................21
4.1.2 Pooling Layers..........................................................................................23
4.1.3 Fully Connected Layers.......................................................................25
4.2 Key Components of CNNs..........................................................................................27
4.2.1 Filters and Kernels.................................................................................27
1
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
2
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
1. Introduction
1.1 Overview of Artificial Intelligence
Artificial Intelligence (AI) is a dynamic and interdisciplinary branch of computer
science dedicated to developing systems that can perform tasks typically requiring human
intelligence. At its core, AI strives to mimic cognitive functions such as learning, reasoning,
problem-solving, understanding natural language, and perception. By leveraging algorithms
and vast datasets, AI systems can identify patterns, make decisions, and even improve over
time through experience—a process known as machine learning.
The transformative impact of AI is evident across numerous sectors. In healthcare,
AI-powered diagnostics and predictive analytics enhance patient care and streamline
treatment planning. In finance, algorithms assist in fraud detection, risk management, and
automated trading, while in transportation, autonomous vehicles rely on AI for navigation,
safety, and efficiency. Moreover, the entertainment industry utilizes AI to personalize content
recommendations, create realistic visual effects, and even generate music or art, pushing the
boundaries of creative expression.
AI continues to evolve, incorporating advances in neural networks, deep learning, and
reinforcement learning, all of which contribute to more sophisticated and capable systems. As
these technologies mature, ethical considerations such as transparency, fairness, and
3
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
4
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
medical image analysis and autonomous driving. Their ability to capture and recognize
patterns makes them indispensable in fields like facial recognition, handwriting analysis, and
even artistic style transfer. The combination of convolutional, pooling, and fully connected
layers allows CNNs to extract meaningful representations, making them a powerful tool for
deep learning applications in visual and spatial data analysis.
1.4 Importance of AI-CNN in Modern Technology
AI-driven Convolutional Neural Networks (AI-CNNs) have revolutionized the way
machines interpret and process visual data, leading to groundbreaking advancements across
various industries. By mimicking the way the human visual system perceives and understands
images, CNNs have enabled significant improvements in tasks such as facial recognition,
object detection, and image classification. From unlocking smartphones using facial
authentication to enhancing security surveillance systems, CNNs have become an integral
part of modern technological solutions.
Beyond personal devices, AI-CNNs play a crucial role in autonomous vehicles, where
they help identify pedestrians, traffic signals, and obstacles with remarkable accuracy. In
healthcare, CNN-powered models assist in medical imaging, detecting diseases such as
cancer and diabetic retinopathy with precision comparable to human experts. Additionally,
industries like agriculture leverage CNNs for crop monitoring and disease detection, while
retail companies use them for automated inventory management and customer behavior
analysis.
The ability of CNNs to learn complex patterns and features from vast amounts of data
has solidified their place as a fundamental tool in artificial intelligence. Their adaptability and
efficiency in extracting meaningful insights from visual information continue to drive
innovation, making them a cornerstone of modern AI applications. As research and
development in deep learning progress, CNNs are expected to further enhance automation,
efficiency, and decision-making across diverse domains.
5
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
The scope of AI extends across multiple domains, ranging from simple rule-based
automation to advanced neural networks capable of self-learning and adaptation. In the
business sector, AI enhances customer service through chatbots, streamlines operations with
predictive analytics, and personalizes user experiences with recommendation systems. In
healthcare, AI-driven algorithms assist in diagnosing diseases, predicting patient outcomes,
and optimizing treatment plans. Similarly, AI plays a critical role in finance, cybersecurity,
robotics, and autonomous systems, revolutionizing industries with its ability to process vast
amounts of data efficiently.
As AI continues to evolve, its applications expand into more complex and
interdisciplinary fields, including artificial general intelligence (AGI), which aims to create
machines capable of performing any intellectual task a human can. The advancements in AI
research and development continue to shape the future of technology, influencing how
humans interact with machines and how industries innovate to solve real-world challenges.
6
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
Feedforward neural networks are the simplest type of artificial neural network, where
information moves in a single direction from the input layer to the output layer without any
cycles or loops. They consist of an input layer, one or more hidden layers, and an output
layer. Each neuron in a layer is connected to neurons in the next layer, with each connection
assigned a weight that determines the importance of the input. These networks are widely
used for classification and regression tasks, such as image recognition and stock price
prediction. While they are relatively easy to implement and train, they do not retain memory
of previous inputs, making them unsuitable for tasks requiring sequential dependencies.
3.2.2 Recurrent Neural Networks (RNNs)
7
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
8
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
9
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
Machine Learning (ML) is a core subset of Artificial Intelligence (AI) that focuses on
developing algorithms enabling computers to learn from data and improve their performance
over time without being explicitly programmed. Unlike traditional software development,
where developers specify rules and conditions, ML systems identify patterns and
relationships in data to make predictions, classifications, or decisions. This ability makes ML
particularly valuable in dynamic environments where explicit programming for all possible
scenarios is impractical.
ML algorithms are typically categorized into three main types:
Supervised Learning: In this approach, algorithms are trained on labeled datasets,
where each input has a corresponding correct output. The system learns by comparing
its predictions with the actual outcomes and adjusting its parameters accordingly.
Applications include image recognition, spam filtering, and credit scoring.
Unsupervised Learning: Here, algorithms work with unlabeled data and aim to
discover hidden patterns or structures. Clustering and dimensionality reduction
techniques are common examples. This approach is often used in market
segmentation, anomaly detection, and recommendation systems.
10
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
11
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
Deep learning is a specialized subset of machine learning (ML) that utilizes artificial
neural networks with multiple layers—hence the term "deep"—to automatically learn and
model complex patterns in data. These deep neural networks are designed to mimic the way
the human brain processes information, allowing them to extract hierarchical features from
raw inputs. Deep learning has gained widespread popularity due to its ability to handle large
volumes of high-dimensional data with minimal manual feature engineering.
One of the key advantages of deep learning is its remarkable performance in tasks
such as image classification, natural language processing, and speech recognition.
Convolutional Neural Networks (CNNs), for example, have revolutionized computer vision
by enabling accurate object detection and facial recognition. Similarly, Recurrent Neural
Networks (RNNs) and Transformer-based models have significantly improved machine
translation and voice assistants.
The success of deep learning is largely driven by advancements in hardware (such as
GPUs and TPUs), the availability of large datasets, and sophisticated optimization
techniques. However, deep learning models often require substantial computational resources
and extensive training data, which can be a limitation in some applications.
Despite these challenges, deep learning continues to drive breakthroughs in artificial
intelligence, powering technologies like autonomous vehicles, medical diagnostics, and
personalized recommendations.
12
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
13
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
Healthcare
AI is playing a crucial role in advancing medical research, diagnostics, and patient
care. Machine learning models can analyze vast amounts of medical data to assist in early
disease detection, such as identifying cancerous tumors in medical imaging with high
accuracy. AI-powered drug discovery accelerates the development of new medications by
predicting molecular interactions, reducing the time and cost required for clinical trials.
Additionally, AI-driven chatbots and virtual health assistants provide personalized
recommendations and support telemedicine services, improving patient engagement and
accessibility to healthcare.
Finance
The financial sector leverages AI for risk management, fraud detection, and
algorithmic trading. AI-powered fraud detection systems analyze transaction patterns in real
time, identifying anomalies and preventing fraudulent activities. In investment banking and
asset management, AI algorithms optimize portfolio strategies and execute trades at high
speeds, enhancing market efficiency. Personalized financial assistants use AI to offer
customized budgeting, loan recommendations, and credit scoring, helping individuals make
informed financial decisions.
Transportation
AI is driving advancements in autonomous vehicles, smart traffic management, and
logistics optimization. Self-driving cars, powered by deep learning and computer vision, aim
to improve road safety and reduce human error. AI-based traffic management systems
analyze real-time data to optimize traffic flow, reducing congestion and emissions in urban
areas. In logistics, AI enhances route planning, demand forecasting, and supply chain
efficiency, leading to cost savings and faster deliveries.
Entertainment
The entertainment industry relies heavily on AI for content recommendation, creation,
and personalization. Streaming platforms like Netflix, Spotify, and YouTube use AI
algorithms to analyze user preferences and provide tailored content suggestions. AI-driven
content generation tools assist in scriptwriting, music composition, and video editing,
enabling creators to produce high-quality content more efficiently. Virtual influencers and
AI-generated characters are also becoming increasingly popular, redefining digital
entertainment experiences.
AI's influence continues to expand, with applications emerging in retail (personalized
shopping), manufacturing (predictive maintenance), agriculture (crop monitoring, automated
harvesting), and many other sectors. As AI technology evolves, its potential to enhance
efficiency, decision-making, and innovation across industries remains boundless.
14
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
15
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
16
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
FNNs are trained using supervised learning, where labeled input data is used to
adjust the network's weights. The training process involves:
1. Forward Propagation: The input data is passed through the network, layer by layer,
until an output is produced.
2. Loss Calculation: A loss function measures the difference between the predicted
output and the actual target.
3. Backpropagation: The error is propagated backward through the network to update
the weights using an optimization algorithm such as Gradient Descent or its variants
(e.g., Adam, RMSprop).
4. Iteration: The network iterates through multiple training cycles (epochs) to minimize
the loss and improve accuracy.
Applications of Feedforward Neural Networks
FNNs are widely used for:
Classification: Identifying patterns and categorizing data, such as email spam
detection or image recognition.
Regression: Predicting continuous values, such as stock prices or house prices.
Function Approximation: Modeling complex mathematical functions and decision
boundaries.
While FNNs are effective for many tasks, they have limitations in handling sequential
data or capturing temporal dependencies. More advanced architectures, such as Recurrent
Neural Networks (RNNs) and Convolutional Neural Networks (CNNs), address these
challenges in specific applications. Nonetheless, feedforward neural networks remain a
foundational model in deep learning and serve as the building blocks for more complex
architectures.
3.2.2 Recurrent Neural Networks (RNNs)
Recurrent Neural Networks (RNNs) are a class of neural networks specifically
designed to handle sequential data by maintaining a hidden state that captures information
about previous inputs. Unlike Feedforward Neural Networks (FNNs), which process each
input independently, RNNs introduce temporal dependencies, making them well-suited for
tasks involving time series, language modeling, and speech recognition.
17
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
Structure of RNNs
An RNN consists of:
1. Input Layer: Receives sequential data, such as words in a sentence or time steps in a
signal.
2. Hidden Layer with Recurrent Connections: Each neuron not only processes input
from the current time step but also retains information from previous time steps via
recurrent connections. The hidden state is updated at each step using:
ht=f(Wxxt+Whht−1+b), where:
o ht is the hidden state at time step ttt,
o xt is the current input,
o Wx and Wh are weight matrices,
o b is the bias,
o f is the activation function (commonly tanh or ReLU).
3. Output Layer: Generates predictions based on the final hidden state or outputs at
each time step, depending on the task.
Training RNNs
RNNs are trained using Backpropagation Through Time (BPTT), a variation of
backpropagation that accounts for dependencies across time steps. However, training RNNs
can be challenging due to:
Vanishing Gradient Problem: Gradients shrink over long sequences, making it hard
for the model to remember distant dependencies.
Exploding Gradient Problem: Large gradients can cause instability in weight
updates.
To address these issues, advanced RNN architectures have been developed, including:
18
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
6. Output Layer: Produces the final output, such as class labels in image classification
tasks.
Key Operations in CNNs
Convolution: The core operation in CNNs, where a small filter slides over the input
data and computes dot products, detecting patterns such as edges, corners, and
textures.
Stride & Padding: Stride controls how much the filter moves per step, while padding
ensures spatial dimensions remain consistent.
Pooling: Reduces spatial dimensions while retaining essential features, helping
prevent overfitting.
Advantages of CNNs
Translation Invariance: Detects objects regardless of their position in the image.
Automatic Feature Extraction: Learns hierarchical features without manual
engineering.
Parameter Efficiency: Uses fewer parameters than fully connected networks,
improving scalability.
Applications of CNNs
CNNs are widely used in:
Image Classification: Object detection (e.g., recognizing cats vs. dogs), medical
imaging (e.g., tumor detection).
Facial Recognition: Identifying individuals in photos and videos.
Autonomous Vehicles: Detecting obstacles, lane markings, and pedestrians.
Medical Diagnosis: Analyzing X-rays, MRIs, and CT scans.
Video Analysis: Action recognition and scene understanding in videos.
CNNs have revolutionized computer vision and remain the foundation for modern
deep learning architectures, including advanced models like ResNet, VGG, Inception, and
EfficientNet. With their ability to learn complex patterns from raw data, CNNs continue to
power breakthroughs in AI-driven visual understanding.
3.3 Training Neural Networks
3.3.1 Backpropagation
Backpropagation (short for "backward propagation of errors") is the fundamental
algorithm used to train artificial neural networks. It is a supervised learning method that
enables networks to learn from errors by adjusting their weights to minimize the loss
function.
20
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
21
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
22
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
2. Compute Loss: The loss function LLL (e.g., Mean Squared Error, Cross-Entropy) is
evaluated based on the model’s predictions.
3. Compute Gradients: The gradient of the loss function with respect to each weight w
is calculated:
∂L
∂w
4. Update Weights: Weights are adjusted using the update rule:
∂L
w:=w−η.
∂w
where:
o η is the learning rate, determining the step size.
∂L
o is the gradient that points in the direction of the steepest increase in loss.
∂w
5. Repeat Until Convergence: Steps 2–4 are repeated iteratively until the loss reaches a
minimum or stops improving.
23
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
24
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
Overfitting occurs when a model learns the training data too well, capturing not only
the underlying patterns but also noise and outliers. As a result, the model performs
exceptionally well on training data but fails to generalize to unseen data, leading to poor
performance on the test set. Overfitting is a common issue in deep learning, especially when
dealing with complex models and limited training data.
Causes of Overfitting
Insufficient Training Data: When the dataset is too small, the model memorizes
specific examples rather than learning general patterns.
Excessive Model Complexity: Deep networks with too many parameters can fit the
training data perfectly but fail to generalize.
Lack of Regularization: Without constraints, the model can assign high weights to
specific features, making it sensitive to minor variations in input data.
Regularization Techniques
Regularization methods help prevent overfitting by limiting the model’s complexity
or modifying the training process to encourage generalization.
1. Dropout
o Randomly disables a fraction of neurons during training to prevent the
network from relying too much on specific features.
o Helps in creating a more robust model that generalizes better.
2. Weight Decay (L2 Regularization)
o Adds a penalty term to the loss function based on the magnitude of the
model’s weights.
o Encourages the network to learn smaller weights, reducing sensitivity to noise.
3. Early Stopping
25
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
o Monitors validation performance and stops training when the model starts to
overfit.
o Prevents unnecessary training that could lead to memorization of the data.
o
4. Data Augmentation
o Increases dataset diversity by applying transformations like rotation, scaling,
and flipping (for images) or text modifications (for NLP tasks).
o Helps the model learn more generalizable patterns.
5. Batch Normalization
o Normalizes activations within each mini-batch, stabilizing training and
reducing dependency on specific input distributions.
6. Ensemble Methods
o Combines predictions from multiple models (e.g., bagging, boosting) to
improve generalization and reduce variance.
26
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
27
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
Filter Size: Determines how much of the input is covered at once (e.g., 3×3 or 5×5).
Stride: Defines how much the filter moves per step (higher stride reduces feature map
size).
Padding:
o Same Padding (Zero Padding): Preserves spatial dimensions by adding extra
borders around the input.
o Valid Padding: No padding, resulting in a smaller output feature map.
28
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
o This helps retain the most prominent features, such as edges or textures, and
reduces the dimensionality.
Example:
For a 2x2 max pooling operation:
[ ]
1 3
2 4
[ 12 34]
The output would be 2.5, the average value of the 2x2 region.
29
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
30
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
Fully connected layers are crucial for producing high-level abstractions in deep learning
models, enabling the integration of features learned in previous layers to make final
predictions or decisions.
.
31
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
32
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
Proper selection of stride and padding is critical in CNN design. Using padding allows
deeper networks to maintain spatial information, while adjusting stride helps control
computational efficiency and feature extraction granularity.
33
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
34
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
Choosing the right activation function depends on the problem at hand. ReLU is
widely used in hidden layers due to its efficiency, while sigmoid and softmax are preferred in
output layers for classification tasks.
4.3 Training CNNs
4.3.1 Data Augmentation
Data augmentation is a technique used to artificially expand the training dataset by
applying various transformations to the existing data. This enhances the model’s ability to
generalize by making it more robust to variations in real-world data. By introducing modified
versions of the input samples, data augmentation helps prevent overfitting, especially when
the original dataset is limited in size.
Common Data Augmentation Techniques
For Image Data:
1. Geometric Transformations
o Rotation: Randomly rotating images within a specified range (e.g., ±30°).
o Scaling: Enlarging or shrinking images while maintaining aspect ratio.
o Translation: Shifting images horizontally or vertically.
o Flipping: Horizontally or vertically mirroring images to simulate different
viewpoints.
2. Color and Lighting Adjustments
o Brightness Adjustment: Modifying the brightness of images.
o Contrast Adjustment: Enhancing or reducing image contrast.
o Color Jittering: Randomly altering hue, saturation, or intensity of colors.
3. Noise and Distortions
o Gaussian Noise: Adding random noise to simulate real-world variations.
o Blurring: Applying Gaussian blur to simulate focus variations.
o Cutout/Masking: Randomly masking parts of an image to force the model to
focus on less obvious features.
For Text Data:
Synonym Replacement: Replacing words with their synonyms.
Back Translation: Translating a sentence to another language and back.
Sentence Shuffling: Rearranging words or phrases while maintaining meaning.
For Audio Data:
Time Stretching: Speeding up or slowing down the audio.
35
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
36
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
37
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
Transfer learning is widely used in AI applications where labeled data is limited, making it a
powerful tool for building high-performance models efficiently.
4.3.3 Hyperparameter Tuning
Hyperparameter tuning is the process of selecting the best values for hyperparameters
to optimize a model’s performance. Unlike model parameters (such as weights and biases)
that are learned during training, hyperparameters are set before training begins and directly
affect how the model learns. Proper tuning is crucial for achieving high accuracy and
preventing issues like underfitting or overfitting.
Common Hyperparameters
1. Learning Rate (α\alphaα)
o Controls how much the model updates weights during training.
o Too high → Model may not converge.
o Too low → Training may be slow or stuck in local minima.
2. Batch Size
o Determines how many samples are processed before updating the model’s
parameters.
o Small batch size: More updates, higher variance, better generalization but
slower training.
o Large batch size: Faster training but may lead to poor generalization.
3. Number of Layers & Neurons
o More layers and neurons allow the model to learn complex features but
increase the risk of overfitting.
o A balance between depth and regularization is necessary.
4. Dropout Rate
o The probability of randomly disabling neurons during training to prevent
overfitting.
5. Weight Decay (L2 Regularization)
o Adds a penalty to large weights, encouraging simpler models that generalize
better.
Hyperparameter Tuning Techniques
1. Grid Search
o Tests all possible combinations of hyperparameter values within a predefined
range.
o Computationally expensive but guarantees finding the best combination within
the grid.
38
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
2. Random Search
o Randomly samples hyperparameter values from a given range.
o More efficient than grid search as it explores a diverse set of values.
3. Bayesian Optimization
o Uses probabilistic models to find the optimal hyperparameters with fewer
evaluations.
o More efficient than brute-force methods like grid search.
4. Hyperband
o A resource-efficient method that adaptively allocates more computation to
promising hyperparameter configurations while discarding poor ones early.
5. Automated Machine Learning (AutoML)
o Tools like Google AutoML and Optuna automate hyperparameter tuning using
advanced search algorithms.
Best Practices for Hyperparameter Tuning
Start with reasonable default values based on prior research.
Use a validation set to evaluate different hyperparameter combinations.
Combine manual tuning with automated search methods for efficiency.
Use early stopping to avoid excessive tuning on suboptimal configurations.
Hyperparameter tuning is essential for maximizing a model’s performance, ensuring a
balance between learning speed, accuracy, and generalization.
6. Applications of CNNs in AI
39
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
40
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
41
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
42
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
43
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
threats. These networks process both spatial and temporal information—often through
specialized architectures like 3D CNNs or two-stream networks—thereby capturing dynamic
motion patterns that are critical for accurate activity recognition.
In the context of sports analytics, CNNs offer powerful tools for dissecting game footage by
tracking players' movements, identifying strategic plays, and highlighting key moments
during competitions. This not only helps in improving team strategies and player
performance but also enriches the viewer experience by providing in-depth analysis and real-
time insights.
Beyond these applications, the adaptability of CNNs makes them suitable for a wide
range of video analysis tasks, including gesture recognition in interactive systems and
behavior monitoring in various settings. As CNN technology continues to evolve, its impact
on video analysis and action recognition is expected to expand, driving further innovation in
both surveillance and sports analytics, as well as in emerging fields that rely on detailed
motion analysis.
o3-mini
44
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
During the training process, CNNs typically rely on advanced hardware like GPUs,
TPUs, or even distributed computing systems to handle the intensive computations. This
reliance can result in lengthy training times and significant energy consumption, posing
challenges for rapid prototyping and experimentation, especially in environments with
limited resources. Similarly, when it comes to real-time applications such as mobile devices
or embedded systems, the heavy computational load can introduce latency and increase
power usage, hindering performance.
To address these challenges, researchers are actively exploring various optimization
techniques, including model compression, quantization, and pruning, as well as the design of
more efficient network architectures. These methods aim to reduce the computational
footprint of CNNs without sacrificing accuracy, thereby making them more accessible for
deployment in resource-constrained settings. Despite these advances, balancing model
performance with computational efficiency remains a critical area of ongoing research,
underscoring the need for continued innovation in both hardware and algorithm design.
45
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
46
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
47
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
48
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
7.3 VGGNet
VGGNet, developed by the Visual Geometry Group at the University of Oxford, is
known for its simple yet powerful design. It uses small 3x3 convolutional filters stacked in
multiple layers, allowing the network to learn more complex features while maintaining
computational efficiency. The most well-known variants, VGG16 and VGG19, contain 16
and 19 layers, respectively. Despite being computationally expensive and requiring
significant memory due to its large number of parameters, VGGNet has been widely used for
transfer learning in various computer vision applications.
7.4 GoogLeNet (Inception)
GoogLeNet, introduced by Google researchers in 2014, introduced the Inception
module, a novel approach that applies multiple convolutional filter sizes within the same
layer. This multi-scale feature extraction technique helps the network capture fine and coarse
details simultaneously. The architecture also reduces computational costs by using 1x1
convolutions for dimensionality reduction before applying larger filters. With 22 layers,
GoogLeNet achieved high accuracy while maintaining efficiency, setting a new standard for
deep network designs.
7.5 ResNet
ResNet (Residual Network), introduced by Microsoft Research in 2015, addressed
one of the key challenges in deep learning: the vanishing gradient problem. By incorporating
residual connections, or "skip connections," ResNet allows gradients to flow directly through
layers, enabling the training of very deep networks, such as ResNet-50 and ResNet-152.
These residual connections help networks learn identity mappings, improving convergence
and accuracy. ResNet's innovation has influenced many subsequent architectures and remains
widely used in deep learning research and applications.
7.6 DenseNet
DenseNet (Densely Connected Convolutional Network) builds upon the idea of
residual connections by introducing dense connectivity. Unlike ResNet, which connects
layers through skip connections, DenseNet connects each layer to every other layer in a feed-
forward fashion. This architecture promotes feature reuse, reduces redundancy, and improves
gradient flow, making training more efficient. DenseNet requires fewer parameters compared
to traditional deep networks while achieving high performance in image classification tasks.
7.7 EfficientNet
EfficientNet, introduced by Google Brain in 2019, focuses on optimizing network
scaling. Unlike traditional architectures that scale depth, width, or resolution arbitrarily,
EfficientNet introduces a compound scaling method that uniformly scales all three
dimensions in a balanced way. This approach enables the model to achieve state-of-the-art
accuracy with fewer parameters and lower computational costs. EfficientNet has been widely
adopted for real-world applications where efficiency and performance are critical, such as
mobile vision tasks and embedded systems.
49
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
50
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
51
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
52
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
53
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
54
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
governments employ AI-driven facial recognition for crime prevention and forensic
investigations. However, concerns regarding privacy, bias, and ethical implications have led
to increased scrutiny and calls for regulation. Researchers are working on developing more
transparent and fair facial recognition systems to balance security with individual rights.
10.5 Case Study: CNNs in Agriculture
In precision agriculture, CNNs help optimize farming practices by analyzing drone
imagery to monitor crop health, detect pests, and assess soil conditions. AI-driven
agricultural platforms use CNN-based models to identify plant diseases early, enabling
targeted interventions that reduce pesticide use and increase crop yield. Startups like PEAT
and PlantVillage have developed mobile applications that allow farmers to diagnose plant
conditions simply by taking a picture, making AI-powered agriculture accessible even in
rural areas. These advancements contribute to sustainable farming practices, helping address
food security challenges in an era of climate change.
CNNs continue to revolutionize industries by providing highly accurate, efficient, and
scalable AI-driven solutions across diverse real-world applications. As deep learning research
progresses, the impact of CNNs is expected to grow, unlocking new possibilities for
intelligent automation and innovation.
11. Conclusion
11.1 Summary of Key Points
This report has provided a comprehensive overview of Artificial Intelligence (AI) and
Convolutional Neural Networks (CNNs), covering their fundamentals, architectures,
applications, challenges, and future directions. CNNs have become a cornerstone of modern
AI, enabling breakthroughs in various fields, including healthcare, autonomous systems,
security, and creative industries. From early architectures like LeNet and AlexNet to more
advanced models like ResNet, EfficientNet, and Transformers, CNNs have evolved
significantly, leading to state-of-the-art performance in numerous real-world applications.
Additionally, the integration of CNNs with emerging technologies such as quantum
computing, federated learning, and Edge AI highlights their growing importance in the AI
landscape.
11.2 The Impact of AI-CNN on Society
AI-CNN has had a profound impact on society, transforming industries and improving
quality of life through innovations in medical diagnostics, autonomous vehicles, security
systems, and personalized technology. However, alongside these advancements come
important ethical and societal concerns, including issues related to privacy, bias,
transparency, and job displacement. The widespread use of facial recognition and
surveillance systems, for example, raises significant questions about data security and
individual rights. Furthermore, the increasing automation of jobs necessitates discussions on
retraining workforces and ensuring equitable AI-driven development. As AI-CNN continues
to shape the modern world, addressing these challenges proactively will be essential to
maximizing its benefits while mitigating risks.
55
Le Minh Hieu – Le Tuan Thang
Convolutional Neural Network
12. References
12.1 Academic Papers and Journals
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-
444.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with
deep convolutional neural networks. Advances in neural information processing
systems, 25, 1097-1105.
12.2 Books and Textbooks
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
Chollet, F. (2017). Deep Learning with Python. Manning Publications.
12.3 Online Resources and Tutorials
TensorFlow Tutorials: https://www.tensorflow.org/tutorials
PyTorch Tutorials: https://pytorch.org/tutorials/
Deep Learning Specialization by Andrew
Ng: https://www.coursera.org/specializations/deep-learning
56
Le Minh Hieu – Le Tuan Thang