0% found this document useful (0 votes)

7 views19 pages

Aicw

The document details coursework on AI and applied machine learning, focusing on image classification using Convolutional Neural Networks (CNNs) and reinforcement learning methods for solving a Gridworld problem. It discusses the design, implementation, and evaluation of a CNN model, highlighting issues like overfitting and class imbalance, and presents improvements through techniques like data augmentation and regularization. Additionally, it compares Value Iteration and Q-Learning in reinforcement learning, demonstrating Q-Learning's superior performance in finding optimal policies.

Uploaded by

k.ackerman124

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views19 pages

Aicw

Uploaded by

k.ackerman124

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

COP528 AI and Applied Machine Learning Coursework

Task 1:
Introduction
Machine learning has become a transformative technology with applications in computer
vision, NLP, and healthcare. A key application is image classification, which categorizes
images based on visual content, with real-world uses in autonomous vehicles, medical
imaging, and object recognition. This coursework focuses on image classification using a
dataset containing objects like parachutes, oil boxes, and trucks. We develop a
Convolutional Neural Network (CNN) to accurately classify images, leveraging CNNs' ability
to extract hierarchical features. We also explore data augmentation and model
enhancements to improve performance. The report covers the problem background, CNN
model design, experimental evaluation, and key findings, demonstrating a practical
application of machine learning.

Approach
A Convolutional Neural Network (CNN) was chosen for image classification due to its ability
to automatically learn spatial hierarchies of features from raw pixel data, making it highly
effective for object recognition.

Justification for Using CNNs

 Feature Extraction: CNNs eliminate the need for manual feature engineering by
learning relevant features directly from images.

 Spatial Invariance: Pooling layers enable recognition of objects regardless of

position or orientation.

 Hierarchical Learning: CNNs capture low-level features (edges, textures) in early

layers and high-level features (object parts, full objects) in deeper layers, aligning
well with the dataset’s complexity.

Key Design Choices

 Input Size (224x224x3): Standardized size for consistency and efficient feature
extraction; RGB channels retained for colour information.

 Convolutional Layers:

o Conv1 (32 filters, 3x3, ReLU): Captures low-level features (edges, textures).

o Conv2 (64 filters, 3x3, ReLU): Extracts higher-level features (object parts).

 Pooling Layers (MaxPooling2D, 2x2): Reduces spatial dimensions, lowers

computational cost, and ensures translation invariance.

 Flatten Layer: Converts 2D feature maps into a 1D vector for classification.

 Fully Connected Layers:

o Dense (128 units, ReLU): Learns complex feature combinations.

o Output Layer (num_classes, linear activation): Produces logits for

classification.

 Loss Function (Sparse Categorical Crossentropy): Suitable for multi-class

classification with integer labels.

 Optimizer (Adam): Efficient learning with adaptive rates and momentum for faster
convergence.

Table 1. Layer type, it’s parameter and the shape of its output

Evaluation protocol:
1. Dataset Splitting: The dataset was already split into training (for learning) and
validation (for tuning hyperparameters) sets, ensuring performance is assessed on unseen
data to prevent overfitting.

2. Performance Metrics: The following metrics were used to evaluate the model:

 Accuracy: The proportion of correctly classified. Primary metric for assessing overall
performance.
 Loss: Tracks how well predictions align with true labels.
 Confusion Matrix: Identifies misclassified classes.
 Classification Report: Provides precision, recall, and F1-score for a detailed class-
wise evaluation.

Accuracy alone can be misleading, so additional metrics ensure a comprehensive

assessment.

3. Learning Curves
 Training vs. Validation Loss

 Training vs. Validation Accuracy

This Detects overfitting (validation loss increasing while training loss decreases) or
underfitting (both losses remain high).

Experiment

Figure 1 - Accuracy and loss of each epoch for both training and validation

Figure 2 - Epoch loss curve for visualisation

Figure 3 – Epoch accuracy curve for visualisation

Figure 4 - Classification Report

Key Observations:

 The model achieved high training accuracy but struggled to generalize to the
validation set, as evidenced by the increasing validation loss and stagnant validation
accuracy.

 Overfitting became evident after the second epoch, as the model memorized the
training data instead of learning generalizable features.
2. Classification Report Key Observations:

 The model performed best on class n03888257 (F1-score: 0.68) and worst on
class n03000684 (F1-score: 0.34).

 The low precision and recall for some classes (e.g., n03000684) suggest that the
model struggled to distinguish these classes from others.

Discussion of Findings

 Overfitting: The model exhibited clear signs of overfitting, as evidenced by the high
training accuracy and low validation accuracy. This suggests that the model
memorized the training data instead of learning generalizable features.

 Class Imbalance: The variability in precision, recall, and F1-scores across classes
indicates potential class imbalance or insufficient representation of certain classes in
the training data.

 Model Complexity: The CNN architecture, while effective, may be too complex for
the dataset, leading to overfitting. Simplifying the model or adding regularization
techniques (e.g., dropout, weight decay) could improve generalization.
Figure 5 – Confusion matrix at each class

Further improvements to models’ performance

To enhance CNN performance, data augmentation, Dropout, L2 regularization, and early stopping
were implemented. Each method is justified and its impact analysed.

1. Data Augmentation

Enhancements:

 RandomFlip (horizontal/vertical) – simulates orientation changes.

 RandomRotation (±20%) – accounts for object rotation.

 RandomZoom (±20%) – adjusts scale variations.

Justification: Expands dataset diversity, improving generalization by exposing the model to varied
input conditions.

Impact: Reduced overfitting, improved validation accuracy, and enhanced robustness to image
variations.
2. Dropout Layers

Implementation: Added a Dropout layer (0.5) after the dense layer, randomly deactivating 50% of
neurons during training.

Impact: Improved generalization, with a smaller training-validation accuracy gap.

3. L2 Regularization

Implementation: Applied L2 regularization (λ=0.001) to convolutional and dense layers, penalizing

large weights.

Impact: Lower validation loss, indicating improved generalization.

4. Early Stopping

Implementation: Used Early Stopping (monitoring validation loss) with patience = 3 and best weight
restoration.

Impact: Balanced training time and model performance, avoiding overfitting.

Experiment evaluation for improved version

Figure 6 - Accuracy and loss of each epoch for both training and validation of enhanced method

The results from the improved CNN model show significant changes compared to the previous model.

 The previous model achieved a training accuracy of 97.26% but a validation accuracy of only 55.95%,
with a large gap between training and validation metrics, indicating severe overfitting.

 The improved model achieved a lower training accuracy (65.23%) but a higher validation accuracy
(58.37%), with a smaller gap between training and validation metrics. This suggests that the
regularization techniques (dropout, L2 regularization) and data augmentation effectively reduced
overfitting.
Figure 7 -Loss vs Epoch curve for improved version

2. Classification Report

 Precision: Ranged from 0.44 (n03425413) to 0.94 (n02102040), showing improved precision
for some classes compared to the previous model.

 Recall: Ranged from 0.36 (n02102040) to 0.81 (n03888257), indicating better recall for
certain classes.

 F1-score: Ranged from 0.41 (n03000684) to 0.73 (n01440764), reflecting a better balance
between precision and recall for most classes.

 Overall Accuracy: The model achieved an accuracy of 58% on the validation set, slightly
higher than the previous model's 56%.

Comparison to Previous Model:

 The previous model had an F1-score range of 0.34 to 0.68, while the improved model
achieved a range of 0.41 to 0.73, indicating better overall performance.

 The improved model showed higher precision and recall for several classes, such
as n01440764 (F1-score: 0.73) and n03888257 (F1-score: 0.69), compared to the previous
model.
Figure 8 – Accuracy vs Epoch curve for improved version

Figure 9 – Classification report for improved version

Figure 10 – Confusion matrix for improved version

Discussion of Findings

 Reduced Overfitting: The improved model showed a smaller gap between training and
validation metrics, indicating that the regularization techniques and data augmentation
effectively reduced overfitting.

 Better Generalization: The improved model achieved higher validation accuracy (58.37%)
compared to the previous model (55.95%), demonstrating better generalization to unseen
data.

 Class-Specific Performance: The improved model showed higher precision, recall, and F1-
scores for several classes, indicating that it learned more robust features and performed
better on challenging classes.

 Trade-offs: While the improved model achieved better generalization, it required more epochs
to converge and had a lower training accuracy, reflecting the trade-off between model
complexity and generalization.

Future work could focus on further fine-tuning hyperparameters, exploring advanced augmentation
techniques, or using more complex architectures to achieve even better results.
Task 2:

Introduction
This task involves solving a Gridworld problem using two reinforcement learning methods: Value
Iteration and Q-Learning. The Gridworld contains walls (w), obstacles (o), and a goal (g). The agent
starts at the top-left corner and must navigate to the goal while avoiding obstacles. The goal is to find
an optimal policy that maximizes cumulative rewards.

 Value Iteration: A model-based method that computes the optimal value function and derives
the policy. It requires knowledge of the environment's dynamics.

 Q-Learning: A model-free method that learns the optimal policy by updating a Q-table through
exploration and exploitation. It does not require prior knowledge of the environment.

Both methods are applied to the Gridworld, and their performance is evaluated based on their ability
to find the optimal policy.

Methods
1. Value Iteration

How It Works: Value Iteration is an iterative algorithm that computes the optimal value
function V∗(s)V∗(s) for each state ss. The value function represents the expected cumulative reward
when starting from state ss and following the optimal policy. The algorithm updates the value function
using the Bellman Optimality Equation.

Implementation Details:

1. Initialization: The value function VV is initialized to zero for all states.

2. Iteration: The value function is updated iteratively until convergence (when the change
in VV is below a small threshold ϵ=10−7ϵ=10−7).

3. Policy Extraction: Once the value function converges, the optimal policy ππ is derived by
selecting the action that maximizes the expected cumulative reward for each state.

Justification:

 Value Iteration is chosen because it is a model-based method that guarantees convergence

to the optimal policy. It is suitable for environments where the transition probabilities and
rewards are known.

2. Q-Learning

How It Works: Q-Learning is a model-free RL method that learns the optimal policy by iteratively
updating a Q-table. The Q-table Q(s,a)Q(s,a) represents the expected cumulative reward for taking
action aa in state ss and following the optimal policy thereafter. The algorithm uses the Bellman
Equation to update the Q-values.

Implementation Details:

1. Initialization: The Q-table is initialized to zero for all state-action pairs.

2. Exploration vs. Exploitation: The agent follows an epsilon-greedy policy to balance

exploration and exploitation. With probability ϵ=0.4ϵ=0.4, the agent takes a random action
(exploration); otherwise, it takes the action with the highest Q-value (exploitation).
3. Q-Value Update: The Q-values are updated iteratively using the observed rewards and the
maximum Q-value of the next state.

4. Policy Extraction: After training, the optimal policy ππ is derived by selecting the action with
the highest Q-value for each state.

Justification:

 Q-Learning is chosen because it is a model-free method that does not require prior
knowledge of the environment's dynamics. It learns the optimal policy through trial and error,
making it suitable for environments where the transition probabilities and rewards are
unknown.

 The epsilon-greedy policy ensures a balance between exploration and exploitation, allowing
the agent to discover the optimal policy while avoiding suboptimal solutions.

Experiments

Figure 11 – Value iteration policy graph

Figure 12 – Q-learning policy graph

1. Value Iteration

 Convergence: The Value Iteration algorithm converged in 250 iterations with a convergence
time of 0.0836 seconds.

 Policy: The derived policy was visualized on the Gridworld, showing the optimal path from the
start to the goal while avoiding obstacles and walls.

 Performance: The average reward over 100 episodes was -125.00, indicating that the policy
needs further refinement to improve performance.

2. Q-Learning

 Convergence: The Q-Learning algorithm converged after 10,000 episodes with a

convergence time of 22.1670 seconds.

 Policy: The derived policy was visualized on the Gridworld, showing the optimal path from the
start to the goal while avoiding obstacles and walls.

 Performance: The average reward over 100 episodes was 67.00, demonstrating better
performance compared to Value Iteration.
Table 2 – Method comparison

Analysis:

 Convergence Time: Value Iteration converged significantly faster than Q-Learning. This is
expected because Value Iteration is a model-based method that directly computes the optimal
value function using known environment dynamics.

 Policy Quality: Q-Learning achieved a higher average reward (67.00) compared to Value
Iteration (-125.00). This suggests that Q-Learning's exploration strategy (epsilon-greedy)
allowed it to discover a more effective policy.

 Exploration: Q-Learning's ability to explore the environment and learn from interactions
made it more robust in finding an optimal policy, especially in complex or unknown
environments.

 Performance: The negative average reward for Value Iteration indicates that the derived
policy may not be optimal or that the reward structure needs adjustment. In contrast, Q-
Learning's positive average reward demonstrates its effectiveness in maximizing cumulative
rewards.

Discussion

Value Iteration converged quickly (250 iterations, 0.0836s) but yielded a suboptimal policy with an
average reward of -125. The policy graph suggests ineffective obstacle avoidance, likely due to
misaligned rewards or transition probabilities. Adjusting the reward function, such as increasing
penalties for moving toward obstacles, could improve performance.

Q-Learning, though slower, achieved a significantly higher average reward of 67.00. Its policy graph
shows better pathfinding, with the agent reliably avoiding obstacles and reaching the goal. This
improvement stems from Q-Learning’s model-free learning and epsilon-greedy exploration, allowing
for better policy discovery. The cumulative reward graph confirms steady learning progress. In
summary, Q-Learning outperformed Value Iteration in this task, achieving better rewards through
exploration and adaptability. While Value Iteration's speed makes it suitable for well-defined problems,
its reliance on accurate modelling can limit performance, as seen in the suboptimal policy graph.
Future work could refine the reward structure for Value Iteration and optimize Q-Learning's
hyperparameters for further improvements.

Part C:
To investigate the effects of the discount factor (γ) and exploration rate (ε) on the performance of Q-
learning, a systematic experimental approach was implemented. The goal was to analyse how
different combinations of these parameters influence the agent's learning process, policy quality, and
overall

1. Parameter Selection

 Discount Factor (γ): Five values of γ were tested: [0.1, 0.5, 0.7, 0.9, 0.99]. These values
represent a range from short-term to long-term reward focus.

 Exploration Rate (ε): Five values of ε were tested: [0.1, 0.3, 0.5, 0.7, 0.9]. These values
represent a range from low to high exploration.

2. Q-learning Implementation

For each combination of γ and ε, the Q-learning algorithm was executed with the initialization and
training Loop

1. Metrics Calculation:

o Convergence Time: The time taken to complete 10,000 episodes.

o Average Reward: The mean cumulative reward across all episodes.

o Success Rate: The percentage of episodes where the agent reached the goal.

o Average Episode Length: The mean number of steps taken per episode.

o Final Q-value Variance: The variance of the Q-values at the end of training, indicating
the stability of the learned policy.

3. Visualisation

 Success Rate vs. Episodes: The cumulative success rate over episodes was plotted for each
combination of γ and ε to observe how quickly the agent learned to reach the goal.

 Average Episode Length vs. Episodes: The average episode length over episodes was
plotted to analyse how efficiently the agent navigated the Gridworld.
Table 3 - numeric results for effects analysis

1. Effect of Discount Factor (γ)

The discount factor determines the importance of future rewards. A higher γ values future rewards
more heavily, while a lower γ focuses on immediate rewards.

 Low γ (0.1): Poor performance across all ε values, with average rewards ranging from -
175.89 to -131.58 and success rates below 0.11%. A low γ causes the agent to prioritize
immediate rewards, which is ineffective in environments where long-term planning is required
to reach the goal. The agent fails to learn a meaningful policy.

 Moderate γ (0.5, 0.7, 0.9): Significant improvement in performance, especially for low ε
values. For example, with γ=0.5 and ε=0.1, the average reward is 25.47, and the success rate
is 79.78%. A moderate γ balances immediate and future rewards, enabling the agent to learn
effective policies. The agent can navigate the Gridworld efficiently, as shown by the higher
success rates and lower episode lengths.

 High γ (0.99): Like moderate γ values, with high success rates (e.g., 80.34% for γ=0.99 and
ε=0.1) and low episode lengths. A high γ emphasizes long-term rewards, which is beneficial in
this environment. However, the performance is comparable to moderate γ values, suggesting
diminishing returns for very high γ.
Table 4 – numeric results for effects analysis

Effect of Exploration Rate (ε)

The exploration rate controls the balance between exploration (trying new actions) and exploitation
(choosing the best-known action).

 Low ε (0.1): High success rates (e.g., 79.78% for γ=0.5 and ε=0.1) and low episode lengths
(e.g., 35.09 steps). A low ε prioritizes exploitation, allowing the agent to follow the best-known
policy. This works well when the agent has already learned a good policy but may fail if the
initial policy is poor.

 Moderate ε (0.3, 0.5): Mixed results. For γ=0.5, ε=0.3 yields a success rate of 44.18%, while
ε=0.5 yields 14.76%. Moderate ε balances exploration and exploitation. While it helps the
agent discover better policies, excessive exploration (e.g., ε=0.5) can reduce performance by
diverting the agent from optimal paths.

 High ε (0.7, 0.9): Poor performance, with success rates close to 0% and high episode
lengths. High ε prioritizes exploration, causing the agent to take random actions frequently.
This prevents the agent from converging to an optimal policy, as it spends too much time
exploring suboptimal paths.

Interaction Between γ and ε

 Low γ and High ε: The worst performance is observed, as the agent focuses on immediate
rewards and explores excessively, failing to learn a meaningful policy.

 Moderate/High γ and Low ε: The best performance is achieved, as the agent balances long-
term rewards with exploitation of the learned policy.

 Moderate/High γ and High ε: Performance degrades due to excessive exploration, even

though the agent values long-term rewards.

Discussion

The results demonstrate that the discount factor (γ) and exploration rate (ε) significantly impact the
performance of Q-learning in the Gridworld environment. There is a clear trade-off between
exploration and exploitation; while some exploration is necessary to discover good policies, excessive
exploration prevents the agent from converging to an optimal policy. For similar environments, it is
recommended to use a moderate to high γ (e.g., 0.7 to 0.9) and a low ε (e.g., 0.1 to 0.3) to achieve
the best performance. In conclusion, the results highlight the importance of carefully selecting γ and ε
in Q-learning, with moderate to high γ and low ε generally yielding the best results in this Gridworld
scenario. Future improvements could involve fine-tuning hyperparameters, implementing epsilon
decay, exploring advanced strategies like Boltzmann exploration, and refining the reward structure.
This study provided key insights into optimizing Q-learning performance.

Figure 13 – Success rate vs Episode for different y and e graph

Figure 14 – Average episode length vs Episode for different y and e graph

DLV Lab Manual Print
No ratings yet
DLV Lab Manual Print
29 pages
Deep Learning For Vision Lab Manual 2024
100% (1)
Deep Learning For Vision Lab Manual 2024
25 pages
17 Master2017Liu
No ratings yet
17 Master2017Liu
105 pages
Image Classification Using MNIST Dataset
No ratings yet
Image Classification Using MNIST Dataset
28 pages
Deep Learning Lab With Tensorflow
No ratings yet
Deep Learning Lab With Tensorflow
84 pages
Advance Artifical Intelligence & ML Certification Program
No ratings yet
Advance Artifical Intelligence & ML Certification Program
33 pages
Artificial Intelligencebased Realtime Traffic Management
No ratings yet
Artificial Intelligencebased Realtime Traffic Management
6 pages
Tesi
No ratings yet
Tesi
57 pages
Deep Learning
No ratings yet
Deep Learning
30 pages
A (Long) Peek Into Reinforcement Learning - Lil'Log
No ratings yet
A (Long) Peek Into Reinforcement Learning - Lil'Log
23 pages
Problem Set 1
No ratings yet
Problem Set 1
15 pages
Fulltext01 P
No ratings yet
Fulltext01 P
78 pages
Fabric Defect Final Black Book Abcdeffg
No ratings yet
Fabric Defect Final Black Book Abcdeffg
64 pages
School of Computer Science and Artificial Intelligence
No ratings yet
School of Computer Science and Artificial Intelligence
35 pages
FULLTEXT01
No ratings yet
FULLTEXT01
74 pages
ML Unit-1
No ratings yet
ML Unit-1
43 pages
Optimal Preventive Maintenance Policy
No ratings yet
Optimal Preventive Maintenance Policy
10 pages
4a - Approximate Reinforcement Learning
No ratings yet
4a - Approximate Reinforcement Learning
55 pages
ML QB 5
No ratings yet
ML QB 5
44 pages
Behavior Trees For Computer Games
No ratings yet
Behavior Trees For Computer Games
28 pages
Report Week 1 and 2
No ratings yet
Report Week 1 and 2
12 pages
DIP Mini Project
100% (1)
DIP Mini Project
12 pages
Assignment SQGAN
No ratings yet
Assignment SQGAN
14 pages
Week 6
No ratings yet
Week 6
8 pages
Article 2
No ratings yet
Article 2
64 pages
Group - 5 - AI in Manufacturing Project
No ratings yet
Group - 5 - AI in Manufacturing Project
18 pages
Report On Neural Network Implementation and Optimization Techniques-10.02.25
No ratings yet
Report On Neural Network Implementation and Optimization Techniques-10.02.25
13 pages
Deep Learning Assignment
No ratings yet
Deep Learning Assignment
11 pages
ML Unit-4 - RTU
No ratings yet
ML Unit-4 - RTU
18 pages
Traffic Sign Classification Slides
No ratings yet
Traffic Sign Classification Slides
29 pages
Linear Quadratic Control Using Model-Free Reinforcement Learning
No ratings yet
Linear Quadratic Control Using Model-Free Reinforcement Learning
16 pages
Ebooks File Applied Reinforcement Learning With Python: With OpenAI Gym, Tensorflow, and Keras Beysolow Ii All Chapters
100% (9)
Ebooks File Applied Reinforcement Learning With Python: With OpenAI Gym, Tensorflow, and Keras Beysolow Ii All Chapters
62 pages
Continuous Deep Q-Learning With Model-Based Acceleration: Shixiang Gu Timothy Lillicrap Ilya Sutskever Sergey Levine
No ratings yet
Continuous Deep Q-Learning With Model-Based Acceleration: Shixiang Gu Timothy Lillicrap Ilya Sutskever Sergey Levine
10 pages
AI Project
No ratings yet
AI Project
13 pages
1 s2.0 S0968090X22003680 Main
No ratings yet
1 s2.0 S0968090X22003680 Main
25 pages
A Survey of Air Combat Behavior Modeling Using Machine Learning
No ratings yet
A Survey of Air Combat Behavior Modeling Using Machine Learning
13 pages
ML Paper - Pneumonia Model (FINAL)
No ratings yet
ML Paper - Pneumonia Model (FINAL)
18 pages
EXP4 Regulizars
No ratings yet
EXP4 Regulizars
8 pages
AngadKumar - 21CS012 - Pattern Recognition
No ratings yet
AngadKumar - 21CS012 - Pattern Recognition
8 pages
Zubair Mohammad - Homework
No ratings yet
Zubair Mohammad - Homework
9 pages
TP3 Mi204 Santos Scardellato
No ratings yet
TP3 Mi204 Santos Scardellato
20 pages
Chetan Abbireddy 23WU0201049 Applied Analytics Using Python
No ratings yet
Chetan Abbireddy 23WU0201049 Applied Analytics Using Python
8 pages
ML Digit Classification Report
No ratings yet
ML Digit Classification Report
7 pages
Makespan Optimisation in Cloudlet Scheduling With Improved
No ratings yet
Makespan Optimisation in Cloudlet Scheduling With Improved
11 pages
Cep Dip
No ratings yet
Cep Dip
9 pages
Project Report
No ratings yet
Project Report
9 pages
Application of LLMs
No ratings yet
Application of LLMs
6 pages
A Review of Vertical Switching Algorithms For Heterogeneous Wireless Networks
No ratings yet
A Review of Vertical Switching Algorithms For Heterogeneous Wireless Networks
6 pages
Experiment 2
No ratings yet
Experiment 2
7 pages
Reinforcement Learning-Based Strategic Bidding For Generation Companies in Electricity Markets
No ratings yet
Reinforcement Learning-Based Strategic Bidding For Generation Companies in Electricity Markets
6 pages
Short - Ques - Answers FML
No ratings yet
Short - Ques - Answers FML
10 pages
Report On Handwritten Digit Recognition Using A Feedforward Neural Network
No ratings yet
Report On Handwritten Digit Recognition Using A Feedforward Neural Network
8 pages
Case Study - AP23322130042
No ratings yet
Case Study - AP23322130042
7 pages
Mavrin 19 A
No ratings yet
Mavrin 19 A
11 pages
Cse3011 RL End Term Announcement
No ratings yet
Cse3011 RL End Term Announcement
2 pages
Traffic Sign Classification: Mezzi Houssem
No ratings yet
Traffic Sign Classification: Mezzi Houssem
36 pages
A Review On Reinforcement Learning Based News Recommendation Systems and Its Challenges
No ratings yet
A Review On Reinforcement Learning Based News Recommendation Systems and Its Challenges
6 pages
SSRN 3763090
No ratings yet
SSRN 3763090
4 pages
Eng21cs0302 - Sgan
No ratings yet
Eng21cs0302 - Sgan
7 pages
Nihad: Teacher: Shanaz Shabazova Full Name: Subject: Machine Learning Group: E27-24 Date: 16-01-2025
No ratings yet
Nihad: Teacher: Shanaz Shabazova Full Name: Subject: Machine Learning Group: E27-24 Date: 16-01-2025
4 pages
Reinforcement Learning - Basics
No ratings yet
Reinforcement Learning - Basics
7 pages
594 Assignment 3
No ratings yet
594 Assignment 3
4 pages
Arabic OCR Report
No ratings yet
Arabic OCR Report
20 pages
Lab 8
No ratings yet
Lab 8
5 pages
Vineela Ann1
No ratings yet
Vineela Ann1
9 pages
Build An AI To Play Dino Run
No ratings yet
Build An AI To Play Dino Run
9 pages
Project Report: CS 574 - Computer Vision Using Machine Learning
No ratings yet
Project Report: CS 574 - Computer Vision Using Machine Learning
38 pages
Enhancing Intrusion Detection in Next-Generation Networks Based On A Multi-Agent Game-Theoretic Framework
No ratings yet
Enhancing Intrusion Detection in Next-Generation Networks Based On A Multi-Agent Game-Theoretic Framework
13 pages
Deep Q Learning
No ratings yet
Deep Q Learning
5 pages
Aids 2 Expt8
No ratings yet
Aids 2 Expt8
4 pages
Wa 1
No ratings yet
Wa 1
9 pages
4th Assign
No ratings yet
4th Assign
6 pages
Report On Skin Disease Classification Using CNN Algorithms
No ratings yet
Report On Skin Disease Classification Using CNN Algorithms
3 pages
CSC580 - CTA6 - Option - 1 - Anderson - Cleon
No ratings yet
CSC580 - CTA6 - Option - 1 - Anderson - Cleon
4 pages
Artificial Intelligence in 5G
No ratings yet
Artificial Intelligence in 5G
34 pages
Cset335 Lab1 Report
No ratings yet
Cset335 Lab1 Report
3 pages
ML Ass2
No ratings yet
ML Ass2
8 pages
Assesment
No ratings yet
Assesment
3 pages
Evaluation Report: CNN Models For Casting Defect Detection: Your Name April 24, 2025
No ratings yet
Evaluation Report: CNN Models For Casting Defect Detection: Your Name April 24, 2025
2 pages
Classifying Hand-Written Digits Using Neural Network
No ratings yet
Classifying Hand-Written Digits Using Neural Network
21 pages
Ece 685D HW3 2024
No ratings yet
Ece 685D HW3 2024
3 pages
Code
No ratings yet
Code
4 pages
Machine Learning Mini-Project Report
No ratings yet
Machine Learning Mini-Project Report
26 pages
Project 2
No ratings yet
Project 2
2 pages
Double Deep Q-Learning-Based Distributed Operation of Battery Energy Storage System
No ratings yet
Double Deep Q-Learning-Based Distributed Operation of Battery Energy Storage System
13 pages
"I C U N N ": Mage Lassification Sing Eural Etworks
No ratings yet
"I C U N N ": Mage Lassification Sing Eural Etworks
15 pages
(IJCST-V11I2P11) :dr. Girish Tere, Mr. Kuldeep Kandwal
No ratings yet
(IJCST-V11I2P11) :dr. Girish Tere, Mr. Kuldeep Kandwal
7 pages
NNDL Lab Record
No ratings yet
NNDL Lab Record
26 pages

Aicw

Uploaded by

Aicw

Uploaded by

COP528 AI and Applied Machine Learning Coursework

Justification for Using CNNs

 Spatial Invariance: Pooling layers enable recognition of objects regardless of

 Hierarchical Learning: CNNs capture low-level features (edges, textures) in early

Key Design Choices

 Pooling Layers (MaxPooling2D, 2x2): Reduces spatial dimensions, lowers

 Flatten Layer: Converts 2D feature maps into a 1D vector for classification.

o Dense (128 units, ReLU): Learns complex feature combinations.

o Output Layer (num_classes, linear activation): Produces logits for

 Loss Function (Sparse Categorical Crossentropy): Suitable for multi-class

Accuracy alone can be misleading, so additional metrics ensure a comprehensive

 Training vs. Validation Accuracy

Figure 2 - Epoch loss curve for visualisation

Figure 4 - Classification Report

Further improvements to models’ performance

 RandomFlip (horizontal/vertical) – simulates orientation changes.

 RandomRotation (±20%) – accounts for object rotation.

 RandomZoom (±20%) – adjusts scale variations.

Impact: Improved generalization, with a smaller training-validation accuracy gap.

Implementation: Applied L2 regularization (λ=0.001) to convolutional and dense layers, penalizing

Impact: Lower validation loss, indicating improved generalization.

Impact: Balanced training time and model performance, avoiding overfitting.

Experiment evaluation for improved version

Comparison to Previous Model:

Figure 9 – Classification report for improved version

1. Initialization: The value function VV is initialized to zero for all states.

 Value Iteration is chosen because it is a model-based method that guarantees convergence

1. Initialization: The Q-table is initialized to zero for all state-action pairs.

2. Exploration vs. Exploitation: The agent follows an epsilon-greedy policy to balance

Figure 11 – Value iteration policy graph

 Convergence: The Q-Learning algorithm converged after 10,000 episodes with a

o Convergence Time: The time taken to complete 10,000 episodes.

o Average Reward: The mean cumulative reward across all episodes.

1. Effect of Discount Factor (γ)

Effect of Exploration Rate (ε)

Interaction Between γ and ε

 Moderate/High γ and High ε: Performance degrades due to excessive exploration, even

Figure 13 – Success rate vs Episode for different y and e graph

Figure 14 – Average episode length vs Episode for different y and e graph

You might also like