0% found this document useful (0 votes)
11 views10 pages

Introduction To Deep Q-Network (DQN) : by Divyansh Pandit

Uploaded by

Rudransh Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views10 pages

Introduction To Deep Q-Network (DQN) : by Divyansh Pandit

Uploaded by

Rudransh Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Introduction to Deep Q-

Network (DQN)
Deep Q-Network (DQN) is a reinforcement learning algorithm that uses deep
neural networks to approximate the optimal action-value function. This powerful
technique enables agents to learn complex behaviors by directly mapping
observations to actions, without requiring extensive feature engineering.

by Divyansh Pandit
Reinforcement Learning Fundamentals

1. Reinforcement learning is a type of machine learning where an agent learns to make decisions by
interacting with its environment and receiving rewards or penalties for its actions.
The key components of a reinforcement learning problem are the agent, the environment, the actions the agent
can take, the states of the environment, and the rewards the agent receives.

The agent's goal is to learn a policy - a mapping from states to actions - that maximizes the cumulative reward
it receives over time.
Markov Decision Processes

Markov Decision Processes (MDPs) are a mathematical framework for modeling


sequential decision-making problems. They describe the relationship between an
agent's actions, the environment's responses, and the rewards or consequences
that result.

MDPs are characterized by a set of states, a set of actions, transition


probabilities, and reward functions. The agent's goal is to learn a policy that
maximizes the expected long-term reward.
Q-Learning and its Limitations
1 Q-Learning Basics
Q-Learning is a model-free reinforcement learning algorithm that learns an optimal action-value
function, known as the Q-function, to determine the best action to take in a given state.

2 Limitations of Q-Learning
While effective in simple environments, Q-Learning struggles to scale to complex, high-
dimensional state spaces due to the curse of dimensionality. It can also be unstable and prone to
divergence when used with function approximation.

3 Need for Representation Learning


To overcome the limitations of Q-Learning, there is a need for representation learning techniques
that can efficiently extract relevant features from high-dimensional state spaces and learn a
compact, yet powerful, Q-function approximation.
Deep Neural Networks for Q-Function
Approximation
Reinforcement learning algorithms like Q-Learning
can struggle to handle complex, high-dimensional
state spaces. Deep neural networks offer a powerful
solution by learning to approximate the Q-function -
a mapping from states and actions to expected future
rewards.

By training a deep neural network to output the Q-


values for each possible action in a given state, the
system can generalize and make accurate predictions
even in very large state spaces.
DQN Architecture and Training
Process
The Deep Q-Network (DQN) architecture combines a deep neural network with
the principles of Q-learning, a reinforcement learning technique. The neural
network is used to approximate the Q-function, which estimates the expected
future reward for each possible action in a given state.

The training process for DQN involves repeatedly sampling experiences from a
replay buffer and using them to update the neural network parameters. This
stabilizes the learning process and allows the network to learn from diverse
experiences.
Experience Replay and Target Network

Experience Replay Target Network Stable Learning


DQN uses experience replay, DQN also uses a separate target The combination of experience
where it stores past experiences in network that is periodically replay and the target network are
a buffer and samples from them updated from the main Q-network. key innovations that make DQN
during training. This helps This helps prevent oscillations and more stable and effective than
stabilize learning by reducing instabilities during training. basic Q-learning, especially when
correlations between samples. dealing with complex
environments.
Handling Continuous State and Action Spaces

Discretization
1 Convert continuous spaces into discrete grids

Function Approximation
2
Use neural networks to represent Q-function

Parameterization
3 Use low-dimensional parameters to represent
complex spaces

Traditional Q-learning methods struggle when faced with continuous state and action spaces, as they rely on
discretizing these spaces. DQN addresses this by using function approximation techniques, such as deep neural
networks, to represent the Q-function over continuous domains. This allows DQN to effectively handle complex,
high-dimensional environments.
Improvements and Variants of DQN

Double DQN
Addresses the overestimation bias in standard DQN by using two separate Q-networks
to select and evaluate actions.

Dueling DQN
Separates the Q-function into value and advantage streams, allowing the model to
better represent the underlying value of states.

Prioritized Experience Replay


Improves sample efficiency by prioritizing transitions with high temporal-difference
error during training, focusing on important experiences.
Applications and Limitations of DQN

1 Applications of DQN 2 Sample Efficiency


DQN has been successfully applied to a wide DQN is relatively sample-efficient compared to
range of tasks, including Atari game playing, other reinforcement learning algorithms,
robotics control, and resource allocation in making it suitable for real-world applications
communication networks. with limited data.

3 Handling Complex Environments 4 Limitations of DQN


DQN can be unstable during training due to the
DQN's ability to approximate complex Q- non-stationarity of the target Q-function, and it
functions using deep neural networks allows it may struggle in environments with sparse
to tackle problems with large state and action rewards.
spaces.

You might also like