ML Assignment[1]
ML Assignment[1]
1. Bellman equation
2. Linear quadratic regulation
3. Q Learning
4. DNN
5. CNN
The Bellman equation, named after Richard E. Bellman, is a fundamental concept in dynamic
programming. It's a recursive equation that helps us make optimal decisions in situations
where we need to consider both immediate rewards and future consequences.
Imagine an agent navigating an environment, like a maze. The Bellman equation tells the
agent that the value of being in a current state (s) is equal to:
The immediate reward (R) received by taking a specific action (a) in that state.
Plus, the discounted value (γ * V(s')) of the next state (s') that results from taking that
action.
The key idea is that the optimal decision considers both the immediate reward of an action
and the long-term value of the resulting state. The Bellman equation helps us iteratively
evaluate the value of each state, allowing the agent to find the sequence of actions that leads
to the maximum long-term reward.
Linear Quadratic Regulation (LQR) is a powerful technique in control theory for finding optimal
control strategies for linear systems. It achieves this by minimizing a quadratic cost function that
penalizes both deviations of the system's state from a desired equilibrium and the effort required to
control the system.
The System: The system is described by linear differential equations in state-space form,
representing the relationship between the system's state, control inputs, and their evolution
over time.
o Deviations from the desired state: This is captured by a positive semi-definite matrix
(Q) that weights the importance of keeping each state variable close to its desired
value.
o Control effort: The control effort required to manipulate the system is also penalized
using another positive definite matrix (R) that weights the importance of minimizing
control inputs (e.g., minimizing energy consumption or actuator wear).
Finding the Optimal Control: LQR solves an optimization problem to find a state-feedback
controller. This controller uses all the system's state variables (full state feedback) to
compute the control input that minimizes the cost function over time.
Guaranteed Stability: If the system is controllable and observable, the LQR controller
guarantees closed-loop stability.
Tuning Flexibility: The weighting matrices (Q and R) allow you to tailor the controller's
behaviour by prioritizing specific state variables or control efforts.
Linearity Assumption: It only applies to linear systems, which may not always be realistic.
Full State Feedback: It requires access to all state variables, which may not be feasible in
practice.
Despite these limitations, LQR remains a valuable tool for control engineers due to its effectiveness
and ease of implementation for linear systems.
Q-Values: At the core is the concept of Q-values. A Q-value represents the expected future
reward an agent can get by taking a specific action (a) in a particular state (s). The agent
maintains a Q-table (or Q-function) that stores these Q-values for all possible state-action
pairs.
Exploration vs. Exploitation: The agent balances exploration (trying new actions) and
exploitation (taking the currently believed best action). This is often achieved through an
epsilon-greedy policy. With a certain probability (epsilon), the agent explores by trying a
random action, and with probability (1-epsilon), it exploits by taking the action with the
highest Q-value in the current state.
Bellman Equation: Q-learning updates the Q-values based on the Bellman equation. This
equation considers the immediate reward received after taking an action, along with the
discounted future reward expected from the resulting state.
Through this iterative process of exploration, reward collection, and Q-value updates, the agent
gradually learns which actions to take in different states to achieve the maximum cumulative reward
over time.
Model-Free: It doesn't require a detailed model of the environment, only the ability to
interact with it and receive rewards.
Off-Policy Learning: It can learn from experience even if the data comes from a different
policy than the one it's currently following.
Versatility: Q-learning can be applied to various scenarios where an agent interacts with an
environment to learn optimal behaviour.
However, there are also challenges to consider:
Convergence: Learning can be slow, and convergence to the optimal policy isn't guaranteed.
Despite these challenges, Q-learning remains a powerful tool for training agents in reinforcement
learning problems.
Deep Neural Networks (DNNs) are a type of artificial neural network inspired by the
structure and function of the human brain. Unlike simpler neural networks, DNNs have
multiple hidden layers between the input and output layers. These hidden layers allow DNNs
to learn complex patterns and relationships in data, making them highly effective for a variety
of tasks.
Despite these challenges, DNNs are a powerful tool at the forefront of artificial intelligence,
with ongoing research pushing the boundaries of their capabilities.
Convolutional Neural Networks (CNNs) are a powerful type of deep learning architecture particularly
adept at image recognition and processing tasks. Their structure, inspired by the human visual cortex,
allows them to excel at finding patterns and relationships within grid-like data like images.
Convolutional Layers: These layers apply filters to extract features from the input image. By
moving these filters across the image, the network can identify edges, shapes, and other visual
elements at various scales.
Pooling Layers: These layers down sample the data, reducing its dimensionality and
computational cost, while preserving important features.
Fully Connected Layers: In the final stages, these layers take the extracted features and
classify the image or make predictions based on the learned patterns.
Advantages of CNNs:
Highly effective for visual tasks: Their architecture is specifically designed to exploit the
spatial relationships within images.
Automatic feature extraction: CNNs can learn features directly from data, eliminating the
need for manual feature engineering.
Transfer learning: Pre-trained CNN models can be fine-tuned for new tasks, leveraging their
learned knowledge as a starting point.
Limitations of CNNs:
Computational Cost: Training large CNNs can be computationally expensive and require
significant data.
Interpretability: Understanding how CNNs arrive at their decisions can be challenging,
limiting their use in some applications.
Overall, CNNs are a cornerstone of deep learning for visual tasks, with ongoing research expanding
their capabilities and applications.