0% found this document useful (0 votes)
2 views

ML Assignment[1]

The document is an assignment on Machine Learning for a B. Tech CSE 6th Semester course, covering key concepts such as the Bellman equation, Linear Quadratic Regulation, Q-Learning, Deep Neural Networks (DNN), and Convolutional Neural Networks (CNN). Each concept is explained with its significance, applications, advantages, and limitations. The assignment is submitted by Simran Tomar to Dr. Amit R Khaparde.

Uploaded by

Simran tomar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ML Assignment[1]

The document is an assignment on Machine Learning for a B. Tech CSE 6th Semester course, covering key concepts such as the Bellman equation, Linear Quadratic Regulation, Q-Learning, Deep Neural Networks (DNN), and Convolutional Neural Networks (CNN). Each concept is explained with its significance, applications, advantages, and limitations. The assignment is submitted by Simran Tomar to Dr. Amit R Khaparde.

Uploaded by

Simran tomar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

GB PANT DSEU OKHLA-1 CAMPUS

(Govt. of NCT of Delhi)


Okhla Industrial Estate Phase-III, New Delhi-110020

B. Tech CSE 6th Semester


Machine Learning
Subject Code: BT-CS-ES602
Assignment-01

Submitted to: Submitted by:


Dr. Amit R Khaparde Simran Tomar (41721016)
Qn(A): Write a short note on:

1. Bellman equation
2. Linear quadratic regulation
3. Q Learning
4. DNN
5. CNN

Ans1. The Bellman Equation: Optimizing Decisions Over Time

The Bellman equation, named after Richard E. Bellman, is a fundamental concept in dynamic
programming. It's a recursive equation that helps us make optimal decisions in situations
where we need to consider both immediate rewards and future consequences.

Imagine an agent navigating an environment, like a maze. The Bellman equation tells the
agent that the value of being in a current state (s) is equal to:

 The immediate reward (R) received by taking a specific action (a) in that state.
 Plus, the discounted value (γ * V(s')) of the next state (s') that results from taking that
action.

Here's a breakdown of the symbols:

 V(s): The value of being in state ‘s’.


 R (s, a): The reward for taking action ‘a’ in state ‘s’.
 s': The next state reached after taking action ‘a’ in state ‘s’.
 γ (gamma): A discount factor (between 0 and 1) that balances the importance of
immediate rewards vs. future rewards. A higher gamma gives more weight to future
rewards.

The key idea is that the optimal decision considers both the immediate reward of an action
and the long-term value of the resulting state. The Bellman equation helps us iteratively
evaluate the value of each state, allowing the agent to find the sequence of actions that leads
to the maximum long-term reward.

The Bellman equation is widely used in various applications, especially in reinforcement


learning, where agents learn through trial and error to make optimal decisions in complex
environments.
Ans 2. Linear Quadratic Regulation (LQR)

Linear Quadratic Regulation (LQR) is a powerful technique in control theory for finding optimal
control strategies for linear systems. It achieves this by minimizing a quadratic cost function that
penalizes both deviations of the system's state from a desired equilibrium and the effort required to
control the system.

Here's how it works:

 The System: The system is described by linear differential equations in state-space form,
representing the relationship between the system's state, control inputs, and their evolution
over time.

 The Cost Function: A quadratic function penalizes two things:

o Deviations from the desired state: This is captured by a positive semi-definite matrix
(Q) that weights the importance of keeping each state variable close to its desired
value.

o Control effort: The control effort required to manipulate the system is also penalized
using another positive definite matrix (R) that weights the importance of minimizing
control inputs (e.g., minimizing energy consumption or actuator wear).

 Finding the Optimal Control: LQR solves an optimization problem to find a state-feedback
controller. This controller uses all the system's state variables (full state feedback) to
compute the control input that minimizes the cost function over time.

LQR offers several advantages:

 Systematic Design: It provides a structured approach to designing controllers for linear


systems.

 Guaranteed Stability: If the system is controllable and observable, the LQR controller
guarantees closed-loop stability.

 Tuning Flexibility: The weighting matrices (Q and R) allow you to tailor the controller's
behaviour by prioritizing specific state variables or control efforts.

However, LQR also has limitations:

 Linearity Assumption: It only applies to linear systems, which may not always be realistic.

 Full State Feedback: It requires access to all state variables, which may not be feasible in
practice.

Despite these limitations, LQR remains a valuable tool for control engineers due to its effectiveness
and ease of implementation for linear systems.

Ans 3. Q-Learning: Learning Through Trial and Reward


Q-learning is a fundamental algorithm in the field of reinforcement learning. Unlike supervised
learning where you have labelled data, reinforcement learning deals with situations where an agent
learns through trial and error in an environment. Q-learning helps the agent discover the best course
of action to take in a given situation to maximize long-term rewards.

Here's what makes Q-learning work:

 Q-Values: At the core is the concept of Q-values. A Q-value represents the expected future
reward an agent can get by taking a specific action (a) in a particular state (s). The agent
maintains a Q-table (or Q-function) that stores these Q-values for all possible state-action
pairs.

 Exploration vs. Exploitation: The agent balances exploration (trying new actions) and
exploitation (taking the currently believed best action). This is often achieved through an
epsilon-greedy policy. With a certain probability (epsilon), the agent explores by trying a
random action, and with probability (1-epsilon), it exploits by taking the action with the
highest Q-value in the current state.

 Bellman Equation: Q-learning updates the Q-values based on the Bellman equation. This
equation considers the immediate reward received after taking an action, along with the
discounted future reward expected from the resulting state.

Through this iterative process of exploration, reward collection, and Q-value updates, the agent
gradually learns which actions to take in different states to achieve the maximum cumulative reward
over time.

Here are some key features of Q-learning:

 Model-Free: It doesn't require a detailed model of the environment, only the ability to
interact with it and receive rewards.

 Off-Policy Learning: It can learn from experience even if the data comes from a different
policy than the one it's currently following.

 Versatility: Q-learning can be applied to various scenarios where an agent interacts with an
environment to learn optimal behaviour.
However, there are also challenges to consider:

 Exploration-Exploitation Trade-off: Finding the right balance between exploring new


possibilities and exploiting known good actions is crucial.

 Convergence: Learning can be slow, and convergence to the optimal policy isn't guaranteed.

Despite these challenges, Q-learning remains a powerful tool for training agents in reinforcement
learning problems.

Ans 4. Deep Neural Networks (DNNs): Learning Like the Brain

Deep Neural Networks (DNNs) are a type of artificial neural network inspired by the
structure and function of the human brain. Unlike simpler neural networks, DNNs have
multiple hidden layers between the input and output layers. These hidden layers allow DNNs
to learn complex patterns and relationships in data, making them highly effective for a variety
of tasks.

Here's a breakdown of how DNNs work:

 Structure: DNNs are composed of interconnected artificial neurons, arranged in


layers. Each neuron receives inputs from the previous layer, performs a mathematical
operation (activation function), and sends its output to the next layer.
 Learning: DNNs learn through a process called backpropagation. During training, the
network is presented with data and calculates its output. The difference between the
desired output and the actual output (error) is then propagated backward through the
network. The weights and biases of the neurons are adjusted to minimize this error
iteratively.
 Strength: The hidden layers allow DNNs to extract features from the data at
increasing levels of complexity. This enables them to model intricate relationships
that might be missed by simpler models.

DNNs are widely used in various applications due to their capabilities:


 Image Recognition: DNNs excel at recognizing objects and patterns in images,
powering applications like facial recognition and self-driving cars.
 Natural Language Processing: DNNs can understand and generate human language,
enabling tasks like machine translation and chatbots.
 Recommender Systems: DNNs personalize recommendations on e-commerce
platforms and streaming services by analysing user behaviour and preferences.

However, DNNs also come with challenges:

 Computational Cost: Training DNNs requires significant computational power and


large amounts of data.
 Interpretability: Understanding how DNNs arrive at their decisions can be difficult,
limiting their use in safety-critical applications.

Despite these challenges, DNNs are a powerful tool at the forefront of artificial intelligence,
with ongoing research pushing the boundaries of their capabilities.

Ans 5. Convolutional Neural Networks (CNNs): Masters of Visual Recognition

Convolutional Neural Networks (CNNs) are a powerful type of deep learning architecture particularly
adept at image recognition and processing tasks. Their structure, inspired by the human visual cortex,
allows them to excel at finding patterns and relationships within grid-like data like images.

Key Features of CNNs:

 Convolutional Layers: These layers apply filters to extract features from the input image. By
moving these filters across the image, the network can identify edges, shapes, and other visual
elements at various scales.
 Pooling Layers: These layers down sample the data, reducing its dimensionality and
computational cost, while preserving important features.
 Fully Connected Layers: In the final stages, these layers take the extracted features and
classify the image or make predictions based on the learned patterns.

Fig: A CNN sequence to classify handwritten digits


Applications of CNNs:

 Image Recognition: Classifying objects in images, facial recognition, medical image


analysis.
 Computer Vision: Tasks like object detection, image segmentation, and self-driving car
perception.
 Video Analysis: Action recognition in videos, anomaly detection in surveillance footage.

Advantages of CNNs:

 Highly effective for visual tasks: Their architecture is specifically designed to exploit the
spatial relationships within images.
 Automatic feature extraction: CNNs can learn features directly from data, eliminating the
need for manual feature engineering.
 Transfer learning: Pre-trained CNN models can be fine-tuned for new tasks, leveraging their
learned knowledge as a starting point.

Limitations of CNNs:

 Computational Cost: Training large CNNs can be computationally expensive and require
significant data.
 Interpretability: Understanding how CNNs arrive at their decisions can be challenging,
limiting their use in some applications.

Overall, CNNs are a cornerstone of deep learning for visual tasks, with ongoing research expanding
their capabilities and applications.

You might also like