Answer Key
Answer Key
Code No:8LC02
Part - A Max.Marks:20
BC CO( Marks
LL s)
Propositions: Statements that are either true or false, such as "It is raining" or
"The sky is blue."
Logical Operators:
Truth Tables: Used to represent all possible truth values of propositions under
logical operators.
Syntax and Semantics:
In an image classification task e.g., cat vs. dog detection, CNNs process the image
through layers to identify key features like edges, textures, and shapes. Based on
these features, the fully connected layer classifies the image into the correct
category.
6 Explain how Deep Reinforcement Learning is applied to master Atari games. L2 CO [2M]
6
The game agents can learn efficient action selection strategies, demonstrating
impressive capabilities in gameplay. For example, in complex mazes, game agents
can find optimal paths, avoiding traps and enemies, through exploration and trial-
and-error.
The Atari game acts as the environment, where the agent interacts by taking
actions (e.g., moving left, right, shooting).
The agent receives observations (e.g., game screen pixels), performs
actions, and receives rewards (e.g., game score).
9 Compare and contrast value iteration and policy iteration in solving decision- L1 CO [2M]
making problems under uncertainty. 4
optimal policy.
Reinforcement Learning (RL) algorithms such as value iteration and policy
iteration are fundamental techniques used to solve Markov Decision Processes
(MDPs) and derive optimal policies. While both methods aim to find the optimal
policy, they employ distinct strategies to achieve this goal. Let's delve into the
differences between value iteration and policy iteration:
Part – B Max.Marks:50
BC CO( Marks
LL s)
11. a) Explain the classification of environments based on their nature and L3 CO [5M]
complexity. 1
Environment types
12. a) What is a Constraint Satisfaction Problem (CSP)? Explain with a suitable L4 CO [5M]
example. 2
Let us start with games with two players, whom we’ll refer to as MAX and
MIN for obvious reasons. MAX is the first to move, and then they take turns
until the game is finished. At the conclusion of the game, the victorious
player receives points, while the loser receives penalties. A game can be
formalized as a type of search problem that has the following elements:
S0: The initial state of the game, which describes how it is set up at
the start.
Player (s): Defines which player in a state has the move.
Actions (s): Returns a state’s set of legal moves.
Result (s, a): A transition model that defines a move’s outcome.
Terminal-Test (s): A terminal test that returns true if the game is over
but false otherwise. Terminal states are those in which the game has come
to a conclusion.
Utility (s, p): A utility function (also known as a payout function or
objective function ) determines the final numeric value for a game that
concludes in the terminal state s for player p. The result in chess is a win,
a loss, or a draw, with values of +1, 0, or 1/2. Backgammon’s payoffs
range from 0 to +192, but certain games have a greater range of possible
outcomes. A zero-sum game is defined (confusingly) as one in which the
total reward to all players is the same for each game instance. Chess is a
zero-sum game because each game has a payoff of 0 + 1, 1 + 0, or 1/2 +
1/2. “Constant-sum” would have been a preferable name, 22 but zero-sum
is the usual term and makes sense if each participant is charged 1.
The game tree for tic-tac-toe is relatively short, with just 9! = 362,880
terminal nodes. However, because there are over 1040 nodes in chess, the
game tree is better viewed as a theoretical construct that cannot be realized in
the actual world. But, no matter how big the game tree is, MAX’s goal is to
find a solid move. A tree that is superimposed on the whole game tree and
examines enough nodes to allow a player to identify what move to make is
referred to as a search tree.
13. a) Explain the concept of logical agents and how they are used in AI. L5 CO [5M]
3
b) Describe the process of inference in first-order logic and give an example. L3 CO [5M]
3
14. a) What are utility functions, and how do they help in decision-making under L4 CO [5M]
uncertainty? Provide an example of their use. 4
b) Describe the process of value iteration and explain how it is used to find L3 CO [5M]
optimal policies in Markov Decision Processes (MDP). 4
15. a) Explain the architecture of Convolutional Neural Networks (CNNs) in detail, L5 CO [5M]
including the roles of convolutional layers, activation functions, and pooling 5
layers.
CNN Architecture:-
The Convolutional layer applies filters to the input image to extract features,
the Pooling layer downsamples the image to reduce computation, and the
fully connected layer makes the final prediction. The network learns the
optimal filters through backpropagation and gradient descent.
How Convolutional Layers Works?
Convolution Neural Networks or covnets are neural networks that share their
parameters. Imagine you have an image. It can be represented as a cuboid
having its length, width (dimension of the image), and height (i.e the channel
as images generally have red, green, and blue channels).
Now imagine taking a small patch of this image and running a small neural
network, called a filter or kernel on it, with say, K outputs and representing
them vertically. Now slide that neural network across the whole image, as a
result, we will get another image with different widths, heights, and depths.
Instead of just R, G, and B channels now we have more channels but lesser
width and height. This operation is called Convolution. If the patch size is the
same as that of the image it will be a regular neural network. Because of this
small patch, we have fewer weights.
Image source: Deep Learning Udacity
Output Layer: The output from the fully connected layers is then fed
into a logistic function for classification tasks like sigmoid or softmax
which converts the output of each class into the probability score of each
class.
Advantages of CNNs:
Good at detecting patterns and features in images, videos, and audio
signals.
Robust to translation, rotation, and scaling invariance.
End-to-end training, no need for manual feature extraction.
Can handle large amounts of data and achieve high accuracy.
Disadvantages of CNNs:
Computationally expensive to train and require a lot of memory.
Can be prone to overfitting if not enough data or proper regularization
is used.
Requires large amounts of labeled data.
Interpretability is limited, it’s hard to understand what the network
has learned.
b) Discuss the shortcomings of traditional feature selection techniques and L4 CO [5M]
explain how Convolutional Neural Networks overcome these challenges. 5
Challenges:-
Hardware Solutions:
Optimized Architectures:
Transfer Learning:
Regularization:
Explainability:
Model Compression:
Traditional Q-learning suffers when the state space is large because it requires
maintaining a Q-table that maps every state-action pair to a value. For large or
continuous state spaces, this becomes impractical due to:
b) Discuss the role of Policy Gradient methods in reinforcement learning. How do L4 CO [5M]
these methods directly optimize the policy without the need for a value 6
function?
Policy Representation:
o The policy πθ(s)\pi_{\theta}(s)πθ(s) is a probability distribution
over actions given the state sss. For discrete action spaces, this
could be a softmax over possible actions, while for continuous
action spaces, it could be a Gaussian distribution parameterized
by the mean and standard deviation.
Where RRR is the total return or cumulative reward received by the agent over
an episode, and Eπθ\mathbb{E}_{\pi_{\theta}}Eπθ represents the expectation
under the current policy πθ\pi_{\theta}πθ.
To improve the policy, we need to adjust its parameters θ\thetaθ. The core idea
behind Policy Gradient methods is to compute the gradient of the expected
return with respect to the policy parameters θ\thetaθ and update θ\thetaθ in the
direction that maximizes the return.
Using the policy gradient theorem, the gradient of J(θ)J(\theta)J(θ) with respect
to θ\thetaθ can be approximated as:
∇θJ(θ)=Eπθ[∇θlogπθ(s,a)Q(s,a)]\nabla_{\theta} J(\theta) =
\mathbb{E}_{\pi_{\theta}} \left[ \nabla_{\theta} \log \pi_{\theta}(s, a) Q(s, a)
\right]∇θJ(θ)=Eπθ[∇θlogπθ(s,a)Q(s,a)]
Where:
4. Policy Update
The parameters θ\thetaθ are updated using a learning rate α\alphaα, typically in
the following way:
This update increases the probability of selecting actions that lead to higher
returns and decreases the probability of selecting actions that lead to lower
returns.
17. a) Compare and contrast goal-based agents and utility-based agents.. L4 CO [4M]
1
Goal-based Agents:
Purpose: These agents operate by achieving specific goals. They select
actions that they believe will bring them closer to their predefined
goals.
Decision Process: The agent compares its current state to the goal state
and plans a sequence of actions to reach that goal. It can only assess
whether it has achieved its goal or not.
Evaluation: A goal-based agent's performance is often binary—it either
achieves the goal or it does not. There is no nuance in terms of partial
success or failure.
Example: A robot whose goal is to reach a particular location. It will
take actions to move towards that location without considering how
"good" the path or actions are, just whether the goal is achieved.
Utility-based Agents:
b) Mention one example of a case study involving a game, such as Tic-tac-toe. L3 CO [3M]
2
AI Approach:
c) Define logical agents and explain their role in artificial intelligence. L4 CO [3M]
3
“Logical AI: The idea is that an agent can represent knowledge of its world, its
goals and the current situation by sentences in logic and decide what to do by
inferring that a certain action or course of action is appropriate to achieve its
goals.”
Logical agents use formal logic (such as propositional logic, first-order logic,
etc.) to reason about the world. This reasoning capability is essential in AI for
problem-solving, decision-making, and even understanding the environment.
Knowledge Representation:
Automation of Decision-Making:
One of the essential roles of logical agents is ensuring that the system's
knowledge remains consistent. They maintain logical coherence by applying
formal rules and eliminating contradictions.
18. a) Explain Bayes' Rule and discuss its application in reasoning under uncertainty. L5 CO [4M]
4
Mathematical Derivation of Bayes' Rule:-
Bayes' Rule is derived from the definition of conditional probability. Let's start
with the definition:
P(A∣B)=P(A∩B)P(B)P(A∣B)=P(B)P(A∩B)
This equation states that the probability of event AA given event BB is equal to
the probability of both events happening (the intersection of AA and BB)
divided by the probability of event BB.
Similarly, we can write the conditional probability of event B given event A:
P(B∣A)=P(A∩B)P(A)P(B∣A)=P(A)P(A∩B)
By rearranging this equation, we get:
P(A∩B)=P(B∣A)⋅P(A)P(A∩B)=P(B∣A)⋅P(A)
Now, we have two expressions for P(A∩B)P(A∩B), since both expressions are
equal to P(A∩B)P(A∩B), we can set them equal to each other:
P(A∣B)⋅P(B)=P(B∣A)⋅P(A)P(A∣B)⋅P(B)=P(B∣A)⋅P(A)
To get P(A∣B)P(A∣B), we divide both sides by P(B)P(B):
P(A∣B)=P(B)P(B∣A)⋅P(A)P(A∣B)=P(B∣A)⋅P(A)P(B)
b) Discuss how neurons in human vision inspire the design of Convolutional CO [3M]
Neural Networks. 5
c) What is a Markov Decision Process (MDP)? Discuss the key components of L4 CO [3M]
MDPs used in reinforcement learning. 6
Components of an MDP
An MDP is defined by a tuple (S,A,P,R,γ)(S,A,P,R,γ) where:
States (S): A finite set of states representing all possible situations in
which the agent can find itself. Each state encapsulates all the relevant
information needed to make a decision.
Actions (A): A finite set of actions available to the agent. At each
state, the agent can choose from these actions to influence its environment.
Transition Probability (P): A probability function P(s′∣s,a) that defines
the probability of transitioning from state s to state s′ after taking action a.
This encapsulates the dynamics of the environment.
Reward Function (R): A reward function R(s,a,s′) that provides a
scalar reward received after transitioning from state s to state s′ due to
action a. This reward guides the agent towards desirable outcomes.
Discount Factor (γ): A discount factor γ∈[0,1) that determines the
importance of future rewards. A discount factor close to 1 makes the agent
prioritize long-term rewards, while a factor close to 0 makes it focus on
immediate rewards.
-- 00 -- 00 –