0% found this document useful (0 votes)
45 views29 pages

Answer Key

The document outlines a key examination paper on Advanced Artificial Intelligence and Deep Learning, focusing on various AI concepts such as adversarial search, propositional logic, uncertainty, and Convolutional Neural Networks. It includes definitions, explanations, and comparisons of AI methodologies, along with practical applications in scenarios like autonomous driving and game playing. Additionally, it discusses the classification of environments and the importance of classical planning in AI, providing a comprehensive overview of foundational AI principles and techniques.

Uploaded by

balaram.balaram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views29 pages

Answer Key

The document outlines a key examination paper on Advanced Artificial Intelligence and Deep Learning, focusing on various AI concepts such as adversarial search, propositional logic, uncertainty, and Convolutional Neural Networks. It includes definitions, explanations, and comparisons of AI methodologies, along with practical applications in scenarios like autonomous driving and game playing. Additionally, it discusses the classification of environments and the importance of classical planning in AI, providing a comprehensive overview of foundational AI principles and techniques.

Uploaded by

balaram.balaram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Sreenidhi Institute of Science and Technology

(An Autonomous Institution)

ADVANCED ARTIFICIAL INTELLIGENCE AND DEEP LEARNING (AIML) - KEY

Code No:8LC02

Part - A Max.Marks:20

ANSWER ALL QUESTIONS

BC CO( Marks
LL s)

1 Define Artificial Intelligence and mention two AI problems. L1 CO [2M]


1
A) Artificial intelligence (AI) is the theory and development of omputer systems
capable of performing tasks that historically required human intelligence, such as
recognizing speech, making decisions, and identifying patterns. AI is an umbrella
term that encompasses a wide variety of technologies, including machine learning,
deep learning, and natural language processing (NLP). The following are a few
major problems associated with Artificial Intelligence 1)data-related, 2)ethical
concerns, 3)regulatory and legal, bias and 3)transparency.

2 Define adversarial search in the context of game playing. L1 CO [2M]


2
The Adversarial search is a well-suited approach in a competitive environment,
where two or more agents have conflicting goals. The adversarial search can be
employed in two-player zero-sum games which means what is good for one
player will be the misfortune for the other. In such a case, there is no win-win
outcome. In artificial intelligence, adversarial search plays a vital role in decision-
making, particularly in competitive environments associated with games and
strategic interactions.

 Game-playing: The Adversarial search finds a significant application


in game-playing scenarios, including renowned games like chess, Go, and
poker. The adversarial search offers the simplified nature of these games that
represents the state of a game in a straightforward approach and the agents are
limited to a small number of actions whose effects are governed by precise
rules.

3 Briefly explain propositional logic. L2 CO [2M]


3
Propositional logic, also known as sentential logic or propositional calculus, is a
branch of logic that deals with statements (called propositions) that are either true
or false. It focuses on the relationships between these propositions and how their
truth values can .

Propositions: Statements that are either true or false, such as "It is raining" or
"The sky is blue."
Logical Operators:

 AND (∧): True if both propositions are true.


 OR (∨): True if at least one proposition is true.
 NOT (¬): Negates the truth value of a proposition.
 IMPLIES (→): True unless the first proposition is true and the second is
false.
 IF AND ONLY IF (↔): True if both propositions have the same truth
value.

Truth Tables: Used to represent all possible truth values of propositions under
logical operators.
Syntax and Semantics:

 Syntax: Rules governing how propositions and operators are combined to


form valid expressions.
 Semantics: Interpretation of these expressions to determine their truth
values.

be combined and manipulated using logical operators.

4 Define uncertainty in the context of AI and provide an example. L1 CO [2M]


4
Uncertainty refers to the lack of complete or perfect information about the
environment, outcomes, or decisions. It arises when the AI system cannot
determine the exact state of the world or predict outcomes with absolute certainty
due to incomplete, ambiguous, or noisy data.

An AI system in autonomous driving faces uncertainty when it tries to identify an


object in low visibility conditions, such as fog or heavy rain. The system might
detect something ahead but be uncertain whether it's a pedestrian, an animal, or just
a shadow. This uncertainty affects the AI's ability to make confident decisions
about actions like braking or steering.
5 Explain the concept of Convolutional Neural Networks (CNNs). L2 CO [2M]
5
Convolutional Neural Networks (CNNs) are a class of deep learning models
specifically designed for processing structured grid data, such as images. They are
widely used in computer vision tasks like image classification, object detection, and
segmentation.

In an image classification task e.g., cat vs. dog detection, CNNs process the image
through layers to identify key features like edges, textures, and shapes. Based on
these features, the fully connected layer classifies the image into the correct
category.

6 Explain how Deep Reinforcement Learning is applied to master Atari games. L2 CO [2M]
6
The game agents can learn efficient action selection strategies, demonstrating
impressive capabilities in gameplay. For example, in complex mazes, game agents
can find optimal paths, avoiding traps and enemies, through exploration and trial-
and-error.

Example:- Environment and Agent:

 The Atari game acts as the environment, where the agent interacts by taking
actions (e.g., moving left, right, shooting).
 The agent receives observations (e.g., game screen pixels), performs
actions, and receives rewards (e.g., game score).

7 What is problem formulation in AI? Why is it important? L2 CO [2M]


1
In artificial intelligence (AI) and machine learning, an agent is an entity that
perceives its environment, processes information and acts upon that environment
to achieve specific goals. The process by which an agent formulates a problem is
critical, as it lays the foundation for the agent's decision-making and problem-
solving capabilities.

Problem Formulation for a Package Delivery by an Autonomous Drone


We will demonstrate how to formulate the problem of package delivery by an
autonomous drone, implementing the concepts in Python code. The drone needs
to navigate from an initial location to a customer's location while avoiding no-fly
zones and managing its battery life.

8 Differentiate between forward chaining and backward chaining. L1 CO [2M]


3
which applies logical rules to a knowledge base to deduce new information. Let’s
explore the key differences between these two strategies.
Forward Chaining
Forward Chaining is a bottom-up reasoning technique where the process starts
with known facts and applies logical rules to infer new facts, eventually leading
to a conclusion. The system moves forward, analyzing all facts and conditions to
arrive at a final decision or outcome.
Ex:-
 Fact: “He is running.”
 Rule: “If he is running, he sweats.”
 Conlusion: “He is sweating.”

 Backward Chaining works in the opposite direction by starting with


a goal or hypothesis and working backward to determine the facts that must be
true for the goal to be achieved. It is a top-down approach that evaluates
whether the goal can be reached by verifying the necessary facts and rules.
 Ex:-
 Goal: “He is sweating.”
 Rule: “If he is running, he sweats.”
 Conclusion: “He is running.”

9 Compare and contrast value iteration and policy iteration in solving decision- L1 CO [2M]
making problems under uncertainty. 4

optimal policy.
Reinforcement Learning (RL) algorithms such as value iteration and policy
iteration are fundamental techniques used to solve Markov Decision Processes
(MDPs) and derive optimal policies. While both methods aim to find the optimal
policy, they employ distinct strategies to achieve this goal. Let's delve into the
differences between value iteration and policy iteration:

Aspect Value Iteration Policy Iteration

Iteratively updates value functions Alternates between policy


Methodology until convergence evaluation and improvement

Converges to optimal value


Converges to the optimal policy
Goal function

Evaluate and improve policies


Directly computes value functions
Execution sequentially
Typically simpler to implement Involves more steps and
Complexity and understand computations

May converge faster in some Generally converges slower but


Convergence scenarios yields better policies

10 What is problem formulation in AI? Why is it important? L2 CO [2M]


1
In artificial intelligence (AI) and machine learning, an agent is an entity that
perceives its environment, processes information and acts upon that environment
to achieve specific goals. The process by which an agent formulates a problem is
critical, as it lays the foundation for the agent's decision-making and problem-
solving capabilities.

Problem Formulation for a Package Delivery by an Autonomous Drone


We will demonstrate how to formulate the problem of package delivery by an
autonomous drone, implementing the concepts in Python code. The drone needs
to navigate from an initial location to a customer's location while avoiding no-fly
zones and managing its battery life.

Part – B Max.Marks:50

ANSWER ANY FIVE QUESTIONS. EACH QUESTION CARRIES 10 MARKS.

BC CO( Marks
LL s)

11. a) Explain the classification of environments based on their nature and L3 CO [5M]
complexity. 1

An environment in artificial intelligence is the surrounding of the agent. The


agent takes input from the environment through sensors and delivers the
output to the environment through actuators. There are several types of
environments:
 Fully Observable vs Partially Observable
 Deterministic vs Stochastic
 Competitive vs Collaborative
 Single-agent vs Multi-agent
 Static vs Dynamic
 Discrete vs Continuous
 Episodic vs Sequential
 Known vs Unknown

Environment types

1. Fully Observable vs Partially Observable


 When an agent sensor is capable to sense or access the complete state
of an agent at each point in time, it is said to be a fully observable
environment else it is partially observable.
 Maintaining a fully observable environment is easy as there is no need
to keep track of the history of the surrounding.
 An environment is called unobservable when the agent has no sensors
in all environments.
 Examples:
o Chess – the board is fully observable, and so are the
opponent’s moves.
o Driving – the environment is partially observable because
what’s around the corner is not known.
2. Deterministic vs Stochastic
 When a uniqueness in the agent’s current state completely determines
the next state of the agent, the environment is said to be deterministic.
 The stochastic environment is random in nature which is not unique
and cannot be completely determined by the agent.
 Examples:
o Chess – there would be only a few possible moves for a chess
piece at the current state and these moves can be determined.
o Self-Driving Cars- the actions of a self-driving car are not
unique, it varies time to time.
3. Competitive vs Collaborative
 An agent is said to be in a competitive environment when it competes
against another agent to optimize the output.
 The game of chess is competitive as the agents compete with each
other to win the game which is the output.
 An agent is said to be in a collaborative environment when multiple
agents cooperate to produce the desired output.
 When multiple self-driving cars are found on the roads, they
cooperate with each other to avoid collisions and reach their destination
which is the output desired.
4. Single-agent vs Multi-agent
 An environment consisting of only one agent is said to be a single-
agent environment.
 A person left alone in a maze is an example of the single-agent
system.
 An environment involving more than one agent is a multi-agent
environment.
 The game of football is multi-agent as it involves 11 players in each
team.
5. Dynamic vs Static
 An environment that keeps constantly changing itself when the agent
is up with some action is said to be dynamic.
 A roller coaster ride is dynamic as it is set in motion and the
environment keeps changing every instant.
 An idle environment with no change in its state is called a static
environment.
 An empty house is static as there’s no change in the surroundings
when an agent enters.
6. Discrete vs Continuous
 If an environment consists of a finite number of actions that can be
deliberated in the environment to obtain the output, it is said to be a
discrete environment.
 The game of chess is discrete as it has only a finite number of moves.
The number of moves might vary with every game, but still, it’s finite.
 The environment in which the actions are performed cannot be
numbered i.e. is not discrete, is said to be continuous.
 Self-driving cars are an example of continuous environments as their
actions are driving, parking, etc. which cannot be numbered.
7.Episodic vs Sequential
 In an Episodic task environment, each of the agent’s actions is divided
into atomic incidents or episodes. There is no dependency between current
and previous incidents. In each incident, an agent receives input from the
environment and then performs the corresponding action.
 Example: Consider an example of Pick and Place robot, which is used
to detect defective parts from the conveyor belts. Here, every time
robot(agent) will make the decision on the current part i.e. there is no
dependency between current and previous decisions.
 In a Sequential environment, the previous decisions can affect all
future decisions. The next action of the agent depends on what action he
has taken previously and what action he is supposed to take in the future.
 Example:
o Checkers- Where the previous move can affect all the
following moves.
8. Known vs Unknown
 In a known environment, the output for all probable actions is given.
Obviously, in case of unknown environment, for an agent to make a
decision, it has to gain knowledge about how the environment works.

b) Describe the Classical Planning problem with an example scenario. L3 CO [5M]


1
Classical Planning in AI
AI Classical planning is a key area in Artificial Intelligence to find a
sequence of actions that will fulfil a specific goal from an exact beginning
point. This process creates methods and algorithms that allow smart systems
to explore systematically various actions and their outcomes which
eventually lead to the desired result occasionally from the starting place.
There are many different areas in which classical planning methods are used
such as: robotics or other industries related to manufacturing, transportation
services like supply chain management or project management. Planning
based on logical reasoning forms the basis for all current methodologies that
deal with automated decision making as well as enhancing efficiency for
resource allocation.
Importance of Classical Planning in AI
The reason why classical planning is so important is first of all its wide range
of features that can be applied to solving problems in different domains.
Secondly - its an allowance for handling complex situations in a logically
coherent manner. Through the replication of the problem area, stipulation of
initial and final states, and specifying how these states may be reached,
classical planning algorithms may search for a viable solution by
systematically going through the space of possible solutions.
Domain-Independent Planning
The essential feature of classical reasoning is domain independence. The
classical algorithms of the planning and technique are designed to apply to
different problems without having to learn domain knowledge or heuristics.
Such domain-independent planning enables the creation of general-purpose
planners that can solve problems and machines for different domains
increasing the power and versatility of classical planning.
The main objective of domain-independent or multi-domain planning is to
develop planning algorithms and systems that can reason about and solve
problems in many different domains, without having to make several
adjustments or change parameters. This is done by concentrating on the
elementary fundamentals of planning that are: state representations, action
models and a strategy of search, and not the domain-specific rules or the
heuristics.
The feature of independence concerning domains in the field of classical
planning can be elaborated by the use of domain-independent modelling
languages like PDDL (Planning Domain Definition Language) and STRIPS
(Stanford Research Institute Problem Solver).
Planning Domain Modelling Languages
The domain modelling languages are applied for depicting planning
problems. Such languages provide a form for the goal state, initial state, and
actions or operators that are permissible for transiting between the states.
1. PDDL (Planning Domain Definition Language):
PDDL (Planning Domain Description Language) is a DeFacto standard
for application of the planning domains and problems. It is a language of
extension that helps to identify objects, their properties, and the
relationships between them, as well as specify preconditions and
consequences of any actions. The development of PDDL has essentially
gone through many different versions each including more and more
features and capabilities.
2. STRIPS (Stanford Research Institute Problem Solver):
STRIPS is the earlier and notable complex domain modelling language
that served as the base for the base of many other planning languages. It
suggested the notion of a set of logical propositions for states
representation, and operators for mimicking actions that actually modify
the propositions.
Classical Planning Techniques
Classical planning stands for the assumption of a static world, where the
transition between the states is deterministic, and the observable environment
is fully observable. The purpose is to search for a series of actions (i.e., a
plan) which will take the current state and move it until the goal state is
reached, while satisfying the given conditions and limitations.
Classical planning algorithms can be broadly categorized into two main
approaches:
1. State-Space Search:
These algorithms make use of the state space by producing and
concerning the successive states recurrently until reaching a goal state.
For instance, there are breadth-first search, depth-first search and more as
A* and greedy best-first search. The part of the state-space search
collective is usually maintained with a frontier unexplored state and then
these states systematically expanded until a solution is found or the space
in the search is exhausted.
2. Plan-Space Search:
Such algorithms are applied in the operational plan space, where the
emphasis is on developing and fine-tuning the partial maps. Segments
mentioned herein include partial-order planning, hierarchical task
network planning, and search algorithms like UCPOP and VHPOP that
can search the plan space. Form-space search algorithms work by
continuously growing and changing these partial plans, using the
constraints and dealing with the inconsistencies until there is a complete
and consistent plan.

12. a) What is a Constraint Satisfaction Problem (CSP)? Explain with a suitable L4 CO [5M]
example. 2

Constraint Satisfaction Problems (CSP) play a crucial role in artificial


intelligence (AI) as they help solve various problems that require decision-
making under certain constraints. CSPs represent a class of problems where
the goal is to find a solution that satisfies a set of constraints. These problems
are commonly encountered in fields like scheduling, planning, resource
allocation, and configuration.

Components of Constraint Satisfaction Problems


CSPs are composed of three key elements:
1. Variables: The things that need to be determined are variables.
Variables in a CSP are the objects that must have values assigned to them
in order to satisfy a particular set of constraints. Boolean, integer, and
categorical variables are just a few examples of the various types of
variables, for instance, could stand in for the many puzzle cells that need
to be filled with numbers in a sudoku puzzle.
2. Domains: The range of potential values that a variable can have is
represented by domains. Depending on the issue, a domain may be finite
or limitless. For instance, in Sudoku, the set of numbers from 1 to 9 can
serve as the domain of a variable representing a problem cell.
3. Constraints: The guidelines that control how variables relate to one
another are known as constraints. Constraints in a CSP define the ranges
of possible values for variables. Unary constraints, binary constraints, and
higher-order constraints are only a few examples of the various sorts of
constraints. For instance, in a sudoku problem, the restrictions might be
that each row, column, and 3×3 box can only have one instance of each
number from 1 to 9.
CSP Algorithms: Solving Constraint Satisfaction Problems Efficiently
Constraint Satisfaction Problems (CSPs) rely on various algorithms to
explore and optimize the search space, ensuring that solutions meet the
specified constraints. Here’s a breakdown of the most commonly used CSP
algorithms:
1. Backtracking Algorithm
The backtracking algorithm is a depth-first search method used to
systematically explore possible solutions in CSPs. It operates by assigning
values to variables and backtracks if any assignment violates a constraint.
How it works:
 The algorithm selects a variable and assigns it a value.
 It recursively assigns values to subsequent variables.
 If a conflict arises (i.e., a variable cannot be assigned a valid value),
the algorithm backtracks to the previous variable and tries a different
value.
 The process continues until either a valid solution is found or all
possibilities have been exhausted.
This method is widely used due to its simplicity but can be inefficient for
large problems with many variables.
2. Forward-Checking Algorithm
The forward-checking algorithm is an enhancement of the backtracking
algorithm that aims to reduce the search space by applying local
consistency checks.
How it works:
 For each unassigned variable, the algorithm keeps track of remaining
valid values.
 Once a variable is assigned a value, local constraints are applied to
neighboring variables, eliminating inconsistent values from their
domains.
 If a neighbor has no valid values left after forward-checking, the
algorithm backtracks.
This method is more efficient than pure backtracking because it prevents
some conflicts before they happen, reducing unnecessary computations.
3. Constraint Propagation Algorithms
Constraint propagation algorithms further reduce the search space by
enforcing local consistency across all variables.
How it works:
 Constraints are propagated between related variables.
 Inconsistent values are eliminated from variable domains by
leveraging information gained from other variables.
 These algorithms refine the search space by making inferences,
removing values that would lead to conflicts.

b) Explain the optimal decision-making process in multiplayer games. L5 CO [5M]


2
Optimal Decision Making in Multiplayer Games

Optimal Decision Making in Games

Let us start with games with two players, whom we’ll refer to as MAX and
MIN for obvious reasons. MAX is the first to move, and then they take turns
until the game is finished. At the conclusion of the game, the victorious
player receives points, while the loser receives penalties. A game can be
formalized as a type of search problem that has the following elements:
 S0: The initial state of the game, which describes how it is set up at
the start.
 Player (s): Defines which player in a state has the move.
 Actions (s): Returns a state’s set of legal moves.
 Result (s, a): A transition model that defines a move’s outcome.
 Terminal-Test (s): A terminal test that returns true if the game is over
but false otherwise. Terminal states are those in which the game has come
to a conclusion.
 Utility (s, p): A utility function (also known as a payout function or
objective function ) determines the final numeric value for a game that
concludes in the terminal state s for player p. The result in chess is a win,
a loss, or a draw, with values of +1, 0, or 1/2. Backgammon’s payoffs
range from 0 to +192, but certain games have a greater range of possible
outcomes. A zero-sum game is defined (confusingly) as one in which the
total reward to all players is the same for each game instance. Chess is a
zero-sum game because each game has a payoff of 0 + 1, 1 + 0, or 1/2 +
1/2. “Constant-sum” would have been a preferable name, 22 but zero-sum
is the usual term and makes sense if each participant is charged 1.
The game tree for tic-tac-toe is relatively short, with just 9! = 362,880
terminal nodes. However, because there are over 1040 nodes in chess, the
game tree is better viewed as a theoretical construct that cannot be realized in
the actual world. But, no matter how big the game tree is, MAX’s goal is to
find a solid move. A tree that is superimposed on the whole game tree and
examines enough nodes to allow a player to identify what move to make is
referred to as a search tree.

A sequence of actions leading to a goal state—a terminal state that is a win—


would be the best solution in a typical search problem. MIN has something to
say about it in an adversarial search. MAX must therefore devise a contingent
strategy that specifies M A X’s initial state move, then MAX’s movements in
the states resulting from every conceivable MIN response, then MAX’s
moves in the states resulting from every possible MIN reaction to those
moves, and so on. This is quite similar to the AND-OR search method, with
MAX acting as OR and MIN acting as AND. When playing an infallible
opponent, an optimal strategy produces results that are as least as excellent as
any other plan. We’ll start by demonstrating how to find the best plan.

13. a) Explain the concept of logical agents and how they are used in AI. L5 CO [5M]
3

b) Describe the process of inference in first-order logic and give an example. L3 CO [5M]
3

14. a) What are utility functions, and how do they help in decision-making under L4 CO [5M]
uncertainty? Provide an example of their use. 4

b) Describe the process of value iteration and explain how it is used to find L3 CO [5M]
optimal policies in Markov Decision Processes (MDP). 4

15. a) Explain the architecture of Convolutional Neural Networks (CNNs) in detail, L5 CO [5M]
including the roles of convolutional layers, activation functions, and pooling 5
layers.

CNN Architecture:-

Convolutional Neural Network consists of multiple layers like the input


layer, Convolutional layer, Pooling layer, and fully connected layers.
Simple CNN architecture

The Convolutional layer applies filters to the input image to extract features,
the Pooling layer downsamples the image to reduce computation, and the
fully connected layer makes the final prediction. The network learns the
optimal filters through backpropagation and gradient descent.
How Convolutional Layers Works?
Convolution Neural Networks or covnets are neural networks that share their
parameters. Imagine you have an image. It can be represented as a cuboid
having its length, width (dimension of the image), and height (i.e the channel
as images generally have red, green, and blue channels).

Now imagine taking a small patch of this image and running a small neural
network, called a filter or kernel on it, with say, K outputs and representing
them vertically. Now slide that neural network across the whole image, as a
result, we will get another image with different widths, heights, and depths.
Instead of just R, G, and B channels now we have more channels but lesser
width and height. This operation is called Convolution. If the patch size is the
same as that of the image it will be a regular neural network. Because of this
small patch, we have fewer weights.
Image source: Deep Learning Udacity

Mathematical Overview of Convolution


Now let’s talk about a bit of mathematics that is involved in the whole
convolution process.
 Convolution layers consist of a set of learnable filters (or kernels)
having small widths and heights and the same depth as that of input
volume (3 if the input layer is image input).
 For example, if we have to run convolution on an image with
dimensions 34x34x3. The possible size of filters can be axax3, where ‘a’
can be anything like 3, 5, or 7 but smaller as compared to the image
dimension.
 During the forward pass, we slide each filter across the whole input
volume step by step where each step is called stride (which can have a
value of 2, 3, or even 4 for high-dimensional images) and compute the dot
product between the kernel weights and patch from input volume.
 As we slide our filters we’ll get a 2-D output for each filter and we’ll
stack them together as a result, we’ll get output volume having a depth
equal to the number of filters. The network will learn all the filters.
Layers Used to Build ConvNets
A complete Convolution Neural Networks architecture is also known as
covnets. A covnets is a sequence of layers, and every layer transforms one
volume to another through a differentiable function.
Types of layers: datasets
Let’s take an example by running a covnets on of image of dimension 32 x
32 x 3.
 Input Layers: It’s the layer in which we give input to our model. In
CNN, Generally, the input will be an image or a sequence of images. This
layer holds the raw input of the image with width 32, height 32, and depth
3.
 Convolutional Layers: This is the layer, which is used to extract the
feature from the input dataset. It applies a set of learnable filters known as
the kernels to the input images. The filters/kernels are smaller matrices
usually 2×2, 3×3, or 5×5 shape. it slides over the input image data and
computes the dot product between kernel weight and the corresponding
input image patch. The output of this layer is referred as feature maps.
Suppose we use a total of 12 filters for this layer we’ll get an output
volume of dimension 32 x 32 x 12.
 Activation Layer: By adding an activation function to the output of
the preceding layer, activation layers add nonlinearity to the network. it
will apply an element-wise activation function to the output of the
convolution layer. Some common activation functions are RELU: max(0,
x), Tanh, Leaky RELU, etc. The volume remains unchanged hence output
volume will have dimensions 32 x 32 x 12.
 Pooling layer: This layer is periodically inserted in the covnets and its
main function is to reduce the size of volume which makes the
computation fast reduces memory and also prevents overfitting. Two
common types of pooling layers are max pooling and average pooling. If
we use a max pool with 2 x 2 filters and stride 2, the resultant volume will
be of dimension 16x16x12.

Image source: cs231n.stanford.edu

 Flattening: The resulting feature maps are flattened into a one-


dimensional vector after the convolution and pooling layers so they can be
passed into a completely linked layer for categorization or regression.
 Fully Connected Layers: It takes the input from the previous layer and
computes the final classification or regression task.
Image source: cs231n.stanford.edu

 Output Layer: The output from the fully connected layers is then fed
into a logistic function for classification tasks like sigmoid or softmax
which converts the output of each class into the probability score of each
class.
Advantages of CNNs:
 Good at detecting patterns and features in images, videos, and audio
signals.
 Robust to translation, rotation, and scaling invariance.
 End-to-end training, no need for manual feature extraction.
 Can handle large amounts of data and achieve high accuracy.
Disadvantages of CNNs:
 Computationally expensive to train and require a lot of memory.
 Can be prone to overfitting if not enough data or proper regularization
is used.
 Requires large amounts of labeled data.
 Interpretability is limited, it’s hard to understand what the network
has learned.
b) Discuss the shortcomings of traditional feature selection techniques and L4 CO [5M]
explain how Convolutional Neural Networks overcome these challenges. 5

Traditional Feature Selection Techniques

Feature selection is the process of selecting a subset of the most relevant


features from the data to improve model performance, reduce overfitting, and
lower computational cost. Traditional feature selection methods can be broadly
categorized as follows:

1. Feature Extraction with Convolutions

 CNNs use convolutional layers to automatically learn spatially


meaningful patterns (features) from the input data.
 During training, filters (kernels) in convolutional layers detect local
patterns, such as edges, textures, or more complex structures,
depending on the layer's depth.
 Filters act as feature selectors by emphasizing important patterns and
ignoring irrelevant information.

2. Hierarchical Feature Learning

 CNNs learn features in a hierarchical manner:


o Lower layers extract simple features like edges and textures.
o Middle layers detect more complex features like shapes or parts
of objects.
o Higher layers focus on task-specific features like object classes.
 This automatic feature learning reduces the need for manual feature
engineering.

3. Pooling Layers for Dimensionality Reduction

 Pooling layers (e.g., max pooling, average pooling) reduce the


dimensionality of feature maps while retaining the most significant
features.
 By summarizing regions of the feature map, pooling eliminates
redundant or irrelevant information, acting as a form of feature
selection.

4. Regularization and Sparsity

 Techniques like dropout and weight regularization (L2) encourage the


network to focus on the most important features, effectively performing
implicit feature selection by preventing overfitting.
5. Attention Mechanisms (Advanced CNNs)

 In modern CNN architectures, attention mechanisms (e.g., SE-Net or


Transformer blocks) emphasize important regions of an image or
feature map while suppressing less relevant areas, refining the feature
selection process.

Challenges:-

Hardware Solutions:

 Utilize GPUs, TPUs, or cloud-based infrastructure for faster training


and inference.

Optimized Architectures:

 Employ efficient architectures like MobileNet, EfficientNet, or


NASNet.

Transfer Learning:

 Use pre-trained models to reduce data and computational requirements.

Regularization:

 Techniques like dropout, L2 regularization, and data augmentation


reduce overfitting.

Explainability:

 Apply interpretability methods like Grad-CAM or SHAP to make


CNNs more transparent.

Model Compression:

 Use pruning, quantization, and knowledge distillation for lightweight


deployment.

Comparison of Traditional Feature Selectio


16. a) What is a Deep Q-Network (DQN)? Explain how DQNs address the challenges L4 CO [5M]
of traditional Q-learning by using deep neural networks to approximate the Q- 6
function.

Deep Q Network” refers to a type of artificial intelligence algorithm that uses


both deep learning and reinforcement learning techniques to make decisions
and solve complex problems. This type of AI is commonly used in applications
such as robotics, gaming, and autonomous vehicles, where the AI needs to
learn how to navigate and make decisions in a simulated or real-world
environment.
For business people, understanding Deep Q Network is important because it
represents the cutting-edge of AI technology. By utilizing deep learning and
reinforcement learning, a DQN can learn from its environment and make
decisions in a way that mimics human cognition.
This means that businesses can use DQNs to optimize processes, make
complex decisions, and even automate tasks that previously required human
intervention. As AI continues to advance, understanding and utilizing DQNs
can give businesses a competitive edge in a wide range of industries.

1. The Challenge of Large State Spaces

Traditional Q-learning suffers when the state space is large because it requires
maintaining a Q-table that maps every state-action pair to a value. For large or
continuous state spaces, this becomes impractical due to:

 Memory limitations: Storing a large Q-table becomes infeasible.


 Exploration difficulties: It's hard to explore all possible states and state-
action pairs.

DQN Solution: Deep Neural Networks for Function Approximation

 Deep Q-Networks use deep neural networks to approximate the Q-


function, which maps states (or state-action pairs) to Q-values
(expected rewards). Instead of maintaining a Q-table, DQNs use a
neural network to generalize across states, enabling the model to handle
large, continuous, and high-dimensional state spaces (e.g., images).
o Neural Network: The network takes a state (or state-action pair)
as input and outputs a Q-value for each possible action.
o Function Approximation: The Q-function is approximated by
the output of the neural network, which learns the optimal Q-
values during training. This allows the model to generalize from
past experiences and make predictions about new, unseen states.

2. The Challenge of Overestimation Bias (and Instability)

Traditional Q-learning, when using a table to store Q-values, faces


overestimation bias in the Q-value updates, especially when the action-value
function is being updated using the greedy policy (i.e., always choosing the
action with the highest Q-value). This can lead to instability and poor
performance.

DQN Solution: Target Networks

 Target Networks: DQNs address this issue by maintaining two separate


networks: a main Q-network and a target Q-network.
o The target network is a copy of the Q-network that is
periodically updated with the parameters of the main Q-
network. This helps mitigate the instability caused by frequent
updates to the Q-values.
o By using a target network for calculating the Q-value targets
during training (instead of using the current Q-network), DQNs
prevent the target values from fluctuating too rapidly, leading to
more stable training.

3. The Challenge of Temporal Correlation in Experience

In traditional Q-learning, experience is often assumed to be independent,


which is unrealistic in many RL settings, especially when using high-
dimensional inputs (like images). When using Q-learning with a table,
consecutive experiences are correlated (the agent takes sequential actions in
the same environment). This correlation can cause instability during training
due to non-independence of samples.

DQN Solution: Experience Replay

 Experience Replay: To address this, DQNs store previous experiences


(state, action, reward, next state) in a replay buffer and sample random
mini-batches from this buffer during training. This breaks the temporal
correlation between consecutive experiences and ensures that the model
is trained on more diverse, uncorrelated experiences.
o This random sampling allows the network to learn from a
broader set of experiences and improves training stability.

4. The Challenge of Slow Learning and High Variance

Traditional Q-learning can learn slowly, especially when the environment is


complex, because it updates Q-values using a single-step lookahead and may
have high variance in updates when using greedy policies.

DQN Solution: Double Q-Learning and Improved Target Calculation

 Double Q-Learning (not initially part of DQN but used in later


advancements) helps reduce overestimation bias by decoupling the
action selection and Q-value estimation processes:
o One Q-network is used to select actions, and the other is used to
compute the Q-value for those actions. This reduces bias in the
Q-value estimates.
o Improved Target Calculation: During training, DQNs calculate
target Q-values using the Bellman equation, where:

y=r+γ⋅max⁡aQ′(s′,a)y = r + \gamma \cdot \max_a Q'(s',


a)y=r+γ⋅amaxQ′(s′,a)

Here, Q′Q'Q′ is the target network, and max⁡aQ′(s′,a)\max_a


Q'(s', a)maxaQ′(s′,a) is used as the target for the next state. By
using the target network for the Q-value estimation, DQNs
reduce high variance and improve the stability of training.

4. The Challenge of Slow Learning and High Variance

Traditional Q-learning can learn slowly, especially when the environment is


complex, because it updates Q-values using a single-step lookahead and may
have high variance in updates when using greedy policies.

DQN Solution: Double Q-Learning and Improved Target Calculation

 Double Q-Learning (not initially part of DQN but used in later


advancements) helps reduce overestimation bias by decoupling the
action selection and Q-value estimation processes:
o One Q-network is used to select actions, and the other is used to
compute the Q-value for those actions. This reduces bias in the
Q-value estimates.
o Improved Target Calculation: During training, DQNs calculate
target Q-values using the Bellman equation, where:

y=r+γ⋅max⁡aQ′(s′,a)y = r + \gamma \cdot \max_a Q'(s',


a)y=r+γ⋅amaxQ′(s′,a)

Here, Q′Q'Q′ is the target network, and max⁡aQ′(s′,a)\max_a


Q'(s', a)maxaQ′(s′,a) is used as the target for the next state. By
using the target network for the Q-value estimation, DQNs
reduce high variance and improve the stability of training.

5. The Challenge of Sparse Rewards

In many reinforcement learning environments, rewards are sparse or delayed,


making it difficult for an agent to associate actions with outcomes (e.g., in
Atari games). Traditional Q-learning struggles with these delayed or sparse
rewards because it updates Q-values based on immediate feedback.

DQN Solution: Neural Network Representation for Better Generalization

 Neural Network Representation: The neural network in DQNs helps


generalize the Q-function over the entire state space, allowing the agent
to learn more efficiently from sparse or delayed rewards. The network
learns complex patterns in the data, allowing the agent to associate
actions with long-term outcomes, even if those outcomes are not
immediately visible.

6. The Challenge of Exploration vs. Exploitation

Traditional Q-learning faces the exploration-exploitation tradeoff: it needs to


explore new actions to learn better policies, but excessive exploration can
delay convergence. A simple epsilon-greedy policy is typically used, where the
agent explores randomly with probability epsilon.

DQN Solution: Exploration with Decaying Epsilon and Exploration Strategies

 Decaying Epsilon: In DQN, the epsilon in the epsilon-greedy policy


(which controls exploration) is typically decayed over time, starting
with a high exploration rate and gradually reducing it as the agent
learns and exploits its knowledge.
o As the agent gains more knowledge, it can gradually shift
towards exploitation (choosing the action with the highest Q-
value). This balances exploration and exploitation, especially
during the early stages of training when the agent knows little
about the environment

b) Discuss the role of Policy Gradient methods in reinforcement learning. How do L4 CO [5M]
these methods directly optimize the policy without the need for a value 6
function?

Policy Gradient methods directly optimize the policy without needing to


explicitly compute or rely on a value function (though some variations like
Actor-Critic methods do use a value function for additional guidance). Here's a
breakdown of how the optimization process works:

1. Defining the Policy

In Policy Gradient methods, the policy πθ(s)\pi_{\theta}(s)πθ(s) is typically


parameterized by θ\thetaθ, where θ\thetaθ represents the parameters of the
policy (often the weights of a neural network).

 Policy Representation:
o The policy πθ(s)\pi_{\theta}(s)πθ(s) is a probability distribution
over actions given the state sss. For discrete action spaces, this
could be a softmax over possible actions, while for continuous
action spaces, it could be a Gaussian distribution parameterized
by the mean and standard deviation.

2. Objective: Maximize Expected Reward

The goal of Policy Gradient methods is to maximize the expected cumulative


reward over time. The objective function J(θ)J(\theta)J(θ) to be maximized is
the expected return under the policy:

J(θ)=Eπθ[R]J(\theta) = \mathbb{E}_{\pi_{\theta}} \left[ R \right]J(θ)=Eπθ[R]

Where RRR is the total return or cumulative reward received by the agent over
an episode, and Eπθ\mathbb{E}_{\pi_{\theta}}Eπθ represents the expectation
under the current policy πθ\pi_{\theta}πθ.

3. Computing the Gradient of the Objective

To improve the policy, we need to adjust its parameters θ\thetaθ. The core idea
behind Policy Gradient methods is to compute the gradient of the expected
return with respect to the policy parameters θ\thetaθ and update θ\thetaθ in the
direction that maximizes the return.

Using the policy gradient theorem, the gradient of J(θ)J(\theta)J(θ) with respect
to θ\thetaθ can be approximated as:

∇θJ(θ)=Eπθ[∇θlog⁡πθ(s,a)Q(s,a)]\nabla_{\theta} J(\theta) =
\mathbb{E}_{\pi_{\theta}} \left[ \nabla_{\theta} \log \pi_{\theta}(s, a) Q(s, a)
\right]∇θJ(θ)=Eπθ[∇θlogπθ(s,a)Q(s,a)]

Where:

 πθ(s,a)\pi_{\theta}(s, a)πθ(s,a) is the probability of taking action aaa in


state sss under the policy πθ\pi_{\theta}πθ.
 Q(s,a)Q(s, a)Q(s,a) is the action-value function, representing the
expected cumulative reward from taking action aaa in state sss.

The gradient ∇θlog⁡πθ(s,a)\nabla_{\theta} \log \pi_{\theta}(s, a)∇θlogπθ(s,a)


is called the log-likelihood of the policy. This term tells us how sensitive the
action probabilities are to changes in the policy parameters, which is important
for learning which actions are beneficial in a given state.

By updating the policy parameters in the direction of this gradient, we are


effectively improving the policy, making the agent more likely to choose
actions that lead to higher rewards.

4. Policy Update

The parameters θ\thetaθ are updated using a learning rate α\alphaα, typically in
the following way:

θ←θ+α⋅∇θJ(θ)\theta \leftarrow \theta + \alpha \cdot \nabla_{\theta}


J(\theta)θ←θ+α⋅∇θJ(θ)

This update increases the probability of selecting actions that lead to higher
returns and decreases the probability of selecting actions that lead to lower
returns.

17. a) Compare and contrast goal-based agents and utility-based agents.. L4 CO [4M]
1
Goal-based Agents:
 Purpose: These agents operate by achieving specific goals. They select
actions that they believe will bring them closer to their predefined
goals.
 Decision Process: The agent compares its current state to the goal state
and plans a sequence of actions to reach that goal. It can only assess
whether it has achieved its goal or not.
 Evaluation: A goal-based agent's performance is often binary—it either
achieves the goal or it does not. There is no nuance in terms of partial
success or failure.
 Example: A robot whose goal is to reach a particular location. It will
take actions to move towards that location without considering how
"good" the path or actions are, just whether the goal is achieved.

Utility-based Agents:

 Purpose: These agents are designed to maximize some notion of


"utility," meaning they select actions that maximize their satisfaction or
well-being according to a utility function.
 Decision Process: A utility-based agent evaluates actions based on how
much they increase its overall utility, even if that doesn't lead directly
to a specific goal. It may choose actions that are suboptimal in terms of
immediate goal achievement if they provide greater long-term utility.
 Evaluation: Unlike goal-based agents, utility-based agents can assess
varying degrees of success. They can consider partial achievements and
balance competing desires or goals.
 Example: An autonomous car deciding on a route to its destination. If it
faces a choice between two routes, it might choose the one with the
least traffic, even if it takes a bit longer, because it maximizes comfort
(utility).

b) Mention one example of a case study involving a game, such as Tic-tac-toe. L3 CO [3M]
2
AI Approach:

 Search Algorithms: Deep Blue used brute-force search (checking as


many potential future moves as possible) and evaluation functions to
determine the best move at any given point. It evaluated around 200
million positions per second during its games.
 Minimax Algorithm: The system used a form of the Minimax algorithm
combined with Alpha-Beta pruning to explore the game tree and
eliminate less promising moves, allowing it to focus computational
resources on the most strategic positions.
 Opening Book: It also incorporated a database of expert human-
generated opening moves to give it a strong start in the game.
 Endgame Databases: Deep Blue had access to tables of precomputed
endgame solutions, where it could guarantee a win or draw from any
position with a small number of pieces left on the board.

c) Define logical agents and explain their role in artificial intelligence. L4 CO [3M]
3
“Logical AI: The idea is that an agent can represent knowledge of its world, its
goals and the current situation by sentences in logic and decide what to do by
inferring that a certain action or course of action is appropriate to achieve its
goals.”

Role of Logical Agents in Artificial Intelligence:

Reasoning and Problem Solving:

Logical agents use formal logic (such as propositional logic, first-order logic,
etc.) to reason about the world. This reasoning capability is essential in AI for
problem-solving, decision-making, and even understanding the environment.

Knowledge Representation:

Logical agents are key to representing knowledge in a formal, structured way


that machines can process. In AI, knowledge representation refers to how
knowledge about the world is encoded into the system so that it can be
reasoned about and manipulated.

Automation of Decision-Making:

Logical agents automate decision-making by evaluating possible actions based


on logical reasoning. They consider the available options and select the best
action that aligns with predefined goals, rules, and constraints.

Consistency and Coherence:

One of the essential roles of logical agents is ensuring that the system's
knowledge remains consistent. They maintain logical coherence by applying
formal rules and eliminating contradictions.

Problem-Solving in Complex Domains:

Logical agents can be applied to complex domains such as automated


planning, robotics, medical diagnosis, and legal reasoning. In these cases,
logical agents help the system reason about different possible actions or
scenarios, choose the most appropriate solution, and plan the steps needed to
achieve a goal.

18. a) Explain Bayes' Rule and discuss its application in reasoning under uncertainty. L5 CO [4M]
4
Mathematical Derivation of Bayes' Rule:-

Bayes' Rule is derived from the definition of conditional probability. Let's start
with the definition:
P(A∣B)=P(A∩B)P(B)P(A∣B)=P(B)P(A∩B)
This equation states that the probability of event AA given event BB is equal to
the probability of both events happening (the intersection of AA and BB)
divided by the probability of event BB.
Similarly, we can write the conditional probability of event B given event A:
P(B∣A)=P(A∩B)P(A)P(B∣A)=P(A)P(A∩B)
By rearranging this equation, we get:
P(A∩B)=P(B∣A)⋅P(A)P(A∩B)=P(B∣A)⋅P(A)
Now, we have two expressions for P(A∩B)P(A∩B), since both expressions are
equal to P(A∩B)P(A∩B), we can set them equal to each other:
P(A∣B)⋅P(B)=P(B∣A)⋅P(A)P(A∣B)⋅P(B)=P(B∣A)⋅P(A)
To get P(A∣B)P(A∣B), we divide both sides by P(B)P(B):

 P(A∣B)=P(B)P(B∣A)⋅P(A)P(A∣B)=P(B∣A)⋅P(A)P(B)

b) Discuss how neurons in human vision inspire the design of Convolutional CO [3M]
Neural Networks. 5

Human Vision Work in neurons:

Receptive Fields: Individual neurons in the visual cortex respond to stimuli


in a specific area of the visual field, called the receptive field. These neurons
become more specialized as we move up through the visual processing
hierarchy.
Simple Cells: Early-stage neurons in the visual cortex (simple cells) are
sensitive to basic features such as edges, orientations, and colors. These
neurons respond strongly to specific orientations or edges in a particular part of
the visual field.
Complex Cells: Higher-level neurons (complex cells) aggregate information
from simple cells in a larger receptive field, recognizing patterns such as
movement or more complex shapes.
Hierarchical Processing: As information flows through the visual cortex, it
gets processed at multiple levels. Lower layers focus on simple features (like
edges), while higher layers integrate these features into more complex
representations (like faces or objects).

c) What is a Markov Decision Process (MDP)? Discuss the key components of L4 CO [3M]
MDPs used in reinforcement learning. 6

A Markov Decision Process is a mathematical framework used to describe an


environment in decision-making scenarios where outcomes are partly random
and partly under the control of a decision-maker. MDPs provide a formalism
for modeling decision-making in situations where outcomes are uncertain,
making them essential for Reinforcement Learning.

Components of an MDP
An MDP is defined by a tuple (S,A,P,R,γ)(S,A,P,R,γ) where:
 States (S): A finite set of states representing all possible situations in
which the agent can find itself. Each state encapsulates all the relevant
information needed to make a decision.
 Actions (A): A finite set of actions available to the agent. At each
state, the agent can choose from these actions to influence its environment.
 Transition Probability (P): A probability function P(s′∣s,a) that defines
the probability of transitioning from state s to state s′ after taking action a.
This encapsulates the dynamics of the environment.
 Reward Function (R): A reward function R(s,a,s′) that provides a
scalar reward received after transitioning from state s to state s′ due to
action a. This reward guides the agent towards desirable outcomes.
 Discount Factor (γ): A discount factor γ∈[0,1) that determines the
importance of future rewards. A discount factor close to 1 makes the agent
prioritize long-term rewards, while a factor close to 0 makes it focus on
immediate rewards.

-- 00 -- 00 –

You might also like