0% found this document useful (0 votes)
13 views13 pages

April May 2023

The document provides an answer key for the B.E/B.Tech degree examinations in Artificial Intelligence and Machine Learning, detailing various concepts such as artificial intelligence, machine learning, search strategies, probability, Bayesian networks, logistic regression, classification trees, and ensemble methods like bagging and boosting. It includes definitions, outlines, examples, and processes for each topic, serving as a comprehensive guide for students. The content is structured in a question-answer format, covering both theoretical and practical aspects of the subject matter.

Uploaded by

abernakumari87
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views13 pages

April May 2023

The document provides an answer key for the B.E/B.Tech degree examinations in Artificial Intelligence and Machine Learning, detailing various concepts such as artificial intelligence, machine learning, search strategies, probability, Bayesian networks, logistic regression, classification trees, and ensemble methods like bagging and boosting. It includes definitions, outlines, examples, and processes for each topic, serving as a comprehensive guide for students. The content is structured in a question-answer format, covering both theoretical and practical aspects of the subject matter.

Uploaded by

abernakumari87
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

B.E/ B.TECH. DEGREE EXAMINATIONS, APRIL/MAY 2023


CS3491- ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
ANSWER KEY
PART A
1 Define artificial intelligence.
Artificial intelligence (AI) is the simulation of human intelligence processes by machines, especially
computer systems.

2 What is informed search?


Informed search uses problem-specific knowledge beyond the definition of the problem itself to find
solutions more efficiently.

3 Define uncertainty.
Uncertainty refers to the lack of complete certainty, or the existence of multiple possible outcomes,
making it difficult to predict a specific outcome.

4 State Bayes rule.


Bayes' rule states: P(A|B) = [P(B|A) * P(A)] / P(B), where P(A|B) is the conditional probability of A
given B.

5 Outline the difference between supervised learning and unsupervised learning.


Supervised learning uses labeled data to train a model, while unsupervised learning uses unlabeled
data to find patterns.

6 What is a heuristic?
A heuristic is a technique designed for solving a problem more quickly when classic methods are too
slow, or for finding an approximate solution when classic methods fail to find any exact solution.

7 Define machine learning


Machine learning is a subset of AI that allows systems to learn from data, identify patterns, and
make decisions with minimal human intervention.

8 What is the significance of Gaussian mixture models?


Gaussian mixture models are used for clustering and density estimation, allowing the representation
of complex data distributions as a combination of simpler Gaussian distributions.

9 Draw the architecture of a multilayer perceptron.


A multilayer perceptron consists of an input layer, one or more hidden layers, and an output layer,
with each layer fully connected to the next.

10 Name any two activation functions.


Sigmoid and ReLU (Rectified Linear Unit) are two common activation functions.
PART B
11. (a) Outline the uniformed search strategies like breadth-first search and depth-first search with
examples.
 Breadth-First Search (BFS):
o Outline:

 BFS systematically explores a graph or tree by visiting all the neighbor nodes at the current
depth level before moving to the nodes at the next depth level.
 It uses a queue data structure to maintain the order of visited nodes.
 It starts at the root node and expands all the successor nodes at the current level before
moving to the next level.
 BFS is complete (if a solution exists, it will find it) and optimal (it finds the shortest path) in
unweighted graphs.
o Example:

 Finding the shortest path between two cities on a map where all roads have equal length.
 Web crawlers use BFS to explore all links on a webpage.
 In a social network, finding all friends of a person within a certain degree of separation.
 Depth-First Search (DFS):
o Outline:

 DFS explores as far as possible along each branch before backtracking.


 It uses a stack data structure or recursion to implement the search.
 It starts at the root node and explores one branch to its deepest point before backtracking and
exploring other branches.
 DFS is memory-efficient but not guaranteed to find the shortest path. It may get stuck in
infinite loops if not implemented carefully.
o Example:

 Solving mazes, where you follow one path until you hit a dead end and then backtrack.
 Finding a path in a decision tree.
 Detecting cycles in a graph.
11. (b) State the constraint satisfaction problem. Outline local search for constraint satisfaction
problem with an example.
 Constraint Satisfaction Problem (CSP):
o State:
 A CSP is a problem that involves finding a set of values for a set of variables that
satisfy a set of constraints.
 It consists of:
 Variables: A set of variables {X1, X2, ..., Xn}.
 Domains: A set of possible values for each variable {D1, D2, ..., Dn}.
 Constraints: A set of rules that specify which combinations of values are
allowed.
 Local Search for CSP:
o Outline:

 Local search starts with an initial assignment of values to variables and iteratively
improves the assignment by changing the value of one variable at a time.
 The goal is to minimize the number of violated constraints.
 Algorithms like min-conflicts are used, where the variable with the most conflicts is
chosen, and its value is changed to minimize the number of conflicts.
 Local search is not guaranteed to find the optimal solution but can find good solutions
in a reasonable amount of time.
o Example:

 The N-Queens problem: Placing N queens on an NxN chessboard such that no two
queens threaten each other.
 Sudoku puzzles: Filling in a 9x9 grid with digits such that each row, column, and 3x3
subgrid contains the digits 1 to 9 without repetition.
 Scheduling problems: Assigning tasks to resources while satisfying constraints on
resource availability and task dependencies.
12. (a) (i) Elaborate on unconditional probability and conditional probability with an example.
 Unconditional Probability (Marginal Probability):
o Elaboration:

 The probability of an event occurring without any prior knowledge or conditions.


 It's denoted as P(A), where A is an event.
 It represents the likelihood of an event occurring in a sample space.
 It can be calculated by dividing the number of favorable outcomes by the total number
of possible outcomes.
o Example:

 The probability of drawing a king from a standard deck of 52 cards is 4/52 (or 1/13).
 The probability of flipping a fair coin and getting heads is 1/2.
 Conditional Probability:
o Elaboration:

 The probability of an event occurring given that another event has already occurred.
 It's denoted as P(A|B), the probability of1 event A given event B.
 It represents the likelihood of A occurring within the subset of outcomes where B has
occurred.
 It can be calculated using the formula: P(A|B) = P(A and B) / P(B).
o Example:

 The probability of drawing a king from a deck of cards (event A) given that the card is
a face card (event B) is 4/12 (or 1/3).
 The probability that it will rain (event A) given that it is cloudy (event B).
12. (a) (ii) What is a Bayesian network? Explain the steps followed to construct a Bayesian network
with an example.
 Bayesian Network:
o Definition:

 A probabilistic graphical model that represents a set of random variables and their
conditional dependencies via a directed acyclic graph2 (DAG).
 Nodes represent random variables, and edges represent conditional dependencies.
 Each node has a conditional probability table (CPT) that specifies the probability
distribution of the variable given its parents.
 Bayesian networks are used for reasoning under uncertainty and for modeling
complex systems.
 Steps to Construct:
1. Identify Variables:
 Determine the relevant variables for the problem.
 Define the domain of each variable (the set of possible values).
2. Determine Dependencies:
 Identify the direct dependencies between variables based on domain knowledge or
data.
 Use a causal perspective to determine the direction of dependencies.
3. Construct DAG:
 Draw a DAG where nodes represent variables and edges represent dependencies.
 Ensure that the graph is acyclic (no directed cycles).
4. Specify Conditional Probability Tables (CPTs):
 For each node, define the conditional probability of that node given its parents.
 The CPT specifies the probability of each value of the node for each combination of
values of its parents.
 Example:
o A simple Bayesian network for "Student Grades":

 Variables: Difficulty of Course (D), Student Intelligence (I), Grade (G), SAT Score
(S), Letter of Recommendation (L).
 Dependencies: D -> G, I -> G, I -> S, G -> L.
 DAG: Draw a directed graph with nodes D, I, G, S, L and edges D->G, I->G, I->S, G-
>L.
 CPTs: Define the conditional probabilities P(D), P(I), P(G|D,I), P(S|I), P(L|G).

12. (b) What do you mean by inference in Bayesian networks? Outline inference by enumeration with
an example.
 Inference in Bayesian Networks:
o In Bayesian networks, inference refers to the process of calculating the probability distribution
of one or more variables (query variables) given evidence about other variables (evidence
variables). Essentially, it's about using the network to answer probabilistic queries.
o The goal is to determine the posterior probability distribution of the query variables, which
represents our updated belief about those variables after considering the evidence.
o Inference allows us to reason under uncertainty by leveraging the probabilistic relationships
encoded in the network.
 Inference by Enumeration:
o Inference by enumeration is a straightforward, though computationally expensive, method for
calculating posterior probabilities.
o It involves summing over all possible combinations of values for the hidden variables
(variables that are neither query nor evidence variables).
o The process:

1. Joint Probability: Express the query in terms of the joint probability distribution of
all variables in the network.
2. Summation: Sum (or marginalize) out the hidden variables from the joint probability
distribution.
3. Normalization: Normalize the resulting distribution to obtain the posterior
probability of the query variables given the evidence.
o Example:

 Let's use a simplified version of the "Sprinkler" network:


 Variables: Rain (R), Sprinkler (S), Grass Wet (W).
 We want to find P(Rain | Grass Wet = true).
 Using enumeration:
 P(R | W = true) = P(R, W = true) / P(W = true)
 P(R, W = true) = Σ_S P(R, S, W = true) (sum over all possible values of S)
 P(W = true) = Σ_R Σ_S P(R, S, W = true) (sum over all possible values of R and S)
 We would then use the conditional probability tables (CPTs) of the network to
calculate the joint probabilities and perform the summations.
 Limitations:
 Enumeration becomes computationally intractable for networks with many variables,
as the number of combinations grows exponentially.
 Therefore, more efficient inference algorithms are often used in practice.

13. (a) Elaborate on logistic regression with an example. Explain the process of computing coefficients.
 Elaboration on Logistic Regression:
o Logistic regression is a statistical model used for binary classification. Unlike linear
regression, which predicts continuous values, logistic regression predicts the probability of a
binary outcome (e.g., yes/no, true/false).
o It uses the sigmoid function (also known as the logistic function) to transform the linear
combination of input features into a probability value between 0 and 1.
o The output of the logistic regression model represents the probability that a given input
belongs to a particular class.
o It is widely used in various applications, including medical diagnosis, spam detection, and
credit risk assessment.
 Example:
o Predicting whether a student will pass or fail an exam based on the number of hours they
studied.
o Input feature: Hours studied.

o Output: Probability of passing (1) or failing (0).

 Process of Computing Coefficients:


o Maximum Likelihood Estimation (MLE): The coefficients (weights) of the logistic
regression model are typically estimated using MLE. MLE aims to find the values of the
coefficients that maximize the likelihood of observing the given data.
o Cost Function: The cost function used in logistic regression is the logistic loss (also known
as cross-entropy loss). This function measures the difference between the predicted
probabilities and the actual outcomes.
o Gradient Descent: Gradient descent is an iterative optimization algorithm used to minimize
the cost function. It calculates the gradient of the cost function with respect to the coefficients
and updates the coefficients in the direction that reduces the cost.
o Iterative Process: The gradient descent algorithm iteratively updates the coefficients until
the cost function converges to a minimum or a maximum number of iterations is reached.
13. (b) What is a classification tree? Explain the steps to construct a classification tree. List and
explain about the different procedures used.
 Classification Tree:
o A classification tree is a decision tree where the target variable is categorical. It is used to
predict the class label of an input based on its features.
o It partitions the data into subsets based on feature values, creating a tree-like structure where
each internal node represents a feature, each branch represents a decision rule, and each leaf
node represents1 a class label.
 Steps to Construct:
1. Select Best Attribute: Choose the attribute that best splits the data based on a chosen metric
(e.g., information gain, Gini index).
2. Create Nodes: Create a decision node for the selected attribute and branches for each of its
values.
3. Partition Data: Divide the data into subsets based on the branches.
4. Repeat: Recursively repeat steps 1-3 for each subset until a stopping criterion is met (e.g., all
data points in a subset belong to the same class, or a maximum tree depth is reached).
 Different Procedures:
o Information Gain (ID3):

 Measures the reduction in entropy (impurity) of the data after splitting on an attribute.
 Selects the attribute with the highest information gain as the splitting attribute.
o Gini Index (CART):

 Measures the impurity of a data partition.


 Selects the attribute that minimizes the Gini index as the splitting attribute.
o Chi-Square (CHAID):

 Measures the statistical significance of the differences between sub-nodes.


 Selects the attribute with the most significant chi-square value as the splitting
attribute.
14. (a) (i) What is bagging and boosting? Give example.
 Bagging (Bootstrap Aggregating):
o Bagging is an ensemble method that creates multiple subsets of the original data (with
replacement) and trains a model on each subset.
o The final prediction is an average (for regression) or majority vote (for classification) of the
individual models.
o Bagging reduces variance and improves the stability of the model.

o Example: Random Forest, where multiple decision trees are trained on bootstrapped
samples.
 Boosting:
o Boosting is an ensemble method that trains models sequentially, where each model focuses
on correcting the errors of the previous ones.
o It assigns weights to data points, and misclassified points get higher weights in subsequent
models.
o Boosting reduces bias and improves the accuracy of the model.

o Example: AdaBoost, Gradient Boosting Machines (GBM).

14. (a) (ii) Outline the steps in the AdaBoost algorithm with an example.
 AdaBoost Algorithm:
1. Initialize Weights: Assign equal weights to all data points.
2. Train Weak Learner: Train a weak learner (e.g., decision stump) on the weighted data.
3. Calculate Error: Calculate the weighted error of the learner.
4. Calculate Learner Weight: Assign a weight to the learner based on its error.
5. Update Data Weights: Increase the weights of misclassified points and decrease the weights
of correctly classified points.
6. Repeat: Repeat steps 2-5 for a specified number of iterations or until a stopping criterion is
met.
7. Combine Learners: Combine the learners using their weights to make the final prediction.
 Example:
o Consider a binary classification problem with data points and their labels.

o Initially, all data points have equal weights.


o The first weak learner is trained, and its error is calculated.

o The learner's weight is calculated based on its error.

o The weights of misclassified points are increased, and the weights of correctly classified
points are decreased.
o The next weak learner is trained on the updated weights, and the process is repeated.

o The final prediction is a weighted combination of all the weak learners.

14. (b) Elaborate on the steps in the expectation-maximization algorithm.


 Expectation-Maximization (EM) Algorithm:
1. Initialization (E-step): Initialize the parameters of the model (e.g., means and covariances
for Gaussian mixtures).
2. Expectation (E-step): Calculate the expected likelihood of the data given the current
parameters.
3. Maximization (M-step): Update the parameters to maximize the expected likelihood.
4. Repeat: Repeat steps 2-3 until convergence.
 Detailed Explanation:
o Initialization: The EM algorithm starts by initializing the parameters of the model. The
initial values can be chosen randomly or based on prior knowledge.
o E-step (Expectation Step): In the E-step, the algorithm calculates the expected likelihood of
the data given the current parameters. This involves estimating the probability of each data
point belonging to each component of the model.
o M-step (Maximization Step): In the M-step, the algorithm updates the parameters of the
model to maximize the expected likelihood calculated in the E-step. This involves finding the
parameter values that best fit the data.
o Iteration: The E-step and M-step are repeated iteratively until the parameters converge to a
stable solution. Convergence is typically determined by monitoring the change in the
likelihood or the parameters.
o Applications: The EM algorithm is widely used in various applications, including clustering,
parameter estimation in probabilistic models, and handling missing data.

15. (a) Explain the steps in the backpropagation learning algorithm. What is the importance of it in
designing neural networks?
 Backpropagation Algorithm:
1. Forward Pass:
 Input data is fed into the network.
 The input is propagated through the network layer by layer.
 Each neuron applies its activation function to its weighted sum of inputs.
 The output of the network is calculated.
2. Calculate Error:
 The error between the predicted output and the actual output is computed using a loss
function (e.g., mean squared error).
3. Backward Pass (Error Propagation):
 The error is propagated backward through the network, layer by layer.
 The error is used to calculate the gradient of the loss function with respect to the
weights and biases of each neuron.
 The chain rule of calculus is used to calculate the gradients for each layer.
4. Update Weights and Biases:
 The weights and biases of each neuron are updated using an optimization algorithm
(e.g., gradient descent).
 The goal is to minimize the error by adjusting the weights and biases in the direction
opposite to the gradient.
5. Repeat:
 Steps 1-4 are repeated for a specified number of iterations or until the error converges
to a minimum.
 Importance:
o Training Neural Networks: Backpropagation is the core algorithm for training most
artificial neural networks.
o Learning Complex Patterns: It allows networks to learn complex, non-linear patterns from
data.
o Efficient Gradient Calculation: It efficiently calculates the gradients of the loss function,
making it feasible to train large networks.
o Enabling Deep Learning: It is essential for training deep neural networks with multiple
layers.
15. (b) Explain a deep feedforward network with a neat sketch.
 Deep Feedforward Network (DFN):
o Explanation:

 A deep feedforward network is a type of artificial neural network where the


connections between neurons do not form a cycle (feedforward).
 It consists of multiple layers of neurons, including an input layer, one or more hidden
layers, and an output layer.
 Each neuron in a layer is connected to all neurons1 in the next layer (fully connected).
 Deep networks have multiple hidden layers, allowing them to learn hierarchical
representations of data.
o Sketch:

o Input Layer Hidden Layer 1 Hidden Layer 2 ... Output Layer

o [x1, x2, ..., xn] -> [h1, h2, ..., hm] -> [h'1, h'2, ..., h'k] -> ... -> [y1, y2, ..., yp]

 Each arrow represents a set of weighted connections.


 Each node represents a neuron.
 The hidden layers allow the network to learn complex features and representations.
 Key Features:
o Multiple Hidden Layers: Enables the learning of complex, hierarchical features.

o Activation Functions: Non-linear functions applied to the weighted sum of inputs in each
neuron (e.g., ReLU, sigmoid).
o Feedforward Structure: Information flows in one direction, from input to output.

o Training with Backpropagation: Weights and biases are learned using backpropagation.

PART C
16. (a) The values of x and their corresponding values of y are shown in the table below.
x: 1 2 3 4 5 6 7
y: 2 4 6 5 6 8 10
(i) Find the least square regression line y = ax + b.
(ii) Estimate the value of y when x = 10.
(i) Least Square Regression Line:
1. Calculate Sums:
o Σx = 1 + 2 + 3 + 4 + 5 + 6 + 7 = 28

o Σy = 2 + 4 + 6 + 5 + 6 + 8 + 10 = 41

o Σx² = 1² + 2² + 3² + 4² + 5² + 6² + 7² = 140

o Σxy = (12) + (24) + (36) + (45) + (56) + (68) + (7*10) = 197


o n = 7 (number of data points)

2. Calculate a and b:
o b = (nΣxy - ΣxΣy) / (nΣx² - (Σx)²)

o b = (7197 - 2841) / (7*140 - 28²)

o b = (1379 - 1148) / (980 - 784)

o b = 231 / 196 ≈ 1.18

o a = (Σy - bΣx) / n

o a = (41 - 1.18 * 28) / 7

o a = (41 - 33.04) / 7

o a = 7.96 / 7 ≈ 1.14

3. Regression Line:
o y = 1.14 + 1.18x

(ii) Estimate y when x = 10:


 y = 1.14 + 1.18 * 10
 y = 1.14 + 11.8
 y = 12.94

16. (b) Consider five points (x1, y1), ..., (x5, y5) with the following coordinates as a two-dimensional
sample for clustering:

(0.5, 1.75), (1, 2), (1.75, 0.25), (4, 1), (6, 3)


Illustrate the k-means algorithm on the above data set. The required number of clusters is two, and
initially, clusters are formed from random distribution of samples: C12 = (x1, y1) and C2 = (x5, y5).

 K-Means Algorithm:
1. Initialization:
o C1 = (0.5, 1.75)

o C2 = (6, 3)

2. Assignment Step:
o Calculate the Euclidean distance of each point to C1 and C2.

o Assign each point to the cluster with the nearest centroid.

3. Update Step:
o Recalculate the centroids of the clusters by taking the mean of the points assigned to each
cluster.
4. Iteration:
o Repeat steps 2 and 3 until the centroids no longer change significantly or a maximum number
of iterations is reached.3

 Detailed Steps:
1. Initial Centroids:
o C1 = (0.5, 1.75)

o C2 = (6, 3)

2. Assignment (Iteration 1):


o Calculate distances and assign points.

3. Update Centroids (Iteration 1):


o Recalculate C1 and C2 based on the assigned points.

4. Repeat:
o Continue the assignment and update steps until convergence.

You might also like