April May 2023
April May 2023
3 Define uncertainty.
Uncertainty refers to the lack of complete certainty, or the existence of multiple possible outcomes,
making it difficult to predict a specific outcome.
6 What is a heuristic?
A heuristic is a technique designed for solving a problem more quickly when classic methods are too
slow, or for finding an approximate solution when classic methods fail to find any exact solution.
BFS systematically explores a graph or tree by visiting all the neighbor nodes at the current
depth level before moving to the nodes at the next depth level.
It uses a queue data structure to maintain the order of visited nodes.
It starts at the root node and expands all the successor nodes at the current level before
moving to the next level.
BFS is complete (if a solution exists, it will find it) and optimal (it finds the shortest path) in
unweighted graphs.
o Example:
Finding the shortest path between two cities on a map where all roads have equal length.
Web crawlers use BFS to explore all links on a webpage.
In a social network, finding all friends of a person within a certain degree of separation.
Depth-First Search (DFS):
o Outline:
Solving mazes, where you follow one path until you hit a dead end and then backtrack.
Finding a path in a decision tree.
Detecting cycles in a graph.
11. (b) State the constraint satisfaction problem. Outline local search for constraint satisfaction
problem with an example.
Constraint Satisfaction Problem (CSP):
o State:
A CSP is a problem that involves finding a set of values for a set of variables that
satisfy a set of constraints.
It consists of:
Variables: A set of variables {X1, X2, ..., Xn}.
Domains: A set of possible values for each variable {D1, D2, ..., Dn}.
Constraints: A set of rules that specify which combinations of values are
allowed.
Local Search for CSP:
o Outline:
Local search starts with an initial assignment of values to variables and iteratively
improves the assignment by changing the value of one variable at a time.
The goal is to minimize the number of violated constraints.
Algorithms like min-conflicts are used, where the variable with the most conflicts is
chosen, and its value is changed to minimize the number of conflicts.
Local search is not guaranteed to find the optimal solution but can find good solutions
in a reasonable amount of time.
o Example:
The N-Queens problem: Placing N queens on an NxN chessboard such that no two
queens threaten each other.
Sudoku puzzles: Filling in a 9x9 grid with digits such that each row, column, and 3x3
subgrid contains the digits 1 to 9 without repetition.
Scheduling problems: Assigning tasks to resources while satisfying constraints on
resource availability and task dependencies.
12. (a) (i) Elaborate on unconditional probability and conditional probability with an example.
Unconditional Probability (Marginal Probability):
o Elaboration:
The probability of drawing a king from a standard deck of 52 cards is 4/52 (or 1/13).
The probability of flipping a fair coin and getting heads is 1/2.
Conditional Probability:
o Elaboration:
The probability of an event occurring given that another event has already occurred.
It's denoted as P(A|B), the probability of1 event A given event B.
It represents the likelihood of A occurring within the subset of outcomes where B has
occurred.
It can be calculated using the formula: P(A|B) = P(A and B) / P(B).
o Example:
The probability of drawing a king from a deck of cards (event A) given that the card is
a face card (event B) is 4/12 (or 1/3).
The probability that it will rain (event A) given that it is cloudy (event B).
12. (a) (ii) What is a Bayesian network? Explain the steps followed to construct a Bayesian network
with an example.
Bayesian Network:
o Definition:
A probabilistic graphical model that represents a set of random variables and their
conditional dependencies via a directed acyclic graph2 (DAG).
Nodes represent random variables, and edges represent conditional dependencies.
Each node has a conditional probability table (CPT) that specifies the probability
distribution of the variable given its parents.
Bayesian networks are used for reasoning under uncertainty and for modeling
complex systems.
Steps to Construct:
1. Identify Variables:
Determine the relevant variables for the problem.
Define the domain of each variable (the set of possible values).
2. Determine Dependencies:
Identify the direct dependencies between variables based on domain knowledge or
data.
Use a causal perspective to determine the direction of dependencies.
3. Construct DAG:
Draw a DAG where nodes represent variables and edges represent dependencies.
Ensure that the graph is acyclic (no directed cycles).
4. Specify Conditional Probability Tables (CPTs):
For each node, define the conditional probability of that node given its parents.
The CPT specifies the probability of each value of the node for each combination of
values of its parents.
Example:
o A simple Bayesian network for "Student Grades":
Variables: Difficulty of Course (D), Student Intelligence (I), Grade (G), SAT Score
(S), Letter of Recommendation (L).
Dependencies: D -> G, I -> G, I -> S, G -> L.
DAG: Draw a directed graph with nodes D, I, G, S, L and edges D->G, I->G, I->S, G-
>L.
CPTs: Define the conditional probabilities P(D), P(I), P(G|D,I), P(S|I), P(L|G).
12. (b) What do you mean by inference in Bayesian networks? Outline inference by enumeration with
an example.
Inference in Bayesian Networks:
o In Bayesian networks, inference refers to the process of calculating the probability distribution
of one or more variables (query variables) given evidence about other variables (evidence
variables). Essentially, it's about using the network to answer probabilistic queries.
o The goal is to determine the posterior probability distribution of the query variables, which
represents our updated belief about those variables after considering the evidence.
o Inference allows us to reason under uncertainty by leveraging the probabilistic relationships
encoded in the network.
Inference by Enumeration:
o Inference by enumeration is a straightforward, though computationally expensive, method for
calculating posterior probabilities.
o It involves summing over all possible combinations of values for the hidden variables
(variables that are neither query nor evidence variables).
o The process:
1. Joint Probability: Express the query in terms of the joint probability distribution of
all variables in the network.
2. Summation: Sum (or marginalize) out the hidden variables from the joint probability
distribution.
3. Normalization: Normalize the resulting distribution to obtain the posterior
probability of the query variables given the evidence.
o Example:
13. (a) Elaborate on logistic regression with an example. Explain the process of computing coefficients.
Elaboration on Logistic Regression:
o Logistic regression is a statistical model used for binary classification. Unlike linear
regression, which predicts continuous values, logistic regression predicts the probability of a
binary outcome (e.g., yes/no, true/false).
o It uses the sigmoid function (also known as the logistic function) to transform the linear
combination of input features into a probability value between 0 and 1.
o The output of the logistic regression model represents the probability that a given input
belongs to a particular class.
o It is widely used in various applications, including medical diagnosis, spam detection, and
credit risk assessment.
Example:
o Predicting whether a student will pass or fail an exam based on the number of hours they
studied.
o Input feature: Hours studied.
Measures the reduction in entropy (impurity) of the data after splitting on an attribute.
Selects the attribute with the highest information gain as the splitting attribute.
o Gini Index (CART):
o Example: Random Forest, where multiple decision trees are trained on bootstrapped
samples.
Boosting:
o Boosting is an ensemble method that trains models sequentially, where each model focuses
on correcting the errors of the previous ones.
o It assigns weights to data points, and misclassified points get higher weights in subsequent
models.
o Boosting reduces bias and improves the accuracy of the model.
14. (a) (ii) Outline the steps in the AdaBoost algorithm with an example.
AdaBoost Algorithm:
1. Initialize Weights: Assign equal weights to all data points.
2. Train Weak Learner: Train a weak learner (e.g., decision stump) on the weighted data.
3. Calculate Error: Calculate the weighted error of the learner.
4. Calculate Learner Weight: Assign a weight to the learner based on its error.
5. Update Data Weights: Increase the weights of misclassified points and decrease the weights
of correctly classified points.
6. Repeat: Repeat steps 2-5 for a specified number of iterations or until a stopping criterion is
met.
7. Combine Learners: Combine the learners using their weights to make the final prediction.
Example:
o Consider a binary classification problem with data points and their labels.
o The weights of misclassified points are increased, and the weights of correctly classified
points are decreased.
o The next weak learner is trained on the updated weights, and the process is repeated.
15. (a) Explain the steps in the backpropagation learning algorithm. What is the importance of it in
designing neural networks?
Backpropagation Algorithm:
1. Forward Pass:
Input data is fed into the network.
The input is propagated through the network layer by layer.
Each neuron applies its activation function to its weighted sum of inputs.
The output of the network is calculated.
2. Calculate Error:
The error between the predicted output and the actual output is computed using a loss
function (e.g., mean squared error).
3. Backward Pass (Error Propagation):
The error is propagated backward through the network, layer by layer.
The error is used to calculate the gradient of the loss function with respect to the
weights and biases of each neuron.
The chain rule of calculus is used to calculate the gradients for each layer.
4. Update Weights and Biases:
The weights and biases of each neuron are updated using an optimization algorithm
(e.g., gradient descent).
The goal is to minimize the error by adjusting the weights and biases in the direction
opposite to the gradient.
5. Repeat:
Steps 1-4 are repeated for a specified number of iterations or until the error converges
to a minimum.
Importance:
o Training Neural Networks: Backpropagation is the core algorithm for training most
artificial neural networks.
o Learning Complex Patterns: It allows networks to learn complex, non-linear patterns from
data.
o Efficient Gradient Calculation: It efficiently calculates the gradients of the loss function,
making it feasible to train large networks.
o Enabling Deep Learning: It is essential for training deep neural networks with multiple
layers.
15. (b) Explain a deep feedforward network with a neat sketch.
Deep Feedforward Network (DFN):
o Explanation:
o [x1, x2, ..., xn] -> [h1, h2, ..., hm] -> [h'1, h'2, ..., h'k] -> ... -> [y1, y2, ..., yp]
o Activation Functions: Non-linear functions applied to the weighted sum of inputs in each
neuron (e.g., ReLU, sigmoid).
o Feedforward Structure: Information flows in one direction, from input to output.
o Training with Backpropagation: Weights and biases are learned using backpropagation.
PART C
16. (a) The values of x and their corresponding values of y are shown in the table below.
x: 1 2 3 4 5 6 7
y: 2 4 6 5 6 8 10
(i) Find the least square regression line y = ax + b.
(ii) Estimate the value of y when x = 10.
(i) Least Square Regression Line:
1. Calculate Sums:
o Σx = 1 + 2 + 3 + 4 + 5 + 6 + 7 = 28
o Σy = 2 + 4 + 6 + 5 + 6 + 8 + 10 = 41
o Σx² = 1² + 2² + 3² + 4² + 5² + 6² + 7² = 140
2. Calculate a and b:
o b = (nΣxy - ΣxΣy) / (nΣx² - (Σx)²)
o a = (Σy - bΣx) / n
o a = (41 - 33.04) / 7
o a = 7.96 / 7 ≈ 1.14
3. Regression Line:
o y = 1.14 + 1.18x
16. (b) Consider five points (x1, y1), ..., (x5, y5) with the following coordinates as a two-dimensional
sample for clustering:
K-Means Algorithm:
1. Initialization:
o C1 = (0.5, 1.75)
o C2 = (6, 3)
2. Assignment Step:
o Calculate the Euclidean distance of each point to C1 and C2.
3. Update Step:
o Recalculate the centroids of the clusters by taking the mean of the points assigned to each
cluster.
4. Iteration:
o Repeat steps 2 and 3 until the centroids no longer change significantly or a maximum number
of iterations is reached.3
Detailed Steps:
1. Initial Centroids:
o C1 = (0.5, 1.75)
o C2 = (6, 3)
4. Repeat:
o Continue the assignment and update steps until convergence.