BML Answer Key
BML Answer Key
Explanation
7
2 Marks
An ANN is composed of a large number of processing elements with their connections, and it has three distinctive
layers, namely input, hidden and output layers. These layers are called the basic elements of architecture and
known as nodes/neurons.
Q.No Explain the role of the activation function in a neural network. Mark allotted
The activation function introduces non-linearity into the network, enabling it to learn complex patterns. Common Definition
8 activation functions include ReLU, Sigmoid, and Tanh, which help in decision-making by transforming weighted
inputs. 2 Marks
Q.No What is a Multi-Layer Perceptron (MLP) network, and how is it different from a single-layer perceptron? Mark allotted
A Multi-Layer Perceptron (MLP) is a type of artificial neural network that consists of an input
layer, one or more hidden layers, and an output layer. It can learn complex patterns and solve Explanation
9 non-linear problems.
An MLP differs from a Single-Layer Perceptron (SLP) because SLP has only one layer (input
2 Marks
and output) and can only solve linearly separable problems, while MLP, with multiple hidden
layers, can handle non-linearly separable data.
Q.No What is backpropagation? Mark allotted
Backpropagation is a training algorithm for neural networks that adjusts weights by propagating errors backward Definition
10 from the output layer to the input layer, minimizing loss using techniques like gradient descent.
2 Marks
PART B(5 X 5 = 25 Marks)
Q.No. Explain the basic decision tree algorithm. How does it work, and what is the role of entropy in building Mark allotted
a decision tree?
11 A Decision Tree Algorithm is a supervised learning method used for classification and regression
tasks. It models decisions and their possible consequences in a tree-like structure, where each Machine Learning
internal node represents a decision based on a feature, branches denote the outcomes of these Explanation
2
decisions, and leaf nodes indicate the final prediction or class label. 2 Marks
o The dataset is divided into subsets based on the selected feature's possible values.
Each subset becomes a child node of the current node.
3. Recursive Partitioning:
o The splitting process is recursively applied to each child node, considering only the
data within that node, until a stopping condition is met (e.g., all instances in a node
belong to the same class, or a maximum tree depth is reached).
o Once the tree is fully grown, each leaf node is assigned a class label, which is used for
making predictions on new data.Analytics Vidhya
In decision trees, entropy is used to calculate Information Gain, which helps in selecting the feature
that best splits the data. Information Gain (IG) is defined as the reduction in entropy after a dataset is
3
split on a particular feature:
Where:
By calculating the Information Gain for each feature, the algorithm selects the feature that results in
the greatest reduction in entropy, leading to more homogeneous and informative splits. This process
continues recursively, building a tree that effectively partitions the data to make accurate predictions.
https://www.analyticsvidhya.com/blog/2021/08/decision-tree-algorithm/
Q.No. Discuss the concept of association rule mining and explain the Apriori algorithm with an example of
Mark allotted
how it generates frequent itemsets.
12 Association Rule Mining
Explanation
Association Rule Mining is a technique used in data mining to discover relationships between items 3 Marks
in large datasets. It is commonly used in market basket analysis to identify patterns in customer
purchases. Example
2 Marks
Key Terms in Association Rule Mining:
2. Confidence – Measures how often an item B appears in transactions that contain item A.
4
5 Marks
3. Lift – Measures how much more likely item B is purchased when item A is purchased,
compared to if they were independent.
Apriori Algorithm
The Apriori algorithm is a popular method for finding frequent itemsets and generating association
rules. It uses a bottom-up approach where it iteratively finds itemsets with high support values.
o Form 2-itemsets from the remaining items and count their occurrences.
o Repeat the process for higher-order itemsets until no more frequent itemsets can be
formed.
3. Generate strong association rules from the frequent itemsets using confidence and lift.
5
Example of Apriori Algorithm
Dataset (Transactions):
Items Purchased
Transaction ID
1 Bread, Milk, Egg
2 Bread, Diaper, Beer
3 Milk, Diaper, Beer
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper
Step 1: Finding Frequent Itemsets (Support Calculation)
1-itemsets:
6
(Milk, Beer) → 2/5 = 40% ✅
3-itemsets:
Q.No. Describe the concept of boosting in ensemble learning. Explain the working of the AdaBoost Mark allotted
algorithm and how it improves the performance of weak classifiers.
13 Formulas
Boosting is an ensemble learning technique that aims to improve the accuracy of weak classifiers by 3 Marks
combining them into a strong classifier. In boosting, multiple weak classifiers (models that perform slightly
better than random guessing) are trained sequentially, with each new classifier focusing on the mistakes made Correct Answer
by the previous ones. 2 Marks
1. Sequential Training: Boosting trains a series of weak classifiers, each built on the errors
made by the previous classifier. 5 Marks
7
2. Weight Adjustment: The algorithm assigns higher weights to misclassified data points,
making them more important for the next classifier.
3. Final Model: The predictions of the weak classifiers are combined (usually by weighted
voting) to form a final strong classifier. The final model’s performance is typically much
better than any single weak model.
AdaBoost Algorithm:
AdaBoost (Adaptive Boosting) is one of the most well-known boosting algorithms. It works by
combining multiple weak classifiers (typically decision trees) to create a stronger classifier.
1. Initialize Weights:
Initially, all data points are given equal weights. Suppose we have a dataset with NNN
examples, each example has a weight wi=1/N.
A weak classifier (e.g., a decision stump or small decision tree) is trained on the dataset using these weights.
The error rate ϵt of the classifier is calculated based on the weighted sum of misclassified
data points.
8
4. Calculate Alpha (Weight of Classifier):
AdaBoost calculates a weight αt for the classifier based on its error rate.
A higher weight is assigned to classifiers with lower error rates, meaning they will
have more influence in the final model.
5. Update Weights:
The weights of the misclassified samples are increased so that the next classifier
will focus more on them. Correctly classified samples have their weights reduced.
The new weights are updated as follows:
6. Repeat:
Steps 2 to 5 are repeated for a predefined number of iterations or until no further improvement is
made.
7. Final Model:
The final model is a weighted combination of all the weak classifiers. The prediction is made by
9
taking the weighted vote from all the classifiers:
3. Handling Overfitting:
Unlike other algorithms, AdaBoost can handle overfitting well, especially when the base classifiers
are simple (e.g., decision stumps). By focusing on difficult examples, it often avoids the tendency to
overfit the easy ones.
Q.No. Explain the structure of an ANN. Discuss the different types of activation functions used in neural
Mark allotted
networks and their significance.
14 1. Structure of an Artificial Neural Network (ANN) Explanation
An Artificial Neural Network (ANN) is a computational model inspired by the human brain, consisting of
3 Marks
interconnected nodes (neurons) arranged in layers.
Components of ANN:
1. Input Layer
Example
o Receives raw data (features) and passes it to the next layer.
2. Hidden Layers 2 Marks
o Perform computations and extract patterns.
o The depth of the network depends on the number of hidden layers.
10
3. Output Layer
o Produces final predictions or classifications.
4. Weights and Biases
o Weights determine the importance of inputs.
o Bias helps shift activation to improve learning.
5. Activation Function
o Introduces non-linearity, allowing the network to learn complex patterns.
2. Activation Functions in Neural Networks (Based on DataCamp's Guide)
Activation functions play a crucial role in determining how the output of a neuron is computed and whether it
should be activated or not.
Types of Activation Functions:
5 Marks
11
3. Importance of Activation Functions in ANN:
1. Introduces Non-Linearity: Allows networks to model complex patterns.
2. Enables Gradient-Based Learning: Helps with weight updates using backpropagation.
3. Prevents Vanishing Gradient: Functions like ReLU avoid slow learning in deep networks.
4. Determines Output Interpretation: Softmax ensures probability-based outputs in classification
tasks.
Q.No. Explain the process of training a neural network using backpropagation and the challenges associated
Mark allotted
with it .
15 Backpropagation is an algorithm used to train artificial neural networks by adjusting weights and biases to Explanation
minimize error. It works by propagating the error backward from the output layer to the input layer using the
With diagram
chain rule of calculus.
https://www.datacamp.com/tutorial/mastering-backpropagation
PART C (3 X 10 = 30 Marks)
Q.No. Compare ID3, C4.5, and CART algorithms in terms of their strengths and weaknesses. Also Discuss the
Mark allotted
role of entropy and information gain, and how it is used to build a decision tree.
16
Workflow
Explanation
8 Marks
Diagram
2 Marks
10 Marks
14
If a node has an equal mix of classes, entropy is high (impure).
Mathematically, entropy for a dataset with nnn classes is given by:
where:
A is the attribute being considered.
S is the dataset.
Sv are subsets created after splitting.
3. Building a Decision Tree Using Entropy & Information Gain
1. Compute the entropy of the entire dataset.
2. Calculate Information Gain for each attribute.
3. Choose the attribute with the highest Information Gain to split the node.
4. Repeat recursively for each subset until all nodes are pure or a stopping condition is met.
Explain rule induction in machine learning. Discuss the process of rule learning and its applications.
Q.No. Mark allotted
Rule induction is a technique in machine learning where patterns in data are represented as IF-THEN rules.
These rules help in making predictions and understanding relationships in datasets.
Explanation MLR
17 Example of a Rule:
IF blood sugar level > 140 AND BP > 130/90, THEN risk of diabetes = High. 6 Marks
Rule induction is commonly used in decision-making systems, medical diagnosis, and expert systems.
Common algorithms:
Steps:
Example:
Dataset:
45 Normal No Safe
Rule Induced:
IF Age > 45 AND BP = High, THEN Outcome = Risky.
Example:
16
A decision tree for loan approval can be transformed into:
IF Income > 50K AND Credit Score > 700 THEN Loan Approved.
3. Advantages
Advantages of Random Forest
Reduces overfitting by averaging multiple decision trees.
Handles high-dimensional data efficiently.
17
Works well with both numerical and categorical data.
Can handle missing values and noisy data.
Less sensitive to outliers than individual decision trees.
Advantages of AdaBoost
Improves weak learners (e.g., decision stumps) to create a strong model.
Focuses more on misclassified samples, improving accuracy.
Less prone to overfitting compared to a single decision tree.
Works well for imbalanced datasets by assigning higher importance to hard-to-classify samples.
Can be used with different base models (not just decision trees).
4. Disadvantages
Disadvantages of Random Forest
Slower training time compared to a single decision tree.
Less interpretable due to multiple decision trees.
Can be computationally expensive for very large datasets.
Disadvantages of AdaBoost
Sensitive to noisy data and outliers, as it assigns higher weights to misclassified instances.
Can overfit if the number of weak learners is too high.
Requires careful tuning of parameters (e.g., learning rate).
5. Use Cases
Use Cases of Random Forest
Medical Diagnosis: Predicting diseases based on patient records.
Fraud Detection: Identifying fraudulent transactions in banking.
Image Classification: Used in object detection and face recognition.
Feature Selection: Helps in ranking important features in datasets.
Use Cases of AdaBoost
Face Detection: Used in computer vision applications.
Spam Detection: Identifying spam emails in filtering systems.
Customer Churn Prediction: Understanding customer retention in marketing.
Credit Scoring: Assessing loan eligibility based on past records.
6. Key Differences
Feature Random Forest (Bagging) AdaBoost (Boosting)
Base Learner Multiple decision trees Decision stumps or weak classifiers
Sequential (each model corrects previous
Training Approach Parallel (trees trained independently)
errors)
Higher if too many weak learners are
Overfitting Risk Low (due to averaging multiple models)
added
Performance on Noisy
Robust to noise Sensitive to noisy data and outliers
Data
Handling of Data May not perform well without Assigns higher weight to misclassified
18
Imbalance modifications samples
Lower for weak classifiers but increases
Computational Cost Higher due to multiple trees
with more iterations
Majority voting (classification) or
Final Decision Weighted sum of weak classifiers
averaging (regression)
Discuss the types of activation functions commonly used in neural networks, including their advantages
Q.No. and disadvantages. Mark allotted
https://www.v7labs.com/blog/neural-networks-activation-functions?utm_source=chatgpt.com Problem Solving
19
4 Steps
3 Iteration
10 Marks
Discuss the issues related to generalization and overfitting in neural networks, and how they can
Q.No. be addressed. Mark allotted
20 Introduction Explanation
7 Marks
Generalization: The ability of a neural network to perform well on new, unseen data.
Overfitting: When a model learns the training data too well, including noise, and performs poorly on Example
new data. 3 Marks
1. Memorization of Training Data – The model remembers training examples instead of learning
general patterns.
2. Poor Test Performance – High accuracy on training data but low accuracy on test data.
3. Complex Model – Too many layers and parameters lead to learning unnecessary details.
10 Marks
19
4. Lack of Enough Data – Small datasets increase the chance of overfitting.
1. Regularization Techniques
2. Data Augmentation
3. Early Stopping
More data helps the model learn general trends instead of overfitting to noise.
If real data is limited, data augmentation can be used.
5. Batch Normalization
Using fewer layers and neurons can prevent the model from memorizing unnecessary details.
20
Convolutional Neural Networks (CNNs) are more efficient for image tasks.
7. Cross-Validation
Splitting data into training, validation, and test sets ensures better model evaluation.
K-fold cross-validation helps use data more effectively.
8. Transfer Learning
21