191IT7310Machine LearningQB

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

191IT7310 - Machine Learning

QUESTION BANK
UNIT- 1INTRODUCTION
Machine Learning – Types of Machine Learning – Supervised Learning – Unsupervised Learning –
Basic Concepts in Machine Learning – Machine Learning Process – Weight Space – Testing Machine
Learning Algorithms – A Brief Review of Probability Theory –Turning Data into Probabilities – The
Bias-Variance Trade-off.
S.NO Q&A CO K-LEVEL
PART- A
Which subfield of artificial intelligence focuses on algorithms and
models that allow computers to learn and make predictions without
explicit programming?
1 a) Machine Learning CL1
CO1.1
b) Natural Language Processing
c) Robotics
d) Computer Vision
Supervised learning is a type of machine learning where the algorithm
learns from labeled data to make predictions or decisions.
2 CO1.2 CL2
a) True
b) False
Which type of learning algorithm learns from rewards or punishments
based on its actions?
a) Supervised Learning
3 CO1.2 CL2
b) Unsupervised Learning
c) Reinforcement Learning
d) Deep Learning
Which type of learning deals with partially labeled data and combines
supervised and unsupervised learning?
a) Reinforcement Learning
4 b) Semi-Supervised Learning CO1.2 CL2
c) Transfer Learning
d) Active Learning
In supervised learning, what are the known outputs or target values
associated with the input data called?
a) Features
5 CO1.2 CL1
b) Labels
c) Training Data
d) Test Data
In which type of learning does the algorithm learn from labeled
examples?
a) Supervised Learning
6 C01.2 CL1
b) Unsupervised Learning
c) Reinforcement Learning
d) Semi-Supervised Learning
Which type of machine learning algorithm is used for continuous
target variables?
a) Classification
7 b) Regression CO1.2 CL1
c) Clustering
d) Dimensionality Reduction
Unsupervised learning is a type of machine learning where the
algorithm learns from unlabeled data to discover patterns or structures
8 in the data. CO1.3 CL2
a) True
b) False
Which technique can be used to group similar data points together
based on their characteristics?
a) Regression
9 b) Clustering CO1.3 CL2
c) Classification
d) Dimensionality Reduction
What is the process of transforming raw data into a suitable format for
machine learning algorithms called?
a) Data Collection
10 CO1.4 CL2
b) Data Preprocessing
c) Feature Engineering
d) Model Selection
Which technique is used to handle missing values in a dataset?
a) Data Imputation
11 b) Data Normalization CO1.4 CL2
c) Feature Scaling
d) Principal Component Analysis
Which step in the machine learning process involves selecting or
creating the most relevant features for the algorithm?
a) Data Collection
12 CO1.4 CL2
b) Data Preprocessing
c) Feature Selection/Engineering
d) Model Selection
What is the goal of feature engineering in machine learning?
a) Selecting the best machine learning algorithm
13 b) Transforming data into a suitable format for analysis CO1.4 CL1
c) Discovering patterns or structures in unlabeled data
d) Creating or selecting relevant features for the model
Which step of the machine learning process involves adjusting the
model's internal parameters or weights using training data?
a) Data Preprocessing
14 CO1.4 CL2
b) Feature Engineering
c) Model Selection
d) Training
Which step of the machine learning process involves applying the
trained model to new, unseen data?
a) Data Preprocessing
15 b) Model Selection CO1.4 CL2
c) Evaluation
d) Testing and Deployment
Which step in the machine learning process involves assessing the
performance of the trained model on unseen data?
a) Data Collection
16 CO1.4 CL2
b) Data Preprocessing
c) Evaluation
d) Testing and Deployment
Which term refers to the space of all possible combinations of
weights in a machine learning model?
a) Feature Space
17 CO1.5 CL1
b) Target Space
c) Weight Space
d) Prediction Space
Which type of machine learning algorithm aims to minimize the error
or loss function during training?
a) Unsupervised Learning
18 b) Reinforcement Learning CO1.6 CL2
c) Supervised Learning
d) Deep Learning
Which metric is commonly used to evaluate classification algorithms?
a) Mean Squared Error
19 b) R-squared CO1.6 CL2
c) Accuracy
d) Root Mean Squared Error
What is the main purpose of cross-validation in machine learning?
a) Selecting the best features for the model
20 b) Assessing the model's performance on unseen data CO1.6 CL1
c) Tuning the model's hyperparameters
d) Preventing overfitting of the model
Which type of machine learning algorithm aims to find the optimal
decision boundary between classes in the data?
a) Linear Regression
21 CO1.6 CL2
b) K-nearest Neighbors
c) Support Vector Machines
d) Naive Bayes
Which technique can be used to prevent overfitting in machine
learning models?
a) Regularization
22 CO1.6 CL2
b) Feature Extraction
c) Cross-validation
d) Gradient Descent
Which technique is used to reduce the dimensionality of the data
while preserving most of its important features?
a) Clustering
23 b) Principal Component Analysis CO1.6 CL1
c) Random Forests
d) Gradient Boosting

Which mathematical framework is used to model and analyze


uncertainty in machine learning?
a) Probability Theory
24 CO1.7 CL1
b) Linear Algebra
c) Calculus
d) Statistics
Which mathematical framework provides a foundation for statistical
inference in machine learning?
a) Linear Algebra
25 CO1.7 CL2
b) Calculus
c) Probability Theory
d) Logic
Which technique can be used to map input data to a probability
distribution over the possible outcomes or classes?
a) Logistic Regression
26 CO1.7 CL1
b) K-means Clustering
c) Decision Trees
d) Support Vector Machines
Which mathematical technique is used to adjust the model's
parameters during the training process?
a) Gradient Descent
27 b) Singular Value Decomposition CO1.7 CL2
c) Markov Chain Monte Carlo
d) Principal Component Analysis
What does the bias-variance tradeoff deal with in machine learning?
a) The relationship between accuracy and precision
28 b) The relationship between bias and variance CO1.8 CL2
c) The tradeoff between training time and prediction time
d) The tradeoff between simplicity and complexity
What does the bias in a machine learning model represent?
a) The error introduced by approximating a real-world problem
29 b) The amount by which the model's predictions vary CO1.8 CL1
c) The difference between predicted and actual values
d) The ability of the model to generalize to new data
Match the Following: Match the machine learning concept with its
description.

A. Bias-Variance Trade-off
B. Machine Learning Process
C. Probability Theory
D. Weight Space

1. The set of all possible parameter values that a machine learning


model can take, affecting the model's behavior and generalization.
2. The mathematical study of uncertainty and random events, utilized
in turning data into probabilities for machine learning applications.
30 3. A step-by-step approach involving data collection, data CO1.8 CL2
preprocessing, feature selection, model training, and evaluation,
used in developing machine learning systems.
4. This concept deals with the trade-off between the error introduced
by bias (simplifying assumptions) and variance (sensitivity to small
fluctuations) in a model.
a) A-3 B-2 C-1 D-4 b) A-1 B-2 C-3 D-4 c)A-1 B-2 C-4 D-3

d)A - 4
B-3
C-2
D-1

PART –B
In the realm of modern technological advancements, how does a
particular form of learning contribute to machines becoming
1 CO1.1 CL2
increasingly intelligent and autonomous?

In the context of Machine Learning, demonstrate how pre-trained


2 models can be fine-tuned for specific tasks and domains, and provide CO1.1 CL2
examples of practical applications.
In the pursuit of optimizing predictive performance, what critical
3 aspect influences the fine-tuning of a supervised model and its ability CO1.2 CL2
to learn from data?
Within the domain of autonomous learning, what primary objective
4 guides algorithms to identify underlying patterns without explicit CO1.2 CL2
examples to follow?
Examine the importance of feature engineering in Machine Learning.
5 Could you provide common techniques used to preprocess and CO1.2 CL1
extract relevant features from raw data?
How do advanced techniques for model evaluation unravel the
6 CO1.3 CL2
underlying strengths and weaknesses of machine learning algorithms?
Within the context of linear models, how does the notion of weight
7 space contribute to understanding the learning process and model CO1.4 CL2
behaviour?
When searching for an optimal solution, what influence does the
8 careful selection of an optimization algorithm wield on the CO1.5 CL2
trajectories followed in weight space
While evaluating the efficacy of a model, what inadvertent challenges
9 emerge when testing its capabilities on the same data it was trained CO1.6 CL2
on?
How do the concepts of conditional probability and Bayes' theorem
10 intertwine with machine learning algorithms to illuminate hidden CO1.7 CL2
insights in data?
When employing probabilistic models, how does the transformation of
11 raw data into probability distributions unlock a deeper understanding of CO1.7 CL2
underlying patterns?
How does the intricacy of a model impact the equilibrium between
12 avoiding bias and mitigating variance, thus influencing its overall CO1.8 CL2
predictive capabilities?
PART –C
How does Machine Learning leverage human cognition and
1 data-driven algorithms to transcend traditional programming CO1.1 CL2
boundaries?
In the realm of machine learning, imagine you're navigating a maze
of data, seeking to uncover hidden patterns and make informed
predictions. How would you strategically employ the interplay
2 between supervised, unsupervised, and other learning approaches as CO1.2 CL2
your guiding tools? Consider the distinct strengths and limitations of
each, and share an example where the synergy of these methods led to
a valuable discovery or outcome.
How does Supervised Learning empower machines to learn patterns
3 and generalize knowledge through dynamic interactions between CO1.3 CL2
input data and output labels?
Imagine you have a treasure chest filled with unlabelled data from a
diverse array of sources. How can you unlock the secrets within using
the intricate art of unsupervised learning? Explore how clustering
4 algorithms transform this treasure trove into a map of hidden insights, CO1.4 CL2
and share a scenario where the power of clustering illuminated a
previously unseen pattern or relationship in a dataset, leading to a
eureka moment.
Given a dataset of 1000 images, let's say 750 images are of "cats" and
5 250 images are of "dogs." Compute the probabilities of selecting a CO1.5 CL2
"cat" image and a "dog" image from this dataset.
Picture yourself as an explorer in the realm of machine learning,
navigating through the dense forest of uncertainty and
decision-making. How does a solid understanding of the foundational
concepts of probability theory act as your compass, guiding you
6 through the thicket of uncertainty, and empowering you to make CO1.6 CL2
informed decisions? Share an instance where a grasp of probability
theory transformed a seemingly unpredictable situation into a
manageable one, highlighting its pivotal role in shaping the course of
a machine learning endeavor
In the context of probabilistic models, how does the transformation of
raw data into probability distributions enable a deeper understanding
7 CO1.7 CL2
of underlying patterns and uncertainties, thereby enhancing predictive
accuracy and reliability?
How does the delicate balance between bias and variance impact the
performance and generalization capabilities of Machine Learning
8 CO1.8 CL2
models, and what strategies can be employed to strike an optimal
trade-off between the two?

UNIT- 2 SUPERVISED LEARNING


Linear Models for Regression – Linear Basis Function Models – The Bias-Variance Decomposition –
Bayesian Linear Regression – Common Regression Algorithms – Simple Linear Regression – Multiple
Linear Regression – Linear Models for Classification – Discriminant Functions – Probabilistic Generative
Models – Probabilistic Discriminative Models – Laplace Approximation – Bayesian Logistic Regression
– Common Classification Algorithms – k-Nearest Neighbors – Decision Trees – Random Forest model –
Support Vector Machines
S.NO Q&A CO K-LEVEL
PART- A
Linear regression is a technique used for:
a) Classification
1 b) Clustering CO2.1 CL1
c) Regression
d) Dimensionality reduction
Match the Following:

A. Bayesian Linear Regression


B. k-Nearest Neighbors (k-NN) CO2.1 CL2
C. Support Vector Machines (SVM)
D. Probabilistic Generative Models

1.This algorithm belongs to the family of non-parametric and


instance
-based learning methods. It classifies an input by finding the
majority class among its k-nearest neighbors in the feature space.
2. It is a linear regression model that incorporates prior knowledge
about the data through Bayesian inference, providing a probabilistic
2 approach to regression tasks.
3. This algorithm is used for both classification and regression tasks.
It works by creating a set of decision boundaries to separate different
classes or predict continuous values.
4. These models are used for classification and assume specific
probability distributions for each class, allowing the computation of
posterior probabilities.

Answer:
A-2
B-1
C-3
D-4

The bias-variance tradeoff deals with the relationship between:


a) Model complexity and computational efficiency
3 b) Underfitting and overfitting CO2.2 CL2
c) Precision and recall
d) Supervised and unsupervised learning
Simple linear regression involves:
a) One independent variable and one dependent variable
4 b) Multiple independent variables and one dependent variable CO2.2 CL1
c) One independent variable and multiple dependent variables
d) Multiple independent variables and multiple dependent variables
Which algorithm is used for both regression and classification tasks?
a) Linear regression CO2.2 CL1
5 b) k-Nearest Neighbors
c) Decision Trees
d) Support Vector Machines
The Laplace approximation is a method used for:
a) Feature selection
6 b) Model regularization CO2.2 CL2
c) Approximating posterior distributions
d) Solving non-linear regression problems
Bayesian linear regression differs from traditional linear regression
in that it:
a) Incorporates prior knowledge about the data CO2.3 CL1
7
b) Eliminates the need for feature engineering
c) Requires a larger training dataset
d) Focuses on non-linear relationships
Discriminant functions are used in linear models for:
a) Clustering
8 b) Feature selection CO2.3 CL2
c) Classification
d) Regression
The bias-variance tradeoff deals with finding a balance between:
a) Model accuracy and model complexity CO2.3 CL1
9 b) Training time and testing time
c) Precision and recall
d) Feature selection and feature extraction
Linear models for classification can handle datasets that are:
a) Linearly separable
10 b) Non-linearly separable CO2.3 CL2
c) Categorical
d) Imbalanced
True or False: Random Forest is an ensemble learning method that
combines multiple decision trees to create a powerful classifier for
11 both regression and classification tasks.
a) True CO2.3 CL2
b) False
The Laplace approximation method is used to approximate:
a) Posterior distributions
12 b) Prior distributions CO2.3 CL1
c) Likelihood functions
d) Loss functions
In decision trees, the splitting of nodes is based on:
a) The mean of the target variable
13 b) The median of the target variable
c) The standard deviation of the target variable
d) The features' values of the data CO2.3 CL1
Which of the following is a key characteristic of Support Vector
Machines (SVMs)?
a) They are non-parametric models
14 b) They can handle missing data effectively CO2.3 CL1
c) They maximize the margin between classes
d) They are particularly suited for high-dimensional data
Linear regression is a supervised learning algorithm used for
classification tasks, where it finds the best linear decision boundary
15 to separate different classes. CO2.3 CL1
a) True
b) False
Decision trees are commonly used for:
a) Regression
16 b) Dimensionality reduction
c) Classification CO2.3 CL2
d) Unsupervised learning
Which technique is used in linear models for regression to estimate
model parameters?
a) Gradient descent CO2.3 CL1
17 b) Expectation-Maximization
c) Genetic algorithms
d) Principal Component Analysis
Which of the following is an example of a regression algorithm?
a) k-Nearest Neighbors
18 b) Decision Trees
c) Random Forest CO2.4 CL1
d) Support Vector Machines
The Random Forest model is an ensemble of:
a) Linear models
19 b) Decision trees CO2.4 CL2
c) Support Vector Machines
d) Neural networks
Support Vector Machines (SVMs) are commonly used for:
a) Regression
20 b) Clustering CO2.4 CL2
c) Classification
d) Dimensionality reduction
The Random Forest model combines multiple decision trees to:
a) Reduce overfitting
21 b) Improve computational efficiency CO2.4 CL2
c) Handle imbalanced datasets
d) Optimize the feature selection process
Which regression algorithm is based on the use of decision trees?
a) Linear regression CO2.5 CL2
22 b) k-Nearest Neighbors
c) Random Forest
d) Support Vector Machines
Which of the following techniques is commonly used in k-Nearest
Neighbors (k-NN)?
a) Decision trees
23
b) Clustering
c) Feature selection CO2.5 CL2
d) Distance metrics
In probabilistic generative models, class-conditional probabilities are
estimated:
a) Separately for each class CO2.6 CL2
24 b) Using discriminant functions
c) Based on prior distributions
d) As a joint distribution over all classes
Which classification algorithm assigns a new data point to the class
with the majority vote from its k nearest neighbors?
a) Linear regression
25 b) k-Nearest Neighbors (k-NN) CO2.6 CL2
c) Decision Trees
d) Random Forest
Which of the following is true about linear regression?
a) It is a classification algorithm
26 b) It handles non-linear relationships between features CO2.6 CL2
c) It is an unsupervised learning technique
d) It estimates a linear relationship between variables
Which model combines multiple decision trees to improve predictive
accuracy and handle complex relationships between features?
a) Linear regression CO2.7 CL2
27 b) k-Nearest Neighbors (k-NN)
c) Decision Trees
d) Random Forest
Which technique is used in Bayesian logistic regression to estimate
posterior distributions?
a) Expectation-Maximization CO2.8 CL2
28 b) Laplace approximation
c) Random sampling
d) Principal Component Analysis
Support Vector Machines (SVMs) are particularly well-suited for:
a) Linearly separable datasets
29 b) High-dimensional data CO2.8 CL2
c) Categorical variables
d) Imbalanced datasets
Which technique is used in linear models for regression to estimate
model parameters by minimizing the sum of squared errors?
a) Gradient descent CO2.8 CL2
30 b) Expectation-Maximization
c) Genetic algorithms
d) Principal Component Analysis

PART –B

1 Can you explain how the violation of essential assumptions in linear CO2.1 CL1
regression can impact the model's performance?
How does the use of basis function in linear regression models
2 CO2.1 CL2
enhance the model's flexibility and adaptability?
Could you provide an analysis of how the bias-variance trade-off
influences the generalization performance of machine learning
3 CO2.2 CL2
models, emphasizing the intricate equilibrium between underfitting
and overfitting?
Could you elaborate on the underlying principles of Bayesian Linear
4 Regression and shed light on how it distinguishes itself from CO2.3 CL2
classical linear regression methods, according to your understanding?
Suppose you have the following dataset representing the relationship CL3
between the number of study hours (x) and the exam scores (y):
Study Hours Exam Scores
(x) (y)
3 55
5 70
7 85
9 95
5 CO2.3

Find the equation of the simple linear regression line (y = mx + b)


that best fits the data, where 'm' is the slope and 'b' is the y-intercept.
Dataset:

x1 x2 Class Label
2 3 0
3 4 0
5 6 1
7 8 1
6 CO2.4 CL3
9 10 1
4 5 ?
Question:
Given the dataset above, suppose we use the linear regression model.
Can you calculate the predicted class label (0 or 1) for the new data
point? What is the predicted class label for this data point based on
the given threshold 0.5?
Solve the below problem using Naïve Bayesian classifier and
find out the solution if the weather is sunny, then the player
should play or not?

7 CO2.9 CL3

Emphasize the different between probabilistic generative models


8 CO2.6 CL2
with discriminative models in classification tasks and principle
Can you reveal your understanding of how the utilization of L1 and
L2 regularization techniques influences the complexity of
9 CO2.6 CL2
probabilistic discriminative models, feature selection, and
generalization performance?
Analyse the dataset to construct the decision tree with CL2
Given table:

Competiti
Age Type Profit
on
Old Yes Software Down
Old No Software Down
Old No Hardware Down
10 CO2.8
Mid Yes Software Down
Mid Yes Hardware Down
Mid No Hardware Up
Mid No Software Up
New Yes Software Up
New No Hardware Up
New No Software Up
State key characteristics and benefits of decision trees and random
11 forests in machine learning, and how do they differ in their CO2.9 CL1
approaches to building predictive models?
CL3

Suppose we have a dataset of the heights and weights of 10


individuals:
Height (cm) Weight (kg)
165 55
170 63
175 68
12 160 50 CO2.9
180 75
155 48
185 80
150 45
190 90
140 38
We want to predict the weight of a new individual with a height of
170 cm using k-Nearest Neighbours algorithm with k = 3.
PART –C
Imagine linear regression as a sturdy bridge built on certain
underlying assumptions. What are these foundational assumptions,
and how do they form the bedrock for accurate predictions? Now,
venture into the realm of model vulnerability - how does the
violation of these assumptions act as the undercurrent that might
1 threaten the stability of our bridge, leading to misguiding CO2.1 CL4
predictions? Can you provide an example where such a violation
significantly impacted the performance of a linear regression model,
revealing the critical role of these assumptions in safeguarding the
integrity of the prediction process

Imagine you're a data scientist tasked with improving the


personalized recommendation system for a popular streaming
platform. The current model sometimes suggests popular shows that
almost everyone watches, but lacks the finesse to cater to individual
preferences. On the other hand, the model occasionally goes
2 overboard, suggesting obscure shows that seem irrelevant to users. CO2.2 CL2
How would you address this delicate balance between serving
popular content and catering to niche tastes? Share your approach,
considering the principles of the bias-variance trade-off, and how it
might lead to a recommendation system that genuinely delights users
across diverse viewing preferences.
Apply SVM algorithm for the data-points and find dimension of
hyper plane to classify the data-points for the figure. (assume bias
=1)

3 CO2.3 CL2

Consider the following dataset representing the relationship between


the number of study hours (x1), the number of exercise hours (x2),
and exam scores (y):
Study Hours (x1) Exercise Hours (x2) Exam Scores (y)
3 2 75
5 3 85
4 7 4 95 CO2.3 CL3
9 3 90

Find the multiple linear regression equation (y = β0 + β1x1 + β2x2)


that best fits the data, where β0 is the y-intercept, β1 is the
coefficient for x1, and β2 is the coefficient for x2.
In the context of machine learning models, what sets probabilistic
discriminative models apart from other approaches, and how do they
5 effectively model the posterior probability of class labels given input CO2.6 CL2
features? Provide examples of real-world applications where these
models have shown superiority.
Analyze the inner workings of Bayesian logistic regression and how
it integrates Bayesian techniques into the logistic regression
6 framework. Elaborate on the role of prior beliefs about model CO2.6 CL2
parameters and how these beliefs impact the model's estimation and
classification.
Imagine you are working on a data science project to predict
whether a bank's loan applicants will default on their loans. You
7 decide to use a Random Forest model for this task. CO2.9 CL2
Why might using a Random Forest be advantageous in this loan
default prediction scenario compared to using a single decision tree?
Give decision trees to represent the following boolean functions:
(a) A ∧ ¬B
(b) A ∨ [B ∧ C]
8 (c) A XOR B CO2.9 CL3
(d) [A∧ B] ∨ [C ∧ D]
(e) [A ∨ B]∧[¬ C]
(f) [A XOR B]∧ C

UNIT- 3UNSUPERVISED LEARNING


Mixture Models and EM – K-Means Clustering – Dirichlet Process Mixture Models – Spectral Clustering
– Hierarchical Clustering – The Curse of Dimensionality – Dimensionality Reduction – Principal
Component Analysis – Latent Variable Models(LVM) – Latent Dirichlet Allocation (LDA).
S.NO Q&A CO K-LEVEL
PART- A
Mixture models are used for:
a) Classification
1 b) Clustering CO3.1 CL2
c) Dimensionality reduction
d) Regression
Linear regression is a supervised learning algorithm used for
classification tasks, where it finds the best linear decision boundary to
2 separate different classes.
a) True
b) False C03.1 CL1
Mixture models are commonly used in:
a) Deep learning architectures
3 b) Reinforcement learning algorithms CO3.1 CL1
c) Natural language processing tasks
d) Social network analysis
The EM algorithm is an iterative process that:
a) Maximizes the log-likelihood of the data
4 b) Minimizes the sum of squared errors CO3.1 CL1
c) Computes the principal components of the data
d) Updates the cluster assignments in k-means clustering
The EM algorithm is composed of two main steps:
a) Expectation and maximization
5 b) Exploration and manipulation CO3.1 CL1
c) Estimation and modeling
d) Encoding and decoding
Dirichlet process mixture models employ a non-parametric prior
based on the
a) Dirichlet distribution
6 b) Gaussian distribution
c) Exponential distribution CO3.1 CL1
d) Beta distribution
K-means clustering assigns each data point to:
a) The nearest cluster center
7 b) The cluster with the highest probability CO3.2 CL2
c) The most representative data point in the cluster
d) The cluster with the maximum entropy
K-means clustering aims to minimize:
a) The within-cluster variance CO3.2 CL2
8 b) The between-cluster variance
c) The total sum of squared errors
d) The number of clusters
In which type of clustering algorithm does each data point explicitly
belong to one and only one cluster?
a) K-means clustering CO3.2 CL1
9
b) Dirichlet process mixture models
c) Spectral clustering
d) Hierarchical clustering
Match the Unsupervised Learning method with its corresponding
description:
A. K-Means Clustering CO3.3 CL2
B. Principal Component Analysis (PCA)
C. Latent Dirichlet Allocation (LDA)
D. Hierarchical Clustering

1. Mixture Models and EM


10 2. Dimensionality Reduction
3. Topic Modeling
4. Agglomerative Clustering
Answer:
A-4
B-2
C-3
D-1
K-means clustering is a:
a) Hierarchical clustering algorithm
11 b) Probabilistic clustering algorithm CO3.3 CL2
c) Density-based clustering algorithm
d) Partition-based clustering algorithm
Dirichlet process mixture models are used to address:
a) The curse of dimensionality
12 b) Overfitting in mixture models CO3.4 CL2
c) Class imbalance in classification tasks
d) Model selection in clustering algorithms
Spectral clustering is a technique used for:
a) Non-linear dimensionality reduction CO3.4 CL2
13 b) Text classification
c) Community detection in networks
d) Density estimation
Hierarchical clustering builds a hierarchy of clusters by:
a) Partitioning the data into a fixed number of clusters CO3.4
14 b) Agglomerating individual data points into clusters CL1
c) Assigning data points to pre-defined clusters
d) Balancing the within-cluster variance
Spectral clustering utilizes the eigenvalues and eigenvectors of a:
a) Covariance matrix
15 b) Decision matrix CO3.4 CL1
c) Similarity matrix
d) Kernel matrix
Hierarchical clustering can be performed using:
a) The k-means algorithm
16 b) The EM algorithm
c) The DBSCAN algorithm
d) Agglomerative or divisive approaches CO3.4 CL2
Spectral clustering is particularly useful for clustering data that
exhibits:
a) Linear separability CO3.4 CL2
17 b) High-dimensional features
c) Non-linear relationships
d) Low-dimensional substructures
Hierarchical clustering can be visualized using:
a) Bar plots
18 b) Heatmaps
c) Dendrograms CO3.4 CL1
d) Scatter plots
The curse of dimensionality refers to the challenge of:
a) Interpreting high-dimensional data
19 b) Computing high-dimensional distances CO3.5 CL2
c) Visualizing high-dimensional data
d) Sampling high-dimensional data
Which algorithm is commonly used to estimate the parameters of
Gaussian mixture models?
a) K-means clustering CO3.6 CL1
20
b) Hierarchical clustering
c) Expectation-Maximization (EM) algorithm
d) Principal Component Analysis (PCA)
Random Forest is an ensemble learning method that combines
multiple decision trees to create a powerful classifier for both
21 regression and classification tasks. CO3.6 CL2
a) True
b) False
Latent Variable Models (LVM) are used to:
a) Capture hidden patterns in data
22 b) Reduce the number of dimensions in the data CO3.7 CL1
c) Optimize the model parameters in clustering algorithms
d) Assign probabilities to data points in mixture models
Latent Dirichlet Allocation (LDA) is a generative model commonly
used for:
a) Image classification CO3.7 CL1
23
b) Sentiment analysis
c) Topic modeling
d) Anomaly detection
Principal Component Analysis (PCA) finds orthogonal vectors that:
a) Maximize the variance of the data
24 b) Minimize the within-cluster variance
c) Maximize the between-cluster variance
d) Minimize the number of dimensions CO3.7 CL1
Principal Component Analysis (PCA) is a technique used for:
a) Feature selection
25 b) Model regularization
c) Dimensionality reduction CO3.7 CL1
d) Non-linear mapping of data
Principal Component Analysis (PCA) is used for:
a) Feature selection
26 b) Dimensionality reduction CO3.7 CL2
c) Clustering data points
d) Fitting regression models
Principal Component Analysis (PCA) can be used for:
a) Feature engineering
27 b) Model selection CO3.7 CL2
c) Data imputation
d) Visualization of high-dimensional data
Latent Variable Models (LVM) aim to:
a) Discover hidden factors in the data
28 b) Find the optimal number of clusters
c) Fit the data to a linear regression model CO3.8 CL1
d) Optimize the decision boundaries in classification
Latent Dirichlet Allocation (LDA) models represent documents as:
a) Clusters of words CO3.9 CL1
29 b) Topic distributions
c) Document embeddings
d) Bag-of-words representations
Latent Dirichlet Allocation (LDA) is commonly used in:
a) Regression analysis
30 b) Image segmentation CO3.9 CL2
c) Topic modeling of text data
d) Reinforcement learning
PART –B
Depict a real-world scenario where the EM algorithm might get stuck
1 in a local optimum when estimating the parameters of a mixture CO3.1 CL3
model.
Consider the following data points in a 2-dimensional space:

Data points: (2, 3), (3, 5), (5, 4), (7, 3), (9, 6), (10, 8)
2 CO3.2 CL3
Apply K-Means clustering algorithm to partition the data into two
clusters.
In what ways can a Dirichlet Process Mixture Model address the
3 problem of automatically determining the number of clusters in a CO3.2 CL3
dataset?
Give the limitations of spectral clustering when dealing with large
4 CO3.3 CL2
datasets and propose a possible improvement.
Provide an example where hierarchical clustering fails to produce
5 CO3.3 CL3
meaningful clusters due to its sensitivity to outliers.
Suggest strategies to mitigate the curse of dimensionality when
6 CO3.5 CL2
dealing with high-dimensional data in clustering tasks.
Dimensionality reduction can be a double-edged sword, both aiding
7 and hindering unsupervised learning tasks .give your point of view CO3.6 CL2
about this
Give a situation where PCA is inappropriate due to the underlying
8 CO3.7 CL3
structure of the data and suggest an alternative technique.
Could you provide the primary steps involved in performing Principal
9 CO3.7 CL1
Component Analysis.
Contrast the assumptions made by latent variable models with those
10 CO3.8 CL2
made by explicit clustering algorithms.
Elucidate the fundamental idea behind Latent Dirichlet Allocation and its
11 CO3.9 CL2
application in topic modelling.
Examine the challenges of LDA and propose methods to enhance the
12 CO3.9 CL2
interpretability of the results.
PART –C
In which scenarios would the EM algorithm converge slowly or fail to
1 converge for fitting a mixture model, and how can these issues be CO3.1 CL2
addressed?

i) Describe the challenges that might arise when using the K-Means
algorithm to cluster data with unevenly distributed or non-spherical
clusters, and propose potential solutions to address these challenges.

ii) Consider the following dataset with 10 data points in a


2-dimensional space:
2 CO3.2 CL3
Data points:
[(2, 3), (3, 5), (5, 4), (7, 3), (9, 6), (10, 8), (11, 5), (12, 7), (13, 6), (15,
7)]

Apply K-Means clustering algorithm to cluster the data into three


clusters.
During a research work, you found seven observations as
described with the data points below. You want to create three
clusters from these observations using K-means algorithm.
After first iteration, the clusters C1, C2, C3 has following
observations:
3 C1: {(2,2), (4,4), (6,6)} CO3.2 CL3
C2: {(0,4), (4,0)}
C3: {(5,5), (9,9)}
If you want to run a second iteration then what will be the cluster
centroids?(SAI KAT DUTT PEARSON BOOK MODEL QUESTION
PAPER – 1)
Describe Dirichlet Process Mixture Models in customer segmentation
4 CO3.2 CL3
for e-commerce Application and the benefits of using that
Given a dataset with n data points, the affinity matrix W is
constructed using the Gaussian kernel similarity measure. The
normalized Laplacian matrix L norm is computed from W and its
5 eigenvectors are computed and stored in the matrix V. If you want to CO3.4 CL2
perform spectral clustering to partition the data into k clusters, explain
how you would proceed using the eigenvectors and provide the
mathematical steps to achieve this clustering.
Determine the Principal Components for the given 2-Dimensional
6 CO3.7 CL3
dataset. (1, 2), (2, 4), (3, 6). AICTE model question
Consider the task of clustering customer purchasing behavior for an
online retail store.
Compare and contrast the use of a Latent Variable Model (such as a
7 Gaussian Mixture Model) with an explicit clustering algorithm (such CO3.9 CL4
as k-means) for clustering customer purchasing behavior. Highlight
the benefits and drawbacks of each approach in the context of this
problem.
How Latent Dirichlet Allocation (LDA) can be extended to
8 incorporate temporal information for modelling evolving topics in a CO3.9 CL3
document collection.

UNIT- 4 GRAPHICAL MODELS


Bayesian Networks – Conditional Independence – Markov Random Fields – Learning – Naive Bayes
Classifiers – Markov Model – Hidden Markov Model.
S.NO Q&A CO K-LEVEL
PART- A
Bayesian networks are graphical models that represent:
a) Hierarchical clustering
1 b) Dimensionality reduction CO4.1 CL2
c) Conditional dependencies among variables
d) Feature selection
Conditional independence in Bayesian networks refers to the property
that:
a) All variables are independent of each other CO4.1 CL2
2
b) Variables are dependent on each other
c) Variables are independent given their parents in the network
d) Variables are independent given their children in the network
What do Bayesian networks represent in the context of probabilistic
graphical models?
a) Feature extraction
3 b) Temporal dependencies Co4.1 CL2
c) Hierarchical clustering
d) Conditional dependencies among variables
Bayesian Networks are graphical models that represent the joint
probability distribution of a set of random variables using a directed
4 acyclic graph (DAG).
a) True CO4.2 CL2
b) False
In a Bayesian network, the conditional probability tables (CPTs) CO4.2 CL2
represent:
a) Priors of the network
5
b) Marginal probabilities of the network
c) Conditional probabilities of the network
d) Joint probabilities of the network
In Bayesian networks, the directed edges represent:
a) Conditional independence
6 b) Markovian dependencies
c) Hierarchical relationships CO4.2 CL2
d) Correlations between variables
Markov random fields (MRFs) are used to model:
a) Hierarchical clustering
7 b) Dimensionality reduction CO4.3 CL1
c) Conditional dependencies among variables
d) Spatial dependencies among variables
Markov models are used to represent:
a) Hierarchical clustering
8 b) Dimensionality reduction
c) Temporal dependencies among variables CO4.3 CL2
d) Categorical dependencies among variables
Markov random fields (MRFs) are commonly used in:
a) Time series analysis
9 b) Image segmentation CO4.3 CL2
c) Natural language processing
d) Regression analysis
Hidden Markov models (HMMs) are characterized by:
a) The number of hidden states and observable states
10 b) The number of parentsof each node CO4.3 CL2
c) The number of parameters in the model
d) The number of conditional probability tables
Hidden Markov models (HMMs) are commonly used for:
a) Dimensionality reduction
11 b) Feature selection CO4.4 CL2
c) Time series analysis
d) Text classification
In a Bayesian network, the nodes represent:
a) Observations
12 b) Hidden variables CO4.4 CL1
c) Parameters
d) Priors
Naive Bayes classifiers are efficient because they:
a) Do not require any training data
b) Assume strong dependencies between features CO4.4 CL2
13 c) Assume conditional independence between features given the class
label
d) Are based on linear regression models
In a Markov random field, the energy function is used to:
a) Model the dependencies between variables
14 b) Estimate the parameters of the model CO4.5 CL1
c) Compute the joint probability distribution
d) Calculate the conditional probabilities
Naive Bayes classifiers are based on the assumption of: CO4.5 CL1
a) Conditional independence among variables
15 b) Markovian dependencies among variables
c) Linear dependencies among variables
d) Hierarchical dependencies among variables
In learning Bayesian networks, the maximum likelihood estimation
(MLE) is used to:
a) Estimate the structure of the network CO4.5 CL1
16
b) Estimate the parameters of the network
c) Estimate the priors of the network
d) Estimate the conditional probabilities of the network
In Bayesian networks, the belief propagation algorithm is used for:
a) Structure learning
17 b) Parameter estimation CO4.5 CL2
c) Inference
d) Model selection
In a Markov model, the Markov property refers to:
a) The assumption of conditional independence
18 b) The assumption of linear regression CO4.5 CL2
c) The assumption of feature selection
d) The assumption of hierarchical dependencies
In Bayesian networks, the structure of the network can be learned
using:
a) Expectation-Maximization (EM) algorithm CO4.5 CL2
19 b) Gradient descent
c) Conditional independence tests
d) Principal Component Analysis (PCA
Hidden Markov models (HMMs) can be trained using:
a) Expectation-Maximization (EM) algorithm
20 b) Gradient descent CO4.5 CL2
c) Principal Component Analysis (PCA)
d) k-means clustering
The Naive Bayes classifier is based on which probability theorem?
a) Bayes' theorem
21 b) Markov's inequality CO4.5 CL2
c) Chebyshev's inequality
d) Central limit theorem
Naive Bayes classifiers assume that the features are:
a) Linearly dependent on each other
22 b) Conditionally independent of each other given the class label CO4.6 CL1
c) Hierarchically dependent on each other
d) Independent of each other
Markov models are commonly used in:
a) Time series analysis
23 b) Clustering analysis CO4.6 CL2
c) Dimensionality reduction
d) Image classification
Markov models are based on the concept of:
a) Conditional independence
24 b) Linear regression
c) Feature selection CO4.6 CL2
d) Hierarchical clustering
Markov random fields (MRFs) are used to model dependencies
between variables in:
a) Time series data CL1
25 b) Spatial data CO4.6
c) Textual data
d) Categorical data
Hidden Markov Models (HMMs) are a type of graphical model CO4.7 CL1
commonly used for supervised learning tasks, such as classification
26 and regression.
a) True
b) False
Hidden Markov models (HMMs) are particularly useful for modeling:
a) Categorical data
27 b) Continuous data
c) Spatial data CL2
d) Textual data CO4.7
Markov random fields (MRFs) are also known as: CO4.8 CL1
a) Bayesian networks
28 b) Hidden Markov models
c) Undirected graphical models
d) Naive Bayes classifiers
Hidden Markov models (HMMs) consist of which types of states?
a) Observable states only
29 b) Hidden states only CO4.7 CL2
c) Observable and hidden states
d) Transition states only
Learning the structure of a Bayesian network involves:
a) Maximizing the log-likelihood of the data
30 b) Minimizing the squared error between predicted and actual values CO4.8 CL1
c) Computing the joint probability distribution
d) Calculating the conditional probabilities

PART –B
Consider the following Bayesian network:

1 CO4.1 CL3

a) Are D and E necessarily independent given evidence about both A


and B?
b) Are A and C necessarily independent given evidence about D?
c) Are A and H necessarily independent given evidence about C?
Sketch how Bayesian Networks skilfully paint conditional probability
2 distributions in their graphical framework. Provide a captivating CO4.2 CL3
example to show your point.
Compare and contrast the local and global Markov properties within
3 Markov Random Fields (MRFs), and explicate their respective CO4.3 CL2
implications on graphical modeling and inference tasks.
Consider the principal motivations driving the utilization of Markov
4 Random Fields as a strong framework for modeling spatial data. infer CO4.3 CL2
its advantages over alternative modeling techniques.
Extract the concept of energy functions within Markov Random
Fields and establish their intricate relationship with the distributional
5 CO4.3 CL2
characteristics of the model. Highlight the role of energy functions in
capturing spatial dependencies.
Outline the process of parameter learning in Bayesian Networks,
emphasizing the underlying algorithms and mathematical
6 CO4.4 CL2
formulations involved in the estimation of model parameters from
data.
Paraphrase Naive Bayes classifier, outlining its foundational
7 assumptions and delineating its potential applications across various CO4.5 CL2
domains.
Outline the Naive Bayes classifier adeptly handles both continuous
8 and categorical features during the classification process. Compare its CO4.5 CL2
feature handling approaches to other classification methods.
Extend the concept of state transition probabilities in Markov Models;
9 highlight their pivotal role in modelling dynamic processes and CO4.6 CL2
time-series data.
Consider the Markov chain with three states, S= {1,2,3}, that has the
following transition matrix:

10 CO4.6 CL2

a) Draw the state transition diagram for this chain.


b) If we know P(X1=1) = P(X1=2) =14, find P (X1=3, X2=2, X3=1)
Compare the observable and hidden states within Hidden Markov
11 Models (HMMs), and delve into the significance of these hidden CO4.7 CL2
states in capturing latent variables and modeling complex systems.
Expound on the practical applications of the forward-backward and
Viterbi algorithms in Hidden Markov Models, specifically focusing
12 CO4.7 CL2
on their role in efficiently decoding and inferring the hidden states
from observed data.
PART –C
a) Consider the following Bayesian network. A, B, C, and D are
Boolean random variables. If we know that A is true, what is the

1 CO4.1 CL1
probability of D being true?
b) For the following Bayesian network
we know that X and Z are not guaranteed to be independent if the
value of Yis unknown. This means that, depending on the
probabilities, X and Z can beindependent or dependent if the value of
Y is unknown. Construct probabilitieswhere X and Z are independent
if the value of Y is unknown, and show thatthey are indeed
independent.
Investigate the concept of d-separation within Bayesian Networks and
its practical utility in detecting conditional independence
2 CO4.2 CL2
relationships. How does this mechanism enhance our ability to infer
probabilistic dependencies efficiently?
Provide a comprehensive illustration of the application of Markov
Random Fields in image segmentation. Detail the model's
3 CO4.3 CL3
components and step-by-step inference process to highlight its
effectiveness in capturing spatial dependencies in image data.
Delve into the challenges involved in learning the structure of
Bayesian Networks from data. Explore cutting-edge approaches and
4 CO4.4 CL2
methodologies aimed at overcoming these challenges and improving
the accuracy of model structure learning.
Elucidate the process of maximum likelihood estimation and
Bayesian parameter estimation within graphical models. Compare and
5 CO4.4 CL1
contrast their respective strengths and weaknesses in the context of
model parameter estimation.
Assess the impact of feature independence assumptions on the
performance of Naive Bayes classifiers. Propose strategies and
6 CO4.5 CL2
techniques to mitigate potential issues related to these assumptions
and enhance classifier accuracy.
characterize the concept of Markov Models and their practical
7 relevance in effectively modeling sequential data and time-series CO4.6 CL1
phenomena across diverse domains.
Engage in a detailed discussion on the key disparities between
observable and hidden states within Hidden Markov Models
8 (HMMs). Expound on the pivotal roles these states play in effectively CO4.7 CL2
modeling latent variables and observed data with real-world
applications in mind.

UNIT- 5 ADVANCED LEARNING


Reinforcement Learning – Representation Learning – Neural Networks – Active Learning – Ensemble
Learning – Bootstrap Aggregation – Boosting – Gradient Boosting Machines – Deep Learning.
S.NO Q&A CO K-LEVEL
PART- A
Which type of learning involves an agent interacting with an
environment to learn optimal actions?
a) Supervised Learning
1
b) Unsupervised Learning CO5.1 CL2
c) Reinforcement Learning
d) Active Learning
Representation learning focuses on:
a) Feature extraction
2 b) Model optimization
c) Label prediction CO5.1 CL2
d) Data visualization
Which technique is used in representation learning to transform
high-dimensional data into a lower-dimensional representation while
preserving its structure? Co5.1 CL2
3 a) Principal Component Analysis (PCA)
b) t-SNE (t-Distributed Stochastic Neighbor Embedding)
c) Autoencoders
d) Gaussian Mixture Models (GMMs
Active learning involves:
a) Using labeled data for model training CO5.2 CL1
4 b) Interacting with users to obtain additional information
c) Reinforcing positive actions in the learning agent
d) Combining multiple models to improve performance
Neural networks are composed of:
a) Nodes and edges
5 b) Weights and biases
c) Layers of interconnected neurons CO5.2 CL2
d) Feature vectors and class labels
Ensemble learning is a technique that:
a) Combines multiple models to improve performance
6 b) Selects the best model from a pool of options
c) Trains models on random subsets of the data CO5.3 CL2
d) Focuses on reducing model complexity
Bootstrap aggregation (bagging) is a technique used in ensemble
learning that involves:
a) Training models on random subsets of the data
7
b) Combining the outputs of multiple models using majority voting CO5.3 CL2
c) Boosting the performance of weak models
d) Applying gradient boosting to improve model accuracy
Boosting is a technique in ensemble learning that:
a) Combines the outputs of multiple models using majority voting
b) Trains models sequentially, giving more weight to misclassified
8
instances
c) Uses gradient boosting to optimize model parameters CO5.4 CL2
d) Focuses on reducing the variance of the models
Gradient Boosting Machines (GBMs) are:
a) A type of neural network architecture
9 b) A reinforcement learning algorithm
c) Ensemble models based on gradient boosting CO5.4 CL2
d) Deep learning models with multiple layers
Match the Learning Method with its corresponding description:

A. Active Learning CO5.4 CL2


B. Deep Learning
C. Gradient Boosting Machines
D. Representation Learning

1. Feature Extraction
10 2. Multiple Hidden Layers
3. Sequential Ensemble Learning
4. Informative Sample Selection
Answer:
A-4
B-2
C-3
D-1
Which of the following is not a common activation function used in
neural networks?
a) Sigmoid
11
b) ReLU
c) Tanh CO5.4 CL2
d) Logistic
Which technique is used in reinforcement learning to estimate the
value of state-action pairs based on the observed rewards and the
estimated values of future states? CO5.5 CL2
12 a) Q-learning
b) Deep Q-networks (DQN)
c) Monte Carlo methods
d) Policy gradients
Which technique is used in representation learning to learn a
compressed representation of the data by encoding and decoding it
through a neural network? CO5.5 CL2
13 a) Principal Component Analysis (PCA)
b) t-SNE (t-Distributed Stochastic Neighbor Embedding)
c) Autoencoders
d) Gaussian Mixture Models (GMMs)
Which ensemble learning technique combines the predictions of
multiple models byweighting each model's prediction based on its
performance? CO5.5 CL2
14 a) Bagging
b) Boosting
c) Stacking
d) Random forest
Which technique is used in active learning to select the most
informative instances by measuring the uncertainty of the model's
predictions? CO5.5 CL2
15 a) K-means clustering
b) Decision trees
c) Support vector machines
d) Uncertainty sampling
Which technique is used in ensemble learning to train models on
different subsets of the data by resampling the data with replacement?
a) Bagging CO5.5 CL2
16
b) Boosting
c) Stacking
d) Random forest
Which ensemble learning technique is based on creating a diverse set
of models and combining their predictions using weighted averaging?
a) Bagging CO5.6 CL2
17
b) Boosting
c) Stacking
d) Random forest
Which technique is used to approximate a target function by fitting a
series of weak models one after another, each correcting the mistakes
of the previous model?
18 a) Gradient boosting
b) Deep learning CO5.6 CL2
c) Reinforcement learning
d) Support vector machines
Which technique is used in active learning to select the most
informative instances for labeling?
a) K-means clustering
19
b) Decision trees CO5.6 CL2
c) Support vector machines
d) Uncertainty sampling
Which ensemble learning technique aims to combine multiple weak
classifiers to create a strong classifier?
a) Bagging CO5.6 CL2
20
b) Boosting
c) Stacking
d) Random forest
Which technique is used to improve the generalization ability of
neural networks by artificially increasing the size of the training
dataset through creating modified copies of existing data points?
21 a) Data augmentation CO5.6 CL2
b) Dropout regularization
c) Weight decay
d) Batch normalization
Which technique is used in reinforcement learning to determine the CO5.6 CL2
value of state-action pairs based on the expected future rewards?
a) Q-learning
22
b) Deep Q-networks (DQN)
c) Monte Carlo methods
d) Policy gradients
Which technique is used in representation learning to automatically
learn feature representations from the raw input data?
a) Principal Component Analysis (PCA)
23
b) t-SNE (t-Distributed Stochastic Neighbor Embedding)
c) Autoencoders
d) Gaussian Mixture Models (GMMs) CO5.6 CL2
Which ensemble learning technique combines the predictions of
multiple models by averaging or voting to make the final prediction?
a) Bagging
24
b) Boosting CO5.6 CL2
c) Stacking
d) Random forest
Which technique is used in active learning to iteratively query the
labels for instances that are the most uncertain or difficult to classify?
a) K-means clustering
25
b) Decision trees
c) Support vector machines
d) Uncertainty sampling CO5.6 CL2
Which technique is used in ensemble learning to reduce overfitting by
training models on different subsets of the data and averaging their
predictions?
26 a) Bagging CO5.6 CL2
b) Boosting
c) Stacking
d) Random forest
Which ensemble learning technique creates a meta-model that
combines the predictions of multiple base models?
a) Bagging
27
b) Boosting
c) Stacking
d) Random forest CO5.6 CL2
Which technique is used in boosting to assign higher weights to
misclassified instances during model training?
a) Bagging
28 b) AdaBoost
c) Gradient boosting
d) Random forest CO5.6 CL2
Which technique is used in representation learning to automatically CL2
learn feature representations from the raw input data?
a) Principal Component Analysis (PCA) CO5.6
29
b) t-SNE (t-Distributed Stochastic Neighbor Embedding)
c) Autoencoders
d) Gaussian Mixture Models (GMMs)
Which ensemble learning technique combines the predictions of
multiple models by averaging or voting to make the final prediction?
a) Bagging
30 b) Boosting CO5.6 CL2
c) Stacking
d) Random forest
PART –B
delineate reinforcement learning and outline its core components that
1 CO5.1 CL2
enable agents to learn from interactions with an environment.
Could you provide an instance of a real-world application where
2 reinforcement learning is effectively utilized, showcasing its CO5.1 CL3
practicality and impact in diverse domains?
Expound on the concept of feature learning within representation
learning, elucidating how algorithms autonomously discover
3 CO5.2 CL2
meaningful features from raw data without explicit human
intervention.
Provide a succinct overview of a basic neural network's structure,
4 including its layers and their respective functions in information CO5.3 CL1
processing.
Explore the concept of active learning and highlight its advantages
5 over passive learning approaches, emphasizing the proactive role of CO5.4 CL2
the learner in selecting informative data samples.
Can you present a compelling scenario where active learning plays a
6 crucial role in significantly reducing labeling costs while achieving CO5.4 CL3
high model accuracy?
put in plain words the essence of ensemble learning and how it
7 harnesses the collective wisdom of multiple models to enhance CO5.5 CL1
overall predictive performance.
Compare and contrast bagging and boosting techniques in ensemble
8 learning, showcasing their distinctive mechanisms for aggregating CO5.5 CL2
model predictions.
Could you provide an illustrative example of an algorithm that
9 employs bagging to improve its performance, demonstrating the CO5.6 CL3
practical utility of this ensemble method?
Delve into the concept of boosting in ensemble learning, illustrating
10 its unique characteristics and how it differentiates from bagging in CO5.7 CL2
terms of model aggregation.
Discuss a compelling scenario where Gradient Boosting Machines
11 (GBMs) might be preferred over deep learning models, highlighting CO5.8 CL1
the strengths of GBMs in handling specific challenges.
Enumerate the key building blocks that constitute a deep neural
12 network, providing a concise overview of the crucial components CO5.9 CL1
responsible for its powerful learning capabilities.
PART –C
Imagine a self-driving car navigating through a complex urban
environment. The car's objective is to reach its destination while
avoiding obstacles, obeying traffic rules, and optimizing its
1 CO5.1 CL3
travel time. How reinforcement learning algorithms can
effectively balance exploration and exploitation to achieve
optimal performance in this real-world scenario.
Elaborate on strategies that can be employed to prevent or mitigate
2 CO5.2 CL2
over fitting during the training of representation learning models?
Suppose that we want to build a neural network that classifies two
dimensional data (i.e., X = [x1, x2]) into two classes: diamonds and
crosses. We have a set of training data that is plotted as follows:

3 CO5.3 CL3

Draw a network that can solve this classification problem. Justify


your choice of the number of nodes and the architecture. Draw the
decision boundary that your network can find on the diagram.
AICTE model question
Consider the following Neural Network with alpha = 0.5, eta=0.24,
desired output = 1 and sigmoid activation function.
i. Perform one forward pass and calculate the error.
ii. Calculate the updated weights for w5 and w6 using back-
propagation.

4 CO5.4 CL3

AICTE model question


Could you provide an analysis of popular activation functions like
5 ReLU, Sigmoid, and Tanh, highlighting their effects on model CO5.5 CL2
training and their suitability for different scenarios?
How does the concept of active learning come into play when dealing
with extensive datasets in real-world contexts? Could you provide an
6 analysis of the advantages and hurdles associated with implementing CO5.6 CL2
active learning in various applications, particularly in terms of
optimizing model performance and efficiently allocating resources
Extend techniques to counteract overfitting when employing boosting
7 CO5.7 CL2
algorithms to ensure model generalization.
Choose real-world instances where deep learning has exhibited
8 superior performance compared to traditional machine learning CO5.9 CL2
approaches.

You might also like