0% found this document useful (0 votes)

47 views18 pages

Sample Q - A For Module 3 - 4

Aiml

Uploaded by

hui88791222

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views18 pages

Sample Q - A For Module 3 - 4

Aiml

Uploaded by

hui88791222

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

SAMPLE Q&A

Presenter: Dr. Amit Kumar Das

Professor,
Dept. of Computer Science and Engg.,
Institute of Engineering & Management.
SIMPLE QUESTIONS
Question: What is Machine Learning?
Answer: Machine Learning (ML) is a subset of artificial intelligence that
enables systems to learn and improve from experience without being
explicitly programmed. It involves algorithms that can identify patterns and
make decisions based on data.

Question: What is a dataset in Machine Learning?

Answer: A dataset is a collection of data that is used for training, validating,
and testing a machine learning model. It typically consists of multiple
instances, each with a set of features and a target variable.

Question: How can you handle missing data in a dataset?

Answer: Missing data can be handled by various methods such as removing
rows with missing values, imputing missing values using mean, median,
mode, or more sophisticated techniques like k-nearest neighbors or
regression imputation.
SIMPLE QUESTIONS (CONTD.)
Question: What is a feature set in Machine Learning?
Answer: A feature set is a collection of variables or attributes used to
describe the instances in a dataset. These features are used as input to the
machine learning model.

Question: Why is feature selection important?

Answer: Feature selection is important because it helps improve model
performance by removing irrelevant or redundant features, reduces
overfitting, and decreases training time.

Question: What is a holdout set? What is the purpose of dividing a

dataset into training, validation, and test sets?
Answer: A holdout set is a subset of the dataset that is not used during
training and is reserved for final evaluation of the model to assess its
generalization performance.
Dividing a dataset into training, validation, and test sets ensures that the
model is trained, validated, and tested on different subsets of data to
evaluate its performance and generalize well to unseen data.
SIMPLE QUESTIONS (CONTD.)
Question: What is k-fold cross-validation?
Answer: K-fold cross-validation is a technique where the dataset is divided
into k subsets, and the model is trained and validated k times, each time
using a different subset as the validation set and the remaining k-1 subsets as
the training set. This helps in providing a more robust evaluation of the
model.

Question: What is LOOCV (Leave-One-Out Cross-Validation)?

Answer: LOOCV is a special case of k-fold cross-validation where k is
equal to the number of instances in the dataset. Each instance is used once as
the validation set, and the model is trained on the remaining instances. This
is computationally expensive but provides an almost unbiased estimate of
model performance.

Question: What is bootstrap sampling?

Answer: Bootstrap sampling is a technique where multiple samples are
drawn from the dataset with replacement, and models are trained on these
samples. This helps in estimating the distribution of a statistic and assessing
model stability.
SIMPLE QUESTIONS (CONTD.)

Question: What does fitting a model mean in Machine Learning?

Answer: Fitting a model means training the model on a dataset by adjusting
its parameters to minimize the error or maximize the accuracy on the
training data. It involves finding the best parameters that allow the model to
make accurate predictions.

Question: What is a confusion matrix?

Answer: A confusion matrix is a table used to evaluate the performance of a
classification model. It shows the counts of true positives, true negatives,
false positives, and false negatives, providing insights into the model's
accuracy, precision, recall, and other metrics.

Question: What is precision?

Answer: Precision is the ratio of true positive predictions to the total
predicted positives. It measures the accuracy of the positive predictions
made by the model.
SIMPLE QUESTIONS (CONTD.)
Question: What is recall?
Answer: ecall, also known as sensitivity or true positive rate, is the ratio of
true positive predictions to the total actual positives. It measures the model's
ability to identify all relevant instances in the dataset.

Question: What is the F-score?

Answer: The F-score is the harmonic mean of precision and recall. It
provides a single metric that balances both precision and recall, especially
useful in situations where there is an imbalance between positive and
negative classes.

Question: What is the ROC curve?

Answer: The ROC (Receiver Operating Characteristic) curve is a graphical
representation of a classifier's performance. It plots the true positive rate
against the false positive rate at various threshold settings. The area under
the ROC curve (AUC) is used to evaluate the overall performance of the
model.
SIMPLE QUESTIONS (CONTD.)
Question: What is cross-entropy loss?
Answer: Cross-entropy loss, also known as log loss, is a measure of the
difference between the true distribution and the predicted distribution of a
classification model. It is commonly used as a loss function for training
classification models, with lower values indicating better model
performance.

Question: What is MME (Mean Squared Error)?

Answer: MME is the average of the squared differences between the
observed and predicted values. It is a commonly used metric to evaluate the
accuracy of regression models.

Question: What is R2 or R-squared (R²)?

Answer: R-squared is a statistical measure that represents the proportion of
the variance for a dependent variable that is explained by an independent
variable or variables in a regression model. It ranges from 0 to 1, with higher
values indicating better model performance.
SIMPLE QUESTIONS (CONTD.)
Question: You are given a dataset for predicting customer churn. What
steps would you take to develop a supervised learning model?
Answer: First, I would perform exploratory data analysis (EDA) to
understand the dataset and identify any missing values or anomalies. Then, I
would preprocess the data by handling missing values, encoding categorical
variables, and scaling numerical features. Next, I would split the data into
training and test sets. I would select a suitable supervised learning algorithm
(e.g., logistic regression, decision trees, random forests), train the model,
and fine-tune hyperparameters using cross-validation. Finally, I would
evaluate the model using appropriate metrics like accuracy, precision, recall,
and F1-score, and refine the model as needed.

Question: What are some common challenges faced when working with
supervised learning models?
Answer: Common challenges include dealing with imbalanced datasets,
selecting relevant features, avoiding overfitting or underfitting, handling
noisy data, and ensuring the model generalizes well to unseen data.
Additionally, the need for a large amount of labeled data can be a significant
challenge.
SIMPLE QUESTIONS (CONTD.)
Question: How would you choose between using a supervised learning
model and an unsupervised learning model for a given problem?
Answer: The choice depends on the problem and the availability of labeled
data. If labeled data is available and the goal is to predict specific outcomes, a
supervised learning model is appropriate. If the goal is to find hidden patterns
or structure in the data without labeled outcomes, an unsupervised learning
model is suitable. Additionally, the nature of the task (e.g., classification,
regression, clustering, anomaly detection) and specific requirements (e.g.,
interpretability, scalability) will influence the choice of model.

Question: Describe a scenario where semi-supervised learning would be

beneficial.
Answer: Semi-supervised learning is beneficial in scenarios where labeled data
is scarce but unlabeled data is abundant. For example, in medical imaging,
obtaining labeled data requires expert annotation, which is expensive and time-
consuming. Semi-supervised learning can leverage a small amount of labeled
images and a large amount of unlabeled images to train a model that performs
better than using labeled data alone.
SITUATION-BASED QUESTIONS
Question: Imagine you're working on a project to predict housing prices.
How would you define the problem in terms of Machine Learning?
Answer: The problem can be defined as a regression task where the goal is
to predict the continuous variable, which is the housing price, based on
features like location, size, number of bedrooms, age of the house, etc. The
dataset would include historical data on houses and their prices.

Question: In a scenario where you have time-series data, why might

traditional k-fold cross-validation not be appropriate, and what alternative
method could you use?
Answer: Traditional k-fold cross-validation might not be appropriate for
time-series data because it doesn't account for the temporal ordering of data.
An alternative method could be time-series split (also known as rolling
window cross-validation) where the data is split in a way that respects the
temporal order, ensuring that the training set always precedes the validation
set.
SITUATION-BASED (CONTD.)
Question: You're training a model and notice it performs exceptionally
well on the training data but poorly on the validation data. What steps
might you take to address this issue?
Answer: This issue indicates overfitting. To address it, I might consider
simplifying the model by reducing its complexity, gathering more training
data, applying data augmentation, or using ensemble methods.

Question: In a credit card fraud detection system, why might you prioritize
precision over recall or vice versa?
Answer: In a credit card fraud detection system, prioritizing precision means
focusing on minimizing false positives (non-fraudulent transactions flagged
as fraudulent), which is important to avoid inconveniencing customers.
Prioritizing recall means focusing on minimizing false negatives (fraudulent
transactions not detected), which is crucial to prevent fraud. The choice
depends on the business impact; if false positives are more costly (e.g.,
causing customer dissatisfaction), precision might be prioritized. If false
negatives are more costly (e.g., financial losses), recall might be prioritized.
SITUATION-BASED (CONTD.)
Question: You’re tasked with building a predictive maintenance system for
industrial machines. How would you approach the problem of imbalanced
datasets where failures are rare?
Answer: For imbalanced datasets, I would consider techniques such as
resampling (over-sampling the minority class or under-sampling the
majority class), and generating synthetic samples using SMOTE (Synthetic
Minority Over-sampling Technique).

Question: In the context of a medical diagnosis model, explain the trade-

off between sensitivity (recall) and specificity. Why might you prioritize
one over the other?
Answer: Sensitivity (recall) measures the proportion of actual positives
correctly identified, while specificity measures the proportion of actual
negatives correctly identified. In medical diagnosis, prioritizing sensitivity is
crucial when the cost of missing a positive case (e.g., a disease) is high, as it
ensures more true positives are caught. Prioritizing specificity is important
when the cost of false positives (e.g., unnecessary treatment) is high. The
trade-off depends on the medical condition and associated risks.
SITUATION-BASED (CONTD.)
Question: How would you explain the concept of overfitting to a non-
technical stakeholder?
Answer: Overfitting occurs when a model learns the details and noise in the
training data too well, leading to poor performance on new, unseen data. It's
like a student who memorizes the answers to specific questions rather than
understanding the underlying concepts, and thus performs poorly on
different questions in an exam.

Question: If your model for loan approval shows bias against a certain
demographic, how would you address this issue?
Answer: To address bias, I would start by analyzing the dataset for any
biases in the input features. Techniques like re-sampling the data to ensure
balanced representation, and adjusting the model to remove bias would be
considered. I would also implement fairness metrics to regularly monitor the
model’s performance across different demographics and ensure transparency
by documenting the steps taken to mitigate bias.
SIMPLE QUESTIONS
Question: What is the difference between binary, multi-class, and multi-
label classification?
Answer:
Binary classification involves two classes (e.g., spam vs. not spam)
Multi-class classification involves more than two classes, where each
instance is assigned to only one class (e.g., classifying animals into cats,
dogs, or birds)
Multi-label classification allows each instance to be assigned to multiple
classes simultaneously (e.g., a movie can be both action and comedy).

Question: What are the key challenges in dealing with imbalanced

datasets, and how can you address them?
Answer: Challenges include biased predictions towards the majority class,
and low recall for the minority class. Solutions include resampling
techniques (e.g., oversampling the minority class, undersampling the
majority class), applying algorithms designed to handle imbalance (e.g.,
SMOTE).
SIMPLE QUESTIONS (CONTD.)
Question: What is clustering, and how does it differ from classification?
Answer: Clustering is an unsupervised learning technique that groups data
points into clusters based on similarity, without using labeled data.
Classification, on the other hand, is a supervised learning technique that
assigns labels to data points based on a trained model.

Question: What is the difference between hierarchical and partitioned

clustering?
Answer: Hierarchical clustering builds a tree of clusters either by a bottom-
up approach (agglomerative) or a top-down approach (divisive). Partitioned
clustering, like K-means, assigns data points to a predefined number of
clusters by optimizing an objective function (e.g., minimizing within-cluster
variance).

Question: What is the role of distance metrics in clustering algorithms?

Answer: Distance metrics (e.g., Euclidean, Manhattan, etc.) determine the
similarity between data points. They play a crucial role in clustering
algorithms by influencing the formation of clusters based on how "close" the
points are to each other.
SIMPLE QUESTIONS (CONTD.)
Question: Explain the concept of feature importance in a Decision Tree
and how it is calculated.
Answer: Feature importance in a Decision Tree is a measure of how much a
particular feature contributes to the model's predictive power. It is typically
calculated based on the decrease in impurity (e.g., Gini index, entropy) at
each split involving that feature.

Question: What is density-based clustering? Name an algorithm that uses

this approach.
Answer: Density-based clustering identifies clusters based on areas of high
data point density. Points in high-density regions are grouped together, while
points in low-density regions are considered noise or outliers. An example of
a density-based clustering algorithm is DBSCAN (Density-Based Spatial
Clustering of Applications with Noise). DBSCAN is particularly effective at
finding clusters of arbitrary shapes and handling outliers.
SIMPLE QUESTIONS (CONTD.)
Question: What is the “elbow point”?
Answer: The elbow point is a heuristic used in K-means clustering to
determine the optimal number of clusters. It is found by plotting the within-
cluster sum of squares (WCSS) against the number of clusters and looking
for a point where the rate of decrease sharply slows down, resembling an
elbow. The elbow point indicates the optimal number of clusters, balancing
between minimizing WCSS and avoiding too many clusters.

Question: What is multi-collinearity in a regression equation?

Answer: Multicollinearity occurs when two or more independent variables
in a regression model are highly correlated, meaning they contain similar
information about the variance in the dependent variable. This can make it
difficult to isolate the individual effect of each predictor on the dependent
variable, leading to unreliable estimates of the regression coefficients and
inflated standard errors.
THANK YOU &
QUESTIONS PLEASE!

Full ML Viva Questions Answers Q1 To Q70
No ratings yet
Full ML Viva Questions Answers Q1 To Q70
6 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
ML Interview Questions PDF
100% (5)
ML Interview Questions PDF
20 pages
Machine Learning Questions
100% (1)
Machine Learning Questions
19 pages
ML Unit 2
No ratings yet
ML Unit 2
86 pages
Interview Questions For DS & DA (ML)
100% (1)
Interview Questions For DS & DA (ML)
66 pages
Top 100 Machine Learning Questions With Answers For Interview PDF
100% (3)
Top 100 Machine Learning Questions With Answers For Interview PDF
48 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
ML Unit 2 Part 1
No ratings yet
ML Unit 2 Part 1
47 pages
NLP Chapter 2
No ratings yet
NLP Chapter 2
79 pages
ML Unit 2
No ratings yet
ML Unit 2
35 pages
Complete ML Notes
No ratings yet
Complete ML Notes
62 pages
Basic Interview Q's On ML PDF
100% (2)
Basic Interview Q's On ML PDF
243 pages
Interview Question For Data Science
No ratings yet
Interview Question For Data Science
33 pages
ABDUA 3 and 4
No ratings yet
ABDUA 3 and 4
102 pages
Data Science Interview Questions
100% (1)
Data Science Interview Questions
68 pages
Machine Learning General: Definiton
No ratings yet
Machine Learning General: Definiton
14 pages
Data Science 1731953513
No ratings yet
Data Science 1731953513
33 pages
Module3 DS PPT
No ratings yet
Module3 DS PPT
68 pages
ML Viva Questions
No ratings yet
ML Viva Questions
25 pages
40 ML Interview Questions That You Must Know (2024) - Reader View
No ratings yet
40 ML Interview Questions That You Must Know (2024) - Reader View
13 pages
40 Interview Questions On Machine Learning From Analytics Vidhya
No ratings yet
40 Interview Questions On Machine Learning From Analytics Vidhya
14 pages
Q1-What's The Trade-Off Between Bias and Variance?
100% (1)
Q1-What's The Trade-Off Between Bias and Variance?
5 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
24 pages
40 ML Interview Questions
No ratings yet
40 ML Interview Questions
12 pages
Machine Learning Interview Question
No ratings yet
Machine Learning Interview Question
9 pages
Machine Learning Qs
No ratings yet
Machine Learning Qs
10 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
ML Question Answer
No ratings yet
ML Question Answer
21 pages
Week 4 - Intro To ML
No ratings yet
Week 4 - Intro To ML
37 pages
Brain, Bytes & Bias: ML Interview Questions You Can't Miss!
No ratings yet
Brain, Bytes & Bias: ML Interview Questions You Can't Miss!
21 pages
15 Mlops Interview Questions For 2025
No ratings yet
15 Mlops Interview Questions For 2025
13 pages
Interview Questions On Machine Learning
100% (4)
Interview Questions On Machine Learning
22 pages
Data Science Interview Question
No ratings yet
Data Science Interview Question
7 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
ML Theory
No ratings yet
ML Theory
10 pages
ML Mindbenders: Interview Questions That'll Make You Sweat (Smartly) !
No ratings yet
ML Mindbenders: Interview Questions That'll Make You Sweat (Smartly) !
21 pages
QUIZ Data
No ratings yet
QUIZ Data
18 pages
Machine Learning Viva Questions
No ratings yet
Machine Learning Viva Questions
6 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
40 Interview Questions Asked at Startups in Machine Learning - Data Science
No ratings yet
40 Interview Questions Asked at Startups in Machine Learning - Data Science
13 pages
ML DS Interview Quetions
No ratings yet
ML DS Interview Quetions
17 pages
Machine Learning Units 1 To 5 Bolded Questions
No ratings yet
Machine Learning Units 1 To 5 Bolded Questions
19 pages
Machine Learning Vapnik-Chervonenkis (VC) Dimension
No ratings yet
Machine Learning Vapnik-Chervonenkis (VC) Dimension
4 pages
Data Science Intervieew Questions
100% (1)
Data Science Intervieew Questions
16 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Machine Learning One Mark Answers
No ratings yet
Machine Learning One Mark Answers
4 pages
Data Analyst Interview Questionaries
No ratings yet
Data Analyst Interview Questionaries
16 pages
Aiml-Qb - Unit 3
No ratings yet
Aiml-Qb - Unit 3
6 pages
Ml-Unit 2-QB
No ratings yet
Ml-Unit 2-QB
6 pages
Supervised Learning Notes
No ratings yet
Supervised Learning Notes
2 pages
40 Interview Questions On Machine Learning - AnalyticsVidhya
100% (1)
40 Interview Questions On Machine Learning - AnalyticsVidhya
21 pages
Machine Learning IQs
100% (1)
Machine Learning IQs
13 pages
Machine Learning Interview Questions PDF
No ratings yet
Machine Learning Interview Questions PDF
14 pages
Simplified Viva EDA
No ratings yet
Simplified Viva EDA
7 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
Bobst Customer Training Die Cutting 2018 EN
100% (1)
Bobst Customer Training Die Cutting 2018 EN
9 pages
Date of Risk Assessment For ISO 17025
100% (1)
Date of Risk Assessment For ISO 17025
4 pages
Booth Competition Mechanics
No ratings yet
Booth Competition Mechanics
2 pages
Dynamic BUsiness Environment
No ratings yet
Dynamic BUsiness Environment
16 pages
World GK MCQs For PPSC Set II PDF
No ratings yet
World GK MCQs For PPSC Set II PDF
12 pages
Learning Area Learning Delivery Modality Lesson Exempla R: School Teacher Teaching Date Teaching Time
100% (1)
Learning Area Learning Delivery Modality Lesson Exempla R: School Teacher Teaching Date Teaching Time
4 pages
Crim 103
No ratings yet
Crim 103
7 pages
Informal Assessment GRADE 11
No ratings yet
Informal Assessment GRADE 11
37 pages
Concrete Masonry Report
No ratings yet
Concrete Masonry Report
21 pages
Supermind - Hustle & Conquer
No ratings yet
Supermind - Hustle & Conquer
58 pages
Learning-Activity-TESDA-June-25-2022-Reorganization - DIZON
No ratings yet
Learning-Activity-TESDA-June-25-2022-Reorganization - DIZON
5 pages
SWOT Analysis - Docx 2
No ratings yet
SWOT Analysis - Docx 2
11 pages
NLP Mod 1 (New)
No ratings yet
NLP Mod 1 (New)
50 pages
Quran Thesis Statement
100% (3)
Quran Thesis Statement
5 pages
10 1108 - 978 1 78769 537 520191001
No ratings yet
10 1108 - 978 1 78769 537 520191001
29 pages
Modulo Ingles Basica Superior Intensiva
No ratings yet
Modulo Ingles Basica Superior Intensiva
32 pages
AMR Update
No ratings yet
AMR Update
6 pages
Multidimensional Locus of Control Attitude Scale Levenson Miller 1976 JPSP
No ratings yet
Multidimensional Locus of Control Attitude Scale Levenson Miller 1976 JPSP
10 pages
Inversion + Passive Voices
No ratings yet
Inversion + Passive Voices
51 pages
Assignment 4 - CSE - AI - 2
No ratings yet
Assignment 4 - CSE - AI - 2
6 pages
Performance Report - FMGE 2019-1
No ratings yet
Performance Report - FMGE 2019-1
20 pages
New Comicverse Brochure
No ratings yet
New Comicverse Brochure
26 pages
A Jury of Her Peers Questions
No ratings yet
A Jury of Her Peers Questions
2 pages
Mole Concept Class XI Notes CBSE
No ratings yet
Mole Concept Class XI Notes CBSE
25 pages
WAABERI ACADEMY (AutoRecovered)
No ratings yet
WAABERI ACADEMY (AutoRecovered)
31 pages
Pawan Meena Resume Technical
No ratings yet
Pawan Meena Resume Technical
1 page
ISO 37120 City Indicators - City of Pickering
No ratings yet
ISO 37120 City Indicators - City of Pickering
37 pages
Int'l Application Guidelines
No ratings yet
Int'l Application Guidelines
18 pages
CC Day5
No ratings yet
CC Day5
1 page
CA3
No ratings yet
CA3
5 pages
Classification of Bones.
No ratings yet
Classification of Bones.
20 pages
Poetry
No ratings yet
Poetry
3 pages
TERMINAL PLANES-WPS Office
No ratings yet
TERMINAL PLANES-WPS Office
4 pages
Logistics Manager - Franco Canzani
No ratings yet
Logistics Manager - Franco Canzani
2 pages
Hope 1 (Reveiwer)
No ratings yet
Hope 1 (Reveiwer)
2 pages

Sample Q - A For Module 3 - 4

Uploaded by

Sample Q - A For Module 3 - 4

Uploaded by

SAMPLE Q&A

Presenter: Dr. Amit Kumar Das

Question: What is a dataset in Machine Learning?

Question: How can you handle missing data in a dataset?

Question: Why is feature selection important?

Question: What is a holdout set? What is the purpose of dividing a

Question: What is LOOCV (Leave-One-Out Cross-Validation)?

Question: What is bootstrap sampling?

Question: What does fitting a model mean in Machine Learning?

Question: What is a confusion matrix?

Question: What is precision?

Question: What is the F-score?

Question: What is the ROC curve?

Question: What is MME (Mean Squared Error)?

Question: What is R2 or R-squared (R²)?

Question: Describe a scenario where semi-supervised learning would be

Question: In a scenario where you have time-series data, why might

Question: In the context of a medical diagnosis model, explain the trade-

Question: What are the key challenges in dealing with imbalanced

Question: What is the difference between hierarchical and partitioned

Question: What is the role of distance metrics in clustering algorithms?

Question: What is density-based clustering? Name an algorithm that uses

Question: What is multi-collinearity in a regression equation?

You might also like