Sample Q - A For Module 3 - 4
Sample Q - A For Module 3 - 4
Question: What are some common challenges faced when working with
supervised learning models?
Answer: Common challenges include dealing with imbalanced datasets,
selecting relevant features, avoiding overfitting or underfitting, handling
noisy data, and ensuring the model generalizes well to unseen data.
Additionally, the need for a large amount of labeled data can be a significant
challenge.
SIMPLE QUESTIONS (CONTD.)
Question: How would you choose between using a supervised learning
model and an unsupervised learning model for a given problem?
Answer: The choice depends on the problem and the availability of labeled
data. If labeled data is available and the goal is to predict specific outcomes, a
supervised learning model is appropriate. If the goal is to find hidden patterns
or structure in the data without labeled outcomes, an unsupervised learning
model is suitable. Additionally, the nature of the task (e.g., classification,
regression, clustering, anomaly detection) and specific requirements (e.g.,
interpretability, scalability) will influence the choice of model.
Question: In a credit card fraud detection system, why might you prioritize
precision over recall or vice versa?
Answer: In a credit card fraud detection system, prioritizing precision means
focusing on minimizing false positives (non-fraudulent transactions flagged
as fraudulent), which is important to avoid inconveniencing customers.
Prioritizing recall means focusing on minimizing false negatives (fraudulent
transactions not detected), which is crucial to prevent fraud. The choice
depends on the business impact; if false positives are more costly (e.g.,
causing customer dissatisfaction), precision might be prioritized. If false
negatives are more costly (e.g., financial losses), recall might be prioritized.
SITUATION-BASED (CONTD.)
Question: You’re tasked with building a predictive maintenance system for
industrial machines. How would you approach the problem of imbalanced
datasets where failures are rare?
Answer: For imbalanced datasets, I would consider techniques such as
resampling (over-sampling the minority class or under-sampling the
majority class), and generating synthetic samples using SMOTE (Synthetic
Minority Over-sampling Technique).
Question: If your model for loan approval shows bias against a certain
demographic, how would you address this issue?
Answer: To address bias, I would start by analyzing the dataset for any
biases in the input features. Techniques like re-sampling the data to ensure
balanced representation, and adjusting the model to remove bias would be
considered. I would also implement fairness metrics to regularly monitor the
model’s performance across different demographics and ensure transparency
by documenting the steps taken to mitigate bias.
SIMPLE QUESTIONS
Question: What is the difference between binary, multi-class, and multi-
label classification?
Answer:
Binary classification involves two classes (e.g., spam vs. not spam)
Multi-class classification involves more than two classes, where each
instance is assigned to only one class (e.g., classifying animals into cats,
dogs, or birds)
Multi-label classification allows each instance to be assigned to multiple
classes simultaneously (e.g., a movie can be both action and comedy).