Machine Learning Question Bank Answers
1. Key Differences Between Classification and Regression
Tasks
- Question: What are the key differences between classification
and regression tasks in supervised learning? Provide examples
of each.
- Answer:
- Classification: Predicts discrete class labels (e.g., spam or
not spam). Examples include email spam detection and image
classification.
- Regression: Predicts continuous values (e.g., house prices).
Examples include predicting temperature and stock prices.
- Key Difference: Classification outputs categories, while
regression outputs numerical values.
- Diagram: A comparison chart showing classification vs.
regression.
- Example:
- Classification: A model predicts whether an email is spam
(binary classification).
- Regression: A model predicts the price of a house based
on features like size and location.
2. Typical Workflow of a Machine Learning Project
- Question: Outline the typical workflow of a machine learning
project, from data collection to model deployment.
- Answer:
1. Data Collection: Gather raw data from various sources
(e.g., databases, APIs).
2. Data Preprocessing: Clean, normalize, and transform data.
3. Feature Engineering: Select and create relevant features.
4. Model Selection: Choose an appropriate algorithm (e.g.,
linear regression, decision trees).
5. Training: Train the model on the training dataset.
6. Evaluation: Evaluate the model using metrics like accuracy,
precision, or mean squared error.
7. Hyperparameter Tuning: Optimize model parameters.
8. Deployment: Deploy the model in a production
environment.
- Example: A retail company uses this workflow to predict
customer churn.
- Diagram: A flowchart showing the machine learning
workflow.
3. Linear Regression Model
- Question: Describe the linear regression model. How is it
used to predict continuous outcomes, and what are its key
assumptions?
- Answer:
- Definition: Linear regression models the relationship
between a dependent variable (target) and one or more
independent variables (features) using a linear equation.
- Equation: \( y = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \
beta_nx_n + \epsilon \), where \( y \) is the target, \( \beta_0 \) is
the intercept, \( \beta_1, \beta_2, ..., \beta_n \) are coefficients,
and \( \epsilon \) is the error term.
- Key Assumptions:
1. Linearity: The relationship between features and target is
linear.
2. Independence: Errors are independent of each other.
3. Homoscedasticity: Errors have constant variance.
4. Normality: Errors are normally distributed.
- Example: Predicting house prices based on features like
size, location, and number of bedrooms.
- Graph: A scatter plot with a best-fit line.
4. Differences Between Binary, Multi-Class, and Multi-Label
Classification
- Question: Discuss the differences between binary
classification, multi-class classification, and multi-label
classification, providing industry examples for each.
- Answer:
- Binary Classification: Predicts one of two classes (e.g., spam
or not spam).
- Example: Email spam detection.
- Multi-Class Classification: Predicts one of more than two
classes (e.g., cat, dog, bird).
- Example: Classifying images of animals.
- Multi-Label Classification: Predicts multiple labels for a single
instance (e.g., a document can be about both sports and
politics).
- Example: Tagging articles with multiple topics.
- Diagram: A comparison chart showing binary, multi-class,
and multi-label classification.
-
5. K-Nearest Neighbor (KNN) Algorithm
- Question: Describe the K-Nearest Neighbor (KNN) algorithm
and its working principle.
- Answer:
- Definition: KNN is a non-parametric algorithm used for
classification and regression. It classifies a data point based on
the majority class of its k-nearest neighbors.
- Working Principle:
1. Choose the number of neighbors (k).
2. Calculate the distance (e.g., Euclidean distance) between
the new data point and all training data points.
3. Select the k-nearest neighbors.
4. Assign the class based on the majority vote (for
classification) or average (for regression).
- Example: Classifying a new flower as "setosa," "versicolor,"
or "virginica" based on its petal and sepal measurements.
- Diagram: A scatter plot showing k-nearest neighbors.
6. Parametric vs. Non-Parametric Algorithms
- Question: Give the difference between Parametric and Non-
Parametric algorithms.
- Answer:
- Parametric Algorithms: Assume a fixed number of
parameters (e.g., linear regression).
- Example: Predicting house prices using a linear regression
model.
- Non-Parametric Algorithms: Do not assume a fixed number
of parameters (e.g., KNN, decision trees).
- Example: Classifying data using a decision tree.
- Diagram: A comparison chart showing parametric vs. non-
parametric algorithms.
7. Unsupervised Learning and Its Applications
- Question: Define unsupervised learning and discuss its
primary applications.
- Answer:
- Definition: Unsupervised learning involves training models on
unlabeled data to find hidden patterns or structures.
- Applications:
- Clustering: Grouping similar data points (e.g., customer
segmentation).
- Dimensionality Reduction: Reducing the number of features
(e.g., PCA).
- Anomaly Detection: Identifying unusual patterns (e.g., fraud
detection).
- Example: A retail company uses clustering to segment
customers based on purchasing behavior.
- Diagram: A flowchart showing unsupervised learning
applications.
8. ID3 Algorithm and Decision Trees
- Question: Describe the ID3 algorithm and its role in building
decision trees.
- Answer:
- Definition: ID3 (Iterative Dichotomiser 3) is a decision tree
algorithm that uses information gain to select features for
splitting.
- Role: It builds decision trees by recursively splitting the
dataset based on the feature that provides the highest
information gain.
- Example: Building a decision tree to classify whether a
person will play tennis based on weather conditions.
- Diagram: A decision tree showing splits based on
information gain.
9. Overfitting vs. Underfitting
- Question: Differentiate between overfitting and underfitting in
machine learning models.
- Answer:
- Overfitting: The model learns the training data too well,
including noise, and performs poorly on unseen data.
- Example: A decision tree with too many branches.
- Underfitting: The model is too simple and fails to capture the
underlying patterns in the data.
- Example: A linear regression model applied to a non-linear
dataset.
- Diagram: A graph showing overfitting, underfitting, and a
well-fitted model.
10. Support Vector Machines (SVM)
- Question: Explain the working principle of Support Vector
Machines (SVM) and discuss the role of kernel functions in
SVM.
- Answer:
- Working Principle: SVM finds the optimal hyperplane that
maximizes the margin between classes.
- Kernel Functions: Transform data into higher dimensions to
make it linearly separable.
- Linear Kernel: For linearly separable data.
- RBF Kernel: For non-linear data.
Diagram: A graph showing the optimal hyperplane and support vectors. - Example:
Classifying images of cats and dogs using SVM with an RBF
kernel.
- Diagram: A graph showing the optimal hyperplane and
support vectors.
11. Applications of Machine Learning
- Question: Describe the applications of machine learning in
various domains, such as healthcare, finance, and e-
commerce.
- Answer:
- Healthcare: Disease prediction, drug discovery.
- Example: Predicting diabetes based on patient data.
- Finance: Fraud detection, stock price prediction.
- Example: Detecting fraudulent transactions in real-time.
- E-commerce: Recommendation systems, customer
segmentation.
- Example: Recommending products to users based on
browsing history.
- Diagram: A flowchart showing machine learning applications
in different domains.
12. Logistic Regression
- Question: Explain logistic regression and its application in
binary classification problems. How does it differ from linear
regression?
- Answer:
- Definition: Logistic regression is used for binary classification
by predicting probabilities using a sigmoid function.
- Equation: \( P(y=1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1x_1 +
\beta_2x_2 + ... + \beta_nx_n)}} \)
- Difference from Linear Regression: Logistic regression
predicts probabilities, while linear regression predicts
continuous values.
- Example: Predicting whether a student will pass or fail an
exam based on study hours and attendance.
- Graph: A sigmoid curve showing the probability output.
13. Supervised, Unsupervised, and Reinforcement Learning
- Question: What are the key differences between Supervised,
Unsupervised, and Reinforcement Learning? Provide
examples.
- Answer:
- Supervised Learning: Uses labeled data to train models
(e.g., predicting house prices).
- Unsupervised Learning: Uses unlabeled data to find patterns
(e.g., clustering customers).
- Reinforcement Learning: Learns by interacting with an
environment and receiving rewards (e.g., training a robot to
navigate a maze).
- Diagram: A comparison chart showing supervised,
unsupervised, and reinforcement learning.
14. Bias-Variance Tradeoff
- Question: Explain the bias-variance tradeoff in machine
learning. How can it affect model performance?
- Answer:
- Bias: Error due to overly simplistic assumptions (e.g.,
underfitting).
- Variance: Error due to sensitivity to small fluctuations in the
training set (e.g., overfitting).
- Tradeoff: Balancing model complexity to avoid underfitting
and overfitting.
- Example: A linear model may have high bias, while a
complex decision tree may have high variance.
- Graph: A graph showing the bias-variance tradeoff
15. Naïve Bayes Classifier
- Question: What is the Naïve Bayes classifier, and how does it
work? Discuss its assumptions and use cases.
- Answer:
- Definition: A probabilistic classifier based on Bayes’ theorem
with the assumption that features are independent given the
class.
- Assumptions: Features are independent (naïve assumption).
- Use Cases: Spam detection, text classification.
- Example: Classifying emails as spam or not spam based on
the presence of certain keywords.
- Diagram: A flowchart showing the Naïve Bayes classification
process.
16. Entropy and Information Gain
- Question: Explain the concept of entropy and information gain
in decision tree algorithms.
- Answer:
- Entropy: Measures the impurity or uncertainty in a dataset.
- Information Gain: Measures the reduction in entropy after
splitting the dataset based on a feature.
- Example: Splitting a dataset based on "color" to classify
fruits.
- Diagram: A decision tree showing entropy and information
gain.
17. Random Forest Algorithm
- Question: Describe the Random Forest algorithm and how it
improves classification performance.
- Answer:
- Definition: An ensemble method that builds multiple decision
trees and combines their outputs.
- Improves Performance: Reduces overfitting by averaging the
predictions of multiple trees.
- Example: Predicting whether a customer will churn based on
their usage patterns.
- Diagram: A flowchart showing the Random Forest algorithm.
18. Gradient Descent
- Question: What is gradient descent, and how is it used to
optimize machine learning models?
- Answer:
- Definition: An optimization algorithm used to minimize the
loss function by iteratively adjusting model parameters.
- Use: Optimizes models like linear regression and neural
networks.
- Example: Optimizing the weights in a linear regression
model to minimize the mean squared error.
- Graph: A graph showing the gradient descent process.
19. Batch, Stochastic, and Mini-Batch Gradient Descent
- Question: Explain the difference between batch gradient
descent, stochastic gradient descent (SGD), and mini-batch
gradient descent.
- Answer:
- Batch Gradient Descent: Uses the entire dataset for each
update.
- Example: Updating the weights of a linear regression model
after processing all training data.
- Stochastic Gradient Descent (SGD): Uses one data point per
update.
- Example: Updating the weights after each training example.
- Mini-Batch Gradient Descent: Uses a small batch of data per
update.
- Example: Updating the weights after processing 32 training
examples.
- Diagram: A comparison chart showing batch, stochastic, and
mini-batch gradient descent.
20. Principal Component Analysis (PCA)
- Question: What is Principal Component Analysis (PCA), and
how is it used for dimensionality reduction?
- Answer:
- Definition: A dimensionality reduction technique that
transforms data into a lower-dimensional space while
preserving variance.
- Use: Reduces the number of features in a dataset while
retaining important information.
- Example: Reducing the number of features in an image
dataset from 1000 to 50.
- Diagram: A graph showing the original data and the reduced-
dimensional data after PCA.
21. Feature Engineering
- Question: Explain feature engineering and its significance in
machine learning models.
- Answer:
- Definition: The process of creating new features or
transforming existing ones to improve model performance.
- Significance: Enhances the predictive power of machine
learning models.
- Example: Creating a "day of the week" feature from a
timestamp to predict customer behavior.
- Diagram: A flowchart showing the feature engineering
process.
22. Hyperparameters in Machine Learning Models
- Question: Discuss the role of hyperparameters in machine
learning models and methods to optimize them.
- Answer:
- Role: Control the learning process (e.g., learning rate,
number of trees).
- Optimization Methods: Grid search, random search,
Bayesian optimization.
- Example: Tuning the learning rate of a neural network to
improve accuracy.
- Diagram: A flowchart showing hyperparameter optimization.
-
23. Cross-Validation Techniques
- Question: Explain cross-validation techniques such as k-fold
cross-validation and leave-one-out cross-validation.
- Answer:
- k-Fold Cross-Validation: Splits data into k subsets and trains
the model k times, each time using a different subset as the
validation set.
- Example: Using 5-fold cross-validation to evaluate a logistic
regression model.
- Leave-One-Out Cross-Validation: Uses one data point as the
test set and the rest as training data.
- Example: Evaluating a model using leave-one-out cross-
validation on a small dataset.
- Diagram: A flowchart showing k-fold cross-validation.
24. Ensemble Learning Techniques
- Question: What are ensemble learning techniques? Explain
Bagging and Boosting with examples.
- Answer:
- Ensemble Learning: Combines multiple models to improve
performance.
- Bagging: Combines models trained on different subsets of
data (e.g., Random Forest).
- Example: Predicting customer churn using a Random
Forest model.
- Boosting: Sequentially trains models to correct errors of
previous models (e.g., AdaBoost, Gradient Boosting).
- Example: Improving the accuracy of a decision tree model
using AdaBoost.
- Diagram: A flowchart showing bagging and boosting.
25. Decision Boundaries
- Question: Describe decision boundaries and how they impact
classification performance in machine learning.
- Answer:
- Definition: The surface that separates different classes in a
classification problem.
- Impact: Determines the model’s ability to classify data
correctly.
- Example: A decision boundary separating "spam" and "not
spam" emails.
- Diagram: A graph showing decision boundaries in a
classification problem.
26. Activation Functions in Neural Networks
- Question: What are activation functions in neural networks?
Explain ReLU, Sigmoid, and Softmax.
- Answer:
- ReLU: \( f(x) = \max(0, x) \). Most commonly used in hidden
layers.
- Sigmoid: \( f(x) = \frac{1}{1 + e^{-x}} \). Used for binary
classification.
- Softmax: Used for multi-class classification to output
probabilities.
- Example: Using ReLU in a neural network for image
classification.
- Diagram: Graphs showing ReLU, Sigmoid, and Softmax
functions.
27. Data Preprocessing Techniques
- Question: Discuss the importance of data preprocessing
techniques like normalization and standardization.
- Answer:
- Normalization: Scales data to a range (e.g., 0 to 1).
- Example: Scaling pixel values in an image dataset.
- Standardization: Scales data to have a mean of 0 and a
standard deviation of 1.
- Example: Standardizing features in a dataset for linear
regression.
- Importance: Ensures that data is on a similar scale,
improving model performance.
- Diagram: A comparison chart showing normalization and
standardization.
28. TF-IDF (Term Frequency-Inverse Document Frequency)
- Question: What is TF-IDF (Term Frequency-Inverse
Document Frequency), and how is it used in text classification?
- Answer:
- Definition: A statistical measure used to evaluate the
importance of a word in a document relative to a collection of
documents.
- Use: Converts text data into numerical features for machine
learning models.
- Example: Using TF-IDF to classify emails as spam or not
spam.
- Diagram: A flowchart showing the TF-IDF process.
29. Structured vs. Unstructured Data
- Question: Explain the difference between structured and
unstructured data in machine learning.
- Answer:
- Structured Data: Organized in a tabular format (e.g., Excel
sheets, SQL databases).
- Example: Customer records in a database.
- Unstructured Data: No predefined structure (e.g., text,
images, videos).
- Example: Social media posts or customer reviews.
- Diagram: A comparison chart showing structured vs.
unstructured data.
30. Challenges in Deploying Machine Learning Models
- Question: What are the key challenges in deploying machine
learning models in real-world applications?
- Answer:
- Data Quality: Poor-quality data can lead to inaccurate
predictions.
- Scalability: Models must handle large volumes of data in
real-time.
- Interpretability: Complex models like neural networks are
hard to interpret.
- Example: Deploying a fraud detection model that must
process millions of transactions per second.
- Diagram: A flowchart showing the challenges in deploying
machine learning models.