Machine Learning Basics
Machine Learning Basics
1. What is machine learning and how does it differ from traditional programming?
Answer: Machine learning (ML) is a subset of artificial intelligence (AI) that
enables systems to learn from data and improve performance without being
explicitly programmed. Traditional programming involves creating specific
instructions for a computer to follow, while machine learning involves creating
algorithms that learn patterns from data and make predictions or decisions based
on that learning.
Explanation: In traditional programming, a programmer writes explicit instructions
for the computer. In contrast, in machine learning, algorithms are trained on data
to discover patterns and make decisions autonomously. For example, a
traditional program for classifying emails might involve manually specifying rules
for what constitutes spam. In machine learning, you would train a model on a
dataset of labeled emails to learn which features are indicative of spam.
2. Explain the difference between supervised and unsupervised learning.
Answer: Supervised learning involves training a model on labeled data, where
the outcome or target is known. The model learns to predict the target based on
input features. Examples include classification (e.g., spam detection) and
regression (e.g., predicting house prices). Unsupervised learning involves
training a model on unlabeled data, where the goal is to find hidden patterns or
structures. Examples include clustering (e.g., customer segmentation) and
dimensionality reduction (e.g., Principal Component Analysis).
Explanation: In supervised learning, the model is guided by known outcomes,
which helps it learn to predict future outcomes based on past data. In
unsupervised learning, the model seeks to discover patterns or groupings in data
without predefined labels.
3. What are some common types of machine learning algorithms?
Answer: Common types of machine learning algorithms include:
● Linear Regression: For predicting continuous values.
● Logistic Regression: For binary classification problems.
● Decision Trees: For both classification and regression tasks.
● Support Vector Machines (SVM): For classification and regression.
● k-Nearest Neighbors (k-NN): For classification and regression.
● Random Forests: An ensemble method for classification and regression.
● Neural Networks: For complex tasks such as image and speech
recognition.
● Clustering Algorithms (e.g., k-Means): For grouping similar data points.
4. Explanation: Each algorithm has its strengths and is suited for different types of
problems. For instance, decision trees are easy to interpret, while neural
networks can handle complex patterns but require more data and computational
resources.
5. How does a decision tree work?
Answer: A decision tree is a flowchart-like structure used for both classification
and regression. It splits the data into subsets based on feature values, with each
node representing a decision based on a feature. The tree is constructed by
recursively splitting the data at each node based on the feature that provides the
best separation of the target variable.
Explanation: At each node in the tree, a decision is made to split the data based
on a feature that maximizes information gain or minimizes impurity. The process
continues until a stopping criterion is met, such as a maximum tree depth or
minimum number of samples per leaf.
6. What is overfitting, and how can you prevent it?
Answer: Overfitting occurs when a model learns not only the underlying patterns
in the training data but also the noise, leading to poor generalization to new data.
It results in high accuracy on training data but poor performance on test data.
Explanation: Overfitting can be prevented by:
● Using Cross-Validation: To ensure that the model performs well on unseen
data.
● Pruning: For decision trees, to remove branches that provide little
predictive power.
● Regularization: Techniques like L1 and L2 regularization add penalties to
the model for having large coefficients.
● Early Stopping: For iterative algorithms, stop training when performance
on a validation set starts to degrade.
● Collecting More Data: More data can help the model learn more
generalized patterns.
7. Explain the bias-variance trade-off.
Answer: The bias-variance trade-off is the balance between two sources of error
that affect the performance of a model:
● Bias: Error due to overly simplistic assumptions in the learning algorithm,
which can lead to underfitting.
● Variance: Error due to the model's sensitivity to fluctuations in the training
data, which can lead to overfitting.
8. Explanation: A model with high bias is too simple and may not capture the
underlying patterns (underfitting), while a model with high variance is too
complex and may capture noise as if it were a pattern (overfitting). The goal is to
find a balance where the model generalizes well to new data.
9. What is cross-validation, and why is it used?
Answer: Cross-validation is a technique for assessing how a machine learning
model performs on unseen data. It involves splitting the dataset into multiple
folds and training the model on some folds while testing it on the remaining fold.
This process is repeated multiple times with different folds.
Explanation: Cross-validation helps estimate the model’s performance more
reliably by using different subsets of the data for training and testing. It reduces
the risk of overfitting and provides a better estimate of the model's ability to
generalize to new data.
10. How do you handle missing data in a dataset?
Answer: Missing data can be handled using several techniques:
● Imputation: Filling in missing values with the mean, median, or mode of
the column.
● Prediction: Using a model to predict the missing values based on other
features.
● Deletion: Removing rows or columns with missing values.
● Flagging: Adding a binary feature indicating whether a value was missing.
11. Explanation: The choice of method depends on the amount of missing data and
the importance of the feature. Imputation is useful when the missing data is not
missing completely at random, while deletion might be used if the proportion of
missing values is very small.
12. What is feature scaling, and why is it important?
Answer: Feature scaling is the process of normalizing or standardizing the range
of feature values in a dataset. Common methods include min-max scaling
(rescaling features to a [0, 1] range) and standardization (scaling features to
have zero mean and unit variance).
Explanation: Scaling is important because many machine learning algorithms,
such as gradient descent-based methods, are sensitive to the scale of input
features. Features with larger ranges can disproportionately affect the model,
leading to biased results.
13. What are the differences between classification and regression problems?
Answer: Classification problems involve predicting categorical outcomes (e.g.,
spam vs. not spam), while regression problems involve predicting continuous
outcomes (e.g., house prices).
Explanation: Classification models output discrete labels or probabilities, while
regression models predict a continuous value. For example, a logistic regression
model predicts probabilities for class membership, while linear regression
predicts a numeric value.
Error Metrics
21. What is Mean Squared Error (MSE), and how is it calculated?
Answer: Mean Squared Error (MSE) measures the average squared difference
between predicted and actual values. It is calculated as:
[ \text{MSE} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 ]
where ( y_i ) is the actual value, ( \hat{y}_i ) is the predicted value, and ( n ) is the
number of observations.
Explanation: MSE penalizes larger errors more severely due to the squaring of
differences. It is useful for assessing the overall fit of the model but can be
sensitive to outliers.
22. What are the advantages and disadvantages of using MSE as an error metric?
Answer:
● Advantages:
● Easy to compute and understand.
● Penalizes larger errors more heavily, which can be useful in certain
contexts.
● Disadvantages:
● Sensitive to outliers due to squaring of errors.
● Does not provide an interpretable scale as it is in squared units of
the target variable.
23. Explanation: MSE's sensitivity to outliers can sometimes lead to misleading
evaluations if the dataset contains significant noise or outliers.
24. How does Mean Absolute Error (MAE) differ from MSE?
Answer: Mean Absolute Error (MAE) measures the average absolute difference
between predicted and actual values. It is calculated as:
[ \text{MAE} = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}_i| ]
Explanation: Unlike MSE, MAE does not square the errors, making it less
sensitive to outliers. It provides a more interpretable measure of average
prediction error.
25. What is R² (Coefficient of Determination), and what does it measure?
Answer: R² measures the proportion of the variance in the dependent variable
that is predictable from the independent variables. It is calculated as:
[ R^2 = 1 - \frac{\sum_{i=1}^n (y_i - \hat{y}i)^2}{\sum{i=1}^n (y_i - \bar{y})^2} ]
where ( \bar{y} ) is the mean of the actual values.
Explanation: R² ranges from 0 to 1, with higher values indicating a better fit. It
shows how well the model explains the variance in the target variable.
26. How do you interpret the R² value of a regression model?
Answer: The R² value represents the proportion of variance in the target variable
that is explained by the model. An R² value close to 1 indicates that the model
explains most of the variance, while a value close to 0 indicates that the model
does not explain much of the variance.
Explanation: R² provides an indication of the goodness-of-fit of the model.
However, it is important to consider other metrics and the context of the problem
when evaluating model performance.
27. What are the limitations of using R² as a performance metric?
Answer:
● R² can be misleading: A high R² value does not necessarily mean a good
model if it overfits the training data.
● Not suitable for non-linear models: R² may not accurately reflect the
performance of non-linear models.
● Ignores the complexity of the model: High R² values can be achieved with
complex models that may not generalize well.
28. Explanation: R² should be used in conjunction with other metrics and
considerations to evaluate the model's performance comprehensively.
29. When should you use MAE instead of MSE?
Answer: MAE is preferable when you want to avoid giving more weight to larger
errors, which MSE does by squaring them. MAE provides a more robust measure
of average error when the dataset contains outliers.
Explanation: MAE is less sensitive to outliers compared to MSE, making it a
better choice for data with extreme values or noise.
30. How do you calculate Root Mean Squared Error (RMSE), and how is it different
from MSE?
Answer: Root Mean Squared Error (RMSE) is the square root of MSE and
provides the error in the same units as the target variable. It is calculated as:
[ \text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2} ]
Explanation: RMSE provides an interpretable measure of average prediction
error in the same units as the target variable, whereas MSE is in squared units.
31. What is the relationship between MAE and RMSE?
Answer: MAE and RMSE are both measures of prediction error but differ in how
they penalize errors. RMSE gives more weight to larger errors due to squaring,
while MAE treats all errors equally.
Explanation: RMSE is more sensitive to outliers compared to MAE. MAE
provides a more straightforward average error measure, while RMSE
emphasizes the impact of larger errors.
32. How can you handle outliers when calculating MSE or MAE?
Answer: To handle outliers:
● Use robust metrics: MAE is less sensitive to outliers compared to MSE.
● Apply transformations: Use log or other transformations to reduce the
impact of extreme values.
● Outlier detection and removal: Identify and remove outliers before
calculating metrics.
33. Explanation: Handling outliers involves choosing appropriate error metrics and
applying techniques to reduce their impact on model evaluation.
34. What is the difference between MSE and Mean Absolute Percentage Error
(MAPE)?
Answer: Mean Absolute Percentage Error (MAPE) measures the average
absolute percentage difference between predicted and actual values:
[ \text{MAPE} = \frac{1}{n} \sum_{i=1}^n \left| \frac{y_i - \hat{y}_i}{y_i} \right|
\times 100 ]
Explanation: MAPE provides a percentage measure of prediction error, making it
scale-independent. MSE provides error in squared units, which can be less
interpretable in some contexts.
35. How do you compute adjusted R², and why is it used?
Answer: Adjusted R² adjusts the R² value for the number of predictors in the
model. It is calculated as:
[ \text{Adjusted } R^2 = 1 - \left( \frac{1 - R^2}{n - p - 1} \times (n - 1) \right) ]
where ( n ) is the number of observations and ( p ) is the number of predictors.
Explanation: Adjusted
R² is used to account for the number of predictors in the model, providing a more
accurate measure of goodness-of-fit when comparing models with different numbers of
predictors.
generator creates fake data samples, while the discriminator evaluates their
authenticity.
**Explanation:** GANs are used to generate realistic data samples and are
widely applied in image generation and synthesis.
2.
3.
4.
5.
6.
Matplotlib Questions
How do you create a basic line plot using Matplotlib?
Answer: Use the plot() function from Matplotlib's pyplot module to create a
basic line plot.
Explanation: plot() can visualize data trends by connecting data points with
lines.
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()
6.
x = [1, 2, 3, 4]
plt.scatter(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot')
plt.show()
7.
8. How can you customize the appearance of plots in Matplotlib?
Answer: Customize plots using parameters such as:
● Color: Specify colors using names or hex codes.
● Line Style: Use styles like dashed or dotted lines.
● Markers: Change marker shapes and sizes.
x = [1, 2, 3, 4]
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Customized Plot')
plt.show()
9.
10. What is the difference between subplot() and subplots() in Matplotlib?
Answer:
● subplot(): Creates a single subplot within a grid.
● subplots(): Creates a grid of subplots in a single figure.
Explanation: Use subplots() for creating multiple subplots at once and subplot()
for adding individual subplots.
import matplotlib.pyplot as plt
# Using subplots()
plt.show()
11.
x = [1, 2, 3, 4]
plt.plot(x, y)
12.
model = RandomForestClassifier()
11.
grid_search.fit(X, y)
12.
13. How do you evaluate a regression model's performance using scikit-learn?
Answer: Use metrics like mean_squared_error (MSE), mean_absolute_error (MAE),
and r2_score.
Explanation: These metrics help quantify how well a regression model predicts
continuous values.
```python from sklearn.metrics import mean_squared_error,
mean_absolute_error, r2_score
y_pred = model.predict(X_test) mse = mean_squared_error(y_test, y_pred) mae
= mean_absolute_error(y_test, y
Explanation: Handling imbalanced datasets ensures that the model performs well
across all classes.
from sklearn.utils.class_weight import compute_class_weight
15.
model = Sequential([
Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
16.
model = Sequential([
Dense(10, activation='softmax')
])
17.
18.
model.compile(optimizer='adam', loss=custom_loss)
19.
20. What is the difference between TensorFlow and PyTorch?
Answer:
● TensorFlow: Developed by Google, it is known for its deployment
capabilities and support for distributed computing.
● PyTorch: Developed by Facebook, it is known for its dynamic computation
graph and ease of use in research.
Explanation: Both frameworks are popular for deep learning, with TensorFlow
often used in production and PyTorch favored in research.
# TensorFlow example
import tensorflow as tf
model = tf.keras.Sequential([tf.keras.layers.Dense(10,
activation='relu')])
# PyTorch example
import torch
import torch.nn as nn