ML Question Bank Solution
ML Question Bank Solution
Bias :
Bias is simply defined as the inability of the model because of that
there is some difference or error occurring between the model’s
predicted value and the actual value.
Variance:
Variance is the amount by which the performance of a predictive
model changes when it is trained on different subsets of the training
data.
Therefore, the U-shaped curve in test error arises due to the balance
between underfitting and overfitting. It illustrates how the test error
initially decreases with increasing model complexity until an optimal
point is reached, beyond which further increasing complexity leads
to overfitting and a subsequent increase in test error. This curve
helps in understanding and determining the appropriate level of
model complexity to achieve the best generalization performance.
4. Differentiate between Supervised and Unsupervised Learning?
Test of model We can test our We can not test our model.
model.
Function Approximation :
Why Estimate f :
Prediction:
● Function approximation allows us to predict the output or
dependent variable y for new input values or observations x.
● By estimating an unknown function f, we can make predictions
about the behavior or outcome of a system based on the
observed relationships between input and output variables.
● Prediction is especially useful in applications such as
forecasting, decision-making, and modeling dynamic systems.
Inference:
● Inference involves drawing conclusions or making inferences
about the underlying structure or behavior of a system based
on observed data.
● By approximating f, we gain insights into the relationship
between input and output variables, which can help us
understand the underlying mechanisms driving the system.
● Inference is valuable for identifying patterns, relationships, and
trends in the data, leading to improved understanding and
decision-making.
How to Estimate f :
Parametric Approach:
● In a parametric approach, we assume a specific functional
form or model for f based on prior knowledge or assumptions
about the underlying relationship between input and output
variables.
● The model typically has a fixed number of parameters that
need to be estimated from the data.
● Once the model parameters are estimated using the data, the
function f is completely determined by those parameters.
● Examples of parametric models include linear regression
(assuming a linear relationship between variables), logistic
regression (for binary classification), and polynomial
regression (for capturing non-linear relationships).
Non-parametric Approach:
● In a non-parametric approach, we do not make any
assumptions about the functional form or structure of f.
Instead, we directly estimate f from the data.
● Non-parametric methods are more flexible and can capture
complex relationships without imposing specific constraints on
the form of f.
● These methods typically require more data and can be
computationally intensive, as they do not rely on predefined
models with fixed parameters.
● Examples of non-parametric methods include k-nearest
neighbors (KNN), kernel density estimation, and decision trees.
6. Define Machine Learning and explain its types?
Machine Learning :
Machine learning is the branch of Artificial Intelligence that
focuses on developing models and algorithms that let
computers learn from data and improve from previous
experience without being explicitly programmed for every task.
● Reinforcement Learning
Semi-Supervised Learning :
Reinforcement Learning :
Parameters :
The input data that needs to be split into training and testing
sets. This could be feature vectors (X) and corresponding target
variables (y) if you're dealing with supervised learning, or just
the input data (X) if you're performing unsupervised learning.
Output:
X_train:
X_test:
y_train (optional):
y_test (optional):
One popular library used for both machine learning and data
visualization in Python is scikit-learn for machine learning and
matplotlib for data visualization.
Code :
X, y = load_iris(return_X_y=True)
plt.show()
Overfitting penalty:
the model. The problem is, it will always increase or stay the same as
minus 1.
Eigenvectors and Eigenvalues: Once the covariance matrix is
matrix. Eigenvectors are the directions of the new feature space, and
directions.
explain the most variance in the data. Typically, you choose the top k
components.
squared.
fit.
Dimensionality Reduction:
Feature Engineering:
Null Hypothesis (H0): The null hypothesis for each coefficient is that
there is no relationship between the predictor variable and the
response variable. In other words, the coefficient is equal to zero,
implying that the predictor has no effect on the dependent variable.
Alternative Hypothesis (H1): The alternative hypothesis is that there
is a relationship between the predictor variable and the response
variable. A non-zero coefficient suggests that the predictor variable
has a significant impact on the dependent variable.
Best fit for the data: The linear regression line is the line that
and the actual values of the dependent variable. In simpler terms, it's
the straight line that comes closest to most of the data points.
variable (X) meets the average value of the dependent variable (Y).
This ensures the line captures the central tendency of the data.
Regression?
decreases).
zero)
21. Explain Linear Regression and write a python code to implement it?
Key Concepts:
dependent variable.
are 0.
# Sample data
model = LinearRegression()
model.fit(x, y)
new_x = np.array([[5]])
y_pred = model.predict(new_x)
it?
Logistic Regression :
belonging to a specific class. It's a powerful tool for tasks like spam
Key Concepts:
and 1.
import numpy as np
from sklearn.linear_model import LogisticRegression
X = np.array([[1, 2], [3, 4], [5, 1], [0, 0]]) # Independent variables
model.fit(X, y)
y_pred = model.predict(new_X)
They make decisions based on the values of input features and are
work:
each step, the algorithm selects the feature that best separates the
regression).
Decision-making: As the tree grows, each internal node represents
the cases where the condition is true, and the right branch
remaining nodes are designated as leaf nodes, and each leaf node
traverse the decision tree from the root node to a leaf node, following
the decisions based on the feature values of the instance. Once you
reach a leaf node, the predicted class label or value associated with
data.
25. Why encoding of categorical variables required in classification
problems?
represent "red" as [1, 0, 0], "blue" as [0, 1, 0], and "green" as [0, 0, 1].
creates binary variables for each category, but it uses one less
"red" as [1, 0], "blue" as [0, 1], and "green" as [0, 0].
with the frequency of its occurrence in the dataset. This method can
variable.
classification model.
26.Explain the concept of LDA and where is QDA required?
Concept of LDA:
nonlinear.
the same covariance matrix, QDA allows each class to have its own
disadvantages?
distance.
neighbors.
calculating the distance between the new point and each existing
Comment.
observed data.
𝑝 𝑋 = 𝛽0 + 𝛽1𝑋
might be
negative
1.
• So, we should model 𝑝(𝑋) using function that gives output between
0 and 1
29. Draw and explain the four cases of AUC and ROC graphs?
and 1 classes as 1. e.g., the Higher the AUC, the better the model is at
and PCA?
are as follows:
True Negative (TN): True negative represents the cases where the
model correctly predicts the negative class (or the absence of the
False Negative (FN): False negative represents the cases where the
model incorrectly predicts the negative class when the actual label
True Positive (TP): True positive represents the cases where the
model correctly predicts the positive class (or the event of interest)
model incorrectly predicts the positive class when the actual label is
outcome.
outcome.
Basic Idea: The fundamental idea behind SVM is to find the optimal
nearest data points from each class. These nearest data points are
Linear Separability: In its simplest form, SVM assumes that the data
Kernel Trick: The kernel trick allows SVM to implicitly map the input
the distance between the hyperplane and the closest data points
improves the generalization ability of the model and reduces the risk
of overfitting.
minimize this loss function while maximizing the margin, thus finding
classification errors.
applications?
new data.
movies, and then recommend movies that those users have enjoyed.
35. Explain the concept of Dendrogram in Hierarchical Clustering?
Rand Index
index has a value between 0 and 1, with 0 indicating that the two
Silhouette Score
about -1, on the other hand, suggests that the object might be in the
incorrect cluster.
Adjusted Rand Index (ARI)
structures in the data without being explicitly told what to look for.
There are two main types of unsupervised learning:
Clustering:
to partition the data into groups such that data points within the
same group are more similar to each other than to those in other
Association rule:
data set. This technique is basically used for market basket analysis
products.
38. Explain K-Means clustering with its advantages and
K-Means Clustering:
Algorithm:
change significantly).
Advantages of K-Means:
Disadvantages of K-Means:
advance.
import numpy as np
X, _ = make_blobs(n_samples=300, centers=4,
cluster_std=0.60, random_state=0)
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)
centers = kmeans.cluster_centers_
alpha=0.75)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('K-Means Clustering')
plt.show()
39. Explain classifiers and their different types?
essential tools for tasks like classification, where the goal is to predict
the value of different features, and each leaf node represents a class
label.
spaces and are versatile due to their ability to use different kernel
functions.
Logistic Regression: Despite its name, logistic regression is a linear
function.
improves accuracy.
40. How does Random Forest work?
replacement (bootstrapping).
them.
tasks, the class with the most votes across all trees is assigned to the
forest, the maximum depth of each tree, and the number of features
Support vectors are data points from the training dataset that lie
separating boundary.
Support vectors play a significant role in SVM for several reasons:
hyperplane and the nearest data points from both classes. Support
42. What are outliers? Explain how the DBSCAN algorithm is used for
outlier Detection?
events.
DBSCAN (Density-Based Spatial Clustering of Applications with
parameters:
● Noise Points: Points that are neither core nor border points.
Outlier Detection: Outliers in DBSCAN are typically identified as
noise points. These are data points that do not belong to any
Algorithm Process:
points.
are merged. It aims to minimize the variance within each cluster and
Clustering?
45. Explain Local Minima, Local Maxima, Global minima and Global
Global Maximum:
domain.
entire curve.
Global Minimum:
domain.
entire curve.
Local Maximum:
absolute highest points in the entire function; they are just the
Local Minimum:
minimum.
Local minima are not necessarily the absolute lowest points in the
entire function; they are just the lowest points in their immediate
vicinity.
Neuron Yes No
independence in
the same layer
Input Layer:
- The input layer is the first layer of the neural network, where the
Hidden Layer :
output layers.
- They are called "hidden" because their activations are not directly
activation function, and passes the result to neurons in the next layer.
- The output layer is the final layer of the neural network, where the
learned in the hidden layers and generates the final output of the
network.
that help adjust the network's weights during training either become
(exploding).
When they vanish, it's like the early layers of the network can't really
learn much from the data, especially if you're using functions that
sometimes impossible.
Overfitting is like memorizing the training data too well. It's when the
model learns the quirks and noise in the training data instead of the
Underfitting, on the other hand, is like not learning enough from the
Finding the right balance between these is really important for good
data) to help with this. These tricks help prevent the model from
getting too complex or too simple, and make sure it learns the right
nonlinear patterns.
complex patterns and relationships within the data. This is crucial for
that the output falls within a specific range. For example, sigmoid
and tanh activation functions squash the output to the range [0, 1]
gradients.
backpropagation.
function.
ReLU (Rectified Linear Unit): Linear for positive values and zero for
convergence.
The first step is to initialize the weights and biases of the neural
Forward Propagation:
With the weights and biases initialized, the training process begins by
data through the network layer by layer, from the input layer to the
output layer.
At each layer, the input is transformed using the layer's weights and
non-linearity.
The output of each layer becomes the input for the next layer, and
outputs.
Loss Computation:
Once the predictions are obtained, the next step is to compute the
loss or error between these predictions and the actual target values.
classification tasks.
loss function with respect to the weights and biases of the network,
network, layer by layer, using the chain rule to compute the gradients
Finally, the gradients are used to update the weights and biases of
RMSprop).
Parameter Update:
Once the gradients are computed, the weights and biases of the
images.
Neuron?
Learning Rate:
neural network.
Gradient Descent: