ML & DL Notes
ML & DL Notes
notes
What is machine learning?
• ML can find patterns and insights in large datasets that might be difficult for
humans to discover.
Types of Machine Learning.
Regression Classification
Regression
continuous values based on input features. The output labels in regression are
continuous values, such as stock prices, and housing prices. The different
• Clustering algorithms group similar data points together based on their characteristics.
The goal is to identify groups, or clusters, of data points that are similar to each other,
while being distinct from other groups. Some popular clustering algorithms include K-
means, Hierarchical clustering, and DBSCAN.
• In model-free reinforcement learning, the agent learns a policy directly from experience
without explicitly building a model of the environment. The agent interacts with the
environment and updates its policy based on the rewards it receives. Some popular
model-free reinforcement learning algorithms include Q-Learning, SARSA, and Deep
Reinforcement Learning.
Regression Algorithms
Linear regression is one of the simplest and most widely used statistical
models. This assumes that there is a linear relationship between the
independent and dependent variables. This means that the change in the
dependent variable is proportional to the change in the independent variables.
• For example, we have two classes Class 0 and Class 1 if the value of the logistic
function for an input is greater than 0.5 (threshold value) then it belongs to Class
1 otherwise it belongs to Class 0. It’s referred to as regression because it is the
extension of linear regression but is mainly used for classification problems.
Regression Metrics
between the predicted and the actual target values within a dataset. It gives a
sense of how far off the predictions are from the actual values, with a larger
• Decision trees can be used for both classification and regression problems.
Decision Tree Terminologies
Root Node: The top node of the tree, representing the initial decision or feature
from which the tree starts branching out.
Internal Nodes (Decision Nodes): Nodes that make decisions based on the
values of specific attributes. These nodes have branches that lead to other nodes.
Leaf Nodes (Terminal Nodes): The end points of the branches, where final
decisions or predictions are made. Leaf nodes do not have further branches.
Decision Tree Terminologies
Branches (Edges): The connections between nodes that represent the decision
path taken based on certain conditions.
Splitting: The process of dividing a node into two or more sub-nodes based on a
decision rule, like selecting a feature and a threshold to create subsets of data.
Parent Node: A node that splits into child nodes. It is the original node from
which a split starts.
Decision Tree Terminologies
Decision Criterion: The rule or condition used to split data at a decision node.
This involves comparing feature values against a threshold.
Creating Multiple Trees: During the training phase, the algorithm creates many
decision trees.
Making Predictions:
For Classification: The algorithm takes a vote from all the trees. The class that
gets the most votes is the final prediction.
For Regression: The algorithm averages the predictions from all the trees to get
the final result.
Random Forest
• Handles Complex Data: Works well with large datasets and many features.
different environments.
Advantages of Random forest
Robust to Noise: Random Forests handle noisy data better by averaging results from
various trees.
Handles High Dimensions: Works well with many features by using random subsets for
each tree.
Stability: More stable, with consistent predictions despite slight data changes.
Classification Accuracy
• It works great if there are an equal number of samples for each class. For
example, we have a 90% sample of class A and a 10% sample of class B in our
training set.
Classification Accuracy
• Then, our model will predict with an accuracy of 90% by predicting all the
training samples belonging to class A. If we test the same model with a test set
of 60% from class A and 40% from class B. Then the accuracy will fall, and we
will get an accuracy of 60%.
• Classification accuracy is good but it gives a False Positive sense of achieving high
accuracy. The problem arises due to the possibility of misclassification of minor
class samples being very high.
Confusion Matrix
• The matrix is a table with four different combinations of predicted and actual
values.
Confusion Matrix
The matrix displays the number of instances produced by the model on the test data.
• True positives (TP): occur when the model accurately predicts a positive data point.
• True negatives (TN): occur when the model accurately predicts a negative data point.
• False positives (FP): occur when the model predicts a positive data point incorrectly.
• False negatives (FN): occur when the model predicts a negative data point incorrectly.
• The accuracy of the matrix is always calculated by taking average values present in the main diagonal i.e.
• Precision is a measure of a model’s performance that tells you how many of the
positive predictions made by the model are actually correct. It is calculated as the
number of true positive predictions divided by the number of true positive and
Precision = TP / TP + FP
Recall
• Recall represents how well a model can identify actual positive cases. It measures
the ability of the model to find all the positive instances. It answers the question:
"Of all the actual positive instances, how many did the model correctly identify?"
Formula:
• Harmonic Mean is the type of mean that is used when we have to find the
average rate of change, it is the mean calculated by taking the reciprocal values
of the given value and then dividing the number of terms by the sum of the
reciprocal values.
• The regular mean treats all values equally, the harmonic mean gives much more
weight to low values.
ROC & AUC Curve
• ROC stands for Receiver Operating Characteristics, and the ROC curve is the
graphical representation of the effectiveness of the binary classification model. It
plots the true positive rate (TPR) vs the false positive rate (FPR) at different
classification thresholds.