ML Fundamentals by Bitspace
ML Fundamentals by Bitspace
Summary
Artificial Intelligence
Intelligence of machines/algorithms in predictive and prescriptive
analysis
Machine Learning
The ability of machine
algorithms to learn without any
predefined code.
Deep Learning
Type of machine learning where
the concept of Artificial Neural
Networks is implemented
Data Science
The Science of getting insights
from a dataset (group of data)
NLP
Algorithms to analyze and
process human language
Cognitive Computing
ML 1
Simulates the human brain to consisting of grammatical and
assist in solving complex literary ambiguities
problems
Unsupervised Learning
Dataset wont have any
outcomes/labels.
Reinforcement Learning
Training an algorithm through
rewarding for every correct
outcome and penalizing for every
incorrect outcome through
negative points.
Mathematical Background
Fundamentals
ML 2
Regression
Regression is the process of prediction in which the model will be
fed with values to find the output which is not discrete (finite
number of classes) like classification, instead continuous between
range.
Linear Regression
Equation of LR
Where y is the prediction , x is
yˉ = β0 + β1 x
ˉ+e
the feature vector and β0 and β1
ML 3
Logistic Regression
Classification Algorithms
Naive Bayes
This algorithm is just the basic application of bayes theorem and
conditional probability
Ei : The
P (Ei ) ∗ P ( EAi ) posterior = (priority.likelihood).α
Ei event for
P( ) =
P (A)
A which we
find the
Since P(A) does not depend on the event Ei
probability
A: The
Feature
vector
ML 4
W = β0 + β1 x1 + β2 x2 + ..... + βn xn
Decision Trees
Decision trees are a type of ML algorithm where predictions are done
based on a tree of conditional statements, having flowchart like
structure.
Courtesy : datacamp.com+
ML 5
Gini Impurity Index
k
gini(Di ) = 1 − ∑ p2i
i=1
i
n
Entropy
n
E = − ∑ pi log pi
Ensemble Learning
Application of more than one models to improve predictive accuracy
ML 6
When more than a single
decision tree is used
on classification, the
average of the results
from each tree will be
presented as output,
for each input. This is
called as Random
Forest.
Source : IBM
K Nearest Neighbors
The elements are classified based on number of neighbors near to it.
ML 7
Note : While using KNN Models in datasets, it is
necessary to scale them
Neural Networks
Neural Networks
ML 8
Mathematical Background
Mathematical Background
Probability and Statistics
Random variable
Normalization
Sampling
Hypotheses
Linear Algebra
Vector and Tensor Algebra
Algorithm Analysis
Time and Space Complexity
Mathematical Background 1
Fundamentals
Features and Labels
The input parameters for
which the output is about to
be predicted, are called as
features, while the
output(s) to be predicted,
are called as labels.
Each feature vector (row)
consists of features such as
BP, Insulin Amount, BMI,
etc. and the Outcome
(Whether the person has
diabetes or not) is the only
label.
Bias-Variance Tradeoff
When a model is overfit (accuracy tending to 100%), The training error will tend to
zero, but it will fail to predict any other data with accuracy, such as the test set. To
avoid getting overfit due to high variances, the bias may be allowed to increase.
Fundamentals 1
Source : Javatpoint
Confusion Matrix
Fundamentals 2
Confusion Matrix is a
matrix of different possible
situations on training a
model.
Example : For a binary
classification as shown in
figure
TP : True Positive (The
output is predicted true,
and it is actually true)
(Type 1 Error)
FN : False Negative (The
output is predicted false,
and it is actually true)
(Type 2 Error)
Precision
TP
P recision =
TP + FP
Recall
TP
P recision =
TP + FP
Evaluation Metrics
Precision
F1 Score :
Recall
It is the harmonic mean of precision and recall
Fundamentals 3
Accuracy
2
F1 = 1 1
+
precision recall
Entropy
Entropy is a measure of randomness /
impurity in a sample. A Model with a
minimum entropy in classification will
have more accuracy.
Fundamentals 4
Neural Networks
The Perceptron Model
Similar to the biological model of a
neuron, a perceptron accepts inputs
from features, with weights wi maps
Neural Networks 1
coefficients of each mapping
and the bias.
Cost Function
(loss function)
It is the average of errors of all the input vectors, between the actual value and the
predicted value. The most common loss functions are
1
C= ∣∣y(x) − aL (x)∣∣2
2n
N
1
logloss = − ∑ yi l og(pi )
N
i
Cost function will depend on the weight W and bias B of the neural network, as well
as the input X and output vectors.
Gradient Descent
it is a first order iteration
algorithm to find the
minimum of a function. Here
it is used to find the
minimum of a cost function
by going down the direction
of slope on each iteration.
In each iteration the point
will be moved to another
point along the slope (here
negative gradient).
Learning rate (Step Size) :
the amount which controls
Neural Networks 2
how much to leap from a
point. More learning rate
overfits the model, while
less learning rate will take
more time.
In most of the
Machine Learning
models, most of the
algorithms will be
having a complex
function which may
not be differentiable
or difficult to do so.
In that case GD
helps.
Backpropagation
The process of calculating the cost function gradient from backward (i.e. aL to a1 )
through a series of gradients according to leibnitz chain rule
Neural Networks 3
Summary
Mathematical Background
Algorithm Analysis and Asymptotic Notations
Theory
Supervised Learning
Regression
Linear
Logistic
Classification
Unsupervised Learning
Reinforcement Learning
Algorithms
Classification
Linear
Summary 1
Logistic Regression
Non Linear
Naive Bayes
K Nearest Neighbors
Decision Trees
Random Forest
K Means Clustering
Neural Networks
Linear Regression
OLS
SGD
Backpropagation
Gradient Descent
Concepts
Probabilistic Model
Cross Validation
Train-Test split
Evaluation Metrics
Accuracy
Log Loss
Gini Coefficient
MSE, RMSE
Bias-Variance Tradeoff
Entropy
Summary 2
Backpropagation - Gradient Descent
Activation Function
Perceptron Model
Summary 3