0% found this document useful (0 votes)
23 views19 pages

ML Fundamentals by Bitspace

Uploaded by

temp528491
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views19 pages

ML Fundamentals by Bitspace

Uploaded by

temp528491
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

ML

Summary

Artificial Intelligence
Intelligence of machines/algorithms in predictive and prescriptive
analysis

Machine Learning
The ability of machine
algorithms to learn without any
predefined code.

Deep Learning
Type of machine learning where
the concept of Artificial Neural
Networks is implemented

Data Science
The Science of getting insights
from a dataset (group of data)

NLP
Algorithms to analyze and
process human language
Cognitive Computing

ML 1
Simulates the human brain to consisting of grammatical and
assist in solving complex literary ambiguities
problems

A neural network trained to recognize cancer on


an MRI scan may achieve a higher success rate than a
human doctor. This system is certainly a cognitive
system but is not artificially intelligent.
- Wikipedia

Types of Machine Learning


Supervised Learning
The provided dataset will
consist of input features and
the corresponding outcome(s),
called as labels.

Unsupervised Learning
Dataset wont have any
outcomes/labels.

Reinforcement Learning
Training an algorithm through
rewarding for every correct
outcome and penalizing for every
incorrect outcome through
negative points.

Mathematical Background

Fundamentals

Machine Learning Algorithms

ML 2
Regression
Regression is the process of prediction in which the model will be
fed with values to find the output which is not discrete (finite
number of classes) like classification, instead continuous between
range.

Linear Regression

Equation of LR
Where y is the prediction , x is
yˉ = β0 + β1 x
ˉ+e
the feature vector and β0 and β1 
​ ​ ​

​ ​

are estimators which get


adjusted according to the data

OLS (Ordinary Least Squares)

The error function is the Euclidian perpendicular distance between


the line and the actual point.

SGD (Stochastic Gradient Descent)

Using the approximation algorithm of gradient descent, but instead


of passing the entire dataset each iteration, a random sample from
the set will be passed.

ML 3
Logistic Regression

Classification Algorithms
Naive Bayes
This algorithm is just the basic application of bayes theorem and
conditional probability

For a Sample space of n mutually exclusive and exhaustive events


E = {E1 , E2 , ..., EN },
​ ​ ​

Ei : The
P (Ei ) ∗ P ( EAi ) posterior = (priority.likelihood).α

Ei event for
P( ) =
​ ​

P (A)
​ ​

A which we
find the
Since P(A) does not depend on the event Ei 

probability

A: The
Feature
vector

Support Vector Machines


It is a type of classification algorithm in which the sample space
will be divided into classes, by hyperplanes.
Hyperplane : a space, one dimension lower than the domain. e.g. for a
2D plane, its hyperplane will be a line, and for a 3d space, its
hyperplane is a 2d plane.

ML 4
W = β0 + β1 x1 + β2 x2 + ..... + βn xn
​ ​ ​ ​ ​ ​ ​

But for a classification, there


exists infinitely many
hyperplanes possible. Thus we
will be selecting one out of
them , The most optimal one,
called as the Maximal Margin
Hyperplane / Optimal Hyperplane
which is the hyperplane which
has the maximum possible
distance between classes

Decision Trees
Decision trees are a type of ML algorithm where predictions are done
based on a tree of conditional statements, having flowchart like
structure.

Courtesy : datacamp.com+

ML 5
Gini Impurity Index
k
gini(Di ) = 1 − ∑ p2i
​ ​ ​

i=1

Gini impurity index is


a measure metric in
building decision
trees, for selecting
the best feature with
which the model split

Gini index for the


Source : learndatasci.com
decision tree
n
ni
giniA (D) = ∑ gini(Di )

​ ​ ​ ​

i
n

Entropy
n
E = − ∑ pi log pi ​ ​

Ensemble Learning
Application of more than one models to improve predictive accuracy

Random Forest Algorithm

ML 6
When more than a single
decision tree is used
on classification, the
average of the results
from each tree will be
presented as output,
for each input. This is
called as Random
Forest.

Source : IBM

K Nearest Neighbors
The elements are classified based on number of neighbors near to it.

This example is about finding K


Nearest Neighbors for point at
(4,4) , for K=9 (gray circle)
and
K=13 (Blue circle).
For K=9, The black is surrounded
with 4 blues but 5 reds, thus
its a red.
For K=13, The black is
surrounded with 7 blues but 6
reds, thus its a blue.

ML 7
Note : While using KNN Models in datasets, it is
necessary to scale them

Unsupervised Learning Algorithms


K-means Clustering
It is a type of algorithm where the
sample space will be classified into k
groups having k centroids(means).

At first k random points will be taken as


means, and after that, for each iteration

Assignment : Each observation will be


assigned to the cluster centroid with
nearest mean

Update : Centroids (Means) will be re-


calculated. Convergence of k-means.
Source : Wikipedia
This will be re-iterated and when the
assignments no longer change , it is said
to be converged

Neural Networks
Neural Networks

ML 8
Mathematical Background
Mathematical Background
Probability and Statistics
Random variable

Normalization

Central Limit Theorem

Sampling

Hypotheses

Linear Algebra
Vector and Tensor Algebra

Transformation and Reduction of Matrices : A group of vectors

Eigenvectors and Principal Component Analysis

Algorithm Analysis
Time and Space Complexity

Mathematical Background 1
Fundamentals
Features and Labels
The input parameters for
which the output is about to
be predicted, are called as
features, while the
output(s) to be predicted,
are called as labels.
Each feature vector (row)
consists of features such as
BP, Insulin Amount, BMI,
etc. and the Outcome
(Whether the person has
diabetes or not) is the only
label.

Bias-Variance Tradeoff

When a model is overfit (accuracy tending to 100%), The training error will tend to
zero, but it will fail to predict any other data with accuracy, such as the test set. To
avoid getting overfit due to high variances, the bias may be allowed to increase.

Fundamentals 1
Source : Javatpoint

As mentioned as “degree of randomness” in chemistry, similarly in machine learning,


Entropy denotes the probability of detecting an output other than the expected
output (impurity)

Confusion Matrix

Fundamentals 2
Confusion Matrix is a
matrix of different possible
situations on training a
model.
Example : For a binary
classification as shown in
figure
TP : True Positive (The
output is predicted true,
and it is actually true)

TN : True Negative (The


output is predicted false,
and it is actually false)

FP : False Positive (The


output is predicted true,
but it is actually false) Fig : A binary confusion matrix of single boolean outcome.

(Type 1 Error)
FN : False Negative (The
output is predicted false,
and it is actually true)

(Type 2 Error)
Precision

TP
P recision =
TP + FP

Recall

TP
P recision =
TP + FP

Evaluation Metrics
Precision
F1 Score :
Recall
It is the harmonic mean of precision and recall

Fundamentals 3
Accuracy
2
F1 = 1 1
+

precision recall
​ ​

Entropy
Entropy is a measure of randomness /
impurity in a sample. A Model with a
minimum entropy in classification will
have more accuracy.

Fundamentals 4
Neural Networks
The Perceptron Model
Similar to the biological model of a
neuron, a perceptron accepts inputs
from features, with weights wi maps​

them to the output through an input


function Z (Z = wx + b)and an
activation function f to limit their
values.
The linear function changes the values
of w and b to get appended to the
dataset, when each data is fed
Bias (b) : Sometimes the weights of
inputs can be zero. in that case a
constant offset bias b will be added to
the linear function to balance the overall
function . b maintains a threshold such
that only when the product of wx
crosses the threshold, it can affect the
equation

Multi Layer Perceptron


Hidden Layers :
There may be more than one
hidden abstract layers within
the input and output layers in
complex machine learning
models. on feeding each
data, the mappings get
adjusted, altering the

Neural Networks 1
coefficients of each mapping
and the bias.

Cost Function
(loss function)
It is the average of errors of all the input vectors, between the actual value and the
predicted value. The most common loss functions are

MSE (Mean Squared Error)

1
C= ∣∣y(x) − aL (x)∣∣2
2n

Cross Entropy loss (Log Loss)

N
1
logloss = − ∑ yi l og(pi )
​ ​ ​ ​

N
i

Cost function will depend on the weight W and bias B of the neural network, as well
as the input X and output vectors.

Gradient Descent
it is a first order iteration
algorithm to find the
minimum of a function. Here
it is used to find the
minimum of a cost function
by going down the direction
of slope on each iteration.
In each iteration the point
will be moved to another
point along the slope (here
negative gradient).
Learning rate (Step Size) :
the amount which controls

Neural Networks 2
how much to leap from a
point. More learning rate
overfits the model, while
less learning rate will take
more time.

In most of the
Machine Learning
models, most of the
algorithms will be
having a complex
function which may
not be differentiable
or difficult to do so.
In that case GD
helps.

Backpropagation
The process of calculating the cost function gradient from backward (i.e. aL to a1 )
through a series of gradients according to leibnitz chain rule

Neural Networks 3
Summary

Mathematical Background
Algorithm Analysis and Asymptotic Notations

Complexities of machine learning algorithms, analysis and benchmarking

Probability Distribution and Statistics

Probabilistic Models, Distributions (mainly Normal), Central limit theorem


and sampling, Confusion Matrix and Hypotheses.

Vectors, Matrices and Linear Algebra

Vector Algebra and products , Matrix normalization and reduction, Vector


Spaces.

Theory
Supervised Learning

Regression

Linear

Logistic

Classification

Unsupervised Learning

Reinforcement Learning

Algorithms
Classification

Linear

Support Vector Machines

Summary 1
Logistic Regression

Non Linear

Naive Bayes

K Nearest Neighbors

Decision Trees

Random Forest

K Means Clustering

Neural Networks

Linear Regression

OLS

SGD

Backpropagation

Gradient Descent

Concepts
Probabilistic Model

Loss Function (Bias)

Cross Validation

Train-Test split

Evaluation Metrics

Accuracy

Log Loss

Gini Coefficient

MSE, RMSE

F1 score - Precision and Recall

Bias-Variance Tradeoff

Entropy

Summary 2
Backpropagation - Gradient Descent

Overfit and Underfit

Why not 100% accuracy

Activation Function

Perceptron Model

Summary 3

You might also like