0% found this document useful (0 votes)
123 views

Machine Learning Basics

The document describes a 3-day workshop on machine learning concepts and applications held from September 16-18, 2021 at P.S.R. Engineering College in Sivakasi, India. The workshop was led by Dr. R. Meena Prakash and aimed to help participants understand fundamental machine learning terms, algorithms, and apply these techniques to engineering applications using Python. Over the course of multiple sessions, the workshop covered topics such as the history and applications of machine learning, probability theory, clustering, classification, regression algorithms, and dimensionality reduction techniques.

Uploaded by

eshwari2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views

Machine Learning Basics

The document describes a 3-day workshop on machine learning concepts and applications held from September 16-18, 2021 at P.S.R. Engineering College in Sivakasi, India. The workshop was led by Dr. R. Meena Prakash and aimed to help participants understand fundamental machine learning terms, algorithms, and apply these techniques to engineering applications using Python. Over the course of multiple sessions, the workshop covered topics such as the history and applications of machine learning, probability theory, clustering, classification, regression algorithms, and dimensionality reduction techniques.

Uploaded by

eshwari2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

Workshop on “Machine Learning Concepts

and Applications”

P.S.R.Engineering College
Sevalpatti, Sivakasi
16.9.21-18.9.21
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
Course Objectives
• To understand fundamental Machine learning
terms and algorithms
• To apply these techniques to develop
solutions for problems related to Engineering
applications using Python

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
Session 1: Machine Learning Basics

Dr.R.Meena Prakash
Associate Professor/ECE
P.S.R.Engineering College
Sevalpatti, Sivakasi

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
Contents

• History of Machine Learning and Applications


• Review of Probability Theory
• Norms, Distances, Matrix operations, Eigen
Vectors, Principal Component Analysis
• Categories of Machine Learning
• Clustering Algorithms
• Classification Algorithms
• Regression Algorithms
• Dimensionality Reduction Techniques

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
History
• 1914 – First Chess Playing Machine
• 1925 – Radio controlled driverless car
• 1940s – Pits and McCulloch – First Artificial
Neurons
• 1950 – Wiener – Cybernetics – Scientific
study of how humans, animals and
machines control and communicate with
each other
Designed in 1770, • 1951 – Minsky- First Neural Net Machine
little Robot “Writer”
in Switzerland
• 1956 – Dartmouth Conference – The term
AI was coined
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
• 1960-1974 – Appearance of Expert systems –
Explicit rule based programs (Playing Chess,
Understanding natural language, etc.,)
• 1969 – Back Propagation – Bryson and Ho
• 1997 – IBM super computer Deep Blue beats
Kasparov in Chess
• Moore’s law – Speed and capability of computers
increase every couple of years
• 2000-2012 –Google, Internet, Big Data, GPUs,
• 2012- Convolutional Neural Networks perform
well on ImageNet
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
Upcoming Applications of ML
• AI Chips – Playing high end games
• IoT and AI – Capturing data from cars using sensors and the
collected data to decide on an insurance amount
• Automated ML – Applying ML to ML – Solving repeated tasks
through automation
• Personalized Medicines
• ML based assistants like Alexa
• ML based industrial equipment and machines - sensors are
used with these machines and the data collected are fed to ML
models, we can get better performance and more efficient
servicing schedules.
• Surveillance
• Social credit systems
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
Artificial Intelligence, Machine Learning and Deep Learning

• Artificial intelligence is an effort to mimic human behavior or automate any


task
• Machine learning involves building mathematical models to understand data.
ML systems are trained on the data and once these models fit on these data,
they can be used to predict unknown data.
• Deep learning performs this by deep sequence of data transformations form
lower levels of abstraction to higher levels
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
Definition
• Machine learning involves
Mathematical models to
understand data
• It involves learning from
data
• Once these models fit to
previously seen data, they
can be used to predict and
understand aspects of
newly observed data

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
Fundamentals of Machine Learning – Probability Theory

• Probability – Mathematical Framework for


representing uncertainity – Inherent randomness
in system, Incomplete data, Incomplete modeling
• Random experiment – Experiment that results in
different outcomes under similar conditions
Ex. Throwing of a dice
• Sample Space – Set of all possible outcomes of
random experiment
• Ex.{Satisfactory, Unsatisfactory}

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
• Probability Distribution – It tells us how likely
a random variable is to take each of its
possible states
• Discrete random variable – Has finite range –
Ex.No. of errors
• Continuous Random Variable – Has real
number interval for its range – Ex. Pressure,
humidity, voltage etc.,
• Mutually exclusive events are those events
where two events cannot happen together.

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
Probability Mass function and Probability Density Function

Sum of all PMF values is equal to one


PMF describes the probability that a certain value
will be generated

2000/20000=0.01

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
Mean, Variance, Covariance, Correlation
• Expectation gives mean value of the random variable given the
distribution
• Variance gives the variation from the expected value
• Covariance refers to the measure of how two random variables in a
data set will change together. 
• Correlation also informs about the degree to which the variables tend
to move together.

Mean=(10+20+30)/3=20
Variance=100+0+100/3=67
SD=sqrt(67)=8.16 X=temp;
y=humidity
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
Conditional Probability, Baye’s Theorem

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
• Baye’s Theorem

• Two random variables X and Y are said to be


statistically independent if and only if
p(x,y)=p(x) and p(y)=p(x)p(y)

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
Tensor
• In machine learning algorithms, the input and
output are both represented as vectors
• Vector is a collection of numbers
A=[1 2 3 4]
• Scalar – Single number
• Vector – Array of numbers
• Matrix – 2D array of numbers
• Tensors – array of numbers with dimension
greater than 2
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
Vector Norms

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
Distances

X=[2 5 -3]; y=[1 1 1]


L2=sqrt(1+16+16)
L2=sqrt(33)=5.7

L1=1+4+4=9

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
Matrix operations
Orthogonal matrix – Transpose is equal to inverse
• Diagonal matrix – only diagonal entries are non-zero
• Symmetric matrix – Matrix is equal to its transpose
• Matrix Determinant and Invertability

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
Matrix operations - Eigen Vector

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
Eigen decomposition
• All matrices can be thought of as a combination
of rotating and stretching vectors
• Eigen vectors for matrix are special vectors that
only stretch under the action of the matrix
• Eigen values are the values by which eigen
vectors stretch
• Span of set of vectors

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
Singular value Decomposition

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
Seven Steps in Machine Learning
• Gathering data
• Preparing data
• Choosing a model or algorithm
• Training
• Evaluation
• Hyper parameter tuning – no. of layers,
maximum depth allowed for decision tree,
learning rate etc.,
• Prediction
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
Categories of Machine Learning

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
• Unsupervised learning is a type of self-
organized learning that helps find previously
unknown patterns in data set without pre-
existing labels.
• In Supervised learning, a model can learn from
the input provided as a labeled dataset. The
trained model is then used for prediction.

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
•  Reinforcement Learning(RL) enables an agent to
learn in an interactive environment by trial and error
using feedback from its own actions and experiences.
• In robotics and industrial automation,RL is used to
enable the robot to create an efficient adaptive
control system for itself which learns from its own
experience and behavior
9/16/2021
Dr.R.Meena Prakash, Associate
Professor/ECE, PSREC
Supervised Learning
• Supervised Learning
models the relationship
between measured
features of some data
and some label
associated with data.
• Once this model is
determined, it can be
used to apply labels to
new, unknown data.
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
Classification
• Process of finding a
function which helps in
dividing the dataset into
classes based on different
parameters.
• In Classification, the labels
are discrete quantities
• Example – Email Spam
Detection

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
• It is 2 dimensional data with 2
features for each point
represented by (x,y).
• Two class labels are blue and
red
• From these features and
labels, a model will be
created that decides a new
point should be labeled blue
or red.
• The best model for this
problem will be a straight line
separating the classes
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
• The model parameters are
the particular numbers
describing the location and
orientation of that line for
the data
• The optimal values for these
parameters are learned from
the data which is called
training the model
• The machine learning
approach can generalize to
much larger data sets in
many more dimensions
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
Regression
Prediction of daily closing price of stock • Regression is a process
market using data from previous days
of finding the
correlations between
dependent and
independent variables.
• It helps in predicting the
continuous variables
such as prediction of
Market trends, weather
forecast etc.,
• Here, the labels are
continuous quantities.
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
Classification and Regression

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
Clustering
Clustering is the process
of grouping similar
entities together. The
goal of this unsupervised
machine learning
technique is to find
similarities in the data
point and group similar
data points together.

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
K-Means Clustering
K=2 X=[20,40,32,96]
C1=20 c2=40
1st iteration
Class 1 class 2
20 40
32
96

New cluster centre


C1=20 c2=46
Class 1 class 2
21 40
32
96

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
Gaussian Mixture Models

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
EM Algorithm for GMM

Repeat until convergence


Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
Dimensionality Reduction
• The higher the number of features, it is harder to
work on it
• Most of these features are correlated, and hence
redundant. 
• The two components of dimensionality reduction
are feature selection and feature extraction
• Feature selection refers to subset of features
• Feature extraction refers to reduction of data in
high dimensional space to a lower dimensional
space

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
• Examples of high dimensional datasets –
videos, emails, satellite observations etc.,
• The unnecessary and noisy dimensions should
be removed and should keep the informative
ones.
• The goal of PCA is to project input data onto a
lower dimensional subspace, preserving as
much variance within the data as possible.

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
Principal Component Analysis (PCA)
• Ex. Classification of Emails as spam or not
• Construct a mathematical representation of
each email as a bag-of-words vector which is
binary vector.
• Each entry in the bag-of-words is the number
of times a corresponding word appears in an
email (0 if it does not appear)
• Consider for an email, the bag of words
vectors is (x1,x2…..xm)

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
• Not all dimensions (words) of the vectors are
informative for spam/not spam classification.
• Better features – Lottery, credit, pay than dog,
cat, tree
• Hence to reduce dimension, PCA is used
• Construct an mxm covariance vector from the
sample (x1,x2….xm)
• Compute its eigen vectors and eigen values.
• Project the vectors onto eigen vectors
corresponding to top p eigen values
• Hence the dimension is reducedto p.
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
• The eigen vectors have
the special property
that they point
towards the directions
of the most variance
within the data
• The first dimension
points towards the
highest variance in the
subspace orthogonal
to the first vector

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
• Projecting onto top eigen
vectors preserves
maximum variance
• Capturing more variance
means capturing more
information to analyze
• Plot the eigen values and
find the plot where the
eigen values start to decay
exponentially
• 3 top values should be
selected for top dataset
and 7 for bottom
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
• After low dimensional PCA projection of bag-
of-words is computed, classification
algorithms may be used to classify mails as
spam/not spam

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
• Missing values Ratio - Data columns with
number of missing values greater than a given
threshold can be removed
• Low variance filter -  All data columns with
variance lower than a given threshold are
removed. Normalization is required before
applying this technique.
• High Correlation filter - Pairs of columns with
correlation coefficient higher than a threshold
are reduced to only one.

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
• Backward Feature Elimination -
 At a given iteration, the
selected classification algorithm
is trained on n input features.
Then we remove one input
feature at a time and train the
same model on n-1 input
features n times. The
classification is then repeated
using n-2 features, and so on.
Smallest number of features is
selected based on the maximum
tolerable error rate
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
• Forward Feature selection - This is the inverse
process to the Backward Feature Elimination.
We start with 1 feature only, progressively
adding 1 feature at a time, i.e. the feature that
produces the highest increase in performance.
• Both the methods are computationally
expensive. They are applicable data with low
number of input columns

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
Factor Analysis
•  It helps in data
interpretations by
reducing the
number of variables.
It extracts maximum
common variance
from all variables
and puts them into a
common score.

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
Random Forest Algorithm
• Used for dimensionality reduction and also for
classification and regression algorithms

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
• For the decision tree to make a prediction for
tomorrow, we must give it the same data it
used during training (the features) and it gives
us an estimate based on the structure it has
learned
• Before training, we are much ‘smarter’
• After enough training with quality data, the
decision tree will be able to predict superior
than us.
• Inputs and outputs are only numbers
• A flow chart of questions leading to prediction
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
• Different persons will answer differently to a
question
• The principle behind random forest is to
combine many decision trees into a single
model.
• Each decision tree in the random forest
considers a random subset of features and has
access to random set of the training data points.
• This increases diversity in the forest leading to
more robust predictions and hence the name
random forest

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
• The idea of a decision tree is to divide the data
set into smaller data sets based on the
descriptive features until a small enough set is
reached that contains data points that fall
under one label.
• Each feature of the data set becomes a root /
parent node, and the leaf /child nodes
represent the outcomes.
• Pruning is a method of limiting tree depth to
reduce over fitting in decision trees – Set
maximum tree depth, set max. no. of features.
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
Support Vector Machine Classifier
• SVM is kernel-based
supervised learning
algorithm used as a
classification tool.
•The training algorithm of
SVM maximizes the margin
between the training data
and class boundary.
• The resulting decision
function depends only on
the training data called
support vectors, which
are closest to the decision
boundary
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC
Performance Metrics - Classifier

• True Positive – Sick people correctly identified


as sick
• False Positive – Healthy People incorrectly
identified as sick
• True Negative – Healthy people identified
correctly as healthy
• False Negative – Sick people incorrectly
identified as healthy

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
A confusion matrix or an error matrix is a table which is
used for summarizing the performance of a
classification algorithm.

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/ECE, PSREC
References
• Python Data science Hand book –
Jake VanderPlas
• Building Machine Learning Systems with Python – Luis Pedro
Coelho, Willi Richert
• Deep Learning – Ian Goodfellow, Yoshua Bengio, Aaron
Courville
• Pattern Recognition and Machine Learning – Springer –
2006
• Statistics and Machine Learning and Python – Edouard
Duchesnay, Tommy Lofstedt
• NPTEL course on “Machine Learning for Engineering and
Science Applications”
• Other Web Resources
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/ECE, PSREC

You might also like