We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 40
.. h_=-
Today’s Agenda
MACHINE LEARNING WITH PYTHON
-PROCESS
-TYPES OF ML
-TYPES OF PROBLEMS
-APPLICATIONS
8/16/2020: Machine Learning
i. / ETP Framework
“A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E.”
Main components of any learning algorithm:
Task(T), Performance(P) and experience (E)
* MLisa field of Al consisting of learning algorithms that:
— Improve their performance (P)
— At executing some task (T)
— Over time with experience (E)
8/16/2020Input Data:
+ Housing prices
+ Customer
transactions
*Clickstream data
Baie ler)
8/16/2020
+ Predict prices
«Segment customers
+ Optimize user flows
+ Categorize images
ae ue Leo
- Accurate prices
* Coherent groupings
aCe mL}
Oro) aie aro) (ce)
Tlutelelor): | Machine Learnin
» Wis. / P
* Machine Learning allows the systems to make decisions
autonomously without any external support
* These decisions are made when the machine is able to
learn from the data and understand the underlying
patterns that are contained within it
* Then, through pattern matching and further analysis, they
return the outcome which can be a classification or a
prediction
8/16/2020 5y / ML Process
tht iP /
One Approach:
ML algorithm is trained using a labelled or unlabeled training data set
to produce a model
New input data is introduced to the ML algorithm
— Make a prediction based on the model
The prediction is then evaluated for accuracy
— If the accuracy is acceptable, the machine learning algorithm is
deployed
— If the accuracy is not acceptable, the ML algorithm is trained again
and again within an augmented training data set
8/16/2020 62 ; Types and
I, “ y/ Applications of ML
Types of
Machine Learning
fee atti] ett} Peer alin}
+ +
+
Fraud detection | Text Mining
Email spam Detection [+ Face Recognition
Diagnostics |- Big Data Visualization
Image Classification ‘— Image Recognition
Risk Assessment [> Biology
Score Prediction |— City Planning
L_ targetted Marketing
8/16/2020 a: Supervised
Uhl ssi. / Reevaatlays3
* Supervised Learning is the most popular paradigm for
performing machine learning operations
* It is widely used for data where there is a precise mapping
between input-output data
* The dataset, is labeled, the algorithm identifies the features
explicitly and carries out predictions or classification accordingly
* As the training period progresses, the algorithm is able to
identify the relationships between the two variables to predict a
new outcome
8/16/2020 3: j Supervised Learning
°c... Az
Supervised Learning
Prediction
Annotations
These are
grapes
8/16/2020 9G Supervised Learning
Ps ' / Use Case
* Facial Recognition is one of the most popular applications of
Supervised Learning and more specifically — Artificial Neural
Networks
* Convolutional Neural Networks (CNN) is a type of ANN used for
identifying the faces of people
* These models are able to draw features from the image through
various filters
* Finally, if there is a high similarity score between the input
image and the image in the database, a positive match is
provided
8/16/2020 soeu.
Baidu will provide the airports with the facial recognition technology that will
provide access to the ground crew and the staff. Therefore, the passengers
do not have to wait in long queues for flight check-in when they can simply
board their flight by scanning their faces.
BAIDU, CHINA’S PREMIER SEARCH
ENGINE COMPANY
8/16/2020 1: Supervised
Whi / Machine Learning
* Train model on a labelled dataset
— Both raw input data as well as its results
* Split data into a training dataset and test dataset
— Training dataset is used to train network
— Test dataset acts as new data for predicting results or to see
the accuracy of model
* Accuracy is what we achieve in supervised learning as model
perfection is usually high
8/16/2020 Ȣ Supervised
Way. if Machine Learning
Learning Phase
G fz -» & wip e
Training data Features vector Algor
Labeled
Data
7
8/16/2020 /: / Unsupervised Learning
» Md ns. /
* Data is not explicitly labeled into different classes
— There are no labels
* The model is able to learn from the data by finding implicit
patterns
* Unsupervised Learning algorithms identify the data based on
their densities, structures, similar segments, and other similar
features
* Cluster analysis is one of the most widely used techniques in
supervised learning
8/16/2020 14: j Unsupervised Learning
o... Azz
Unsupervised Learning
» 00
yy bbb
Model 64H
Input Output= Unsupervised Learning
Ps ' / Use Case
* Using clustering, businesses are able to capture potential
customer segments for selling their products
* Sales companies are able to identify customer segments
that are most likely to use their services
* Companies can evaluate the customer segments and then
decide to sell their product to maximize the profits
8/16/2020 16:.: hz
The goal of this company is to ingest and process the customer data in order
to make it accessible to the marketers. They take it one step further by
providing smart insights to the marketing team, allowing them to reap the
maximum profit out of their product marketing.
ISRAELI BASED STARTUP — OPTIMOVE
8/16/2020 v: Unsupervised Learning
» Md ns.
Unsupervised Learning in real life
bG-é aes 6 é
by mle leh
Identifying the potential Implementing Clustering ae product to
customer dase for Algorithms to group
selling the product the customer base ee group
8/16/2020 18y : f Reinforcement Learning
Ula -P /
* Reinforcement Learning covers more area of Artificial
Intelligence which allows machines to interact with their
dynamic environment in order to reach their goals
* With this, machines and software agents are able to evaluate
the ideal behavior in a specific context
* With the help of this reward feedback, agents are able to learn
the behavior and improve it in the longer run
* This simple feedback reward is known as a reinforcement signal
8/16/2020 19Reinforcement Learning
i.
action
A:
state
S:
reward
RX. Res
—.
Environment
s
* The agent in the environment is required to take actions that are based on the current state
* There is no answer key provided to the agent when they have to perform a particular task
* When there is no training dataset, it learns from its own experience
8/16/2020 20@,. /
Reinforcement Learning
Use Case
Google’s Active Query Answering (AQA) system makes use of
reinforcement learning
Google’s AQA system reformulates the questions asked by the user
For example, if you ask the AQA bot the question — “What is the birth
date of Nikola Tesla” then the bot would reformulate it into different
questions like “What is the birth year of Nikola Tesla”, “When was
Tesla born?” and “When is Tesla’s birthday”
This process of reformulation utilized the — traditional
sequence2sequence model, but Google has integrated reinforcement
Learning into its system to better interact with the query based
environment system
8/16/2020 21Reinforcement Learning
i.
* This is a deviation from the traditional Reinforcement Learning
seq2seq model as all the tasks are carried
out using reinforcement learning and policy e)
gradient methods r =
* For a given question q0, we want to obtain
the best possible answer a*
* The goal is to maximize the award
a* = argmaxa R(ajq0)
8/16/2020 2Types of Supervised Machine Learning
* Resulting Supervised learning algorithms are task-oriented
* As we provide it with more and more examples, it is able to learn
more properly so that it can undertake the task and yield us the
output more accurately
Types
* Regression
z — Output variable is a real value, such as “dollars” or “weight”
* Classification
— Predict output variable which falls just in particular categories,
// such as the “red” or “blue” or “disease” and “no disease”
Ce ea
8/16/2020 23: / Algorithms
B.: /
Won nw Sw Nee
Decision Trees
Naive Bayes Classification
Support Vector Machines for classification problems
Random Forest for classification and regression problems
Linear regression for regression problems
Logistic Regression
K-nearest Neighbors
Ensemble Methods (Gradient Boosting)
Artificial Neural Network
8/16/2020 24y f / VE as
Ml ns. old dalu
Logistic Regression Naive Bayes Decision Tree
ey
ATT roa aa ene) Random Forest _ K-Nearest Neighbours
8/16/2020 25i.
Help to make a decision about the data item
Decision Tree algorithms are used for both predictions as
well as classification in machine learning
Using the decision tree with a given set of inputs, one can
map the various outcomes that are a result of the
consequences or decisions
8/16/2020
Decision Trees
Is Sex Male? ca
Is age > 9.5?
Is SIBSP > 2.5?
26: | Naive Bayes
@,, /
A classification technique dependent on the Bayes’ Theorem
Naive Bayes Classifier assumes that a particular feature in a
class is not exactly, directly related to any other feature
Even if features are interdependent and each of the features
exist because of the other feature
All these properties got to contribute independently to the
probability of the outcome
Naive Bayes model isn’t difficult to build and is really useful for
very large datasets
8/16/2020 27: / Naive Bayes
~ Way. / ij
* If the categorical variable belongs to
a category that wasn’t followed up in Posterior a
the training set, then the model will ~~
give it a probability of O which will
inhibit it from making any prediction. Ba rece
* Naive Bayes assumes independence
between its features. In real life, it is
difficult to gather data that involves wtlin. i"
completely independent features
8/16/2020 28i.
Bayes’ Theorem
P(H | E) - Posteriori means deriving theory out of given evidence.
It denotes the conditional probability of H (hypothesis), given the
evidence E.
P(E | H) — Likelihood conditional probability of the occurrence of
the evidence, given the hypothesis. It calculates the probability of
the evidence, considering that the assumed hypothesis holds true.
P(H) — Prior probability denotes the original probability of the
hypothesis H being true before the implementation of Bayes’
Theorem. That is, this probability is without the involvement of the
data or the evidence.
P(E) - This is the probability of the occurrence of evidence
regardless of the hypothesis.
8/16/2020
pP(E|H) p(w)
pie)
P(H|E) =
Posterior probability
P(H | E) as the product
of the probability of
hypothesis P(E | H),
multiplied by the
probability of the
hypothesis P(H) and
divided by the
probability of the
evidence P(E)
29: Example
» Ms. 4
Suppose the weather of the day is cloudy
* Whether it would rain today, given the cloudiness of the day
* Calculate the probability of rainfall, given the evidence of cloudiness
* Posterior probability part of equation
— P(Rain | Clouds), where finding whether it would rain today is the
Hypothesis (H) and Cloudiness is the Evidence (E)
Suppose we know that 60% of the time, rainfall is caused by cloudy weather.
Probability of it being cloudy, given the rain
P(clouds | rain) = P(E | H)
This is the backward probability where, E is the evidence of observing clouds
given the probability of the rainfall, which is originally our hypothesis
8/16/2020 30@,. /
* Out of all the days, 75% of the days in a month are cloudy. This is the
probability of cloudiness or P(clouds)
* Since this is a rainy month of the year, it rains usually for 15 days out
of 30 days. That is, the probability of hypothesis of rainfall
or P(H) is P(Rain) = 15/30 = 0.5 or 50%
* Calculate the probability of it raining, given the cloudy weather
P(Rain | Cloud) = (P(Cloud | Rain) * P(Rain)) / (P(Cloud))
= (0.6 * 0.5) / (0.75)
=0.4
Hence, 40% chance of rainfall, given the cloudy weather
8/16/2020 31: Advantages
» Ms. ‘i
Naive Bayes is an easy and quick way to predict the class of the dataset.
Using this, one can perform a multi-class prediction.
When the assumption of independence is valid, Naive Bayes is much more
capable than the other algorithms like logistic regression. Furthermore, you
will require less training data
If the categorical variable belongs to a category that wasn’t followed up in
the training set, then the model will give it a probability of O which will
inhibit it from making any prediction.
Naive Bayes assumes independence between its features. In real life, it is
difficult to gather data that involves completely independent features
8/16/2020 32y Wy, f Logistic Regression
* Used for binary classification of data-points
* The name logistic regression came from a special function called
Logistic Function which plays a central role in this method
* A logistic regression model is termed as a probabilistic model
* It helps in finding the probability that a new instance belongs to
a certain class
* The output lies between 0 and 1
* Positive and negative class
8/16/2020 33: / Important Parts
» Wis. / 4
* Hypothesis and Sigmoid Curve
* With the help of this hypothesis, we can derive the likelihood of the
event
* The data generated from this hypothesis can fit into the log function
that creates an S-shaped curve known as “sigmoid”
* Using this log function, we can further predict the category of class
* Equation for logistic regression
— y=e(b0 + b1*x) / (1 + e4(b0 + b1*x))
— bO and b1 are the two coefficients of the input x. We estimate
these two coefficients using “maximum likelihood estimation”
8/16/2020 34=> Clay el (cy
i.
Let’s consider an example of classifying emails into the spam malignant and ham
(not spam)
Assume that the malignant spam would be falling in the positive class and benign
ham would be in the negative class
Take several labeled examples of emails and then use it to train the model
After training it, this can be used really well to predict the class of new email based
examples
Feed the examples to our model, it returns to us a value, say it is y such that Osys1
Suppose, the value we get is 0.8
From this value, we can say or predict that there is 80% probability that tested
examples are a kind of spam
Thus this can be classified it in the form of a spam mail
8/16/2020 35~ Has. f Linear regression
* Linear regression is well known to be an approach for modeling
the relationship that lies in between a dependent variable ‘y’
and another or more independent variables that are denoted as
‘x’ and expressed in a linear form
* The word Linear indicates that the dependent variable is directly
proportional to the independent variables
* It has to be constant as if x is increased/decreased then Y also
changes linearly
y=Ax+B
8/16/2020 36° Logistic and Linear
Whi if Nee gola MCT ec] ela)
Graph this logistic
function:
11 (1+e4-x)
The ‘e’ in the above equation
represents the S-shaped
curve that has values
between 0 and 1 —— Logistic Regression
—— Linear Regression
-+3-2-10123456
x
8/16/2020 37eu.
Python implementation of Simple Linear Regression
It is the most basic version of linear regression which predicts a response
using a single feature. The assumption in SLR is that the two variables are
linearly related
USING DIABETES DATASET FROM SCIKIT-
LEARN
8/16/2020 38PSY te metatectinnsict wat
‘oort nay a
=
1 cae rain = at a
1 ta jester on
sy cmae mathe area)
nimIt Uearegresson on. terre, ALamercenein, eens, eemalinales)
1 (2s red = eer aet)
1 (aye pum catectns 0, apne)
See weer a wee cepts so a
brio Saraae sae Raf F raacretssen Sores)
1 (a LE ee cars
teeta vate, shor natn =)
ac
8/16/2020 39eu.
Python implementation of Multiple Linear Regression
It is the extension of simple linear regression that predicts a response using two or more features
Multiple Linear Regression models always includes the errors in the data known as residual error which
changes the calculation
USING BOSTON HOUSING DATASET
FROM SCIKIT LEARN
8/16/2020 408/16/2020
oa
ot
31
pa
psy
Tipere atploeLibyptit ox ptt
import nangy a8 99
‘fron elem Inport Gatasets, Lineerqodel, metrics
estan > eran Aaa alee TERY = eal)
free gusrnamceseneton seperate see eit a ve
ea ong ant)
ep ree uersins 7 tram)
frdeec catficrentst t's cepscoet)
Weacte-vertence sesret 3" Seror ea.ocoreOfesk, WEGHO))
Fie style weet #iicteiyeiphe”)
Fitssexttcr(eeeeoreascuA tein, reesOredLceK train) —y-train, color» “efcen's s 20, 18bel ~ “Teaun ats")
it scatter (vepcpresiewitesey reg prestet(x est) > y tart colors "MMe", 2 2; bel = test sete).
fehheg = eae TE Tgessate
Hiectegeraloc = rer nigh
icetheestdet evar)
1
Residual errors
a