0% found this document useful (0 votes)
12 views

Learning Algorithm

Uploaded by

palasek182
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Learning Algorithm

Uploaded by

palasek182
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

Advanced

Learning
Alogrithm
Demand Prediction

To illustrate how neural networks work, let's start with an


example. We'll use an example from demand prediction in
which you look at the product and try to predict, will this
product be a top seller or not? Let's take a look.

DeninkkodclickRakeurdloct ekg,ching tagingdevokene


min mot sin plan vididoon chung co ban chay ko ? Hay cingtim hin
.

In this example, you're selling T-shirts and you would like to know if
a particular T-shirt will be a top seller, yes or no, and you have
collected data of different t-shirts that were sold at different prices,
as well as which ones became a top seller. This type of application
is used by retailers today in order to plan better inventory levels as
well as marketing campaigns. If you know what's likely to be a top
seller, you would plan, for example, to just purchase more of that
stock in advance.

Trongvidyaybar dangban co thun viba, min bit c th chiccio thun ncio

ban chay vabanco dili thi thap disc vecc co thin da dic ban voi cas mic

gic Khoic nhan cing nhi chies niobanchay. Loa lingding may ligay nay
die cac nha ban le siding -
d lipkhoach v mic to ko ang nhi ke
-

hoach marketing. Nu nhibabit cainco garchi ban chay barco the lenk
-
,

hooch vidy
,
nhi la mic nhin kien hang othon.
In this example, the input feature x is the price of the T-shirt, and
so that's the input to the learning algorithm. If you apply logistic
regression to fit a sigmoid function to the data that might look like
that then the outputs of your prediction might look like this, 1/1
plus e to the negative wx plus b. Previously, we had written this as
f of x as the output of the learning algorithm.

Trong ridi may ,


dai vio xlagioco thin vicing dai a vioca that
toca. Ne ba p dingco hquylogistic dkhp vhamsigmoid thi dis
liv se ni tren ,
Kli d ket quo di door cotte tring tency .
/

nolif ca nhi lo can racia that toim


Trios day ,
ching ta toi viet x

Now e
wealphabet a to denote the output of this logistic
Regression algorithm We . can think it as a
very simplified
model of aringle neural in the brain.
Rek

Activation - Linear

Softmax
we will learn more about it

in followinglectures .

In this feature
example , we
just have one ,
now we're
have 4 features
going to .

The affordability ,
awareness , perceived quality are .
activations

Input layer : 4 node

Hidden layer :
Player ,
Enoder

Output layer : 1 node .


We will learn later in this course for chooling appropriate
architecture for a neural network.
Choosing the
right number
of hidden layers and number of hidden units per layer can

have an impact on the performance of learning algorithm a well .

We can call it that Multilayer perception .


Example : Recognizing Images
Face recognition :
Input Image > Output Identity the persion
-

: :

First we flatter image to an array


-
recto (input for the
model)
But how the hidden layers work ? We can see that :

From layer har less unito to the layer have more units , it

reparate the windows to smaller windows .

On the contrary
,
When from layer have more unith to layer
the previo windows to
have less units , it groub
create new windows
.

In the last hidden layer we compare object with the


,

windows to get accuracy for the output .

With another data it can operate like this.


,
Neural network layer
1
Layer :

> [1 >
-

layer
layer
-

In this ree that :


Wex
we can

unit

The output of layer 1 is the input of layer 2 .

↑ hen we have a -
the output of neural network
Now we can you al to predict with threshold 0 5
,
More complex neural network
Inference : Making predictions
(forward propagation)
Inference in code :

TerrorFlow Implementation
Good tasting coffe
?

We can see that :

Not nicely roalted let of bears :


·
not
langenagt
-

low temperature
-

too long or
highly temperature
Simple neural network :

Now we implement it in complex neural network :


Data in TensorFlow

Warning ! Note that we have


I type of data .

In tensorflow : tendor-matrix

numpy :
array
-
linear array

But don't
worry ,
we can use some function to convert

between it.
Building a neural network

Layer /
Layer I

First ,
we need to determine the architecture.
Then and data
,
preparing x y .

Let's go
to example about Digit classification :
We can convert pardel data frame to
numpy array
by function : df to-mampy 1)
.

Forward prop in single layer


a

-
Ku
Artifical General Intelligence
AGI
What is it ?
Vectorization (optional)
How neural network are implemented
efficiently ?
Matrix Multiplication
Then the result is :

The rule is that : The colums in vector A mint be


equal the rows in rector B
Terror flow Implementation

We have 3 stepl :
1.
Specify the model .

2
. Complie the model(ing alpecific loss function
much al Binary Crossentropy
- -

3. Train the model


Training Detail
Over view :

Lest go to detail each steps :


We can
-
change
any
loss function complie ,

accuracy between them to get optimized parameter

Back propagation : is a gradient estimation method


used to train neural network.
Alternatives to the

rigmoid activation
ReLLI :

Common activations :
But ,
How can we choose them ?

activation function
Choosing

When workingo einary dassificationener


1

Dression problems : We cane

linear activation.

valea
Regressionwithoutlegative
3 -
With hidden layer :

The recons of RelW more popular than Sigmoid :

1
. ReLU is a bit falter to compute because
it required computingiax of 0, 2

.
2 RelU function goes flat only in one

part of graph but sigmoid is


,
two ·
Summary :
But Why do
, we need
activation function !
What would happen if we were to me a linear activation function
for all of Rodel in neural network ?
will become different than just linear
It no
regression.
The result is like linear regression
.
Multiclass
Softmax
Neural network with
roftmax output

Digits dalification :

Implement Roftmax in Terrorflow :


Improved Implementation
of Roftmax in neural network

Maybe ,
we will havelome erross :

The result is that :


Improve with Logistic regression :

But it will be worke with Roftmax multi


,

Callification :
Softmax :

Logistic regression :
Advanced Optimization
We learned that Gradient Descent is led to
have
minimize the cost of alogrithn but now with
,

a
huge data it's now and we have another
,

alogritti is faster than it


It's Adam

This alogrithm can increase or decrease a

automatically .
With gradient descent , you have only Ringled
But with Adam you have muld for each w,
,
I
Additional Layer Types
Convolutional layer :

S
7
-
What is derivative ?
package calculatee
Alseeypy
to
Computation graph

The output of calculation is on the arrow


Forward prop is
left to right .

Oppolite the
Back prop right
: to left .

Let's check :

So why to
O we we back prop to compute
the derivatives?
* For
computing cost function ;:
use
left-to-right (Forward prop
For
computing all derivativel :

we right-to-left (Back prop


Larger neural network
Deciding what to try next
Evaluating the model
Logistic alogrithm
Model relection and train/
cross validation/test sett

I
test
for the fifth polynomial for m5, bi
turns out to the lowest. But when you eftimate how
well this model perform thistirm out to be
,
dlightly
flawed procedure is to report the test set error
, ,

Stes+ W4 b ,
5
To modify the training and testing procedure ,
instead of we relect the model andeplit
the datalet to train test sets we split
,
,
the datalet to 3 set (train-cross validation
~
test)
Instead of evaluating on test let , we do it on

2 V -
let , then you choose the model have lowest
C .
v error

Finally ,
we test set to report out the estimate
of the generalization error of how well this model do

or new data .
Example with neiral network :

>
-
pick model with lowesteverror
Diagnoising Dial and
variance

G Fror
The middle will
be the best model

But in some cases , it is possible toRimultanearly


have high bia and
hig↓ variance . It will turn out

. To
neural networks thislituation
in recognize ,

thatTtrain is
veryhigk
you can see .
For the part of the input , you have a very complicated
that overfit , so it overfits for the part of the input .

But then for some record , for other parts of the


doesn't fit the and
input ,
it
training well ,
so it underfits

for parts of the input .

=> If the algorithm does


poorly on the
training
ret and it even does much worke than the
, on
training
Ret .
Regularization and
Bias/Variance
How the regularization can impact to the
overall performance of the algorithm:

With large X :

With
large x the ,

algorithm is highly
motivated to
keep these
parametere Wi Ro
Riall and lo
you
end
with We
up we,
With the regularization
equale zero
,
the algorithm
overfit with train test .

We will choose the


x
intermediate
Usecret evaluate then pick was? b s
<
to ,

Use test let to


report .

The x is
oppolite the degree of polynomial ,

Whenincrease , Train err increase but with

degree of polynomial decrease.


Establising a baseline
level of performance.

Look in the picture ,


we can see that it had high train
and the Jo is
higher This thing
. leade we to

it has
conclude that high bial .

But wait ,
look the human level performance
-
human's error ,
the Train is only a little higher
than humans error .
It hal some recons. One
of them is the naily rounds in audio - .

So we should consider human level


performance .
What is baseline level performance ?

L
small Jtrain
high Jo-v ↳RighFrai
Learning curves

When you
have a
larger trainingeet ,
it's harder for quadratic function
to fit all examples perfectly.
When you have more and more
trainining data the growth
,

rate of I train gradually decrease ,


To's descent speed gradually decrease .
Let's go to
high bick :

see that when we increase the


You can ,
training petlice
the Tc and Train can't change loth of things They a .

will both flatter both of them will


out and
probably just continue
to be flat like that .

That givel this conclusion maybe a little bit superling


, ,

that if a
learning algorithm has high bias getting more training data
,

will not be by itself hope that much


With high variance :
Deciding what to try
next revisited
If
your alogrithm makes acceptbly large
T
errors

in its prediction what ,


do
you try
next .

There were bideas to do it.

With high variance :


·

getting more
training data or simplify
the model ,

With high bial : -makyor model more powerful to


give then more flexibility to fit more
complex or
wigly functions .
Bias ,
variance and
neural network
.

HighTo
o
High Train

↳ g-- Strain
With high bial problem ,
we can
yo a
larger in .

But what if
, mint is too dig ,
it will case high variance
problem. For this case ,
we can regularize this larger
n2
Iterative loop of
ML development .

choose for
larger mode I ,
regularize T

overfiting
under fiting

Go to
Span Classification example
to look how it work
The first way :
Or record way :

( The
the
quantity of each word
document.
in
Error analysis

But when we have more data ,


how can

we work ?
Adding data

Instead of adding more data everything under the


sur ,we just need
adding
data where error
analyris hal indicated it lit mig help J

In this example aoloI more


we can data-related
pharma Span
>
-
But it takes a lots of time and may be
. Then
expensive we can ule data
augmentation .
Train OCR alogritha read from
to text
Image .

(
/

the center of
Recognize digit in
window
We have research directions mode-centric and
2 :

data-centric In the palt , model an algorithm


.

were prioritized for deve


lopment but now, mode
,

and algorithm are ho nice . In the current ,

era of
explosion , we can develop in a data-centric
direction.
Transfer learning :
using
data from different task
We have I ways :

Option : suitable for training set


It's .

Option 2 : If
you
have largertraining let .
Fully cycle of machine

learning project
(
= to large number of hers

(
And not too
high of computational cost .


Maybe related with the er
privacy
Ruve that condent allows
Make store
you to

this o lata .

Monitoring system allowed is to figure ait


data shifting and the
the
less accurate
algorithm wa become
↳ Then we can retrain model and then
to
carry out a model
update to replace this old model
M machine learning operations
LOps :

This refer to how to build , deploy and maintain


Fair biah and
,
ethics
Error metrics for skewed dataset

We can ue confilion matrix to evaluate the


models
Trade off precision and recall
We choose the algorithm with highest
Frscore .

Decision tree model


Decision Tree is a
very powerful model. It
widely
use in
many application .

Letago to a binary example .


Measuring purity

Entropy:
Information gain :

reduction in
=
entropy
Decision tree learning ,

putting it together .
*
Split features to get highest information gain
One hot
encoding :

Convert a feature to multiply binary


classification features.


Continuous valued features

Split with thresholds then calculate


,

to find highest information


gain

But how choose threshold ?


can we

We have a rule that choose value


in area which is around of mid value
in list .
Regression Tree

*
problem
In this , we choose the highest
information
gain /reduction on variance)
Random forest :

You might also like