Learning Algorithm
Learning Algorithm
Learning
Alogrithm
Demand Prediction
In this example, you're selling T-shirts and you would like to know if
a particular T-shirt will be a top seller, yes or no, and you have
collected data of different t-shirts that were sold at different prices,
as well as which ones became a top seller. This type of application
is used by retailers today in order to plan better inventory levels as
well as marketing campaigns. If you know what's likely to be a top
seller, you would plan, for example, to just purchase more of that
stock in advance.
ban chay vabanco dili thi thap disc vecc co thin da dic ban voi cas mic
gic Khoic nhan cing nhi chies niobanchay. Loa lingding may ligay nay
die cac nha ban le siding -
d lipkhoach v mic to ko ang nhi ke
-
hoach marketing. Nu nhibabit cainco garchi ban chay barco the lenk
-
,
hooch vidy
,
nhi la mic nhin kien hang othon.
In this example, the input feature x is the price of the T-shirt, and
so that's the input to the learning algorithm. If you apply logistic
regression to fit a sigmoid function to the data that might look like
that then the outputs of your prediction might look like this, 1/1
plus e to the negative wx plus b. Previously, we had written this as
f of x as the output of the learning algorithm.
Now e
wealphabet a to denote the output of this logistic
Regression algorithm We . can think it as a
very simplified
model of aringle neural in the brain.
Rek
Activation - Linear
Softmax
we will learn more about it
in followinglectures .
In this feature
example , we
just have one ,
now we're
have 4 features
going to .
The affordability ,
awareness , perceived quality are .
activations
Hidden layer :
Player ,
Enoder
: :
From layer har less unito to the layer have more units , it
On the contrary
,
When from layer have more unith to layer
the previo windows to
have less units , it groub
create new windows
.
> [1 >
-
layer
layer
-
unit
↑ hen we have a -
the output of neural network
Now we can you al to predict with threshold 0 5
,
More complex neural network
Inference : Making predictions
(forward propagation)
Inference in code :
TerrorFlow Implementation
Good tasting coffe
?
low temperature
-
too long or
highly temperature
Simple neural network :
In tensorflow : tendor-matrix
numpy :
array
-
linear array
But don't
worry ,
we can use some function to convert
between it.
Building a neural network
Layer /
Layer I
First ,
we need to determine the architecture.
Then and data
,
preparing x y .
Let's go
to example about Digit classification :
We can convert pardel data frame to
numpy array
by function : df to-mampy 1)
.
-
Ku
Artifical General Intelligence
AGI
What is it ?
Vectorization (optional)
How neural network are implemented
efficiently ?
Matrix Multiplication
Then the result is :
We have 3 stepl :
1.
Specify the model .
2
. Complie the model(ing alpecific loss function
much al Binary Crossentropy
- -
rigmoid activation
ReLLI :
Common activations :
But ,
How can we choose them ?
activation function
Choosing
linear activation.
valea
Regressionwithoutlegative
3 -
With hidden layer :
1
. ReLU is a bit falter to compute because
it required computingiax of 0, 2
.
2 RelU function goes flat only in one
Digits dalification :
Maybe ,
we will havelome erross :
Callification :
Softmax :
Logistic regression :
Advanced Optimization
We learned that Gradient Descent is led to
have
minimize the cost of alogrithn but now with
,
a
huge data it's now and we have another
,
automatically .
With gradient descent , you have only Ringled
But with Adam you have muld for each w,
,
I
Additional Layer Types
Convolutional layer :
S
7
-
What is derivative ?
package calculatee
Alseeypy
to
Computation graph
Oppolite the
Back prop right
: to left .
Let's check :
So why to
O we we back prop to compute
the derivatives?
* For
computing cost function ;:
use
left-to-right (Forward prop
For
computing all derivativel :
I
test
for the fifth polynomial for m5, bi
turns out to the lowest. But when you eftimate how
well this model perform thistirm out to be
,
dlightly
flawed procedure is to report the test set error
, ,
Stes+ W4 b ,
5
To modify the training and testing procedure ,
instead of we relect the model andeplit
the datalet to train test sets we split
,
,
the datalet to 3 set (train-cross validation
~
test)
Instead of evaluating on test let , we do it on
2 V -
let , then you choose the model have lowest
C .
v error
Finally ,
we test set to report out the estimate
of the generalization error of how well this model do
or new data .
Example with neiral network :
>
-
pick model with lowesteverror
Diagnoising Dial and
variance
G Fror
The middle will
be the best model
. To
neural networks thislituation
in recognize ,
thatTtrain is
veryhigk
you can see .
For the part of the input , you have a very complicated
that overfit , so it overfits for the part of the input .
With large X :
With
large x the ,
algorithm is highly
motivated to
keep these
parametere Wi Ro
Riall and lo
you
end
with We
up we,
With the regularization
equale zero
,
the algorithm
overfit with train test .
The x is
oppolite the degree of polynomial ,
it has
conclude that high bial .
But wait ,
look the human level performance
-
human's error ,
the Train is only a little higher
than humans error .
It hal some recons. One
of them is the naily rounds in audio - .
L
small Jtrain
high Jo-v ↳RighFrai
Learning curves
When you
have a
larger trainingeet ,
it's harder for quadratic function
to fit all examples perfectly.
When you have more and more
trainining data the growth
,
that if a
learning algorithm has high bias getting more training data
,
getting more
training data or simplify
the model ,
HighTo
o
High Train
↳ g-- Strain
With high bial problem ,
we can
yo a
larger in .
But what if
, mint is too dig ,
it will case high variance
problem. For this case ,
we can regularize this larger
n2
Iterative loop of
ML development .
choose for
larger mode I ,
regularize T
overfiting
under fiting
Go to
Span Classification example
to look how it work
The first way :
Or record way :
( The
the
quantity of each word
document.
in
Error analysis
we work ?
Adding data
(
/
the center of
Recognize digit in
window
We have research directions mode-centric and
2 :
era of
explosion , we can develop in a data-centric
direction.
Transfer learning :
using
data from different task
We have I ways :
Option 2 : If
you
have largertraining let .
Fully cycle of machine
learning project
(
= to large number of hers
(
And not too
high of computational cost .
↓
Maybe related with the er
privacy
Ruve that condent allows
Make store
you to
this o lata .
Entropy:
Information gain :
reduction in
=
entropy
Decision tree learning ,
putting it together .
*
Split features to get highest information gain
One hot
encoding :
↓
Continuous valued features
*
problem
In this , we choose the highest
information
gain /reduction on variance)
Random forest :