0% found this document useful (0 votes)
12 views13 pages

DL Notes B Div

The document discusses the architecture and functioning of neural networks, including input layers, hidden layers, and output layers, along with the activation functions and loss functions used in training. It outlines the backpropagation algorithm for updating weights and minimizing loss, as well as various strategies for regularization to prevent overfitting. Additionally, it covers techniques for optimizing neural network training, such as gradient descent and the use of different activation functions to address issues like vanishing gradients.

Uploaded by

251SAYEE REKHE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views13 pages

DL Notes B Div

The document discusses the architecture and functioning of neural networks, including input layers, hidden layers, and output layers, along with the activation functions and loss functions used in training. It outlines the backpropagation algorithm for updating weights and minimizing loss, as well as various strategies for regularization to prevent overfitting. Additionally, it covers techniques for optimizing neural network training, such as gradient descent and the use of different activation functions to address issues like vanishing gradients.

Uploaded by

251SAYEE REKHE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

feedforuwqTd Neura NetworKs by

b Nitesh Khapra
Modue-4 0
NPTEL
The imput to Hhe Nettoork is an y-dime
VectoT
h the netwoTk Contains L- hiclden layers
havinq mneuTOMS each
ftmalty there is one oun layer
Contal ning k neurons
h each neuron im the hidden ayer
OutputJayer can be split Iito tuD
a parts PTe-acivastion k acivation
he impuk aye1 can be called O
Layey
hL
The pre-athvction at Layer IS
The activatiom at the ven by ailx) bi + wihi-i(x)]
=

the acHvatio at Layer is


Output ayer is given given b
yf)="h 0c)= O(a() hix)=g,ain)
achivation
yif() =0(Wag (Wag Wiztb)+ba)+bs)
N
Data was
Cxiy)i
parometer: = N. .. .LL, br, bz bL CL-3)
AlgeTihm:- Gradient Pescent with back- popagaution
0bjechi ve [Loss/EvroY functionm.
N
min (yiy-JU)
he choice LossSunction oleyencly on he Problem

Outputs
Beal Valuu Probabsilihieg
Outpus Acivaion Linear Softmax -F)
ossumchion Squared Cvoss Entropy
WL
hay
od(o) Df()ýaLi
Wel
FOY a Simgft tyainimg exomple, Backpropagatiom: - algortm

calcutates the gadient the enor fUntion. Backpopation


can be UTitten a s a ftckion of the meuras nehwor
Back propagattiom atgoithms are a set o methools wed to
efficiently

Ws-04 tavget 1 vdJuc


O0s(
Wa: 0-2 0-0
VeroY T2=o9
Wa 05o
3025 tnqek 2
O-10 (x 4-0 36
2g-0 52
noY
Fnpukaye Ardden Output

0 35
b Layer

vaimimg rocess oNeuTal Network:feedfod o iP


ack
o
popagaton o EToT
eaynin

weght Opdatiom new


w old -
w
pud Valut
Caleulate the valuez
X200 by OYwqve
of Hk Hz PSS
nitial wefght
W O 15 H xW +X2x2 tb1
Ws 0.45
H-0.0SX0 IS +0-10 xd.20+ 0.35
WaO50
W :30 W 0 55 H 0.375
2ias Value)
b 0 35 Stgmoid

Jarqef Value
Hfnal 1Le
te-

UT 0.01 H = 0.59326
0 o 11 X, XWyt b
H , xw3 t
0.0S XO 25-+ 0-1oX0 30t0 3S
H2 0.3925

S9moid - 5968
H fina 14eH
=HXWs t fh64 b
Y Y 0.5932 xO 400.5q688 AO.45+0-60
as T= O 0J
Y 110590
Sigmoid fun Tavgeti cdo not

O.7S136 Ymalch valuy


Y
2 H Hxwa+
+R2xnNg+b2 Y2
O.5932 X0:S0 O.S9688 x O 5S +0.6O
0.266+O:3282-t O:60
-.224
te-2 Samoid fun

Ya 0.727
Total Emon
2(tcmet -0utpti)
ETotul
Ct-8fiad t-fimal) 2
(.1-0.751 (0.99 O 42
- 0.41 0- 2173 Sme

O.294S + O.02360
weg ate
0.298109
Backoard pal8 atthe output Layey Evo7 E t ot a l - a

Etota. Ftotax 2ynal y aj1 -


Ws yfinal WS
Total a%(-Jfinal) +S(-J2final
ya final
2 (T-' fnal)x(-1)+o
-CTi -1 fina
-(o-o 0S13)
O:741365 -

final
(T4 =e(yf
1
Ifined
e-1-Yfne
final
inal x( finad finalx - fina - (8
final Os13 x (1- O751 36)
oina =
0 1868

Rfinal xWs + Rfina xhs+b2


S 3 (H final xws + Ha Pinal X W6 +b2)
NS dWs
H fina
= 0 5 32
tloue Neuval TVetwork works:
y XwH75lW,1x3wtb

z Act (Y) NeuTO Yn


z :Ak(y)
w
sp2 is0 5
out pu A E Omy then
Acvtchor
elaueT Neuro0
n un Lwille tiva

Unit- 2
Sie neurom lwith 3 mputs -
n a c h vatian
a-7 umitim
ofp

FCNN with 4 hidden layer with 4 neurovs 1 output layer.


Architechure CT C-1J
w +bi

layer a =ga:

lossuncion
Classificahioh
IRegyessiOn
Cross e n t r o p s
msE Simar CYosS eHtroPy
MAE Categonca
Lossunction A erroY f u n i s f o r Single
loss unciom
training exomple /input
overthe entire
Coss -fun:. A CosEun is he average loss
trainimg dotaset
Kegresio
H S f (mean Squortd eo)
MAE (mear Aosolule Eror
Clascation
msethe
Binar
Calegoical
diference
erdolo
CrosS- entropy
nS entrop
CrossEntror
betueen he atual value model
detase
pTediction, squqr t average it across the whole
A
MSE N
(t-
MAEThe diftrencebwthe actual value k model prediction
average it aCTOSS the whole olatset

MAE 1yi -
Ni

Hupekparameker S tumima
HuperqTamelers Crethe variable which delerrmincg
the
4the hehworkstucturt Ceq no ohidden uni)
Varnables urhic% deter mins howKe mehwork 13
troumed Ceadeaming ae).
Huper paraneersareset eore tvainina befor
opttmizinq the weight b i a
GTadient dejecent it tpes -

Snitialise wb
LDBadk qao
olren dien

Ttera ovev data


Stochastic
omu L[o,b)
Compule (,)
g yadien decen
A Mini -botc
bEH bt - n&bt hadient ducent
it stisies
Algorithm txp. TRIS (lower clausificaton usíng hatkpmpp
Tmpo cquired braries
2oad the RIS flowey dataset (esv)
Inithalize Ondomn ucights bins
rformforwgrol propadation
im: bt åswi
fyim)>Use Siqmoid funthion
S Calculate ossund
6Pevfonm backpropagahon usmgrodient descen 4-updode
the weights
TRetterale alulate emor-funtion
OBatch, adient makes the entire dakasek,caleuloe
the (ost 0 fum pdate pavameler
Stochastic Tadient
Update he pararneheYs alter
eveYY simgte observation every time he wuahis
updtcdy 1 Is Known
eupdted an tevation
i m i . batch gadient Make q Subset of data
Subse
update Ahe paameers based o n every
Batch SGD
P3

Cast Cos
w w

iteration terathom iratiams


(oS -furc Lot o Variahons Smoothe cosE
Yeduce n cos func unchon as
Smoothy Conpared to SD
Computa ime Computa ime
Compudatonal
is more is leaseyY han
(ost id very
SGD
hig enhre data
epoch 0ne Paus Over the
updale o he paameters
skep One o
No o data pts
Mim ,af ch size
No o Steps 1m d cpoch
Ago
Mgo
Ealch N
SGD
Me
Momentum) adaptive 4Y op
o pthimizahioz
himi2ahio»
bqsed
gradient
Useo hdaqrad atgo oY
adadeltd propacuhon)
propaghon)
Cfm

Root meam Sque


Ass1g

A
Rms p (
Thoment eshimation)
(Adaptive
SGD

Adom

Aeavnin g a
dagrad - modies he qeneya à Ssumalated

estict +he windouw


o
A tadelta
ixed SiZe w
Size WS
o Some
qradients
past
rmulas for bias
need to vemembey
Not No

Parametrs unm
highey pqramekrs unimg
deermine
vriables uhich
fetweY ayers -

underithing the nchuork Shruturt.


Meuny layers veuftting
VonishIma exploding gYadient

anis hing
Exploclin q
r CdiertEE)

kadiai
ealent
adent
/1

4 L l3 LLa L3 0P
Layer
Slope
The derivative o slope The devivotive olope
will
ill get SmaUey qe lavgey lorgRY
SmalleY
Due tu weiqht, not oz
Depenepende On wejghts the atation un ho
n0 Om activaton
fumdion

exist u i natkpropagation
Remedics foY Vanishing Yadiey7t heblem

1 choice of activotho n func


Vanishihq vadient
Yadient oblem Ocuswith he sigo
devIvabves
tanh achahoy fum astheiy
re blw o to o25 o to 1
To
usedbcoz
qvoidthisydient
pobleh isRELU Yactivahoyfunction
v e . evoevo p t

fov 4ve i
RELUfumchon f) Devivartive dRELU

f) = *i,when «yo
O, when x{ )

) Appopiate choice o he weiqhrt

weight s <IVO shimg vad ienl


vaden
weqhts>t, explodim j qrolient
chooSee weights vande mly
he intializatio Xviey Intali zatio
Technique Sc a Suc
)

o vnolom uwciqht s imitializatio


Tnteltigen1 chooSm backpropagation learnn19 a
Convoluton on muge
Neuralnetwork (CN RNN RLN
Convoluh on
No of traimable dataset

avYamettrs feedfeed forward metwork

Model. complie
Model add
mode
Reqyession (L2 Regulavizahio
Ridge
+ Regularized term
Ridge Reqvession= LOssunchon
RsSridpe (w,b) = Z ( i - (o,xi tb} *E u

Kegulorizoth on- IS nd af Yeqvess ion here he


earniig algorithm ave modpicd to veduce Ovevitting
earnimg

Lwitthoud vegulayizahon with Requlavization

Over-f tting Strategie-


Redute he Complexity of our model
Augment he date n o of Somples in datasef
nphy Reguarizafion, that is applyng a penelty-fector
to the doss funchon (L1 0r L2 regulhrizahon

Dro o ut modes ave droppedy p1opabty


neu o ns OpttpPa-109 ibook
foY eq O 2 :S .O

D 3 0 2 soo Dyoupou O6'0 o ot3


o 03
o6 O o 1q 3
2 I.9,0 3 2
o

arly Stopping3-is a vequlanzahon Hethnigue for


or deep
neural networks h a tops trauhig
when barameteY upadete3 no longeY begin
Tn vield 1mpTOYerment o a

Vaidaton set
Aalidatiom
set

Tyaning s e

aThy Numbey
iterations
. Spliltine of datq set e
foldYoss ylidatioy) e e e
yalidating You approac uin9 fold Vaildafieon
TData, l i t mo pvitio
Vadi,dalvnStn|
d olidahol yinino (oinna
old VvalialatioValidation"Tvaining
faAs1 Yekuke [Training Validakian
Caty Dogs dalaset Image classiication Ueing NN

rOm kaggle eom| ologs Vs-cal f d ato


Read
Readthe data
Spit
model Sequentia
Tmodel add Conv2D(filHeY, (3,3). actvatioon ReL
model SummaruCtouo may pavayneYers ved )

UTiit No- 5 n Text book. 3A3Kevna


foltowed o Rev atfivon
Stide
Stide B y hoo namu P
Kenal piels
el have AYno e

fo Tmoyemen kevval
padding asdimg 2exo n boun davy

3 cale

paddi mg Volid

laye cony2D fillos 3 , hemal sixe 3 Coia


activation -"elu") inpul)
CNN ATchitechure
Pooima
anput Convolution- (o nvolhukion
Relu flatlern
Relv
aye

Poelina Layers
Toredute ho ondihion ality
To letect eda a,eyes, n8es
Mlax pooling ma o t
Averaaa ooung

sile: chorlett gimub dL wi pHo vision


DL wimq computtr

You might also like