0% found this document useful (0 votes)
77 views

Neural Net HO

ok, still pretty basic, but finally added something on linear algebra linear regression and kernels, and gaussian process regression. hope it's somewhat coherent or at least usable / correct -- formula-wise. i would certainly appreciate being notified if any errors are discovered

Uploaded by

phli
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Neural Net HO

ok, still pretty basic, but finally added something on linear algebra linear regression and kernels, and gaussian process regression. hope it's somewhat coherent or at least usable / correct -- formula-wise. i would certainly appreciate being notified if any errors are discovered

Uploaded by

phli
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Neural Nets

P
i fi wi > threshold → Output = 1, else Output = 0

1 Simple Logical Connectives

and: or: not:

{1,0}*1 {1,0}*1 {1,0}*1 {1,0}*1 {1,0}*-1


+ +

>1.5? >.5? >-.5

or:
and: not:
{1,0}*1 1*-.5 {1,0}*1
{1,0}*1 1*-1.5 {1,0}*1 {1,0}*-1 1*.5
+ +
+ + +
>0?
>0? >0?

2 Training

∆w = (C) ∗ Error ∗ f
Error = CorrectAnswer − Output

2.1 or, C=1, threshold=.5

ex. f1 f2 CA w1 w2 f1 w1 + f2 w2 Output Error ∆w1 ∆w2


1 0 0 0 0 0 0 0 0 0 0
2 1 0 1 0 0 0 0 1 1 0
3 0 1 1 1 0 0 0 1 0 1
4 1 1 1 1 1 1 1 0 0 0

1
2.2 or, C=.5, threshold=0
P
ex. f1 f2 f3 CA w1 w2 w3 i fi wi Output Error ∆w1 ∆w2 ∆w3
1 0 0 1 0 0 0 0 0 0 0 0 0 0
2 1 0 1 1 0 0 0 0 0 1 .5 0 .5
3 0 1 1 1 .5 0 .5 .5 1 0 0 0 0
4 1 1 1 1 .5 0 .5 1 1 0 0 0 0
1 0 0 1 0 .5 0 .5 .5 1 -1 0 0 -.5
2 1 0 1 1 .5 0 0 .5 1 0 0 0 0
3 0 1 1 1 .5 0 0 0 0 1 0 .5 .5
4 1 1 1 1 .5 .5 .5 1.5 1 0 0 0 0
1 0 0 1 0 .5 .5 .5 .5 1 -1 0 0 -.5
2 1 0 1 1 .5 .5 0 .5 1 0 0 0 0
3 0 1 1 1 .5 .5 0 .5 1 0 0 0 0
4 1 1 1 1 .5 .5 0 1 1 0 0 0 0
1 0 0 1 0 .5 .5 0 0 0 0 0 0 0
2 1 0 1 1 .5 .5 0 .5 1 0 0 0 0
3 0 1 1 1 .5 .5 0 .5 1 0 0 0 0
4 1 1 1 1 .5 .5 0 1 1 0 0 0 0

3 linear regression

In fact, neural nets are just basic linear algebra:


 
xa
 xb 
~xT · w
~ =  xc  · [wa wb wc ...] = xa wa + xb wb + xc wc + ...

...
and linear algebra makes linear regression incredibly easy!
assuming a least squares cost function,
with tn the true value, given ~xn ,
and ~x · w
~ is our neural model’s predicted value.

1X
(tn − (~xTn · w))
~ 2 (1)
2 n
the minimum (i.e. derivative=0) is given where
~ = (XT · X)−1 · XT · ~t
w (2)

where X is the vector of all the input vectors, and ~t the vector of their true outcomes.
(gotta take that inverse so XT · X needs to be nonsingular)
try it out in matlab (if wealthy or connected)
or Octave or R (for the rest of us, both quite remarkable tools for free)

2
4 kernels (gpr, in particular)

even more magical mathematics (i.e. not got into here)


also derive the following elegant alternative method of prediction:

~ T · ~k
f (~x) = w (3)
−1~
~ = (K + λIn )
w t (4)

In is the identity matrix, and λ is whatever fudge room is needed (can help with inversion)
K is an n×n matrix where Kij = k(xi , xj )
~ki = k(~x, ~xi ), for all (n) observed xi

and what’s k(xi , xj )? whatever function seems most useful!

“one of the most common Gaussian Process Regression kernels”:

θ2
k(~xi , ~xj ) = θ1 exp(− ||~xi − ~xj )||2 ) + θ3 (~xTi · ~xj ) + θ4 (5)
2

the 3 terms are:


• the Gaussian kernel (note θ2 is basically the inverse of σ 2 )
• a linear term (moderated by θ3 )
• a constant offset (θ4 )
play with θ weights or optimize against error function.

3
5 classification

• For a binary classification, t is either 0 or 1.


• For multiple classifications, t becomes ~t, and ~t becomes T
(scalar becomes a vector, vector becomes a matrix): tn =[0 0 0 1], or whatever
• and that also means w
~ become W.
• (“logistic/sigmoidal”) S-functions ((1 + exp(−x))−1 ) are good for classification because any
high enough input generates a 1, and any low enough input generates a 0.

6 multilayers and back propagation?

• don’t do it unless you have to.


• multilayers are said to be necessary for functions like exclusive ‘or’.
• the activation function must be differential (e.g. sigmoidal).
• it’s basically just gradient descent.
• it can get stuck in local minima.
• start with random weights, adjust until error is tolerable.
(do multiple times to check for better minima)
(as presented by R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996
http://page.mi.fu-berlin.de/rojas/neural/chapter/K7.pdf)
for sigmoidal activation:

∆wij = −γoi dj (6)


dj = oj (1 − oj )(oj − tj ) (7)
X
dj = oj (1 − oj ) wjq dq (8)
q

δE δ 1 (oj − tj )2
= oi dj , 2 = oj − tj (9)
δwij δwij

1 δs(x)
s(x) = , = s(x)(1 − s(x)) , s(x) = oj (10)
1 + exp(−Cx) δx

dj is “the backpropagated error”


(or perhaps rather the multiplication of its derivatives (basically the chain rule), as shown by eq. 9-10);
E is the error.
or at the hidden layers, summing over the weights and “errors” from the observed side (eq. 8).

γ is “a learning constant”; oi feeds wij ; tj is the desired output; C is whatever constant as well.

You might also like