0% found this document useful (0 votes)

64 views6 pages

Estimating The Mean and Variance of The Target Probability Distribution

Uploaded by

Aniket kolte

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views6 pages

Estimating The Mean and Variance of The Target Probability Distribution

Uploaded by

Aniket kolte

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Estimating the Mean and Variance of the

Target Probability Distribution

David A. Nix and Andreas S. Weigend
Department of Computer Science and Institute of Cognitive Science
University of Colorado at Boulder
Boulder, CO 80309-0430, USA
[email protected]
Abstmzct- We introduce a method that diction by simultaneously estimating the degree of
estimates the mean and the variance of the noise about ji(Z) based on the noise observed in the
probability distribution of the target as a func- training data (see [2] [5]).
tion of the input, given an assumed target Just as the network output y varies with the input
error-distribution model. Through the acti- pattern I , the quantitative uncertainty due to n(Z),
vation of an auxiliary output unit, this method defined as the true variance u2 of the target error
provides a measure of the uncertainty of the distribution around f(.’), is also some function of Z .
usual network output for each input pattern. This function, a2(Z),may be constant (i.e., indepen-
We here derive the cost function and weight- dent of the input) if the target noise level is uniform
update equations for the example of a Gaus- over the range of input values. Alternatively-the
sian target error distribution, and we demon- case we want to consider here-the level of noise may
strate the feasibility of the network on a syn- vary systematically over the input space. In either
thetic problem where the true input-depen- case, not only do we want the network t o learn an
dent noise level is known. output function y(Z) M f(Z) that estimates the true
mean p(Z) of the corresponding target distribution,
I. INTRODUCTION but we also want to simultaneously learn a function
Feed-forward artificial neural networks are widely s2((.3that estimates the true variance u 2 ( Z )of that
used and well-suited for function-approximation (re- distribution, given an appropriate assumption as to
gression) tasks, particularly when there exists a suf- the distribution’s form.
ficiently large data set from which to train. In al- Based on a maximum-likelihoad formulation of a
most any real-world problem, paired input-target feed-forward neural network for function approxi-
data contains noise, one form of which is observa- mation [2] [3] [4], we here introduce a network that
tional noise that corrupts the target values (e.g., [l]). calculates s2(Z) M u 2 ( Z ) ,the estimated variance of

Thus, when we attempt to approximate the func- the target error distribution as a function of the in-
tion f(Z),we assume our measured data d ( Z ) can put, in addition to the usual output y(Z) = b(Z) x
be modeled by p(Z) = f(Z). We derive the method in full for the
case where we assume the targets are normally dis-
d ( Z ) = f(Z) + n(Z) (1) tributed about f(Z),and we apply this derivation to
a synthetic example problem where U’(.’) is known.
where the additive noise n(Z) can be viewed as er-
rors on the target values that move the targets away 11. THEMETHOD
from their true values f(Z) to their observed val- A . The Idea
ues d ( Z ) . In a function-approximation task, the
How does the network estimate s2(Z)? To the out-
output of a network given a particular input pat-
put unit y that computes ji(Zi), we add a comple-
tern, y(.’), can be interpreted as an estimate B ( Z )
mentary “s2 unit” that computes s2(Zi), the esti-
of the true mean p ( Z ) of this noisy target distribu-
mate of U ’ ( & ) given input pattern Zi.
tion around f(Z), given an appropriate error model
Since a’(.’) can never be negative or zero, we
for n(Z) [2] [3] [4].
choose an exponential activation function for s2(Z)
While an estimate of the mean of the target distri-
to naturally impose these bounds:
bution (the expectation value) for a given input pat-
tern is indeed valuable information, we sometimes
want to know more. In addition to the network’s
predicting d ( Z ) by estimating the mean of the tar-
get distribution (U(.’) = j i ( Z ) ) , we would also like
s 2 ( 2 i ) = exp
[. W,akhf(Zi)

the network to quantify the uncertainty of its pre- where p is the bias for the s2 unit and hi’(Zi) is
p +
1 (2)

0-7803-1901-X/94 $4.00 01994 IEEE 55

Authorized licensed use limited to: Indian Institute of Technology (Ropar). Downloaded on September 16,2024 at 12:50:08 UTC from IEEE Xplore. Restrictions apply.
the activation of hidden unit k for input Zi in the Output unit Variance unit
hidden layer feeding directly into the s2 unit. Y S2
Having selected a particular network architecture
(see Figure 1; inputs z' are indexed by m, hidden
units by j and k), we employ the same gradient-
descent (backpropagation) learning as in the usual
case to find wjm and W y j in order to also find a set
of weights Wkm and w63k that calculate s2(z). Thus,
after each pattern i is presented, all the weights in
the network1 are adapted to minimize some cost
function C according to x m

Input units
(3)
Figure 1: Architecture for a network with one out-
put unit y and one s2 unit. Both sets of hidden
(4)
units are connected to the same input units, but
no connections are shared (not all connections are
shown).

where q is the learning rate and Ci is the contribu-

tion of pattern i to the overall cost function C. initial weights produce a net input to the s2 unit's
We obtain a form for C by expressing our goal as exponential activation function (Eq. (2)) of approx-
maximizing the log likelihood of the targets (having imately zero, corresponding to an initial estimate
assumed our patterns are independently and identi- of approximately unit variance over the entire in-
cally distributed), given the input patterns and the put space. Because it would be premature to make
network n/ (e.g., [2] [4] [SI). That is, we attempt to differing estimates of the noise level over the input
maximize space before f(Z)is at least roughly approximated
by y(Z), we save computations by not updating the
(7) weights W k m and W r l k until y(Z) is somewhat close
i
to f(Z).
The exact form of C depends on the assumption
Additionally, if the variance is either much larger
we make as to the form of this target probability
or much smaller than unity, the bias P in Eq. (2),
distribution such that y(Z) = fi(Z).
will have to grow either very positive of very neg-
B. Details ative for s2(Z) to well approximate a'(.-). Since P
B . l . Architecture has a natural interpretation as the natural log of the
mean value of u 2 ( Z ) ,we can accelerate the learning
The s2 unit is fully connected to its own set of hid-
of s2(Z) by setting p equal to the natural log of the
den units, h" (indexed by k), just as the output
current mean variance over the training data at the
unit y is connected to its hidden units hv (indexed
endof each epoch (for early training epochs). The
by j ) (see Figure 1). Alternatively, we could connect
precise form of the variance depends on the assumed
both y and s2 to a common large set of hidden units,
form of the target error distribution model (see ex-
but our experience has been that the method works
ample below).
better when we use a split-hidden-unit architecture.
However, once training has proceeded to the point
In addition, we could easily add a second hidden
where the approximation of f ( Z ) is improving only
layer (split or shared) as required by the particular
very slowly, we assume that y(Z) is a rough approx-
form of f ( 2 ) and/or u2(Z),but we will restrict our
imation of f ( Z ) that additional training will fine-
attention here to the case of a single hidden layer.
tune. At this point, and for subsequent training, we
B.2. Learning Dynamics update the weights that calculate ~'(5)according to
All initial weights are drawn from a uniform ran- Eqs. (13) and (14). Furthermore, we no longer set
dom distribution E [-1,1] and scaled by the recip- the s2 unit's bias /3 to the natural log of the current
rocal of the number of incoming connections. These mean variance over the training data; instead we
update /3 according to gradient descent just like all
'The biases of all units are considered to be an additional
weight connected t o a hidden unit clamped at 1 and are updated other weights and biases. Training is then continued
accordingly. until C does not decrease significantly further.

Authorized licensed use limited to: Indian Institute of Technology (Ropar). Downloaded on September 16,2024 at 12:50:08 UTC from IEEE Xplore. Restrictions apply.
111. A SPECIFIC
EXAMPLE been to generate error bars in addition to the pre-
dicted values themselves, none of the entries con-
A . Normally Distributed Errors
tained principled uncertainty estimates [7]. Ignor-
Least-squares regression techniques can be interpret- ing the possibility a variable u2(.‘) is equivalent to
ed aa maximum likelihood with an underlying Gaus- assuming U”(&) to be a constant independent of
sian error model. In this simple case of assuming Zi. With this assumption, the second term in Eq.
normally distributed errors around f(.‘), we have (10) is a constant that can be ignored for minimiza-
tion, and the 1/2u2(Zi) term in Eq. (10) is a con-
stant that is incorporated into the learning rate
in Eqs. (11) and (12). This assumption results in
(8) the standard equations for backpropagation using
aa the target probability distribution for input pat- a sum-squared-error cost function. However, since
tern Zi, where, as before, y(5i) corresponds to the we are specifically allowing for a variable U2(Zi), we
mean of this distribution and u”(1i)is the variance. explicitly keep these terms in the cost function.
If we take the natural log of both sides, we get
8. A Synthetic Example Problem
1 B.l. The Problem
lnP(diIZi,N) = - s l n ( 2 ~ )
To demonstrate the application of Eqs. (10)-(14),
we construct a one-dimensional example problem
where the true f(Z) and a2(.’) are known. We con-
as the log likelihood to be maximized. The first sider an amplitude-modulation equation of the form
term on the right is a constant and can be ignored
for maximization. Since maximizing a value is the f(x) = m(z)sin(w,z) (15)
same as minimizing the negative of that value, we where m ( z ) = sin(wmz). For this simple example
write what remains of the right-hand side of Eq.
we choose w, = 5 and wm = 4 over the interval
(9) as a cost function C to be minimized over all
2 E [O,T/2].
patterns i: We generate our target values according to Eq.
(1) where n(z) is zero-mean Gaussian noise with
variance u2(x)that changes according t o

Using Eq. (10) for C, we obtain our weight-update

u2(x)= 0.02 + 0.02 x [I - m(x)12. (16)
equations by first specifying a linear activation func- We generate 5000 patterns and randomly assign ap-
tion for y and tanh activation functions for the hid- proximately 25% of these t o a cross-validation set to
den units.” Then we approximate U”((.;.) by s2(&), guard against overfitting [SI. We perform gradient-
the activation of the s2 unit, and take the deriva- descent learning on the remainder of the patterns,
tives in Eqs. (3)-(6) for pattern i: updating the weights according to Eqs. (11)-( 14)
after the presentation of each pattern. We connect
thirty tanh hidden units to y and ten tanh hidden
units to s”. To avoid artifacts, we use a conservative
learning rate of 7) = lo-‘ and do not use momen-
tum.
As described above, initially only the weights and
biases that calculate y(z) are updated after each
pattern presentation, and O , in Eq. (2) is set a t the
end of each epoch to the natural log of the mean-
squared error over the entire training set. After
the normalized mean-squared error stops decreasing
x w,lk[l - hia(Zi)]22m,i.(14) sharply, all parameters in the network are adapted
by Eqs. (11)-(14) after each pattern presentation,
Despite one of the tasks of the Santa Fe Tame including p.
Series Prediction and Analysis Competition having
8.2. Results
We could select m y appropriate hidden-unit activation func-
tion for either set of hidden units; however, for simplicity we The learning curve for the normalized-mean-squared
choose all tanh activation functions. error (NMSE) is plotted in Figure 2. The curve

Authorized licensed use limited to: Indian Institute of Technology (Ropar). Downloaded on September 16,2024 at 12:50:08 UTC from IEEE Xplore. Restrictions apply.
1.0 1 -Training Set
--- Cross-validation Set
1.o

0.8 0.5
W
*. 0.0
3 0.6
0.4 -0.5
True function f(x)
0.2 Network output y(x)

Figure 2: Learning curve for normalized mean- Figure 4: Training data, true function f(z), and
squared error ( N M S E v = MSEp/u&; where V estimate y(z) (epoch 3000).
is either the training or the cross-validation data).
0.12

0.10

-1 .o 1 -Training Set
--- Cross-validation Set 0.08

2!
.g 0.06
s 0.04

0.02

0.0 0.4 0.8 1.2 1.6

X
Epoch

Figure 5: True variance c2(t) and estimate sz(z)

Figure 3: Learning curve for normalized cost (epoch 3000).
( N C p = Cv/u$; where 2, is either the training
or the cross-validation data ).
or the NC is observed on the cross-validation set.
In Figure 4 we plot the training data, the true
has two primary descent phases, corresponding to function f ( t ) ,and the output of the network, y(z),
learning each of the two significant oscillations in over the interval z E [ 0 , ~ / 2 ] . We see that the
f(z);z E [ 0 , r / 2 ] .The curve starts to level out at network's approximation y(z) closely matches the
about epoch 1500,at which point y(z) approximates shape of f(t). Note that the approximation is slightly
the gross features of f(z). After epoch 1500, Eqs. better for smaller t than for larger t;this is due in
(11)-(14) are used to update all parameters in the part to the robust regression effect described below.
network. The true variance u 2 ( z ) ,given by Eq. (16)' is plot-
While the NMSE continues to decrease slightly ati ted in Figure 5 along with the network's estimate
the approximation y(z) w f(t)is fine-tuned, we see s 2 ( t ) over the interval t E [ 0 , n / 2 ] . We see that
in Figure 3 that the normalized cost (NC) continues s2(z) follows closely the shape of u 2 ( z ) . The slight
to decrease steadily as s2(z)learns to approximate differences are due to a combination of two factors.
u2(z).Training is continued until epoch 3000. Note First, the approximation of f(t) is not perfect, so
that no overfitting with respect to either the NMSE there is some error introduced to the approximation

Authorized licensed use limited to: Indian Institute of Technology (Ropar). Downloaded on September 16,2024 at 12:50:08 UTC from IEEE Xplore. Restrictions apply.
of u 2 ( z ) . Second, with a finite sample size the tar- overfitting weaponry as is required when attempt-
get noise will not have exactly the true u2(iz). With ing function approximation with any sparse data set
these slight differences aside, however, for a given in- (e.g., adding complexity penalty terms to Eq. (10)
put z,we have accurate estimates y and s2 of both PI [O
the mean and the variance of the target probability V . CONCLUSIONS
distribution.
We have introduced a method to estimate the un-
IV. DISCUSSION certainty of the output of a network that tries to
A . Robust Regression approximate a function. This is accomplished by
learning a second function s2(.‘) that estimates u 2 ( S ) ,
Naively, one might expect that allowing for varia- the variance of the target probability distribution
tions in u 2 ( q (with the addition of the s2 unit and around f(.‘) as a function of the input 5. This
the resulting modifications of the standard back- function provides a quantitative estimate of the tar-
propagation weight-update equations) does not al- get noise level depending on the location in input
ter the way the network approximates f(.’). How- space and, therefore, provides a measure of the un-
ever, this is not the case. According to Eqs. (11) certainty of y(5).
and (12), as long as U’(.’) is constant over all Si, We have derived the specific weight update equa-
the effective learning rate is constant over all pat- tions for the case of Gaussian noise on the outputs,
terns ( q / u 2 ) . However, for input patterns where i.e., we have shown how to estimate the second mo-
u2(.‘) is smaller than average, the learning rate q is ment (variance) of the target distribution in addi-
effectively amplified compared to patterns for which tion to the usual estimation of the first moment
u2(.’) is larger that average. Thus, this particular
(mean). The extension of this technique to other er-
estimation of u2(.’) has the side-effect of biasing the ror models is straightforward (e.g., a Poisson model
network’s allocation of its resources towards lower- could be used when the errors are suspected to be
noise regions, discounting regions of the input space Poisson distributed, etc.).
where the network is producing larger-than-average For very sparse data sets, we may only be able
errors. Through this side-effect, this procedure im- to reasonably estimate the first moment of the tar-
plements a form of robust regression, emphasizing get distribution. Estimating both the first and sec-
low-noise regions of the input space in the alloca- ond moments is a reasonable goal if we are dealing
tion of the network’s remaining resources. with a moderately sized data set. For extremely
B. Overfitting large data sets, however, one can be more ambi-
tious and aim for estimating the entire probability
In the example problem above, a sufficiently large
density function using connectionist methods [9] or
data set was used such that overfitting of y(.’) to the
hidden Markov models with mixed states [lo].
training data was not observed. However, in most
We will apply our method to the real-world Data
applications data sets are more limited, and overfit-
Set A (from a laser) from the Santa Fe Time Series
ting can present a serious problem. In our method,
Analysis and Prediction Competition [5] [7].
an accurate approximation of u2(.’) depends on the
quality of y(2) as an approximation of f(2) with- ACKNOWLEDGMENTS
out overfitting. To see why this is so, consider the
extreme case where a network overfits the training We would like to thank David Rumelhart and Barak
data such that the error is zero on every training Pearlmutter for discussing the problem and the ap-
pattern. Then the estimated variance would be zero proach. We would also like to thank Wray Buntine
even though in reality there may be considerable for emphasizing the potential problem of overfitting
target noise about the true f(.’). of the variance function. This work was supported
In addition, we must also be concerned with over- by a Graduate Fellowship from the Office of Naval
fitting s2(.‘) to the training data. For example, take Research and by NSF grant RIA ECS-9309786.
a situation in which the true variance is constant
over the input space, yet in one small region we REFERENCES
have four one-dimensional input patterns arranged [l] M. Casdagli, S. Eubank, J.D. Farmer, and J.
such that the outer two patterns have small errors Gibson, “State Space Reconstruction in the
from f(z) and and the inner two have large errors. Presence of Noise.’’ Physica D, vol. 51D, pp.
We do not want s2(z)to estimate a sudden increase 52-98, 1991.
in the variance in the region of the inner two pat- [2] W.L. Buntine and A S . Weigend, “Bayesian
terns. Therefore, in applying our technique to rel- Backpropagation.” Complex Systems, vol. 5 ,
atively short data sets, we must use the same anti- pp. 603-643, 1991.

Authorized licensed use limited to: Indian Institute of Technology (Ropar). Downloaded on September 16,2024 at 12:50:08 UTC from IEEE Xplore. Restrictions apply.
[3]D. MacKay, “A Practical Bayesian Frame-
work for Backpropagation.” Neural Computa-
tion, vol. 4,no. 3, pp. 448-472,1992.
[4]D.E. Rumelhart, R. Durbin, R. Golden, and Y.
Chauvin, “Backpropagation: The Basic The-
ory.” In Backpropagation: Theory, Architec-
tures and Applications, Y. Chauvin and D. E.
Rumelhart, eds., Lawrence Erlbaum, 1994.
[5] N.A. Gershenfeld and A S . Weigend, “The Fu-
ture of Time Series.” In Time Series Prediction:
Forecasting the f i t u r e and Understanding the
Past, A.S. Weigend and N.A. Gershenfeld, eds.,
Addison-Wesley, pp. 1-70, 1994.
[6] A.S. Weigend, B.A. Huberman, and D.E.
Rumelhart, “Predicting Sunspots and Ex-
change Rates with Connectionist Networks.” In
Nonlinear Modeling and Forecasting, M. Cas-
dagli and S . Eubank, eds., Addison-Wesley, pp.
395-432,1992.
[7]A S . Weigend and N.A. Gershenfeld, eds., Time
Series Prediction: Forecasting the Future and
Understanding the Past. Santa Fe Institute
Studies in the Sciences of Complexity, Proc.
Vol. XV.,Addison-Wesley, 1994.
[8] A S . Weigend, B.A. Huberman, and D.E.
Rumelhart, “Predicting the h t u r e : A Con-
nectionist Approach,” International Journal of
Neural Systems, vol. 1, pp. 193-209,1990.
[9]A.S. Weigend, “Predicting Predictability,”
Preprint, Department of Computer Science,
University of Colorado at Boulder, in prepa-
ration, 1994.
[lo] A.M. Fraaer and A. Dimitriadis, “Forecast-
ing Probability Densities Using Hidden Markov
Models with Mixed States.” In Time Series
Prediction: Forecasting the Future and Under-
standing the Past, A S . Weigend and N.A. Ger-
shenfeld, eds., Addison-Wesley, pp. 265-282,
1994.

Authorized licensed use limited to: Indian Institute of Technology (Ropar). Downloaded on September 16,2024 at 12:50:08 UTC from IEEE Xplore. Restrictions apply.

Sae Arp4754 Rev.A
88% (8)
Sae Arp4754 Rev.A
115 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
Classification BP Regression KNN Other Classifiers - Final
No ratings yet
Classification BP Regression KNN Other Classifiers - Final
116 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Multivariable Functional Interpolation and Adaptive Networks
No ratings yet
Multivariable Functional Interpolation and Adaptive Networks
35 pages
Lecture 2
No ratings yet
Lecture 2
67 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
Towards A Mathematical Understanding of Neural Network-Based Machine Learning: What We Know and What We Don't
No ratings yet
Towards A Mathematical Understanding of Neural Network-Based Machine Learning: What We Know and What We Don't
56 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
MATLAB Programs From Unit 6 of M.Tech Computer Science
No ratings yet
MATLAB Programs From Unit 6 of M.Tech Computer Science
35 pages
DHSCH 6
No ratings yet
DHSCH 6
35 pages
Question 105A
No ratings yet
Question 105A
33 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
Six Lectures On NN - Montanari
No ratings yet
Six Lectures On NN - Montanari
77 pages
JNTUK R20 B.tech CSE 4-1 Cloud Computing Unit 1 Notes
No ratings yet
JNTUK R20 B.tech CSE 4-1 Cloud Computing Unit 1 Notes
18 pages
Chap11 Neural Nets
No ratings yet
Chap11 Neural Nets
38 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Selected Theoretical Aspects of ML and Deep Learning
No ratings yet
Selected Theoretical Aspects of ML and Deep Learning
46 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
38 Backpropagation
No ratings yet
38 Backpropagation
19 pages
ML Unit Ii
No ratings yet
ML Unit Ii
16 pages
L06 Slides - mlp3
No ratings yet
L06 Slides - mlp3
26 pages
Lecture Slides 2 - Neural Networks - 2021
No ratings yet
Lecture Slides 2 - Neural Networks - 2021
42 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
DHSCH 6
No ratings yet
DHSCH 6
30 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Neural Networks
No ratings yet
Neural Networks
14 pages
12 - ASAP - NPTEL - Neural Network - Let4
No ratings yet
12 - ASAP - NPTEL - Neural Network - Let4
13 pages
ANN-Implemetation of Back-Prop
No ratings yet
ANN-Implemetation of Back-Prop
89 pages
TFM Lichtner Bajjaoui Aisha
No ratings yet
TFM Lichtner Bajjaoui Aisha
18 pages
UNIT 3 - Backpropagation Algorithm
No ratings yet
UNIT 3 - Backpropagation Algorithm
38 pages
Unit 2
No ratings yet
Unit 2
36 pages
Neural Networks Unit-3
No ratings yet
Neural Networks Unit-3
14 pages
Mid Summary
No ratings yet
Mid Summary
13 pages
BackProp in Recurrent NNs
100% (1)
BackProp in Recurrent NNs
10 pages
Anomaly Detection As A Tool For Discovering New Physics at CERN's Large Hadron Collider
No ratings yet
Anomaly Detection As A Tool For Discovering New Physics at CERN's Large Hadron Collider
8 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
Error Propagation
No ratings yet
Error Propagation
22 pages
Hernandez Lobatoc15
No ratings yet
Hernandez Lobatoc15
9 pages
ML Algorithms
No ratings yet
ML Algorithms
10 pages
Neural Networks - Slides - CMU - Aarti Singh & Barnabas Poczos
No ratings yet
Neural Networks - Slides - CMU - Aarti Singh & Barnabas Poczos
36 pages
Module 2 DL Snotes P1
No ratings yet
Module 2 DL Snotes P1
16 pages
Constructed Layered Systems: Measurements and Analysis
From Everand
Constructed Layered Systems: Measurements and Analysis
W. H. Cogill
No ratings yet
Back Propagation
No ratings yet
Back Propagation
9 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
Neural Networks Embed D
No ratings yet
Neural Networks Embed D
6 pages
FALLSEM2023-24 CSE4020 ELA VL2023240104096 2023-09-07 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSE4020 ELA VL2023240104096 2023-09-07 Reference-Material-I
7 pages
John Bullinaria's Step by Step Guide To Implement Neuronal Network in C
No ratings yet
John Bullinaria's Step by Step Guide To Implement Neuronal Network in C
6 pages
Happymonk Test Paper For Data Scientist Intern
No ratings yet
Happymonk Test Paper For Data Scientist Intern
2 pages
Introduction To Neural Networks: RWTH Aachen University Chair of Computer Science 6 Prof. Dr.-Ing. Hermann Ney
No ratings yet
Introduction To Neural Networks: RWTH Aachen University Chair of Computer Science 6 Prof. Dr.-Ing. Hermann Ney
31 pages
Exp 4
No ratings yet
Exp 4
9 pages
10 - FCFS and SJF Algorithm
No ratings yet
10 - FCFS and SJF Algorithm
28 pages
Learning Rules For Multilayer Feedforward Neural Networks
No ratings yet
Learning Rules For Multilayer Feedforward Neural Networks
19 pages
Feed Forward Neural Network Assignment PDF
No ratings yet
Feed Forward Neural Network Assignment PDF
11 pages
Lec 15 MLP Cont
No ratings yet
Lec 15 MLP Cont
34 pages
A Survey of Randomized Algorithms For Training Neural Networks
No ratings yet
A Survey of Randomized Algorithms For Training Neural Networks
10 pages
Sparse Autoencoder
No ratings yet
Sparse Autoencoder
15 pages
Scratch Programming (Scratch 3.0)
No ratings yet
Scratch Programming (Scratch 3.0)
13 pages
PL 900
No ratings yet
PL 900
14 pages
A Seminar Report ON Direct-To-Home Television (DTH)
100% (1)
A Seminar Report ON Direct-To-Home Television (DTH)
32 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Deep Feedforward Networks Application To Patter Recognition
No ratings yet
Deep Feedforward Networks Application To Patter Recognition
5 pages
Online IGE 2022
No ratings yet
Online IGE 2022
15 pages
(IJCST-V6I4P17) :P T V Lakshmi
No ratings yet
(IJCST-V6I4P17) :P T V Lakshmi
4 pages
Information Sciences: Le Zhang, P.N. Suganthan
No ratings yet
Information Sciences: Le Zhang, P.N. Suganthan
3 pages
Learning Representations by Backpropagating Errors PDF
No ratings yet
Learning Representations by Backpropagating Errors PDF
4 pages
Learning Management System (LMS) : USER Manual Version 6.0: Sl. No Version History Date
No ratings yet
Learning Management System (LMS) : USER Manual Version 6.0: Sl. No Version History Date
19 pages
Hints of Assignment5 - Fall 2024
No ratings yet
Hints of Assignment5 - Fall 2024
11 pages
TechSmart 131, August 2014
No ratings yet
TechSmart 131, August 2014
52 pages
Kamera Sultan
No ratings yet
Kamera Sultan
4 pages
Gauss-Sediel Methode
No ratings yet
Gauss-Sediel Methode
36 pages
PDF-3 SRT - Files - PKJ
No ratings yet
PDF-3 SRT - Files - PKJ
11 pages
Unit 1 Introduction To Cloud Computing: Structure
No ratings yet
Unit 1 Introduction To Cloud Computing: Structure
17 pages
70 00035 01 00 AN - 30e - User - Manual
No ratings yet
70 00035 01 00 AN - 30e - User - Manual
106 pages
Practice - Creating A Discount Modifier Using Qualifiers
No ratings yet
Practice - Creating A Discount Modifier Using Qualifiers
37 pages
Ebooks File Handbook of Discrete and Combinatorial Mathematics, Second Edition Kenneth H. Rosen (Editor-In-Chief) All Chapters
100% (1)
Ebooks File Handbook of Discrete and Combinatorial Mathematics, Second Edition Kenneth H. Rosen (Editor-In-Chief) All Chapters
55 pages
Narrative Part 2
No ratings yet
Narrative Part 2
67 pages
2 - EDS-528E-4GTXSFP-HV - Layer 2 Managed Switches EDS-528E Series - MOXA
No ratings yet
2 - EDS-528E-4GTXSFP-HV - Layer 2 Managed Switches EDS-528E Series - MOXA
1 page
Assignment Guidelines-July'24 Session
No ratings yet
Assignment Guidelines-July'24 Session
2 pages
Transpose of A Matrix in Python With User Input
No ratings yet
Transpose of A Matrix in Python With User Input
15 pages
Smart Parking System..
No ratings yet
Smart Parking System..
75 pages
Sencon 2.0 Software Update Version 2
No ratings yet
Sencon 2.0 Software Update Version 2
11 pages
Paper Batik 2024 With Mahatir
No ratings yet
Paper Batik 2024 With Mahatir
8 pages
Department of Mca
No ratings yet
Department of Mca
32 pages
Beam Torsion Al Section Properties
No ratings yet
Beam Torsion Al Section Properties
26 pages
S33120+Kate Saenko+Fighting Dataset Bias in Computer Vision - 1617924588759001FZj3
No ratings yet
S33120+Kate Saenko+Fighting Dataset Bias in Computer Vision - 1617924588759001FZj3
47 pages
5 G Test Bed
No ratings yet
5 G Test Bed
10 pages
How To Add A Username To A V7 BIRT Report PDF
No ratings yet
How To Add A Username To A V7 BIRT Report PDF
5 pages
PHD Thesis Template For Dtu Management
No ratings yet
PHD Thesis Template For Dtu Management
13 pages

Estimating The Mean and Variance of The Target Probability Distribution

Uploaded by

Estimating The Mean and Variance of The Target Probability Distribution

Uploaded by

Estimating the Mean and Variance of the

Target Probability Distribution

0-7803-1901-X/94 $4.00 01994 IEEE 55

where q is the learning rate and Ci is the contribu-

Using Eq. (10) for C, we obtain our weight-update

0.0 0.4 0.8 1.2 1.6

Figure 5: True variance c2(t) and estimate sz(z)

You might also like