0% found this document useful (0 votes)
3 views5 pages

6.estimators (C)

Deep Learning

Uploaded by

Kavitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views5 pages

6.estimators (C)

Deep Learning

Uploaded by

Kavitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

6.

ESTIMATORS, BIAS AND VARIANCE

Foundational concepts such as parameter estimation, bias and variance are useful
to formally characterize notions of generalization, underfitting and overfitting.

Point Estimation

 Point estimation is the attempt to provide the single “best” prediction of


some quantity of interest.

 In general the quantity of interest can be a single parameter or a vector


of parameters in some parametric model, such as the weights in our
linear regression.

 In order to distinguish estimates of parameters from their true value, our


convention will be to denote a point estimate of a parameter θ by θˆ.

 Let {x(1) , . . . , x(m)} be a set of m independent and identically distributed

 The definition does not require that g return a value that is close to
the true θ or even that the range of g is the same as the set of allowable
values of θ.

Function Estimation

 Here we are trying to predict a variable y given an input vector x.

 We assume that there is a function f (x) that describes the approximate


relationship between y and x. For example, we may assume that y = f(x) + ∈,
where ∈ stands for the part of y that is not predictable from x.

 In function estimation we are interested in approximating f


with a model or estimate fˆ

 Function estimation is really just the same as estimating a parameter θ; the


function estimator fˆis simply a point estimator in function space.
Bias

 The bias of an estimator is defined as:

bias(θˆm ) = E(θˆm ) − θ

Since bias(ˆθ) = 0, we say that our estimator ˆθ is unbiased.


Variance and Standard Error

 The variance of an estimator is simply the variance

Var(θˆ) (5.45)

 Where the random variable is the training set. Alternately, the square root
of the variance is called the standard error, denoted SE( ˆθ).

 The variance or the standard error of an estimator provides a measure of how


we would expect the estimate we compute from data to vary as we
independently resample the dataset from the underlying data generating
process.

 When we compute any statistic using a finite number of samples, our estimate
of the true underlying parameter is uncertain, in the sense that we could have
obtained other samples from the same distribution and their statistics would
have been different.

 The expected degree of variation in any estimator is a source of error that we


want to quantify.
Trading off Bias and Variance to Minimize Mean Squared Error

 Bias and variance measure two different sources of error in an estimator.

 Bias measures the expected deviation from the true value of the function or
parameter.

 Variance on the other hand, provides a measure of the deviation from the
expected estimator value that any particular sampling of the data is likely
to cause.
 The most common way to negotiate this trade-off is to use cross-validation.

 Empirically, cross-validation is highly successful on many real-world tasks.

Alternatively, we can also compare the mean squared error (MSE) of the estimates:

Consistency

 So far we have discussed the properties of various estimators for a training


set of fixed size.

 Concerned with the behavior of an estimator as the amount of training data


grows.

 The number of data points m in our dataset increases, our point estimates
converge to the true value of the corresponding parameters.

 we would like that

You might also like