0% found this document useful (0 votes)

44 views10 pages

Stats, Mle, and Other Stuff: 1 Sevssd

1. The standard error (SE) quantifies how spread out measurements are from the mean for a sample statistic, while the standard deviation (SD) does this for a population parameter. The SE depends on sample size, with larger samples having SEs closer to the population value. 2. Maximum likelihood estimation (MLE) can be used to estimate sampling distributions and calculate SEs when the exact distribution is unknown. MLE finds the parameter values that make a statistical model best fit the data. 3. In MLE, a likelihood function measures how likely the data are under the statistical model. The model parameters are adjusted to maximize the likelihood function, yielding the maximum likelihood estimators. This process allows estimating sampling distributions

Uploaded by

JONATHAN LIN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views10 pages

Stats, Mle, and Other Stuff: 1 Sevssd

Uploaded by

JONATHAN LIN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

stats, mle, and other stuff

April 25, 2020

1 SE vs SD
For a collection of measurements of a single parameter, the SD tells us how
spread out our measurements are from the mean measurement value.

The SE is a little more involved. It is formally defined as the SD of the ”sam-

pling distribution” of a parameter. For instance, suppose we want to estimate
the mean height of all Americans; the mean height is our parameter of inter-
est. To estimate it, we could take a smaller random sample of the American
population and compute the mean height for that sample. If I repeated that
process many times, I would get many mean heights for all my random samples.
I could plot up all those values (say in a histogram); that would be the sampling
distribution of the mean height.1

Notice that the SE of the mean will depend on how large our random samples
are. Bigger random samples means that each sample mean will, on average, be
closer to the true population mean, simply because larger samples means more
data. The exact effect is quantified by the so-called ”central limit theorem.”

Finally we should recall that the SE can be computed for things other than
the mean. Any statistic with a sampling distribution (so, probably every statis-
tic worth thinking about) has an SE. For instance, suppose we took some sample
data, did a line fit, and recorded the line slope. Repeat that process to get the
sampling distribution of slopes, whose SD is the SE of the slope.

2 computing SE: background

To calculate the SE of a statistic you need to know its sampling distribution.
You could estimate the sampling distribution by taking many random samples
and computing the desired statistic over and over again for each sample, but
that’s often impractical and/or impossible. If we knew the exact distribution
of our population we might also be able to compute the SE, but often times
1 Well, technically an estimate to the sampling distribution of the mean height, since we

presumably only take a finite number of random samples.

1
we don’t know that either. Instead, what we can use is a technique known as
maximum likelihood estimation (MLE). In MLE, we assume some functional
form for our data, then find the values of the parameters of our function (thing
like slope or y-intercept) that make it the most consistent with our data. In the
process, we can estimate the sampling distributions of our parameters.

3 machinery of MLE
MLE is used when you have some data and a model with adjustable parameters.
You want to adjust the parameters so that the model and the data match the
best.

In practice, this involves defining a ”likelihood” function. The likelihood mea-

sures the probability of obtaining your dataset, assuming your model function
was actually true; more on this later. Note that the likelihood will be a function
of your model parameters; the entire goal is to find the model parameter values
which maximize the likelihood (hence the name MLE). These model parameter
values will be our best fit values, also caleld the ”maximum likelihood estima-
tors” for our model parameters.

Now for some math. Suppose we run an experiment and get some data sets
[xi ] = [x1 , x2 , ..., xN ] and [yi ] = [y1 , y2 , ..., yN ]. We plot up our data and decide
we want to fit our data to some model function f where

y = f (x, c1 , c2 , c3 ...) (1)

The ci ’s specify that our model function f can depend on an arbitrary number
of extra parameters. As examples, suppose our model function was the constant
function. Then our model function only needs one extra parameter, c1 :

f (x, c1 ) = c1 (2)

For a linear model, we might have

f (x, c1 , c2 ) = c1 x + c2 (3)

For quadratic,
f (x, c1 , c2 , c3 ) = c1 x2 + c2 x + c3 (4)
and so on. Models also don’t need to be polynomial.

With our model function in hand, we can calculate the probability of obtain-
ing our data assuming that our model is true. By this, we mean that we as-
sume our yi data would be perfectly explained by our model, if there was no
noise/uncertainty in our experimental method. With noise added, the yi ’s are
displaced slightly off of the values predicted by our model. See fig 1.

2
no noise with noise

y y

model function model function

x x

Figure 1: Left: without any noise or measurement uncertainty, our mea-

sured data xi and yi (black dots) would be perfectly explained by our model
f (x, c1 , c2 ...) (dashed curve). Right: with measurement uncertainty, the fit is no
longer perfect. The data on the right would be representative of actual measured
data.

Mathematically, we are saying that if there is no noise then there exists some
c1 , c2 , ... such that the equality

yi = f (xi , c1 , c2 , c3 ) (5)

holds for all our data. In the presence of noise the above no longer holds and
we add in a fudge factor to make the equality work

yi = f (xi , c1 , c2 , c3 ) + i (6)

where characterizes the noisiness/uncertainty in our measurements.

Now we must assume a profile for the noise in our experiment. The most basic
noise profile is a Gaussian (this turns out to often be a good approximation even
when the noise isn’t Gaussian) with some fixed variance σ 2 . This means that
the probability of obtaining some noise value is
2

1
p() = √ exp − 2 (7)
2πσ 2 2σ
With (6) and (7) we can define the probability of obtaining a pair of data points
(xi , yi ). Note that
i = yi − f (xi , c1 , c2 ...) (8)
so
(yi − f (xi , c1 , c2 ...))2

1
p(xi , yi ) = p(i ) = √ exp − (9)
2πσ 2 2σ 2

3
Equation (9) gives us the probability of obtaining only one data point pair. We
want the probability of obtaining our entire data set. Assuming all our data
point pairs are independent, the probability of obtaining all our pairs is the
product of all the individual probabilities for obtaining each pair. Thus
N
Y
p([xi ], [yi ]) = p(xi , yi ) (10)
i=0

The Π is like a sum, except we are multiplying all the elements in the sequence
instead of summing. Equation (10) is the likelihood. Note that to actually
compute (10) we need to assume values of c1 , c2 , ... (since they appear in 9).
Thus, it is customary to write the likelihood instead as

p([xi ], [yi ]|c1 , c2 , ...) (11)

Things on the right of the bar are assumed to be known before the probabilty
is calculated.

The likelihood can be further simplified using exponentiation rules.

N
(yi − f (xi , c1 , c2 ...))2

Y 1
p([xi ], [yi ]|c1 , c2 , ...) = √ exp −
i=0 2πσ 2 2σ 2
" N
#
2 −N/2 1 X (yi − f (xi , c1 , c2 ...))2 (12)
= (2πσ ) exp −
2 i=0 σ2

1
= (2πσ 2 )−N/2 exp − χ2
2

where we define the quanity χ2 as

N
X (yi − f (xi , c1 , c2 ...))2
χ2 ≡ (13)
i=0
σ2

Recall that the end goal is to find the model parameters c1 , c2 , ... maximize the
likelihood (12). This, in general, is hard. Often times it is easier to maximize
the log-likelihood L where
N 1
L ≡ log p([xi ], [yi ]|c1 , c2 , ...) = − log(2πσ 2 ) − χ2 (14)
2 2
This is usually done since the likelihood is often an extremely small number,
which makes it hard for computers to work with. Additionally the log-likelihood
will be maximal where the likelihood will be maximal, so results are the same
either way. In this case, maximizing the likelihood reduces to minimizing χ2 , so
this method is identical to the ordinary least squares method. (However MLE
is far more powerful in that you can use non-standard noise profiles and more
complicated non-linear models, unlike OLS).

4
4 maximizing likelihood
Even when considering the log-likelihood, maximization is a hard problem. In
calculus, maximization is usually carried out by computing derivatives and find-
ing where they go to 0. In this case, to maximize L(c1 , c2 , ...), we would need
to compute the partial derivatives
∂L ∂L ∂L
, , , ... (15)
∂c1 ∂c2 ∂c3
and find the the c1 , c2 , c3 , ... that make all the partial derivatives go to 0. (A
partial derivative with respect to variable c1 is like a regular derivative except
that we assume c2 , c3 , ... to all be constant; similarly for c2 , c3 , ...). The max-
imizing values of c1 , c2 , ... I denote as ĉ1 , ĉ2 ...; again, these are also called the
”maximum likelihood estimators” to c1 , c2 , ...

The above process is sometimes impossible to do with pen & paper, especially
for very complex/highly dimensional likelihood functions. So more often numer-
ical maximization methods are used to approximate where the maxima occur.
Commonnly used numerical methods include gradient-ascent methods and sim-
plex methods.

However, for simple models, exact solutions are possible using the derivative
method. I show two such solutions in later sections.

5 but what about standard error??

If we recall, the standard error of a model parameter is the SD of its sam-
pling distribution. We do not know the sampling distribution, but we might
estimate it with the so-called ”posterior functions”, which give the probability
distributions for our model parameters. Posteriors are denoted

p(c1 |[xi ], [yi ]), p(c2 |[xi ], [yi ]), p(c3 |[xi ], [yi ]), ... (16)

Note that our dataset appears to the right of the bar because the values of our
model parameters should be constrained to fit our observed data. The SD’s of
the above probability distributions estimate the SE’s of the model parameters,
which we will denote σc1 , σc2 , σc3 , ...

How do we get the above? Well, going back, the likelihood function p([xi ], [yi ]|c1 , c2 , ...)
gives us the the probabilty of obtaining our data set given some values for our
model parameters. We can invert this probability using Bayes’ rule to obtain
the posterior. In most cases,2

p(c1 , c2 , ...|[xi ], [yi ]) ∝ p([xi ], [yi ]|c1 , c2 , ...) (17)

2 By most cases I mean when we have uniform ”priors”, i.e., we are doing this fit without

any knowledge/expectation on what the fit should be.

5
The posterior, on the left, is the probability that our model with parameters
c1 , c2 , ... is true, given our data. The ∝ symbol denotes proportionality, meaning
the left hand side and right hand side are equal sans some constant.

Finally, to isolate the probability of a single model parameter, we must integrate

over all other model parameters:
Z
p(c1 |[xi ], [yi ]) = p(c1 , c2 , ...|[xi ], [yi ])dc2 , dc3 , ... (18)

This process is known as ”marginalization.” Often times doing the above integral
is hard. Fortunately, there are other ways to get the posteriors. In particular,
we can approximate our likelihood function with another function that we know
how to marginalize, without even doing an integral.

The crux of this approach involves Taylor expanding the log-likelihood about
the ML estimators ĉ1 , ĉ2 , ... to 2nd order.3 A multivariable 2nd order Taylor
expansion is rather messy, and involves something called the ”Hessian” matrix,
which I will not go into. However, only a subset of the terms in the expansion
(namely those involving the diagonal of the Hessian) matter for our purposes.
So I only write those out explicitly. The expansion of L is
N
1 X ∂2L
L(c1 , c2 , ...) ≈ C0 + (ci − ĉi )2 + ... (19)
2 i=1 ∂c2i

where C0 is a constant combining all the 0th order terms and the ... combines
all the cross-terms (terms with factors like ∂ 2 L/∂c21 × ∂ 2 L/∂c22 ).

Now recall that the log likelihood is defined as

L = log p([xi ], [yi ]|c1 , c2 , ...) (20)

Substitute our 2nd order expansion for L and exponentiate to find

" N #
1 X ∂2L 2
p([xi ], [yi ]|c1 , c2 , ...) ∝ p(c1 , c2 , ...|[xi ], [yi ]) ∝ exp (ci − ĉi ) + ...
2 i=1 ∂c2i
(21)
Eqn (21) is a multivariate Gaussian, which has the nice property that they are
easy to marginalize. Namely, the following (which I can provide a proof for),
holds: 2
1∂ L 2
p(c1 |[xi ], [yi ]) ∝ exp (c 1 − ĉ1 ) × φ(c2 , c3 , ...) (22)
2 ∂c21
where φ is some function that doesn’t depend on c1 . Similar expressions can
be written for the rest of the model parameters. We can immediately iden-
tify the variances of the marginalized Gaussians (and hence of the probability
3 Why not to first order? Answer: the first order terms will be 0.

6
distributions of our model parameters) as
−1
∂2L

2
σc1 =−
∂c21
2 −1
2 ∂ L (23)
σc2 =−
∂c22
...

Thus, we have arrived at how to estimate the SDs of the posterior distributions
(16). Note that (23) is evaluate at c1 = ĉ1 , c2 = ĉ2 , ... since our Taylor expansion
was centered on those values.

As a side note, a more general approach to (23) involves the ”Fisher informa-
tion.” Intuitively, we are estimating the ”spread” of the probabilty distributions
for c1 , c2 , ... with their second derivative, which contains information on the
curvature of the distribution. The Fisher information builds on this concept.

6 SE of the mean (no MLE)

The point of this section is to calculate the SE of the mean without MLE, so I
can later show that MLE gives the same results.

In this simple case the extra machinery of MLE is not strictly necessary. Let’s
say we have a population of x values with some true mean µ and variance σ 2 .
Now we sample our population to get N measured x values [x1 ,...,xN ]. The
mean of this sample x̄ is
N
1 X
x̄ = xi (24)
N i=1
We ultimately want the SD of x̄; this is the SE of the mean. To do so we
calculate the variance of x̄, using the identity

Var(ax) = a2 Var(x) (25)

7
where a is a constant. (2) just comes from the fact that computing variance
involves squaring things. Anyways, this gives
N
1 X
Var(x̄) = Var xi
N i=1
N
1 X
= Var (xi )
N2 i=1
N
1 X (26)
= Var(xi )
N 2 i=1
N
1 X 2
= σ
N 2 i=1
σ2
=
N
So the SE is r
σ2 σ
SE = =√ (27)
N N
Note that in practice we do not know the true population SD, σ, so the general
approach is to estimate it with the standard deviation of our sample.4

7 SE of the mean (MLE)

In this case we sample some data [xi ] and fit to a constant model. Write the
model as
x = c1 (28)
We assume our experiment has some characteristic noise with variance σ 2 . The
log likelihood as per (14) is
N 2
N 1 X x i − c1
L = − log(2πσ 2 ) − (29)
2 2 i=1 σ

Differentiate once to find

N
∂L 1 X
= 2 (xi − c1 ) (30)
∂c1 σ i=1

Set the above to 0 and solve for c1 to find the maximimizing value of c1 :
N
1 X
ĉ1 = xi = x̄ (31)
N i=1
4 If you are astute you’ll notice there’s some trickery which involves switching the order

of the variance calculation and the sum. This is only allowable if our measurements xi are
completely independent of one another.

8
The ML estimator is just the arithmetic mean! Differentiate L twice with respect
to c1 to find
N
∂2L 1 X N
= (−1) = − 2 (32)
∂c21 σ 2 i=1 σ
By the relations in (23), the SE for the parameter c1 is
σ
σc1 = √ (33)
N
just as we derived in section 5.

8 MLE for a linear fit

In this more complex case we want to compute the SE’s of line fit parameters
(primarily slope). Our model is of the form

f (x, c1 , c2 ) = c1 x + c2 (34)

where c1 is the slope and c2 is the y intercept. Our χ2 takes the form
N
X (yi − c1 xi − c2 )2
χ2 (c1 , c2 ) ≡ (35)
i=0
σ2

Our log-likelihood L is
N 1
L = log p([xi ], [yi ]|c1 , c2 ) = − log(2πσ 2 ) − χ2 (c1 , c2 ) (36)
2 2
We need to take the first derivatives of L like before and find the values of
c1 , c2 that make both first derivatives go to 0. This involves solving a system of
equations. One of the equations, for example, is
∂L 1 ∂χ2
0= =−
∂c1 2 ∂c1
N
1 X ∂
=− 2 × (yi − c1 xi − c2 )2
2σ i=0
∂c 1

N (37)
1 X
=− 2
× 2(yi − c1 xi − c1 ) × −xi
2σ i=1
N
X
0= xi (yi − c1 xi − c2 )
i=0

The two equations can be solved. If you were to carry out the maximization
you would find P P P
N (xi yi ) − xi yi
ĉ1 = (38)
N x2i − ( xi )2
P P

9
P P
yi − ĉ1 xi
ĉ2 = (39)
N
I have verified that (37) can be derived from the above procedure. Now we
want to know the SE’s. I will calculate just the SE for the slope. Differentiate
L twice with respect to c1 and find

∂2L 1 X 2
2 =− 2 (xi ) (40)
∂c1 σ

So the SE of the slope by (23) is

σ
σc1 = pP 2 (41)
(xi )

which matches with formulas you can find online.

7th English Guide Term 1
No ratings yet
7th English Guide Term 1
84 pages
Statistics Cheatsheet
100% (1)
Statistics Cheatsheet
2 pages
Jade M Kit
No ratings yet
Jade M Kit
1 page
METTL - Logical Building 1 - 2 and 3 Links
100% (1)
METTL - Logical Building 1 - 2 and 3 Links
2 pages
STAT 713 Mathematical Statistics Ii: Lecture Notes
No ratings yet
STAT 713 Mathematical Statistics Ii: Lecture Notes
152 pages
LTE Radio Access Network Protocols and Procedures
0% (1)
LTE Radio Access Network Protocols and Procedures
151 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
HL-740 (TM) 7-5
No ratings yet
HL-740 (TM) 7-5
17 pages
Fundamentals of Statistics (18.6501x)
No ratings yet
Fundamentals of Statistics (18.6501x)
20 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
Week (Multivariable Functions)
100% (1)
Week (Multivariable Functions)
19 pages
Aces Review Center: Ree Online Review Refresher Esas 7B by Engr. Jimmy L. Ocampo 0920 - 644 - 6246
No ratings yet
Aces Review Center: Ree Online Review Refresher Esas 7B by Engr. Jimmy L. Ocampo 0920 - 644 - 6246
5 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
StockEdge Combined
No ratings yet
StockEdge Combined
807 pages
DS ML Probability Statistics Interview
No ratings yet
DS ML Probability Statistics Interview
6 pages
Class VII Exam Paper-1
100% (1)
Class VII Exam Paper-1
3 pages
Basic Concepts of Inference: Corresponds To Chapter 6 of Tamhane and Dunlop
No ratings yet
Basic Concepts of Inference: Corresponds To Chapter 6 of Tamhane and Dunlop
40 pages
NIBDocument NIB16
No ratings yet
NIBDocument NIB16
92 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
Lectura 1 Point Estimation
No ratings yet
Lectura 1 Point Estimation
47 pages
Lecture Notes Statistics II PDF
No ratings yet
Lecture Notes Statistics II PDF
139 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
114 pages
Intro To Essential Stats With Python
No ratings yet
Intro To Essential Stats With Python
51 pages
Maximum Likelihood Estimation by K.Kashin
No ratings yet
Maximum Likelihood Estimation by K.Kashin
34 pages
Statistics
No ratings yet
Statistics
53 pages
Asymptotic Theory and Parametric Inference
No ratings yet
Asymptotic Theory and Parametric Inference
32 pages
Inf 2
No ratings yet
Inf 2
37 pages
Lec11 Introduction2BayesianStatistics
No ratings yet
Lec11 Introduction2BayesianStatistics
48 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
Advanced Statistical Inference
No ratings yet
Advanced Statistical Inference
7 pages
Project Report
No ratings yet
Project Report
56 pages
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
No ratings yet
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
12 pages
Principles of Statistics
No ratings yet
Principles of Statistics
113 pages
Mstat Note12 Parametric Inference FSP
No ratings yet
Mstat Note12 Parametric Inference FSP
45 pages
MiniWave Manual
No ratings yet
MiniWave Manual
16 pages
Notes 12
No ratings yet
Notes 12
23 pages
s131 Reviewer 002
No ratings yet
s131 Reviewer 002
14 pages
Notes For Lectures 1 To 10 - 2024
No ratings yet
Notes For Lectures 1 To 10 - 2024
39 pages
Sta255 Week 11-2 Pre
No ratings yet
Sta255 Week 11-2 Pre
21 pages
Model Fitting
No ratings yet
Model Fitting
19 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
No ratings yet
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
6 pages
NOTES
No ratings yet
NOTES
14 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
BB - Cac Phuong Phap Dieu Khien Tien Tien Nham Nang Cao Chat Luong Va TKNL - 11tr
No ratings yet
BB - Cac Phuong Phap Dieu Khien Tien Tien Nham Nang Cao Chat Luong Va TKNL - 11tr
11 pages
Data and Error Analysis
No ratings yet
Data and Error Analysis
35 pages
1 s2.0 S0306261924004148 Main
No ratings yet
1 s2.0 S0306261924004148 Main
20 pages
Script Tlsfrance
No ratings yet
Script Tlsfrance
13 pages
Stat-Review Xid-8243919 1
No ratings yet
Stat-Review Xid-8243919 1
24 pages
BRAC IT Report
No ratings yet
BRAC IT Report
15 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
11 pages
Mathematical Statistics (MA212M) : Lecture Slides
No ratings yet
Mathematical Statistics (MA212M) : Lecture Slides
16 pages
AllNotes 4
No ratings yet
AllNotes 4
56 pages
11 Mle
No ratings yet
11 Mle
26 pages
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
No ratings yet
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
16 pages
Introduction
No ratings yet
Introduction
11 pages
Lecture 1
No ratings yet
Lecture 1
8 pages
StatisticsToolbox II
No ratings yet
StatisticsToolbox II
16 pages
Unsupervised Learning Clustering Math
No ratings yet
Unsupervised Learning Clustering Math
28 pages
Regression Probabilistic Perspective
No ratings yet
Regression Probabilistic Perspective
20 pages
Lecture Note-2
No ratings yet
Lecture Note-2
7 pages
ML Notes
No ratings yet
ML Notes
4 pages
MIT14 30s09 Lec19
No ratings yet
MIT14 30s09 Lec19
7 pages
RCC11 Element Design
No ratings yet
RCC11 Element Design
6 pages
Outline and Equation Sheet For M E 345: Every Additive Term in An Equation Must Have The Same Dimensions
No ratings yet
Outline and Equation Sheet For M E 345: Every Additive Term in An Equation Must Have The Same Dimensions
7 pages
IoT Based Street Light Controlling and M
No ratings yet
IoT Based Street Light Controlling and M
8 pages
ISAAC Info For Online Portfolio: About
No ratings yet
ISAAC Info For Online Portfolio: About
1 page
BIM Project Delivery Waste
No ratings yet
BIM Project Delivery Waste
6 pages
STAT2102 Chapter6
No ratings yet
STAT2102 Chapter6
5 pages
DBMS Short Notes Diploma Compact
No ratings yet
DBMS Short Notes Diploma Compact
8 pages
Bauer New Filling Valves
No ratings yet
Bauer New Filling Valves
4 pages
Section 5
No ratings yet
Section 5
18 pages
Data Analysis For Physics Laboratory: Standard Errors
No ratings yet
Data Analysis For Physics Laboratory: Standard Errors
5 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
Standchen
No ratings yet
Standchen
4 pages
Maximum Likelihood Estimators and Least Squares
No ratings yet
Maximum Likelihood Estimators and Least Squares
5 pages
AT04 - AT05 Series Datasheet V2.1
No ratings yet
AT04 - AT05 Series Datasheet V2.1
3 pages
Chromatic Test Results PDF
No ratings yet
Chromatic Test Results PDF
1 page
Latent 2
No ratings yet
Latent 2
4 pages
Aopsfs PDF
No ratings yet
Aopsfs PDF
1 page
Aopsfs PDF
No ratings yet
Aopsfs PDF
1 page
Alcad Vantex VTX5.5-EN-2206
No ratings yet
Alcad Vantex VTX5.5-EN-2206
2 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Signal
No ratings yet
Signal
3 pages
FB Viral Page
No ratings yet
FB Viral Page
2 pages
41 Assigment 4 Chapter 6-9
No ratings yet
41 Assigment 4 Chapter 6-9
1 page
SIPGA Project List
No ratings yet
SIPGA Project List
1 page
Admitcard31 01 2024
No ratings yet
Admitcard31 01 2024
1 page
Statistical Methods: Multivariate Analysis
No ratings yet
Statistical Methods: Multivariate Analysis
1 page
Caleb M. Lemmons: Research & Development Summer Internship
No ratings yet
Caleb M. Lemmons: Research & Development Summer Internship
2 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet

Stats, Mle, and Other Stuff: 1 Sevssd

Uploaded by

Stats, Mle, and Other Stuff: 1 Sevssd

Uploaded by

stats, mle, and other stuff

April 25, 2020

The SE is a little more involved. It is formally defined as the SD of the ”sam-

2 computing SE: background

presumably only take a finite number of random samples.

In practice, this involves defining a ”likelihood” function. The likelihood mea-

y = f (x, c1 , c2 , c3 ...) (1)

For a linear model, we might have

model function model function

Figure 1: Left: without any noise or measurement uncertainty, our mea-

where  characterizes the noisiness/uncertainty in our measurements.

p([xi ], [yi ]|c1 , c2 , ...) (11)

The likelihood can be further simplified using exponentiation rules.

where we define the quanity χ2 as

5 but what about standard error??

p(c1 , c2 , ...|[xi ], [yi ]) ∝ p([xi ], [yi ]|c1 , c2 , ...) (17)

any knowledge/expectation on what the fit should be.

Finally, to isolate the probability of a single model parameter, we must integrate

Now recall that the log likelihood is defined as

L = log p([xi ], [yi ]|c1 , c2 , ...) (20)

Substitute our 2nd order expansion for L and exponentiate to find

6 SE of the mean (no MLE)

Var(ax) = a2 Var(x) (25)

7 SE of the mean (MLE)

Differentiate once to find

8 MLE for a linear fit

So the SE of the slope by (23) is

which matches with formulas you can find online.

You might also like

where characterizes the noisiness/uncertainty in our measurements.