0% found this document useful (0 votes)

67 views10 pages

Lecture Notes Week 2

1) The document discusses statistical models and their use in answering research questions involving uncertainty. It provides examples of coin flipping, milk sales, and measuring celestial distances to illustrate statistical models. 2) A statistical model represents possible probability distributions for observed data based on prior information. It allows estimating unknown probabilities and parameters to help answer research questions. 3) Common statistical models include the Bernoulli model for binary data, the binomial model for counts of successes, and the normal model for measurements with additive errors. Parameter estimation and hypothesis testing are used to make inferences from the models.

Uploaded by

tarik Benseddik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views10 pages

Lecture Notes Week 2

Uploaded by

tarik Benseddik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

2 Statistical models

Subjects Sections
Statistical model
Population 5.1.1
Histograms
QQ-plot
Location-scale family 3.5
Exponential family 3.4

The statistical sciences are concerned with answering questions or making decisions in
the face of uncertainty. Examples of such questions are
What is the probability that a destructive tornado hits the US next year?

Is a new medical procedure better than the older one?

How sure are we about the predictions of a political election?

The statistics approach starts by collecting information or data (x1 , . . . , xn ) that can
help us answer the research question. These could for example be previous times between
tornados, the number of healed people in a test group and voting choices for a subgroup of
the population. The hopes are that one can then use this data to understand the dynamics
of the underlying process. Unfortunately, one quickly realises that the underlying dynamics
producing the data are often too complex to fully describe. Tornado behaviour depends
on global weather conditions, medical effectiveness depends on living style and election
choices depend on social structures in our society. Therefore, instead of trying to fully
understand the underlying process, we assume that the observed data is a realization of
a stochastic vector (X1 , . . . , Xn ) with unknown pdf f . The goal then becomes to try to
determine f , because if we have it we can use it to answer the research question. Again,
unfortunately f is unknown, but we can use our (limited) knowledge and the data to define
a set of possible pdf’s that f could belong to.
Definition 2.1. A statistical model for (X1 , . . . , Xn ) is a collection of probability distri-
bution functions M = {f (x | θ) | θ ∈ Θ}, where Θ is a set and θ is an indexing parameter.
A statistical model represents all probability distributions that we a priori deem pos-
sible for (X1 , . . . , Xn ). If we have very little information about the underlying process,
then the statistical model has to be larger. In fact, a model can be so large that Θ is
infinite dimensional, for example we could include all pdf’s that are unimodal, i.e. contain
a single peak. If we have more information about the underlying process, then we can use
that to narrow down the statistical model.
Definition 2.2. A statistical model is called parametric if there exists a k ∈ N such that
Θ ⊆ Rk .
In this course we will focus on parametric models, as they can already accurately
describe a large portion of the world around us. Another thing we will typically assume
is that our data is independent and identically distributed. This simplifies our situation
significantly as our pdfs f (x | θ) are multivariate if we have more than one observation,
i.e. n > 1.

1
Definition 2.3 (5.1.1). If X1 , . . . , Xn are iid with unknown pdf g, then we call X1 , . . . , Xn
a random sample from the population g.
If the underlying data generating process is iid, then the pdf splits:
n
Y
f (x1 , . . . , xn ) = g(xi ).
i=1

To construct a statistical model it therefore suffices to specify a collection of univariate

distributions N = {g(x | θ) | θ ∈ Θ} as it defines the model
( n )
Y
M= g(xi | θ) | θ ∈ Θ .
i=1

From now on, we will always write f ’s to denote multivariate pdf’s and g’s to denote
univariate ones. Moreover, we will directly call N a model and abuse notation by writing
distribution names instead of pdf’s, i.e.
{Bernoulli(p) | p ∈ [0, 1]} = {g(x | p) | p ∈ [0, 1]} = {px (1 − p)1−x | p ∈ [0, 1]}.

{Exponential(λ) | λ > 0} = {g(x | λ) | λ > 0} = {λe−λx | λ > 0}.

2.1 Examples of statistical models

Let’s discuss some examples of simple research questions, the accompanying data and the
associated statistical model.
Example 2.4 (Coin wager). I have a coin and offer to bet you a thousand euros on
whether the next flip ends up heads. Research question: should you take the bet? To
decide wether you should accept you might be interested whether the coin is fair, more
specifically how likely is it that a heads turns up on the flip. To obtain an indication, I
allow you to throw the coin one hundred times to obtain data (x1 , . . . , xn ). Now, even
in such a simple setting, fully describing the underlying dynamics behind the coin flips is
impossible. The outcome of a coin toss depends on countless small factors like air pressure,
wind direction, the strength that I will use to flip the coin, the time that I decide to catch
the coin and more. Therefore we assume that the data is a realization of a stochastic
vector (X1 , . . . , Xn ) with unknown pdf f . To define a statistical model we assume that
the coin flips are independent, so that it suffices to find a univariate pdf g. A single coin
flip has only two possible outcomes, zero and one, but we have no information how likely
each outcome is, so for our model we take all possible distributions on these two points.
Verify that the model defined equals the set {Bernoulli(p) | p ∈ [0, 1]}.
Our research question now translates to whether p0 = P(X1 = 1) < 0.5. Suppose
we have observed 99 heads, then most likely p0 > 0.5, but we cannot be sure since any
0 < p < 1 can produce the observed data. At what point should we be convinced that
p0 is indeed smaller than one half. At 49 observed heads? Maybe we want to be more
conservative, 40 heads? Formalising this procedure is called hypothesis testing, which is
one of the two main subjects that we will discuss in this course.
Example 2.5 (Milk sales). Suppose you own a store that sells milk. Every morning a
truck comes in bringing fresh dairy, which is put in the store freezer and sold during the
day to costumers. Storing milk is expansive, because the freezer uses a lot of energy, so

2
you don’t want to order too much in the morning. On the other hand, you don’t want
to run out of milk too early in the day, because this upsets your costumers. How much
milk should you buy every morning? There are many different possible ways to formulate
the research question. We don’t want to disappoint our costumers, but we also don’t
want to have too much excess supply. One possible way to frame the research question
is to ask ”What is the minimal amount of milk I should buy such that no costumer
finds an empty store with 99% certainty”? To answer this question we write down the
number of daily costumers for three months to obtain data (x1 , . . . , xn ), which we assume
comes from a stochastic vector (X1 , . . . , Xn ) with unknown pdf f . In this example it is
much more unreasonable to assume that the data generating process is iid. Surely people
buy more milk in the weekend than on Monday, also the amount of milk bought today
probably depends on the amount of milk bought yesterday. Nevertheless, we assume the
data has been adjusted for these effects and continue with our iid presumption. Now,
what could be a possible set of distributions for the number of sales on a single day?
To approximate costumer entry behaviour we assume that there are a large amount of
different potential costumers who live in an area around the store, where each one has
an independent but equally small probability to enter the store on a given day. We don’t
know the number of potential costumers, or their likeliness to come to the store, therefore
we include all possible remaining distributions. Verify that the model defined equals the
set {Binomial(k, p) | k ∈ N, p ∈ [0, 1]}.
Let m be the number of cartons of milk we buy in the morning. Then the research
question translates to determining the minimal m such that P (X1 > m) ≤ 0.01. We can
only calculate this probability if the true k0 and p0 are known. Estimating their values by
using the observed numbers of costumers in the last n days is called parameter estimation,
which is the second main subject of this course.
Example 2.6 (Celestial distance). Research question: A physicist wants to find the
distance µ0 between two celestial bodies. Therefore he measures this distance n times,
yielding varying results (x1 , . . . , xn ) due to equipment inaccuracy. If the measurements
are performed in a consistent manner, then its reasonable to assume that the data is an
iid realisation of a random sample X = (X1 , . . . , Xn ) with population g. To define a
statistical model we examine the unobserved measurement errors ei = Xi − µ0 , which
are also random variables. An error can often be interpreted as the total sum of many
small independent errors. It follows by the central limit theorem that the errors are
then approximately normally distributed and thus the Xi are also approximately normally
distributed. An appropriate statistical model could therefore be {N(µ, σ 2 ) | µ ≥ 0, σ 2 > 0}.
The mathematician and physicist Carl Friedrich Gauss discovered the normal distribution
exactly by trying to gain insights into this research question.
An intuitive way to estimate µ0 would be to take the average of the n measurements.
A common assumption is that errors have expectation zero, that is E(ei ) = 0. In that case
we obtain by the law of large numbers that
n n n
1X 1X 1X
Xi = µ0 + e i = µ0 + ei ≈ µ0 + E(e1 ) = µ0 .
n n n
i=1 i=1 i=1
We will show later on in the course that averaging is the best way, according to some
criteria, to estimate µ0 if the Xi are truly normally distributed. However, suppose that
this is not the case and instead that the Xi are Cauchy distributed. Then their first
1 Pn
moment does not exist, thus the law of large numbers does not apply and hence n i=1 ei
does not converge to zero. The estimate in this case is likely to be terrible.

3
2.2 Model validation
Throughout this course we will assume that our statistical models are correct, which means
that we assume that there is a unique (unknown) θ0 ∈ Θ such that X1 ∼ gθ0 . We have
seen in the previous example, however, that assuming a Gaussian model incorrectly can
lead to mistakes. Many times we have multiple potential statistical models, none of which
are completely undisputed. In cases like these it is necessary to validate the chosen model.
This section discusses methods that give us insight into whether our chosen model is correct
or not. We assume that (x1 , . . . , xn ) is a realisation of a random vector (X1 , . . . , Xn ) of
iid random variables with pdf g and cdf G.

2.2.1 Histograms
A simple technique to get a first impression from the density g is to plot a histogram of
the data x. Let a0 < a1 < . . . < am be an even partition of the range of the xi , that is
aj − aj−1 = c is constant for 1 ≤ j ≤ m. For any y ∈ R, the histogram function hn is
defined as
m Xn m n
!
1{aj−1 <y≤aj } 1{aj−1 <xi ≤aj } = 1{aj−1 <y≤aj } 1{aj−1 <xi ≤aj } .
X X X
hn (y) =
j=1 i=1 j=1 i=1

That is, the histogram function counts the number of observations on each interval defined
by the partition. It can be very useful to plot both a histogram and a given density in one
figure to compare them against one another. In that case we have to rescale the histogram,
since a pdf integrates to one, while hn integrates to c × n. Therefore we define
m n
1 XX
h̃n (y) = 1{aj−1 <y≤aj } 1{aj−1 <xi ≤aj }
cn
j=1 i=1

If n and m are large, then the histogram can give a good approximation of the density g.
To motivate this, take a y ∈ (aj−1 , aj ]. Then, the histogram function is approximated by
n Z aj
1 X (i) 1 1 (ii)
h̃n (y) = 1{aj−1 <xi ≤aj } ≈ P (aj−1 < X1 ≤ aj ) = g(x)dx ≈ g(y),
cn c c aj−1
i=1

where the approximation in (i) follows from the law of large numbers, while approximation
(ii) holds true if g does not vary too much on (aj−1 , aj ]. Note that variability of g on a given
interval goes down as the width of the interval decreases, which happens as m increases.
Histograms can thus give an impression of g. Unfortunately, to make the impression
good, we need a lot of data and the right choice of c, the width of the intervals. Too many
intervals, and the histogram will contain too many peeks, which makes it hard to notice
characteristics of g. Too few intervals results in a total loss of details and therefore there
is little we can say about g. Hence, we usually cannot expect more from a histogram than
a first impression. Figure 1 and Figure 2 show two simulated histograms compared to
their shared true pdf, which is a Normal(185, 36), for one hundred observations. Notice
how deceptive the second histogram can be if the true density is unknown.

4
0.07

0.06

0.05

0.04

0.03

0.02

0.01

0
165 170 175 180 185 190 195 200 205
Lengths

Figure 1

0.07

0.06

0.05

0.04

0.03

0.02

0.01

0
165 170 175 180 185 190 195 200
Lenghts

Figure 2

2.2.2 QQ-plots
Suppose that you suspect that the random sample X1 , . . . , Xn has population pdf h and
cdf H. QQ-plots are a popular way to quickly check whether these suspicions might be
true, i.e. whether g = h and G = H. The idea is based on the quantiles associated with
a distribution. Let Y be a random variable with distribution g. Then by symmetry we
have that

P (Y ≤ X(1) ) = P (X(1) < Y ≤ X(2) ) = · · ·

1
= P (X(n−1) < Y ≤ X(n) ) = P (Y > X(n) ) = .
n+1

5
It follows that the order statistics can be used as an approximation for the quantiles as
for each 1 ≤ k ≤ n we have
k k
P Y ≤ X(k) = ⇒ G(x(k) ) = P Y ≤ x(k) ≈
n+1 n+1
k
⇒ x(k) ≈ G−1
n+1

A QQ-plot, or quantile-quantile-plot, is a scatter plot of the points x(k) , H −1 n+1
k
.
If indeed G = H, then these points should all approximately lie on the y = x line of the
graph. If this is not the case, then we have an immediate visual aid that tells us that H
is not a good approximation.

2.3 The location-scale family

With statistical models being defined as a set containing probability distributions, a lot of
research has been conducted on the properties of various special collections of distributions.
One intuitive, but very flexible, collection of distributions is the location-scale family.
Essentially, a location-scale family is created by taking any pdf and allowing for its graph
to shift along the x-axis, as well as contract or expand while retaining its basic shape (and
of course while still integrating to 1). A formal definition is given below.
Definition 2.7 (3.5.5). Let g(x) be any pdf. Then,

1 x−µ
(2.1) g(x|µ, σ) = g | µ ∈ R, σ > 0 ,
σ σ
is called the location-scale family f .
Perhaps without realizing it, you have all been introduced already to at least one
location-scale family, namely the family of distributions given by Normal(µ, σ 2 )| µ ∈ R, σ > 0 .
To convince yourself that this indeed forms a location-scale family, take the standard nor-
mal with pdf given
1 2
g(x) = √ e−x /2 .
2π
and apply the transformation from (2.1).
The location-scale family introduces a very simple connection between the cumulative
distribution functions as well.
Lemma 2.8. Let g(x|µ, σ) be a member of the location-scale family g. Then, the cdf of
g(x|µ, σ) satisfies G(x|µ, σ) = G x−µ
σ , where G is the cdf of g.
Proof. Tutorial exercise.

Lemma 2.9. Let Y be a random variable with cdf H, let µ ∈ R and σ > 0 and define
Yµ,σ = µ + σY . Then Yµ,σ has cdf Hµ,σ .
Proof. This follows immediately from calculating

y−µ y−µ
P (Yµ,σ ≤ y) = P (µ + σY ≤ y) = P Y ≤ =H .
σ σ

6
Example 2.10. Suppose that Y ∼ N (0, 1). Then we know that µ + σY ∼ N (µ, σ 2 ) and
thus the location-scale family of N (0, 1) is the set of all normal distributions.

Importantly, QQ-plots can be used to check whether the data generating process is a
member of a certain location-scale family. Suppose that the data is a sample drawn from
some distribution g(x|µ, σ) that is a member of the location-scale family h with CDF H.
Then, it follows that
x(k) − µ

k (∗) −1 k µ 1
≈ G(x(k) |µ, σ) = H ⇒H ≈ − + x(k) ,
n+1 σ n+1 σ σ

where (∗) follows from Lemma 2.8. Hence, even though the data is
a sample drawn from
g(x|µ, σ) and not h(x), when plotting the points (x(k) , H −1 k
n+1 , they should roughly
follow a straight line with intercept −µ/σ and slope 1/σ. In this case we can conclude
that the location-scale family of h is a good statistical model.
We now have a simple graphical aid to check if the set of normal distributions is a
good statistical model for our data. If the QQ-plot of our data with the standard normal
is approximately a straight line, then that is an indication that the model is correct. We
show the QQ-plot for simulated data from the Normal(185, 36) distribution compared
to the N(0, 1) distribution in Figure 3. In Figure 4 we compare simulated data from a
students t distribution with three degrees of freedom compared to the N(0, 1) distribution.

7
QQ Plot of Sample Data versus Standard Normal
205

200

195

Quantiles of Input Sample 190

185

180

175

170

165
-3 -2 -1 0 1 2 3
Standard Normal Quantiles

Figure 3

QQ Plot of Sample Data versus Standard Normal

10
Quantiles of Input Sample

-10

-20

-30
-4 -3 -2 -1 0 1 2 3 4
Standard Normal Quantiles

Figure 4

2.4 The exponential family

Another important family of distributions in statistics is the exponential family.

Definition 2.11 (3.4.1). A family of pdfs or pmfs is called an exponential family if it

can be expressed as
k
!
X
(2.2) g(x|θ) = h(x)c(θ)exp wi (θ)ti (x) ,
i=1

where h(x), c(θ) ≥ 0, t1 (x), . . . , tk (x) are real valued functions of x that do not depend on
θ, and w1 (θ), . . . , wk (θ) are real-valued functions of the parameter(s) θ.

8
The exponential family contains many famous probability distributions, including most
of the distributions that you studied in probability theory.
Example 2.12 (3.4.1). Let X ∼ Binomial(n, p) with pdf given by

n x
g(x|n, p) = p (1 − p)n−x , 0 < p < 1.
x

Then, g(x|n, p) is a member of the exponential family, which becomes clear upon rewriting
x
n x n−x n n p
g(x|n, p) = p (1 − p) = (1 − p)
x x 1−p

n p
= (1 − p)n exp log x .
x 1−p

p
such that h(x) = nx , c(θ) = (1 − p)n , w1 (θ) = log 1−p

and t1 (x) = x.

Example 2.13 (3.4.4). Let X ∼ Normal(µ, σ 2 ) with pdf given by

(x − µ)2

2 1
g(x|µ, σ ) = √ exp − , µ ∈ R, and σ 2 ∈ R+ .
2πσ 2σ 2

Then, g(x|n, p) is a member of the exponential family (exercise).

One of the nice statistical properties of the exponential family is that their exist short-
cuts to derive the moments of its member distributions. However, the property that is
exploited most throughout this course, is the fact that the h(x) component can be ignored
when estimating θ. Indeed, all the information relevant to the parameter that can be
extracted from the data turns out to be contained in the ti (x) functions. This often allows
for substantial data reduction without loss of information, which is the topic for next week.
When evaluating whether a specific distribution is a member of the exponential family,
it is good practice to include the support explicitly into the expression of the distribution.
For example, we know that the Exponential(λ) distribution has pdf

g(x|λ) = λe−λx , 0 < x < ∞.

However, as (2.2) does not allow for separate inclusions of information related to the
support, i.e. the “0 < x < ∞” part, it is best to include this directly into the pdf with
the use of the indicator function:

g(x|λ) = λe−λx 1(0,∞) (x),

where (
1 x∈A
1A (x) = .
0 x∈
/A
Whenever the support of the distribution does not depend on the parameter, the indicator
function related to the support will simply get absorbed into the h(x) function. However,
if the support does depend on the parameter, the indicator function will not be parameter
free. Since we cannot split the indicator function into a function h(x) that depends only
on the data and a function c(θ) that depends only on the parameter, such distributions
will in general not be members of an exponentional family.

9
Example 2.14. Let X ∼ Binomial(k, p), with both k and p unknown. Then the pdf of X
is given by
k x
f (x | k, p) = p (1 − p)k−x 1{0,1,...,k} (x).
p
Since the indicator function cannot be split into an h(x) and c(θ) function, nor can it be
represented by an exponential function, this is not a member of the exponential family.

STAT 231 Course Notes W16 Print
No ratings yet
STAT 231 Course Notes W16 Print
424 pages
Ivchenko Medvedev Chistyakov Problems in Mathematical Statistics
No ratings yet
Ivchenko Medvedev Chistyakov Problems in Mathematical Statistics
282 pages
David Williams - Weighing The Odds A Course in Probability and Statistics
100% (1)
David Williams - Weighing The Odds A Course in Probability and Statistics
567 pages
MIR - Ivchenko G. I., Medvedev Yu. and Chistyakov A. - Problems in Mathematical Statistics - 1991
100% (4)
MIR - Ivchenko G. I., Medvedev Yu. and Chistyakov A. - Problems in Mathematical Statistics - 1991
282 pages
Core Statistics PDF
100% (4)
Core Statistics PDF
256 pages
Stat 231 Course Notes
100% (1)
Stat 231 Course Notes
326 pages
STAT 231 Course Notes Winter
100% (1)
STAT 231 Course Notes Winter
358 pages
MITx - 18.6501x - FUNDAMENTALS OF STATISTICS
No ratings yet
MITx - 18.6501x - FUNDAMENTALS OF STATISTICS
10 pages
Fundamentals of Statistics (18.6501x)
No ratings yet
Fundamentals of Statistics (18.6501x)
20 pages
STAT 552 Probability and Statistics Ii: Short Review of S551
No ratings yet
STAT 552 Probability and Statistics Ii: Short Review of S551
51 pages
Statistics and Probability
No ratings yet
Statistics and Probability
35 pages
AS Level Mathematics Statistics (New)
No ratings yet
AS Level Mathematics Statistics (New)
49 pages
Chapter 7 Estimation
No ratings yet
Chapter 7 Estimation
20 pages
Estima
No ratings yet
Estima
378 pages
Probability
No ratings yet
Probability
36 pages
Statistical Analysis For Ib Biology
No ratings yet
Statistical Analysis For Ib Biology
26 pages
S1B 16 All Lectures
No ratings yet
S1B 16 All Lectures
221 pages
Probability and Stochastic Process Chap 1 Theory by Nguyễn Thị Thu Thủy (Sami-Hust)
No ratings yet
Probability and Stochastic Process Chap 1 Theory by Nguyễn Thị Thu Thủy (Sami-Hust)
239 pages
LectureNotes22 WI4455
No ratings yet
LectureNotes22 WI4455
154 pages
Lecture Notes
No ratings yet
Lecture Notes
138 pages
S 2
No ratings yet
S 2
247 pages
CompleteLectureNotes STAT 261
No ratings yet
CompleteLectureNotes STAT 261
158 pages
CH 15
No ratings yet
CH 15
170 pages
Statistical Modeling Notes
No ratings yet
Statistical Modeling Notes
25 pages
Lecture Notes Statistics II PDF
No ratings yet
Lecture Notes Statistics II PDF
139 pages
MAT 211 Introduction To Business Statistics I Lecture Notes
No ratings yet
MAT 211 Introduction To Business Statistics I Lecture Notes
69 pages
Ikaj Stochmod Lectnotes
No ratings yet
Ikaj Stochmod Lectnotes
114 pages
Week 7
No ratings yet
Week 7
113 pages
Lecture Two 2025
No ratings yet
Lecture Two 2025
106 pages
Statistics Formulas: Parameters
No ratings yet
Statistics Formulas: Parameters
3 pages
An Introduction To Objective Bayesian Statistics PDF
No ratings yet
An Introduction To Objective Bayesian Statistics PDF
69 pages
NAT REVIEW in Statistics and Probability
No ratings yet
NAT REVIEW in Statistics and Probability
59 pages
Principles of Statistics
No ratings yet
Principles of Statistics
113 pages
Principles of Statistical Inference
100% (10)
Principles of Statistical Inference
236 pages
Si Notes
No ratings yet
Si Notes
58 pages
Fundamentals of Statistics I - Lecture Notes
No ratings yet
Fundamentals of Statistics I - Lecture Notes
77 pages
AE2015 Lecture Notes Ch4
No ratings yet
AE2015 Lecture Notes Ch4
51 pages
Stat Cookbook
No ratings yet
Stat Cookbook
31 pages
09 Inference - Slides Web
No ratings yet
09 Inference - Slides Web
39 pages
Unit-I Probability
No ratings yet
Unit-I Probability
38 pages
A Probability and Statistics Cheatsheet
No ratings yet
A Probability and Statistics Cheatsheet
28 pages
Chapter 11 PDF
No ratings yet
Chapter 11 PDF
25 pages
MSD Discrete Count Models 2
No ratings yet
MSD Discrete Count Models 2
42 pages
Probability and Statistics: Cookbook
No ratings yet
Probability and Statistics: Cookbook
31 pages
CS109/Stat121/AC209/E-109 Data Science: Statistical Models
No ratings yet
CS109/Stat121/AC209/E-109 Data Science: Statistical Models
26 pages
Fall 2022 Statistics 2 M
No ratings yet
Fall 2022 Statistics 2 M
31 pages
An Introduction To Statistical Inference - 3
No ratings yet
An Introduction To Statistical Inference - 3
30 pages
Chapter 2
No ratings yet
Chapter 2
23 pages
7 Inference L8 Unlocked
No ratings yet
7 Inference L8 Unlocked
29 pages
Probability and Statistics: Cookbook
No ratings yet
Probability and Statistics: Cookbook
31 pages
Activity 3 General
No ratings yet
Activity 3 General
21 pages
Pertemuan 7z
No ratings yet
Pertemuan 7z
31 pages
Probability Distributions
No ratings yet
Probability Distributions
17 pages
Final Lab Report-20bci7108
No ratings yet
Final Lab Report-20bci7108
22 pages
U3-L4 - Sampling Distributions
No ratings yet
U3-L4 - Sampling Distributions
25 pages
Statistics BI: Models of Random Outcomes. What Is A Model?
No ratings yet
Statistics BI: Models of Random Outcomes. What Is A Model?
22 pages
Statistics Lecture Notes
No ratings yet
Statistics Lecture Notes
15 pages
Data8 sp22 Midterm Solution
No ratings yet
Data8 sp22 Midterm Solution
16 pages
9 Different Statistical Techniques
No ratings yet
9 Different Statistical Techniques
16 pages
Random Variables Aren't Random: Paul W. Vos February 9, 2025
No ratings yet
Random Variables Aren't Random: Paul W. Vos February 9, 2025
17 pages
Ch1 Prob II NAU Spring23
No ratings yet
Ch1 Prob II NAU Spring23
17 pages
Week 017 Measures of Central Tendency
No ratings yet
Week 017 Measures of Central Tendency
15 pages
Statistical Inference
No ratings yet
Statistical Inference
14 pages
Further Statistics Chapter 5
No ratings yet
Further Statistics Chapter 5
13 pages
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
No ratings yet
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
12 pages
Mathematical Statistics (MA212M) : Lecture Slides
No ratings yet
Mathematical Statistics (MA212M) : Lecture Slides
16 pages
Probability Note Ex 13.1 - Hsslive
No ratings yet
Probability Note Ex 13.1 - Hsslive
11 pages
Essential Statistics For The Behavioral Sciences 1st Edition Privitera Solutions Manual
100% (38)
Essential Statistics For The Behavioral Sciences 1st Edition Privitera Solutions Manual
7 pages
Mathematical Statistics
No ratings yet
Mathematical Statistics
7 pages
Statistical Model
No ratings yet
Statistical Model
5 pages
10 - Exercise On One Way ANOVA
No ratings yet
10 - Exercise On One Way ANOVA
4 pages
Lift (Data Mining)
No ratings yet
Lift (Data Mining)
3 pages
Spearmans Rho Table
No ratings yet
Spearmans Rho Table
4 pages
08 212020082 Nalitalia Ramjani
No ratings yet
08 212020082 Nalitalia Ramjani
4 pages
View Result 1
No ratings yet
View Result 1
2 pages
Business Statistics 1st Model Test 2 Chapter
No ratings yet
Business Statistics 1st Model Test 2 Chapter
2 pages
MUAC 2.7 NACS Users Guide Apr2016
No ratings yet
MUAC 2.7 NACS Users Guide Apr2016
2 pages
Statistical Inference (BW-SP20)
No ratings yet
Statistical Inference (BW-SP20)
2 pages
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Analytic Inequalities
From Everand
Analytic Inequalities
Nicholas D. Kazarinoff
5/5 (1)
Sampling in Statistics
From Everand
Sampling in Statistics
Stephanie Glen
No ratings yet
Integration, Measure and Probability
From Everand
Integration, Measure and Probability
H. R. Pitt
No ratings yet
Markov Models Supervised and Unsupervised Machine Learning: Mastering Data Science And Python
From Everand
Markov Models Supervised and Unsupervised Machine Learning: Mastering Data Science And Python
William Sullivan
2/5 (1)
Neural Modeling Fields: Fundamentals and Applications
From Everand
Neural Modeling Fields: Fundamentals and Applications
Fouad Sabry
No ratings yet
Structured Decision Making
From Everand
Structured Decision Making
Andreas Michael Theodorou
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet

Lecture Notes Week 2

Uploaded by

Lecture Notes Week 2

Uploaded by

2 Statistical models

 Is a new medical procedure better than the older one?

 How sure are we about the predictions of a political election?

To construct a statistical model it therefore suffices to specify a collection of univariate

 {Exponential(λ) | λ > 0} = {g(x | λ) | λ > 0} = {λe−λx | λ > 0}.

2.1 Examples of statistical models

P (Y ≤ X(1) ) = P (X(1) < Y ≤ X(2) ) = · · ·

2.3 The location-scale family

Quantiles of Input Sample 190

QQ Plot of Sample Data versus Standard Normal

2.4 The exponential family

Definition 2.11 (3.4.1). A family of pdfs or pmfs is called an exponential family if it

Example 2.13 (3.4.4). Let X ∼ Normal(µ, σ 2 ) with pdf given by

Then, g(x|n, p) is a member of the exponential family (exercise).

g(x|λ) = λe−λx , 0 < x < ∞.

g(x|λ) = λe−λx 1(0,∞) (x),

You might also like

Is a new medical procedure better than the older one?

How sure are we about the predictions of a political election?

{Exponential(λ) | λ > 0} = {g(x | λ) | λ > 0} = {λe−λx | λ > 0}.