0% found this document useful (0 votes)
8 views39 pages

Lecture 1

This document provides an overview of univariate probability concepts, including random variables, probability distributions, expected value, variance, and the difference between populations and samples. Key points covered include defining random variables and how they represent outcomes, describing probability distributions and how they characterize probabilities for all possible outcomes, defining expected value and how it is calculated, and distinguishing between properties of populations versus taking samples from populations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views39 pages

Lecture 1

This document provides an overview of univariate probability concepts, including random variables, probability distributions, expected value, variance, and the difference between populations and samples. Key points covered include defining random variables and how they represent outcomes, describing probability distributions and how they characterize probabilities for all possible outcomes, defining expected value and how it is calculated, and distinguishing between properties of populations versus taking samples from populations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Review of Probability and Statistics

Juergen Meinecke

1 / 102
Roadmap

Univariate Probability
Random Variables, Probability Distributions

2 / 102
Definition
The mutually exclusive potential results of a random process are
called outcomes.

Definition
The set of all possible outcomes is called sample space.

Definition
An event is a subset of the sample space.

3 / 102
Example:
random process rolling a die

• outcomes: e.g., rolling ‘five dots’


• sample space: {one dot, two dots, …, six dots}
• event: e.g., {three dots, five dots}

4 / 102
Example:
random process
number of kangaroos spotted during my morning run

(in my local nature reserve)

• outcomes: e.g., five kangaroos


• sample space: {one kangaroo, two kangaroos, …, fifty kangaroos}
(this one’s tricky, what’s the upper limit?)
• example of an event: more than six kangaroos

5 / 102
Definition
A random variable 𝑌 is the numerical representation of an
outcome in a random process.

Rolling a die example

• the outcome ‘one dot’ is represented by the number 1


• the outcome ‘two dots’ is represented by the number 2
and so forth

Note: outcomes can be represented by any number


For instance, the outcome ‘one dot’ could also be represented by the
number 247
I picked the obvious and sensible candidates

6 / 102
Random variables save us a lot of notation

Consider the event


not less than four but fewer than ten kangaroos
(sounds clumsy, doesn’t it?)
Using random variables, this can be concisely summarized
mathematically as
4 ≤ 𝑌 < 10

7 / 102
Definition
The probability distribution of a random variable 𝑌 is the full
characterization of probabilities for all possible outcomes of a
random process.

(this applies to discrete random variables; the definition for


continuous random variables would be slightly different)
Example

• age of EMET2007 students


• suppose ages vary between 18 and 26
(just to keep things simple; sorry if you are older!)

8 / 102
Example: probability distribution of age

⎧0.05
{ if 𝑦 = 18
{
{0.14
{ if 𝑦 = 19
{0.24 if 𝑦 = 20
{
{
{0.23 if 𝑦 = 21
{
{
Pr(𝑌 = 𝑦) = ⎨0.14 if 𝑦 = 22
{
{0.15
{ if 𝑦 = 23
{
{0.02
{ if 𝑦 = 24
{0.02 if 𝑦 = 25
{
{
{0.01 if 𝑦 = 26

Note: little 𝑦 is called the realization of the random variable,


it’s merely a placeholder for a number between 18 and 26

9 / 102
Example: cumulative probability distribution of heights

⎧0.05
{ if 𝑦 = 18 ⎧0.05
{ if 𝑦 = 18
{ {
{0.14
{ if 𝑦 = 19 {0.19
{ if 𝑦 = 19
{0.24 if 𝑦 = 20 {0.43 if 𝑦 = 20
{ {
{ {
{0.23 if 𝑦 = 21 {0.66 if 𝑦 = 21
{ {
{ {
Pr(𝑌 = 𝑦) = ⎨0.14 if 𝑦 = 22 Pr(𝑌 ≤ 𝑦) = ⎨0.80 if 𝑦 = 22
{ {
{0.15
{ if 𝑦 = 23 {0.95
{ if 𝑦 = 23
{ {
{0.02
{ if 𝑦 = 24 {0.97
{ if 𝑦 = 24
{0.02 if 𝑦 = 25 {0.99 if 𝑦 = 25
{ {
{ {
{0.01 if 𝑦 = 26 {1.00 if 𝑦 = 26
⎩ ⎩

10 / 102
Frequency plot (histogram)

11 / 102
Review of Probability and Statistics

Juergen Meinecke

12 / 102
Roadmap

Univariate Probability

Expected Value, Standard Deviation, and Variance

13 / 102
Definition
Suppose the random variable 𝑌 takes on 𝑘 possible values
𝑦1 , … , 𝑦𝑘 . The expected value is given by
𝑘
𝐸[𝑌] ∶= ∑ 𝑦𝑗 ⋅ Pr(𝑌 = 𝑦𝑗 ) (1)
𝑗=1

Occasionally we also call this the population mean or simply the


mean or the expectation.
Often times, the expected value is also denoted 𝜇𝑌 .

14 / 102
Example: age distribution
Recall
⎧0.05 if 𝑦 = 18
{ ⎧0.15
{
{ { if 𝑦 = 23
{
{ 0.14 if 𝑦 = 19 {
{
{ {0.02 if 𝑦 = 24
Pr(𝑌 = 𝑦) = ⎨0.24 if 𝑦 = 20 Pr(𝑌 = 𝑦) = ⎨
{
{ {0.02
{ if 𝑦 = 25
{
{ 0.23 if 𝑦 = 21 {
{ {0.01 if 𝑦 = 26
{0.14 ⎩
⎩ if 𝑦 = 22

We have 𝑦1 = 18, 𝑦2 = 19, … , 𝑦9 = 26, therefore


9
𝐸[𝑌] = ∑ 𝑦𝑗 ⋅ Pr(𝑌 = 𝑦𝑗 )
𝑗=1

= 18 ⋅ 0.05 + 19 ⋅ 0.14 + ⋯ + 26 ⋅ 0.01


= 20.96

15 / 102
Properties of the expected value

1. Let 𝑐 be a constant, then 𝐸[𝑐] = 𝑐


2. Let 𝑐 be a constant and 𝑌 be a random variable, then
E[𝑐 + 𝑌] = 𝑐 + E[𝑌]
E[𝑐 ⋅ 𝑌] = 𝑐 ⋅ E[𝑌]

It follows that for two constants 𝑐 and 𝑑,


𝐸[𝑐 + 𝑑 ⋅ 𝑌] = 𝑐 + 𝑑 ⋅ E[𝑌]

3. Let 𝑋 and 𝑌 be random variables, then


E[𝑋 + 𝑌] = E[𝑋] + E[𝑌]
E[𝑋 − 𝑌] = E[𝑋] − E[𝑌]

(Can you prove all of these?)

16 / 102
Definition
The 𝑟𝑡ℎ moment of a random variable 𝑌 is given by
𝑚𝑟 (𝑌) ∶= 𝐸[𝑌 𝑟 ], for 𝑟 = 1, 2, 3, …

It is obivous that the first moment and the expected value are the
same

17 / 102
Definition
The population variance is defined by
𝑘
Var [𝑌] ∶= ∑(𝑦𝑗 − 𝜇𝑦 )2 ⋅ Pr(𝑌 = 𝑦𝑗 )
𝑗=1

Often times, the variance is denoted by 𝜎𝑌2 .

Definition
The population standard deviation is defined by

StD[𝑌] ∶= √Var [𝑌]

It follows immediately that the population standard deviation is


simply 𝜎𝑌 .

18 / 102
Example: age distribution
We have 𝑦1 = 18, 𝑦2 = 19, … , 𝑦9 = 26
Doing the math
9
Var [𝑌] = ∑(𝑦𝑗 − 𝜇𝑦 )2 ⋅ Pr(𝑌 = 𝑦𝑗 )
𝑗=1

=(18 − 20.96)2 ⋅ 0.05 + (19 − 20.96)2 ⋅ 0.14 + ⋯


(26 − 20.96)2 ⋅ 0.01
=2.74

Therefore
StD[𝑌] = 1.66

19 / 102
Properties of the variance

1. Let 𝑐 be a constant, then Var [𝑐] = 0


2. Let 𝑐 be a constant and 𝑌 be a random variable, then
Var [𝑐 + 𝑌] = Var [𝑌]
Var [𝑐 ⋅ 𝑌] = 𝑐2 ⋅ Var [𝑌]

3. Let 𝑋 and 𝑌 be random variables, then


Var [𝑋 + 𝑌] = Var [𝑋] + Var [𝑌] + 2 ⋅ Cov(𝑋, 𝑌)
Var [𝑋 − 𝑌] = Var [𝑋] + Var [𝑌] − 2 ⋅ Cov(𝑋, 𝑌)

(Can you prove all of these?)


We haven’t yet defined what we mean by ‘Cov(𝑋, 𝑌)’,
we’ll do this later when we discuss bivariate probability

20 / 102
Review of Probability and Statistics

Juergen Meinecke

21 / 102
Roadmap

Univariate Probability

Population versus Sample

22 / 102
Definition
A population is a well defined group of subjects.

The population contains all the information on the underlying


probability distribution
Subjects don’t need to be people only

Examples

• Australian citizens
• kangaroos in Tidbinbilla
• leukocytes in the bloodstream
• protons in an atom
• lactobacilli in yogurt

23 / 102
Definition
The population size 𝑁 is the number of subjects in the population.

We typically think that 𝑁 is ‘very large’

In fact, it is so large that observing the entire population becomes


impossible
Mathematically, we think that 𝑁 = ∞, even though in many
applications this is clearly not the case
Setting 𝑁 = ∞ merely symbolizes that we are not able to observe
the entire population

24 / 102
Example: population of Australian citizens
Clearly, 𝑁 = 26, 310, 784
For all practical purposes it is so large that it might as well
have been 𝑁 = ∞
Example: kangaroos in Tidbinbilla

I have no idea how many kangaroos live in Tidbinbilla


(therefore, I do not know the actual population size)
I could ask the park ranger, but suppose she also doesn’t know
We treat the population size as unimaginable: 𝑁 = ∞

25 / 102
The point is:
for some reason we are not able to observe the entire population
(too difficult, too big, too costly)
Instead, we only have a random sample of the population

26 / 102
Definition
In a random sample, 𝑛 subjects are selected
(without replacement) at random from the population.

Each subject of the population is equally likely to be included in


the random sample.

Typically, 𝑛 is much smaller than 𝑁


Most important, 𝑛 < 𝑁 ≤ ∞

27 / 102
The random variable for the 𝑖-th randomly drawn subject is denoted
𝑌𝑖

Definition
Because each subject is equally likely to be drawn and the
distribution is the same for all 𝑖, the random variables 𝑌1 , … , 𝑌𝑛
are independently and identically distributed (i.i.d.)
with mean 𝜇𝑌 and variance 𝜎𝑌2 .
We write 𝑌𝑖 ∼ i.i.d.(𝜇𝑌 , 𝜎𝑌2 ).

Given a random sample, we observe the 𝑛 realizations 𝑦1 , … , 𝑦𝑛 of


the i.i.d. random variables 𝑌1 , … , 𝑌𝑛
What do we do with a random sample of i.i.d. data?

28 / 102
Review of Probability and Statistics

Juergen Meinecke

29 / 102
Roadmap

Univariate Probability

Sample Average

30 / 102
In analogy to the mean of a population,
we define the mean of a subset of the population:
Definition
The sample average is the average outcome in the sample:
1 𝑛
𝑌̄ ∶= ∑ 𝑌𝑖
𝑛 𝑖=1

Sometimes we call the sample average also the sample mean.

It should be obvious that this is a sensible definition

31 / 102
Let’s say we are interested in learning about the weights of
kangaroos in Tidbinbilla
We drive to Tidbinbilla and somehow randomly collect 30 roos and
measure their weights
This will give us a random sample of size 30 of kangaroo weights

It’s easy to calculate the average weight of these 30 roos


Suppose we obtain a sample average of 52kg

32 / 102
There is a huge difference between the population mean and the
sample mean
There is only one population, therefore there is only one population
mean
But there are many different random subsets (samples) of the
population, each of which results in a (potentially) different sample
average

Let’s say we drive to Tidbinbilla for a second time, again randomly


collect 30 roos and measure their weights
Should we expect to obtain a sample average of 52kg?

33 / 102
It is unlikely that the second time around we collect exactly the same
30 roos (while it is possible, it is not probable)
If we collect a different subset of 30 kangaroos, chances are that we
come up with a different sample average
Suppose we obtain a sample average of 49kg
And now we collect a third random sample …

…and obtain a sample average of 55kg


And so forth …

34 / 102
This illustrates that the sample average itself is a random variable!
Random variables have statistical distributions
What distribution does the sample average have?

• what is its expected value?


• what is its variance?
• what is its standard deviation?
• what is its shape?

35 / 102
Let 𝑌𝑖 ∼ i.i.d.(𝜇𝑌 , 𝜎𝑌2 ) for all 𝑖

We don’t know exactly which distribution generates the 𝑌𝑖 , but at


least we know its expected value and its variance
(turns out this is all we need to know!)

Each random variable 𝑌𝑖 has

• population mean 𝜇𝑌
• variance 𝜎𝑌2

36 / 102
Expected value
𝑛
1
E[𝑌]̄ = E ⎡
⎢ ∑ 𝑌𝑖 ⎥

𝑛
⎣ 𝑖=1 ⎦
𝑛
1
= E⎡ ⎤
⎢∑ 𝑌𝑖 ⎥
𝑛 ⎣𝑖=1 ⎦
1 𝑛
= ∑ E[𝑌𝑖 ]
𝑛 𝑖=1
1 𝑛
= ∑𝜇
𝑛 𝑖=1 𝑌
1
= 𝑛𝜇
𝑛 𝑌
= 𝜇𝑌

(all of this follows by the properties of expected values)

37 / 102
Variance
1 𝑛
Var [𝑌]̄ = Var ⎡
⎢ ∑ 𝑌𝑖 ⎥

⎣ 𝑛 𝑖=1 ⎦
𝑛
1
= 2 Var ⎡ ⎤
⎢∑ 𝑌𝑖 ⎥
𝑛 ⎣𝑖=1 ⎦
1 𝑛
= ∑ Var [𝑌𝑖 ]
𝑛2 𝑖=1
1 𝑛 2
= ∑𝜎
𝑛2 𝑖=1 𝑌
1
= 𝑛𝜎 2
𝑛2 𝑌
= 𝜎𝑌2 /𝑛

(all of this follows by the properties of variances,


and realizing that Cov(𝑌𝑖 , 𝑌𝑗 ) = 0 for 𝑖 ≠ 𝑗 (why?))

38 / 102
Standard deviation
StD(𝑌)̄ = 𝜎𝑌 /√𝑛

(that’s an easy one, given that we know the variance)

39 / 102

You might also like