Chapter 4 Part 1

Preface:
The Road to
Statistical Inference
In this unit, we give a high level overview of the chapter, by introducing its overarching
theme -- statistical inference.
1
In a previous chapter
Type of Research Question Requires

Questions about complete
the population population data
Make an estimate about the population to answer
Test a claim about the population

Complete population data difficult to obtain
Compare two sub-populations

Approximate from a sample of the population
Investigate a relationship
between two variables in the population
Statistical inference
In a previous chapter, we are introduced to various types of research questions. The

common thread among all the types, is that they are all questions pertaining to the
population. Answers to such questions ideally requires complete knowledge about the
population, at least with respect to the relevant characteristic. This is often untenable,
because of time and logistical constraints of the real-world.
It is therefore, useful to be able to give an approximate answer based on just a sample of

the population. The general approach of drawing conclusions about the population from
sample data is known as statistical inference.
2
PPDAC
Cycle
In the greater context of the PPDAC cycle, it should be obvious by now that the first and
last steps of the cycle are connected with the help of statistical and data-related tools,
tools which are very much emphasized in this course. The high concentration of
specialized tools and techniques relevant to the middle three steps, allows these steps to
be recast in more specialized language.
3
Analysis
Plan &
Data
Picture source: courses.lumenlearning.com
The components of this language are shown in this slide.
In the previous chapters, we have gone through means of producing data, which fall
under the steps “plan” and “data” in the PPDAC cycle, as well as methods of exploratory
data analysis, which fall under the step “analysis”.
4
Analysis
The endgoal of the “analysis” step is very often statistical inference. It links the analysis
of sample data with the drawing of conclusions about the population.
5
This
• Confidence intervals Chapter
• Hypothesis tests
But before we can go knee-deep in statistical inference, we have to acquire knowledge

about probability.
Probability and inference will be the main thrust of this chapter, as we build upon the
foundations of probability to arrive at tools required in statistical inference. We will
briefly explore two kinds of tools common in the art of statistical inference:
• confidence intervals, and
• hypothesis tests.
6
Probability:
Setting the Stage
In this unit, we will lay the groundwork for probability by defining some basic terms used
in the subject.
7
Vernacular
Uncertainty
(chance, likelihood)
Mathematical Probability
In the previous chapters, we occasionally and implicitly dealt with, and relied on, the
concept of uncertainty. Whenever we mentioned the word “chance”, or use the phrases
“more likely” or “less likely”, we were appealing to your intuition of things that are not
definite, or do not always hold true. Such terms are common and adequate in day-to-day
communication. However, in order to deal with data at a deeper level, their meanings
need to be made precise, and it is helpful to have a more rigorous framework to ground
uncertainty in.
Enter probability, a mathematical means we can use to reason about uncertainty.
8
Possibilities
×2 HH TT
HT TH
A probability experiment Outcomes of the experiment
A probability experiment must

• Be repeatable (“as many times as you want”)
• Give rise to a precise set of outcomes
Probability experiments form the bedrock of probability
Let’s define the basic terms of probability using an example. Say, you flip a coin twice,
and observe the results of the two flips. There are four possibilities here. Namely, getting
two heads, getting two tails, getting a head followed by a tail, and getting a tail followed
by a head. In this example, the procedure of flipping the coin twice is called a probability
experiment, and the four possible results are the outcomes of the probability experiment.
Note that a probability experiment is defined more narrowly than the experiments of
Chapter 1, because we must be able to both repeat it as many times as we want, and
exactly describe, or list down, all its outcomes.
Just like how a probability experiment gives rise to a set of outcomes, every example of
mathematical objects we are going to define, going forward, will arise as a result of some
probability experiment. In this sense, probability experiments form the bedrock of the
study of probability.
9
possibilities
collection of all outcomes of

Sample space :
a probability experiment
Context
subcollection of the sample
Event :
space
Probability of an event of a sample

space
HH TT
 How likely the outcome of the
HT TH probability experiment is an
element of the event
Note: we regard outcomes as events

“two-in-a-row”
We will use the term sample space to denote the collection of all possible outcomes of a
probability experiment. We will also use the word event to denote a subcollection of the
sample space. In the probability experiment of flipping a coin twice, we have the
following sample space of four elements. An example of an event of this sample space is
getting either two heads or two tails. Colloquially, we may call this event a “two-in-a-
row”.
It turns out that, a sample space and an event of that sample space is enough to give
context to the mathematical discussion of probability. In other words, mathematically
speaking, we only ever talk about the probability of an event of a sample space.
Intuitively, this probability measures how likely the outcome of the probability
experiment -- which gives rise to the sample space in the first place -- is an element of
the event. It is useful to note that we often, in practice, regard outcomes as events, so
that we can talk about probability of outcomes.
10
Example: Die-rolling
Sample space A possible event
1 2 1 2
3 4 3 4
5 6 5 6
Rolling a six-
sided die
“an even-numbered face”
{1, 2, 3, 4, 5, 6}
= {2, 4, 6}
Another simple example of a probability experiment is the rolling of a six-sided die. It is

easy to see that we can repeatedly roll the die, as many times as we want, and
moreover, we can list down precisely all the outcomes of a roll.
What are the basic properties of this probability experiment? Well, for one, its sample
space consists of six elements, one for each face of the die. The sample space can be
written using set notation as follows. A possible event of this sample space is highlighted
here is green. This event can be described as “the die landing on an even-numbered
face”, and we can check that this description corresponds to the subcollection of the
sample space containing exactly the outcomes ‘2’, ‘4’ and ‘6’.
11
Probabilities
In this unit, we define the class of mathematical objects that is the linchpin of statistics.
12
Sample space Numerical values
between 0 and 1
(inclusive)
Probability experiment
assigned to
Events Probabilities
If E is an event that has been assigned a probability

• P(E), read “the probability of E”, stands for the
probability assigned to E
We have learnt that a probability experiment gives rise to a sample space, as well as
events of the sample space. What is left to do, in modelling our probability experiment, is
to talk about the probabilities themselves. Conventionally, probabilities are numerical
values between 0 and 1 we assign to events. If E is an event that has been assigned a
probability, we use P of E, “the probability of E”, to denote the probability assigned to E.
All that being said, how can we read off probabilities from a probability experiment?
13
Simple Cases: Finite Sample Spaces
For any event E:
1. Repeat the probability experiment a large number (N) of times Every event can be
2. For each repetition, check if the outcome is in E assigned a probability
2 3 N
...
1
Yes No Yes • Proportion of E

Yes estimates the true
P(E)
• Estimate gets more

accurate as N
Proportion of E = → P(E) increases
This is not too difficult, if our probability experiment has a finite sample space, in which
case, every event can be assigned a probability. We do so in the following manner:
choose any event E, repeat the probability experiment a large number of times, say N
times, and for each repetition, check if the outcome is in E.
For example, in the N repetitions, we might see that the first outcome is in E, so you
mark it with a ‘Yes”. The second outcome might not be in E, so we mark it with a ‘No’.
The third repetition might turn out to be a ‘Yes’ again, and so on and so forth.
After you have marked all N repetitions with either ‘Yes’ or ‘No’, you count the number
of ‘Yes’ and divide it by N, to get the proportion of E for this particular set of repetitions.
We can assign this proportion to E as its probability. Of course, it is reasonable to believe
that the proportion of E will be different if we repeat the probability experiment another
N or more times.
The point is, all these proportions are estimates of the true probability of E, and such
estimates get more accurate as N gets larger.
14
Simple Cases: Finite Sample Spaces
An Example
Event E = “an even-numbered face”
1. Repeat the probability experiment a large number (N) of times Every event can be
2. For each repetition, check if the outcome is in E assigned a probability
2 3 N
...
1
Yes No No
Yes
Rolling a six-
Proportion of E = → P(E) sided die
Let’s walk through this procedure of assigning probabilities with an example.
Consider the probability experiment of rolling a six-sided die. To assign a probability to

the event E of the die landing on an even-numbered face, we roll the die N times. Our
first roll might land on 2, which is an outcome in E, since 2 is an even number. Our
second roll might be a 5, which is not in E, since 5 is not an even number. Our third roll
might land on 1, so it is not in E. And so on, and so forth. At the end of the N-th roll, we
compute the proportion of E and assign it as the probability of E.
15
Rules of Probabilities
For every assignment of probabilities
1. 0 ≤ 𝑃 𝐸 ≤ 1 for each event E
2. P(S) = 1 if S is the entire sample space
3. If E and F are non-overlapping (mutually exclusive) events, then 𝑃 𝐸 ∪ 𝐹 = 𝑃 𝐸 + 𝑃(𝐹)
For finite sample spaces, it is enough to assign probabilities to outcomes so that they add up to 1
It is virtually impossible to confirm the true probability of an event of a probability

experiment, so the probabilities we encounter in real-life are typically just estimates. In
the analysis of data, we often want to treat estimates of a probability as if it is the true
probability. An abstract treatment of probability is useful then, as it makes this
delineation between estimates and true probabilities irrelevant.
In one such abstraction, we only require the assignment of probabilities, whether

estimated or actual, to obey the following rules:
1. The probability of each event E is between 0 and 1.

2. The probability of S is 1, if S is the entire sample space.
3. If E and F are mutually exclusive events, then the probability of E union F is equal to
the sum of the probabilities of E and F.
When the sample space is finite, we only need to assign probabilities to outcomes so
that these probabilities sum to 1. The probabilities of all other events can then be
derived from there.
16
For finite sample spaces, it is enough to assign probabilities to outcomes so that they add up to 1.
P(1) = 0.1
P(2) = 0.1
P(3) = 0.1 add up to
P(4) = 0.1 1
P(5) = 0.1
P(6) = 0.5 Deriving probabilities of
other events
• Repeated application of the 3rd rule of probability

• Probability of each event is the sum of
probabilities of its outcomes Rolling a
biased
six-sided die
For example, in the rolling of a biased six-side die, after assigning these probabilities to
all 6 outcomes, and then checking that they add up to 1, we can derive the probabilities
of other events by invoking the 3rd rule of probability repeatedly, so that the probability
of each event is the sum of probabilities of its outcomes.
17
For finite sample spaces, it is enough to assign probabilities to outcomes so that they add up to 1.
P(1) = 0.1
P(2) = 0.1
P(3) = 0.1 add up to
P(4) = 0.1 1
P(5) = 0.1
P(6) = 0.5 Deriving probabilities of
other events
E.g. Let E denote the event “an odd-numbered face”.

Let F denote the event “an even-numbered face”.
Rolling a
P(E) P(F) biased
= P(1) + P(3) + P(5) = P(2) + P(4) + P(6) six-sided die
= 0.1 + 0.1 + 0.1 = 0.1 + 0.1 + 0.5
= 0.3 = 0.7
If we have two events E and F where E is the event the die lands on an odd-numbered
face, and F the event the dies lands on an even-numbered face, we can compute the
probabilities of E and F as follows. The probability of E will be the sum of the
probabilities of the dies landing on 1, 3 and 5, which evaluates to 0.3.
On the other hand, the probability of E will be the sum of the probabilities of the dies
landing on 2, 4 and 6, which equals 0.5. For the rest of this chapter, unless specified
otherwise, we will only concern ourselves with samples spaces which are finite.
18
Uniform Probabilities and Rates
Uniform
probability
HH TT
HT TH
Every outcome has the

same probability
= Probability = ¼ = 0.25
A distinguished way of assigning probabilities is the commonly labelled “uniform

probability”. In the case of finite sample spaces, uniform probability assigns equal
probability to every outcome, so that each outcome has the probability one divided by the
size of the sample space.
In the sample space corresponding to flipping a coin twice, there are a total of four
outcomes, so uniform probability over this sample space simply assigns a probability of a
quarter, or 0.25, to each outcome. As you will learn in the subsequent units, uniform
probability in this case, corresponds to two independent flips of a fair coin.
19
Uniform Probabilities and Rates
Uniform
probability
Randomly Unit
• Probability of selecting each unit

Every outcome has the =
same probability
• For any subgroup (event) A, P(A)
= = probability of selected unit
being in A = rate(A)
Uniform probability is also relevant in simple random sampling. When we randomly

select a unit from a sampling frame, we are conducting a probability experiment, and
the sample space of this probability experiment is exactly our sampling frame. Note that
the term “random selection” and “random sampling” are both used as abbreviations for
“simple random sampling with replacement”. As a result, the probability of selecting
each unit from the sampling frame is equal to one divided by the size of the sampling
frame.
Furthermore, if A is any subgroup of the sampling frame, then A is an event of the

sample space, and the probability of A, interpreted as the likelihood of the selected unit
being in A, equals to the rate of A in the sampling frame.
20
Conditional Probabilities
In this unit, we look into the concept of conditional probability, and give a general
method of computing conditional probabilities.
21
E and F are events
Conditional
P(E | F) : “probability of E given F” Probability
How likely the outcome is

in E, if we know it is in F
E
Even number E∩F
P(E ∩ F)
= P(E|F)
Can be divided P(F)
by 3 F
By convention, if P(F) = 0, then P(E | F) = 0 for any E.
The concept of conditional probability deals with probabilities written in the following
notation, which is usually read as the “probability of E given F”. Here, E and F are
events of a particular sample space. Intuitively, the probability of E given F measures
how likely the outcome of the probability experiment – which again, gives rise to the
sample space – is an element of E, if we already know that it is an element of F. To
compute conditional probabilities, we usually invoke the idea of restricting sample
spaces.
So imagine we have a finite sample space in mind, with two events labelled E and F. To
compute the probability of E given F, we restrict our focus to the given event F, which
may contain some overlap with E, denoted “E intersect F”. F acts as a baseline to the
computation, and we can read off its assigned probability. E intersect F is the part of E
we can find in F; it is also an event of the sample space. So we can read off the
probability of E intersect F. Taking quotient of these two probabilities, we arrive at a
value which is justifiably, the probability of E given F.
Of course, an obvious problem occurs when the probability of F is 0, because then we

would have a divide-by-zero error in our computation of the probability of E given F. In
such a corner case, we stipulate, by convention, the probability of E given F to be 0 for
all events E of the same sample space as F.
22
Conditional Probabilities as Rates
A and B are subgroups
Recall:
Does P(A | B) equal rate(A | B)?
Randomly Unit
• Sample space = sampling

frame
• P(A) = rate(A), for any
subgroup (event) A
In the previous unit, we gave a common manifestation of uniform probability, and that
is, the probability experiment of randomly selecting a unit from a fixed sampling frame.
We saw that the sample space of this probability experiment is equal to the sampling
frame, and that for any subgroup A of the sampling frame, A is an event of the sample
space, with probability of A being equal to the rate of A.
Now, we can ask a follow-up question with respect to this probability experiment: does
the probability of A given B equal the rate of A given B, whenever A and B are subgroups
of the sampling frame? Let’s work this out on the board.
23
Conditional Probabilities as Rates
A and B are subgroups
Recall:
Does P(A | B) equal rate(A | B)?
Randomly Unit
( ∩ )
𝑃(𝐴 | 𝐵) =
( )
( ∩ )
Yes!
=
( )
• Sample space = sampling ∩
frame = =
∩
= 𝑟𝑎𝑡𝑒(𝐴 | 𝐵)
• P(A) = rate(A), for any
subgroup (event) A
We start with the probability of A given B, which equals the probability of A intersect B
divided by the probability of B. This is the equation we derived previously using the idea
of restricting the sample space. Since talking about probabilities is the same as talking
about rates, we can substitute probabilities for rates in the expression on the left to get
the rate of A intersect B divided by the rate of B. Unravelling the definitions of rates as
ratios of two sizes, we get the following expression. Cancelling out the common
denominator in the ratio, we have it being equal to the size of A intersect B divided by
the size of B, which is precisely the rate of A given B.
This easy verification leads to an affirmative answer to our question. Indeed, just as
probabilities are equivalent to rates in this probability experiment, so too are conditional
probabilities equivalent to conditional rates.
24
Example: ART for Covid-19
Sensitivity : P(+ | Covid-19)
(True positive rate) = 0.80
Antigen Randomly Person
Rapid Test
Specificity : P(- | no Covid-19)
(True negative rate) = 0.99
Apply ART Test for Check Covid-
Covid-19 19 status
Conditional probability shows up commonly in medical diagnosis. For example, consider

the Antigen Rapid Test, or ART for short, used to test for the presence of Covid-19
infection in humans. Evaluation of the ART usually goes along the lines of sensitivity, or
true positive rate, which stands for the conditional probability of someone being tested
positive given he or she has been infected with Covid-19, and specificity, or true negative
rate, which stands for the conditional probability of someone being tested negative
given he or she has not been infected with Covid-19.
Here, the probabilities are formulated with respect to the probability experiment of
randomly selecting a person from the global population, applying the ART on this person
to test for Covid-19, and then checking his Covid-19 infection status using a surefire way.
An outcome of this probability experiment thus comprises a test result and a Covid-19
infection status. Now that we know how to make sense of the conditional probabilities
that define sensitivity and specificity, let’s go back to fill in the blanks. According to
studies, the ART has a sensitivity of 0.80 and a specificity of 0.99.
25
Rapid Test
Covid-19 19 status
Base rate of
: P(Covid-19) = 0.01
Covid-19
What is P(Covid-19 | +)?
However, these two conditional probabilities do not help the average people because
they have no means of knowing for sure their Covid-19 infection statuses, without
jumping through a lot of hoops. What they have access to instead, are the results of
their ARTs. So what can we say about a person’s Covid-19 infection status given his or
her ART result? For example, what is the probability a person is in fact, infected with
Covid-19, given that he or she tested positive? This is an important question because a
false positive test result can cause a lot of grief and inconvenience to a person.
As it turns out, having just the sensitivity and specificity of the test is insufficient to
answer this question. What we are missing is the base rate, or rate of infection, of Covid-
19, which in this probability experiment, is equal to the probability the person selected
at random is infected with Covid-19. Let us take a conservative stance and assume that
1% of the global population is infected with Covid-19.
26
Rapid Test
Covid-19 19 status
+ - Row total
Covid-19 800 200 1000 Base rate of
No Covid-19 990 98010 99000 : P(Covid-19) = 0.01
Covid-19
Column total 1790 98210 100000
Rate(Covid-19 | +) = = 0.447 (3 s.f.) What is P(Covid-19 | +)?
Because conditional probabilities correspond to conditional rates in probability

experiments involving random sampling of the kind we see here, the contingency tables
we have learned to plot in a previous chapter becomes useful. We start with total
number large enough so that our intermediate calculations would result in whole
numbers. A hundred thousand would suffice. A base rate of 0.01 tells us that 1000 out
of these hundred thousand would be infected with Covid-19. Taking difference, we get
the number of those not infected. A sensitivity of 0.80 tells us that 800 of those infected
would test positive while the remaining would test negative. A specificity of 0.99 tells us
that 98010 of those not infected would test negative, while the remaining would test
positive. Now, taking sum of the columns, we get the number of positive results, and
negative results.
The rate of Covid-19 infection among those tested positive is thus equal to 800 divided
by 1790, or 0.447 when rounded off to 3 significant figures. By the correspondence
between conditional probabilities and conditional rates, this also answers the question
we posed. That is, if one is tested positive for Covid-19 in an ART, there is about 45%
probability that he or she is really infected with Covid-19. Such a low conditional
probability indicates that more rigorous tests need to be applied for confirmation,
following the ART.
27
Independence
In this unit, we define what it means for two events to be independent, and look at the
relation between independence and association.
28
Definition 1
Independence
: P(A) = P(A | B)
of events A, B
𝑃(𝐴 ∩ 𝐵)
P(A) =
𝑃(𝐵)
Definition 2
Independence
: 𝑃 𝐴 × 𝑃(𝐵) = 𝑃(𝐴 ∩ 𝐵)
of events A, B
One way to define independence of two events A and B, is to say that the probability of A
equals the probability of A given B. If we unpack the definition of the probability of A
given B, we get the probability of A intersect B divided by the probability of B, so the
equation can be reformulated as shown. Multiplying both sides of the equation by the
probability of B gives us this new equation. We thus have another, equivalent definition
of what it means for two events to be independent. By our second definition, it is clear
that the order of events does not matter when we talk about independence.
29
Independence as Non-Association
B Not B
Variable 1 Variable 2
A
A / not A B / not B
Not A
Rate(A) = Rate(A | B)
Check if A
Randomly Unit P(A) = P(A | B) No Association
Check if B
Independence
Independence in probability is a very general concept, and it can manifest as something

more familiar in certain specific scenarios. Consider studying a population along two
categorical binary variables, one with categories ‘A’ and ‘not A’, and the other with
categories ‘B’ and ‘not B’ study. In the scenario where you want to check for association
between the two variables, it makes sense to draw up a 2 by 2 contingency table, before
computing and comparing conditional rates, as taught in Chapter 2. By the basic rule of
rates, the rate of A being equal to the rate of A among B precisely indicates the lack of
association between the two variables. Now recall that we can frame rates as
probabilities and conditional rates as conditional probabilities, by recasting the analysis
as an analysis of a probability experiment.
Here, the relevant probability experiment involves randomly selecting of one unit from
the population we want to study, followed checking the values of the selected unit with
regards to variable 1 and variable 2. The equivalence of rates and probabilities
immediately leads us to the conclusion that A and B being independent in this probability
experiment is exactly what it means for the two variables to not be associated.
30
Independent Probability Experiments
Carried out independently
Probability
experiment P P Q Q
Sample
space
S All pairings of an element of S
with an element of T
T
Probabilities Pairings obey “Definition 2” of independent events
From time to time, we hear statements concerning “independent probability

experiments” such as “two independent rolls of the die”, which is standard parlance for
“two copies of the probability experiment of rolling the die, with one copy being
independent of the other.” We have learned what it means for two events in the same
probability experiment to be independent. But how can two probability experiments be
independent, assuming we know the probabilities assigned to the events of each
probability experiment? The main idea here is to view the two probability experiments
as two parts of one combined probability experiment.
We start with two probability experiments, one of which we label P, which gives rise to
the sample space S. The other probability experiment, which we label Q has sample
space T. If these two probability experiments are independent, then we can view them as
two components of a larger experiment that has P coupled with Q. It is easy to see that
this combined probability experiment has sample space all pairings of possible outcomes
of the two components. That the two components are independent, is conveyed by how
probabilities are assigned to outcomes of the combined experiment. In short, we want the
probability of each pairing to obey something analogous to “Definition 2” of independent
events.
31
Independent Probability Experiments
Carried out independently
Probability
experiment ×2
×2
{(1, HH), (1, TT), (1, HT), (1, TH),

Sample {1, 2, 3, … {HH, TT,
space 4, 5, 6} HT, TH}
(6, HH), (6, TT), (6, HT), (6, TH)}
Probabilities P((1, HH)) = P(1) × P(HH) … P((6, TH)) = P(6) × P(TH)
Let us walk through this construction with a more concrete example.
Suppose P is the rolling of a particular six-sided die, which gives rise to the sample space
comprising the six possible faces of the die. Also suppose Q is the tossing of a particular
coin twice, which has the familiar 4-element sample space. Then the sample space of the
combined probability experiment consists of the pairs shown in the slide.
The assignment of probabilities to outcomes then proceeds as follows: the outcome in

which the die lands on ‘1’ and the two coin tosses land on heads, has probability equal to
the probability of the die landing on ‘1’, multiplied by the probability of the two coin
tosses landing on heads, so on and so forth.
32
Random Variables
In this unit, we introduce the notion of random variables.
33
Probability Numerical
Outcome
experiment value
Examples
Randomly Person
1 if infected Check Covid-

Payoff Outcome
0 otherwise 19 status
Consider a probability experiment such that each of its outcomes is given a numerical
value. One such probability experiment is a game of roulette, played in a casino, such
that every outcome is associated with a payoff, which is of course, numerical in nature.
Another example involves randomly selecting a person from a population, checking his
Covid-19 infection status, then giving a 1 or a 0, depending on the status.
34
all values (non int
by counting Random Variable as well)
Discrete Continuous
Numerical
Probabilities +
variable
Discrete Continuous
Variable Variable
Now, abstracting away the motivating probability experiment and its outcomes and
focusing on what is left, we can say that any numerical variable with probabilities
assigned over its possible values, is a random variable.
If the numerical variable is a discrete variable, we call the random variable a discrete
random variable. On the other hand, if the numerical variable is a continuous variable,
we call the random variable a continuous random variable.
35
Random
Variables
• Measures of
model central tendency
• Measures of
dispersion
… can be computed
Data
Distributions
Random variables and data distributions can be thought of two sides of the same coin. In
fact, random variables were conceived as a mathematical way to model data
distributions. As a result, common summary statistics like measures of central tendency,
measures of dispersion, et cetera, can be computed from random variables, just as they
can be computed from data distributions. These summary statistics can be difficult to
compute for many random variables, and we shall not cover them in our course.
36
HDB Household Size
Finite discrete variable
Randomly Household Household size 1 2 3 4 5 6

Probability 0.16 0.226 0.204 0.201 0.119 0.09
Check size
Discrete random variable
A discrete random variable of practical use is usually derived from real-world data and/or
real-world probability experiments. Let’s look at one example. Complementary to this
unit is “Unit_5_household_size.csv”, containing modified Singstats data regarding HDB
households. Here, the size of a household refers to the number of individuals living in the
household.
Consider the probability experiment of randomly selecting a household from all HDB
households. The numerical variable we are interested in is the size of the selected
household. From the data given, equating probability with rate, we can plot a table
detailing the possible household sizes and their respective probabilities, with respect to
this procedure. Obviously, the household size is a finite discrete variable. This particular
table thus represents a discrete random variable.
37
Visualisation of a Discrete Random Variable
Household size 1 2 3 4 5 6
X
Probability 0.16 0.226 0.204 0.201 0.119 0.09
• Each point
represents a possible
value of X, indicated
by its x-value
• y-value of a point =
probability that X
assumes its x-value
Labelling this discrete random variable X, we can visualise it using a plot of points, in
which each point represents a possible value of X equal to its x-value, and the y-value of
each point equals the probability that X assumes its x-value.
38
Household size 1 2 3 4 5 6
X
Probability 0.16 0.226 0.204 0.201 0.119 0.09
Discrete collection
of possible values
of X
Vertical line segments can be added to connect the points in the plot to the x-axis. This
is sometimes done to emphasise the height of each point. Since the points in the plot
are separated by gaps, it is clear that their x-values, which are the possible values X can
assume, are discrete. The points in the plot, as a whole, form a visual representation of X.
39
Highest point
Notes:
• Probabilities of the
points add up to 1
X • x-value of a
highest point is a
mode
Mode
This representation agrees with the rules of probabilities in the sense that the
probabilities of the points add up to 1. Moreover, analogous to what has been taught
about distribution of data, a mode of a discrete random variable is the x-value of a
highest point. In this plot, there is a single highest point, with x-value 2, so 2 is the mode
of this discrete random variable.
40
Probabilities of a Discrete Random Variable
Probability that a
randomly selected HDB
household has size ≥ 5?
I.e.
P(X ≥ 5)
5)?
X
It is common to ask for probabilities of a random variable taking on values within a

particular range. For example, what is the probability that a randomly selected HDB
household has size at least 5? Or equivalently, in the language of random variables, what
is the probability that X assumes a value greater than or equal to 5?
41
Probabilities of a Discrete Random Variable
P(X ≥ 5)
= P(5) + P(6)
X
= 0.119 + 0.09
= 0.209
To compute this probability, we simply add up the probabilities of points with x-values at
least 5, which would give us a value of 0.209.
42
Visualisation of a
Continuous Random Variable
Density curve of
continuous
random variable Y
Continuous range
of possible values
Y can assume
Any continuous random variable Y can be visualised with a density curve on the
standard x- and y- axes. A curve can be viewed as “continuous series of points”, which
makes it an analogue to how a discrete random variable is visualized with a plot of
discrete points. Following this parallel, the x-values under such a curve correspond to the
possible values Y can assume.
43
Visualisation of a
Highest point
Notes:
Density curve of • Area under the
continuous curve = 1
random variable Y • x-value of a
highest point is a
Mode mode
Any density curve of a continuous random variable must have area under it equal to 1.
Similar to the case of discrete random variables, the x-value of a highest point of the
curve is a mode. For the curve of Y, there is a single highest point, with x-value 0.2, so
0.2 is the mode of this continuous random variable.
44
Probabilities of a
Probability that Y
assumes a value
between 0.3 and 0.5?
Density curve of I.e.
continuous P(0.3 ≤ Y ≤ 0.5)
0.5)?
random variable Y
As in the case of discrete random variables, we are often curious about probabilities of a
random variable taking on values within a range. For example, what is the probability
that Y assumes a value between 0.3 and 0.5?
45
Probabilities of a
Density curve of Shaded area

continuous = P(0.3 ≤ Y ≤ 0.5)
random variable Y = 0.311
Interval: [0.3, 0.5]
To compute this probability, we calculate the area under the density curve of Y in the
interval 0.3 to 0.5. This area, shaded in the plot here, evaluates to 0.311.
46
Probabilities of a
In general
Probability that a continuous random variable takes on a value in interval 𝑎, 𝑏
= area under its density curve from a to b
In general, the probability that a continuous random variable takes on a value in an

interval [a, b], is equal to the area under its density curve from a to b.
47
Normal Distributions
Two normal distributions can only differ by their means or their variances.
𝑁 𝑥, 𝑦 : the normal distribution with mean x and variance y
Common properties:
• Bell-shaped curve
• Peak of the curve occurs at the mean
• Curve is symmetrical about the mean Mean = mode = median
A distinguished class of continuous random variables goes by the name, normal

distributions.
A particular normal distribution is fully described by its mean and variance. In other
words, any two normal distributions can only differ by their means or their variances.
We use the notation N of x comma y, to denote the normal distribution with mean x and
variance y. Every normal distribution possesses the following properties:
• it is a bell-shaped curve,
• it has a peak that occurs at the mean, and
• it is symmetrical about the mean.
The last two points, in conjunction, implies that the mean, the mode and the median of
any normal distribution, are the same.
48
Two normal distributions can only differ by their means or their variances.
𝑁 𝑥, 𝑦 : the normal distribution with mean x and variance y
𝑁 0,1 : standard normal 𝑁 2,2
Mean = 2
Mean = 0
• Smaller variance → thinner bell shape • Greater variance → fa er bell shape
Area under curve kept constant → a fa er curve compensates by being shorter

sd=sqrt(variance)
Now let us have a look at how the mean and variance of a normal distribution manifest
visually.
When the mean is 0 and the variance is 1, the resulting normal distribution is called the
standard normal distribution. Its density curve is shown to the left of the slide. The dotted
line indicates the peak of the curve, which as mentioned, occurs at the mean, 0. The
normal distribution with mean 2 and variance 2 has its density curve plotted on the same
axes, to the right of the slide. Here, the peak of the curve is to the right of the peak of the
standard normal distribution, due to the mean being greater.
Contrasting the shape of the two curves, we can infer that a smaller variance corresponds
to a thinner bell shape, whereas a greater variance corresponds to a fatter bell shape. Note
that, since any continuous random variable must have area under curve being 1, a fatter
bell shape needs to compensate by being shorter.
49
A real life example of a normal distribution occurs in intelligence quotient, or IQ for
short. The Wechsler Adult Intelligence Scale is designed such that its IQ scores follow a
normal distribution with mean 100 and standard deviation 15 (hence, a variance of 225).
As a result, approximately 68% percent of the IQ scores fall within the range 85 to 115,
approximately 95% percent of the scores fall in the range 70 to 130, less than 2% of the
scores fall in either of the ranges 55 to 70, and 130 to 145.
Summary
Sample space
Probability experiment subcollections of
assigned to
At this point, a summary of what we have covered so far could be useful. When talking
about formalizing real-life scenarios involving uncertainty, we always start with
describing a probability experiment, which should come equipped with an obvious
sample space. We define events as subcollections of the sample space. The probability
experiment should also give us information about how we can associate probabilities to
events.
51
Sample space
Probability experiment subcollections of
assigned to
• Conditional probability
• Random variables
• Independence
• Discrete random variables
• Independent events
• Continuous random variables
• Independent probability
• Normal distributions
experiments
All this information allows us on one hand, to talk about conditional probability, and the
related concept of independence, not just in the context of events of one probability
experiment, but also, among multiple probability experiments. On the other hand, this
information allows us to conceptualise random variables, both discrete and continuous,
including probably the most well-known class of continuous random variables around,
the class of normal distributions.
These concepts and constructs will be useful in the coming units.
52

Chapter 4 Part 1

Uploaded by

Copyright:

Available Formats

Chapter 4 Part 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4 Part 1

Uploaded by

Copyright:

Available Formats

Preface:

Type of Research Question Requires

Test a claim about the population

Compare two sub-populations

In a previous chapter, we are introduced to various types of research questions. The

It is therefore, useful to be able to give an approximate answer based on just a sample of

Picture source: courses.lumenlearning.com

The components of this language are shown in this slide.

Picture source: courses.lumenlearning.com

But before we can go knee-deep in statistical inference, we have to acquire knowledge

Enter probability, a mathematical means we can use to reason about uncertainty.

A probability experiment Outcomes of the experiment

A probability experiment must

Probability experiments form the bedrock of probability

collection of all outcomes of

Probability of an event of a sample

Note: we regard outcomes as events

Another simple example of a probability experiment is the rolling of a six-sided die. It is

If E is an event that has been assigned a probability

Yes No Yes • Proportion of E

• Estimate gets more

Let’s walk through this procedure of assigning probabilities with an example.

Consider the probability experiment of rolling a six-sided die. To assign a probability to

1. 0 ≤ 𝑃 𝐸 ≤ 1 for each event E

2. P(S) = 1 if S is the entire sample space

3. If E and F are non-overlapping (mutually exclusive) events, then 𝑃 𝐸 ∪ 𝐹 = 𝑃 𝐸 + 𝑃(𝐹)

It is virtually impossible to confirm the true probability of an event of a probability

In one such abstraction, we only require the assignment of probabilities, whether

1. The probability of each event E is between 0 and 1.

• Repeated application of the 3rd rule of probability

E.g. Let E denote the event “an odd-numbered face”.

Every outcome has the

A distinguished way of assigning probabilities is the commonly labelled “uniform

• Probability of selecting each unit

Uniform probability is also relevant in simple random sampling. When we randomly

Furthermore, if A is any subgroup of the sampling frame, then A is an event of the

How likely the outcome is

By convention, if P(F) = 0, then P(E | F) = 0 for any E.

Of course, an obvious problem occurs when the probability of F is 0, because then we

Does P(A | B) equal rate(A | B)?

• Sample space = sampling

Does P(A | B) equal rate(A | B)?

Conditional probability shows up commonly in medical diagnosis. For example, consider

What is P(Covid-19 | +)?

Rate(Covid-19 | +) = = 0.447 (3 s.f.) What is P(Covid-19 | +)?

Because conditional probabilities correspond to conditional rates in probability

Independence in probability is a very general concept, and it can manifest as something

Probabilities Pairings obey “Definition 2” of independent events

From time to time, we hear statements concerning “independent probability

{(1, HH), (1, TT), (1, HT), (1, TH),

Probabilities P((1, HH)) = P(1) × P(HH) … P((6, TH)) = P(6) × P(TH)

Let us walk through this construction with a more concrete example.

The assignment of probabilities to outcomes then proceeds as follows: the outcome in

In this unit, we introduce the notion of random variables.

1 if infected Check Covid-

Finite discrete variable

Randomly Household Household size 1 2 3 4 5 6

Discrete random variable

It is common to ask for probabilities of a random variable taking on values within a

Density curve of Shaded area