0% found this document useful (0 votes)
10 views78 pages

Chapter 4 Data Analysis

a

Uploaded by

devadityasen2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views78 pages

Chapter 4 Data Analysis

a

Uploaded by

devadityasen2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Introduction Random variables Describing distributions Special case Independent events

Chapter 4A: Random Variables &


their Distributions

MAST10010 Data Analysis

Department of Mathematics & Statistics


University of Melbourne

Slide 1/67
Introduction Random variables Describing distributions Special case Independent events

Outline
Introduction
Randomness
Random variables
Types of Random Variables
Distributions of Random variables
Describing probability distributions
Measures of location
Measures of spread
Special case
Sums of i.i.d.r.v.s and the Central Limit Theorem
Independent events
Exercises
Slide 2/67
Introduction Random variables Describing distributions Special case Independent events

References

References:
Utts and Heckard (4th & 5th editions) – Chapter 8
DeVeaux, Velleman, Bock – Chapter 16
Utts and Heckard (3rd edition) – Chapters 7 & 8

Slide 3/67
Introduction Random variables Describing distributions Special case Independent events

Learning Outcomes:
At the end of this topic you should be able to:
▶ Understand notation for random variables and related
quantities
▶ Recognise discrete and continuous random variables
▶ Use the pmf, pdf and cdf to calculate probabilities and
quantiles
▶ Describe probability distributions using expected value,
variance and standard deviation (including calculating these in
some cases)
▶ Use properties of expected value and variance to calculate
these for derived (or re-scaled) variables
▶ State the Central Limit Theorem
▶ Understand independence and mutual exclusion for random
variables
Slide 4/67
Introduction Random variables Describing distributions Special case Independent events

Road Map

Slide 5/67
Introduction Random variables Describing distributions Special case Independent events

Introduction

▶ To date we have focused our attention on sample data.


▶ We have explored the distribution of our sample data and
then, based on the information contained in the sample, we
have conjectured about what could be happening in the
population.
▶ We know that every time we take another sample the sample
mean will differ (think about the Chocs and Blocks!).

Slide 6/67
Introduction Random variables Describing distributions Special case Independent events

▶ So, what values for our sample mean (for example) could we
expect to observe, if we took lots and lots (an infinite number)
of random samples and recorded their respective means?
The answer to questions like this come from understanding what
random phenomena (processes) are, and how they behave.

In this chapter we look at ( ◦
◦ ) hypothetical (population) models
that enable us to model random phenomena, such as the variation
in outcomes from sample-to-sample.

Slide 7/67
Introduction Random variables Describing distributions Special case Independent events

. . . Visually

Slide 8/67
Introduction Random variables Describing distributions Special case Independent events

Randomness

Random phenomena

Example:
Flip a fair coin. It’s challenging to guess the outcome of just one
coin toss (one trial) because the outcome is random.

However, if you flip a fair coin many times over you would, in the
long run, expect the proportion of heads to be about 0.5.

It is this behaviour of random events that delights statisticians,


namely:
▶ the outcome of single trial is uncertain but in the long run,
over many trials, the random phenomenon settles and
becomes predictable.

Slide 9/67
Introduction Random variables Describing distributions Special case Independent events

Randomness

. . . what does it look like?

Slide 10/67
Introduction Random variables Describing distributions Special case Independent events

Randomness

https://pollev.com/paulfijn

Slide 11/67
Introduction Random variables Describing distributions Special case Independent events

Randomness

https://www.youtube.com/watch?v=JC41M7RPSec

Slide 12/67
Introduction Random variables Describing distributions Special case Independent events

Randomness

Quantifying randomness

Probability is a way of:


▶ measuring the chances of observing particular outcomes for a
random phenomenon/experiment.
The randomness may arise from:
▶ the way we assign subjects to treatments
▶ the way we select people for a sample
▶ measurement error
▶ other???

Slide 13/67
Introduction Random variables Describing distributions Special case Independent events

Randomness

Probability — a measure of randomness

Several definitions of probability exist.

We use: the probability of an event is the relative frequency of its


occurrence in an infinite-length sequence of experiments.

We represent the probability of an event by

Pr(X = x) = p(x)

Example Let X = number of heads when a fair coin is tossed


three times. “X = 2” denotes an event; Pr(X = 2) = 38 .

Slide 14/67
Introduction Random variables Describing distributions Special case Independent events

Random Variables

A random variable is a numerical variable, the value of which


depends on the outcome of a ‘random experiment’.

It has an unknown value before the random experiment; and an


observed numerical value after the experiment.

The observed numerical value is a called a realisation of the


random variable.

Notation
▶ for random variables use capital letters (X , Y , Z , . . .)
▶ for observed values use lower-case letters (x, y , z, . . .)

Slide 15/67
Introduction Random variables Describing distributions Special case Independent events

Examples

Let L = lead content (parts per million, ppm) of a randomly


chosen mussel.
▶ the random experiment is randomly choosing the mussel
and recording the lead content
▶ the random variable (L) is the lead content (ppm) in a
randomly chosen mussel.
▶ a realisation of the random variable (l) would be the actual
observed lead content (ppm)

Slide 16/67
Introduction Random variables Describing distributions Special case Independent events

Examples

Let X = number of heads when a fair coin is tossed three times


▶ the random experiment is tossing the fair coin three times
and recording the outcome
▶ the random variable (X ) is the number of heads in three coin
tosses
▶ a realisation of the random variable could be x = 0, 1, 2 or 3.

Slide 17/67
Introduction Random variables Describing distributions Special case Independent events

Probability Distributions

▶ A random variable is completely specified by its probability


distribution.
▶ A probability distribution summarizes the possible outcomes of
a random experiment and their associated probabilities.
▶ A probability distribution specifies how the values of a random
variable are ‘distributed’ among its possible values.

Slide 18/67
Introduction Random variables Describing distributions Special case Independent events

Types of Random Variables

Types of Random Variables

There are two types of random variables; discrete and continuous.

Slide 19/67
Introduction Random variables Describing distributions Special case Independent events

Types of Random Variables

Discrete Random variables

A discrete random variable can take only a countable (often


finite) number of values. They are usually counts.

For example:
Y = number of individuals in a group who own an iPhone;
Z = number of medical students with CVD (Colour Vision
Deficiency);
V = number of grapevines that are diseased

Slide 20/67
Introduction Random variables Describing distributions Special case Independent events

Types of Random Variables

. . . and Continuous

A continuous random variable can take any value in an interval.


They are usually measurements.

For example:
W = weight;
X = age;
T = temperature (◦ C)

Slide 21/67
Introduction Random variables Describing distributions Special case Independent events

Types of Random Variables

Which of the following are discrete?


1. Toss a coin: let X = 0 if tails, 1 if heads.
2. Amount of rainfall in a day.
3. Z = maximum temperature (◦ C) in Melbourne.
4. W = number of days with rain in a week in Melbourne
5. V = number of slabs of beer sold in a month in Victoria

Slide 22/67
Introduction Random variables Describing distributions Special case Independent events

Types of Random Variables

https://pollev.com/paulfijn

Slide 23/67
Introduction Random variables Describing distributions Special case Independent events

Distributions of Random variables

Discrete probability distributions

Discrete probability distributions — defined If X is a discrete


random variable, then its distribution is given by the probability
mass function (pmf)

pX (x) = Pr (X = x).
pX (x) may be displayed as a table, a graph or a formula.

Slide 24/67
Introduction Random variables Describing distributions Special case Independent events

Distributions of Random variables

Example — table

A fair coin is tossed three times, and we define:

X = the number of heads recorded.


x 0 1 2 3
pX (x) 81 38 38 18
 
3
pX (x) = 0.5x 0.53−x , x = 0, 1, 2, 3
x

Slide 25/67
Introduction Random variables Describing distributions Special case Independent events

Distributions of Random variables

Example — graph

Slide 26/67
Introduction Random variables Describing distributions Special case Independent events

Distributions of Random variables

Properties of pmfs

1. pX (x) ≥ 0 for all values of x;


P P
2. x pX (x) = 1, where x denotes the sum over the possible
values of x.

Slide 27/67
Introduction Random variables Describing distributions Special case Independent events

Distributions of Random variables

Cumulative Distribution Functions

The cumulative distribution function (cdf) of any random


variable (discrete or continuous) X is defined as:

F (x) = Pr (X ≤ x).

To find the probability that a discrete random variable X takes


some value between a and b inclusive, add up all the individual
values of pX (x) for a ≤ x ≤ b, That is,
b
X
Pr (a ≤ X ≤ b) = pX (x) = pX (a) + · · · + pX (b)
x=a

For continuous random variables. . . ?

Slide 28/67
Introduction Random variables Describing distributions Special case Independent events

Distributions of Random variables

Illustrating. . .

The following is the pmf of a discrete random variable X :

x 1 2 3 4 5
p(x) 2c 3c c 4c 5c

1. What is the value of c?


2. Find Pr(X ≤ 4) and Pr(1 < X < 5).

Slide 29/67
Introduction Random variables Describing distributions Special case Independent events

Distributions of Random variables

Continuous probability distributions

The distribution of a continuous random variable X is given by


the probability density function (pdf), denoted by fX (x).

The probability that a continuous random variable takes a value


between any two values a and b is given by the area under the pdf
curve between a and b.

The total area under the pdf curve is 1.

What about Pr(X = x)?

Slide 30/67
Introduction Random variables Describing distributions Special case Independent events

Distributions of Random variables

. . . Pr(X = x)?

Continuous random variables


▶ Arise when dealing with quantities measured on a continuous
scale, such as yield, height, blood pressure, % body fat,
temperature etc.
▶ To be recorded, the values must be rounded at some level.
▶ It is impossible to define the probability distribution by
assigning probabilities to different outcomes — there are too
many of them (an uncountable infinity), and they cannot all
be non-zero.
▶ However, we can ask for the probability that the variable takes
a value in a small interval.

Slide 31/67
Introduction Random variables Describing distributions Special case Independent events

Distributions of Random variables

Example: rounding

T : time a student takes to complete a tutorial question.

T is a continuous random variable, but we might measure time


only to the nearest minute.

Then Pr(16.5 < T < 17.5) is taken as the probability that ‘the
student takes 17 minutes’ (rounded to the nearest minute).

Slide 32/67
Introduction Random variables Describing distributions Special case Independent events

Distributions of Random variables

Properties of pdfs

The distribution of a continuous random variable X is given by the


probability density function (pdf), denoted by fX (x).
1. fX (x) ≥ 0
2. The total area under the curve is always equal to 1:
Z ∞
fX (x)dx = 1
−∞

3. Finding probabilities:
Z b
Pr(a < X < b) = fX (x)dx.
a

Slide 33/67
Introduction Random variables Describing distributions Special case Independent events

Distributions of Random variables

Cumulative distribution functions

The cdf is defined in the same way for both discrete and
continuous random variables.

F (x) = Pr (X ≤ x), (−∞ < x < ∞).

Properties of the CDF:


1. 0 ≤ F (x) ≤ 1
2. F (−∞) = 0
3. F (∞) = 1.

Slide 34/67
Introduction Random variables Describing distributions Special case Independent events

Describing probability distributions

The probability distribution of X is specified by the pmf (discrete


r.v.) or pdf (continuous r.v.), but what we want is a simple
summary of the important features of the distribution.

The most important features are the location and spread.

Slide 35/67
Introduction Random variables Describing distributions Special case Independent events

Measures of location

Percentiles of a distribution

The pth percentile of the distribution of X is the value of x for


which p% of the probability falls at or below this value.

For example, if the 97.5th percentile of the distribution of X is


1.96, then Pr(X ≤ 1.96) = 0 : 975.

Continuous distribution
the x boundary for a left area

Discrete distribution: tricky!

Slide 36/67
Introduction Random variables Describing distributions Special case Independent events

Measures of location

Some common percentiles


The lower quartile (Q1 ) is the 25th percentile, the upper quartile
(Q3 ) is the 75th percentile and the median M (Q2 ) is the 50th
percentile. Quartiles provide information about both location and
spread.

Slide 37/67
Introduction Random variables Describing distributions Special case Independent events

Measures of location

Expectation of a discrete random variable

The expectation of a discrete random variable X is


X
µX = E (X ) = xpX (x)
x

where the sum is taken over all possible values of X .


Synonymous terms
▶ mean of X
▶ expectation of X or expected value of X
▶ E (X )
▶ µX or simply µ

Slide 38/67
Introduction Random variables Describing distributions Special case Independent events

Measures of location

Expectation of a discrete random variable

Example:
x 0 1 2 3
pX (x) 81 3
8
3
8
1
8

1 3 3 1 3
E (X ) = 0 × + 1 × + 2 × + 3 × = = 1.5
8 8 8 8 2

Slide 39/67
Introduction Random variables Describing distributions Special case Independent events

Measures of location

Expectation of a continuous random variable

The expectation of a continuous random variable X is


Z ∞
µX = E (X ) = x fX (x)dx.
−∞

Integration not covered further in this course.

Interpretation of the mean value


On the graph, it represents the centre of gravity or balance point
of the distribution of X .

The mean can be thought of as the average of the outcomes of


infinitely many repetitions of the experiment.

Slide 40/67
Introduction Random variables Describing distributions Special case Independent events

Measures of location

Properties of the mean

The following statements hold true for both discrete and


continuous random variables.
1. E (X ) does not have to be an observable value of X .
2. If the distribution of X is symmetric, then E (X ) is the point
of symmetry on the x-axis.
3. If Y = aX + b, where a and b are known constants, then

E (Y ) = a × E (X ) + b

4. If X and Y are any two random variables, then

E (X + Y ) = E (X ) + E (Y ).

Slide 41/67
Introduction Random variables Describing distributions Special case Independent events

Measures of spread

The standard deviation of a random variable

The most important measure of spread is the standard deviation,


or its square, the variance.

Standard deviation gives, very roughly, the average distance of the


values from their mean and has the same units as the original
measurements; variance is easier to work with mathematically.

Slide 42/67
Introduction Random variables Describing distributions Special case Independent events

Measures of spread

The standard deviation of a discrete r.v.

The variance of a discrete random variable X is

X
(x−µ)2 pX (x) = E (X − µ)2 ,
 
Var (X ) = where µ = E (X ).
x

Equivalently (and often better for calculation)


X
Var (X ) = x 2 pX (x) − µ2 .
x
p
The standard deviation of X is Var (X ).

Slide 43/67
Introduction Random variables Describing distributions Special case Independent events

Measures of spread

Standard deviation of a discrete random


variable

x 0 1 2 3
1 3 3 1
pX (x) 8 8 8 8

3
E (X ) = 2 (found previously)

 2
1 3 3 1 3
Var (X ) = 0 × + 1 × + 4 × + 9 × −
8 8 8 8 2
24 9 3
= − =
8 4 4

Slide 44/67
Introduction Random variables Describing distributions Special case Independent events

Measures of spread

Standard deviation of a discrete random


variable

For each pmf below, find E (X ) and Var (X ).

x 0 1 2
1.
pX (x)
0.5 0.3 0.2
2. pX (x) = x3 (0.6)x (0.4)3−x ,

x = 0, 1, 2, 3
3. pX (x) = kx, x = 1, 2, 3, 4, 5.

Slide 45/67
Introduction Random variables Describing distributions Special case Independent events

Measures of spread

Standard deviation of a continuous random


variable

The variance of a continuous random variable X is


Z ∞
(x − µ)2 fX (x)dx = E (X − µ)2 ,
 
Var (X ) =
−∞

where µ = E (X ).
p
The standard deviation of X is again Var (X ).

You won’t be required to use calculus to calculate variances


or standard deviations of continuous random variables.

Slide 46/67
Introduction Random variables Describing distributions Special case Independent events

Measures of spread

Properties of variance and standard deviation


The following hold for both discrete and continuous random
variables.
1. Var (X ) ≥ 0 and sd(X ) ≥ 0.
2. Common notation for the variance is Var (X ) or σX2 or σ 2 ; and
for the standard deviation is sd(X ) or stdev (X ) or σX or σ.
3. The standard deviation, and not the variance, is in the same
units as X .
4. For many distributions,

Pr(µ − 2σ < X < µ + 2σ) ≈ 0.95

5. A useful result for finding Var (X ) is that:

Var (X ) = E (X 2 ) − µ2 .
Slide 47/67
Introduction Random variables Describing distributions Special case Independent events

Measures of spread

Properties of variance and standard deviation

6. If Y = aX + b, where a and b are constants, then

Var (Y ) = a2 Var (X )
sd(Y ) = |a|sd(X )

7. If X and Y are independent random variables:

Var (X + Y ) = Var (X ) + Var (Y ).

Also
Var (X − Y ) = Var (X ) + Var (Y ),
and
Var (aX + bY ) = a2 Var (X ) + b 2 Var (Y ).

Slide 48/67
Introduction Random variables Describing distributions Special case Independent events

Measures of spread

A Particular Result — standardising

8. Standardising
8.1 Let X be a random variable, discrete or continuous, with
expectation µ and standard deviation σ.
From (6) it follows that XS = X −µ
σ has expectation 0 and
standard deviation 1.
8.2 Let Y be a random variable, discrete or continuous, with
expectation 0 and standard deviation 1. Again from (6) it
follows that W = σY + µ has expectation µ and standard
deviation σ.

Slide 49/67
Introduction Random variables Describing distributions Special case Independent events

Measures of spread

Rescaling — an example

The maximum daily temperature for Melbourne in February has a


mean of 27.5(◦ C) and a standard deviation of 5.2 (◦ C).

What would the mean and standard deviation be for temperature


measured in Fahrenheit (◦ F), where F = 95 C + 32?

Slide 50/67
Introduction Random variables Describing distributions Special case Independent events

Measures of spread

Example: Time for Medley Relay

Four students from the university swimming team are preparing to


compete in the Inter-varsity 4 × 100 meter medley relay event.
Each swimmer will be swimming 100 meters; each swimmer swims
a different stroke.
Their mean training times to date are tabled below:
Swimmer Time (s) Standard deviation Variance
1 55.4 0.26
2 60.7 0.24
3 54.0 0.27
4 49.1 0.23
The coach is interested in the time his medley team is likely to
swim ‘on the BIG day’.

Slide 51/67
Introduction Random variables Describing distributions Special case Independent events

Measures of spread

Define: T = total time taken to swim the 4 × 100 medley

E (T ) =

V (T ) =

stdev (T ) =

What time should the coach expect?


What assumption have we made? Is it reasonable?

Slide 52/67
Introduction Random variables Describing distributions Special case Independent events

Sums of i.i.d.r.v.s and the Central Limit Theorem

Mean and variance for the sum of i.i.d.r.v.s


Let X1 , X2 , . . . , Xn be independent random variables (discrete or
continuous), all having the same distribution with expectation µ
and standard deviation σ, and let
n
X
Sn = Xi = X1 + X2 + · · · + Xn
i=1
Then, from the additivity of expectation (for any variables) and of
variances (for independent variables) it follows that

E (Sn ) = µ + µ + · · · + µ = nµ
Var (Sn ) = σ 2 + σ 2 + · · · + σ 2 = nσ 2

and hence p √
sd(Sn ) = Var (Sn ) = nσ
Slide 53/67
Introduction Random variables Describing distributions Special case Independent events

Sums of i.i.d.r.v.s and the Central Limit Theorem

Distribution of the sum of i.i.d.r.v.s


The Central Limit Theorem says that the sum of a large number
of similarly distributed random variables which are independent,
but may have the same distribution, is asymptotically normally
distributed. It is always the case that for
n
X
Sn = Xi = X1 + X2 + · · · + Xn
i=1
where Xi are independent, identically distributed random variables
(discrete or continuous), with mean µ and standard deviation σ,
E (Sn ) = nµ
Var (Sn ) = nσ 2
and √
sd(Sn ) = nσ
Slide 54/67
Introduction Random variables Describing distributions Special case Independent events

Sums of i.i.d.r.v.s and the Central Limit Theorem

Furthermore, CLT says

The Central Limit Theorem says that if, in addition, n is large then:

d √
Sn ≈ N(nµ, nσ)
This is an amazing and important result! It is the fundamental
reason for the importance of the normal distribution.

Any variable that can be considered as being composed of the sum


of many small influences will be approximately normally distributed.

Slide 55/67
Introduction Random variables Describing distributions Special case Independent events

Sums of i.i.d.r.v.s and the Central Limit Theorem

Corollary — Mean of the means


Let X1 , X2 , . . . , Xn be independent random variables (discrete or
continuous), all having the same distribution with expectation µ
and standard deviation σ, and let
n
1 1X
X̄ = Sn = Xi
n n
i=1

Then

E (X̄ ) = µ
Var (X̄ ) = σ 2 /n

and √
sd(X̄ ) = σ/ n
Slide 56/67
Introduction Random variables Describing distributions Special case Independent events

Sums of i.i.d.r.v.s and the Central Limit Theorem

Example

What is the expected profit while betting on red in Roulette?


Bet $1, get $2 if you win, lose the $1 if you lose.

1, with probability 18/37
X = profit =
−1, with probability 19/37

Answers (worked solutions will be on the LMS)


▶ E (X ) = −0.027
▶ Var (X ) = 0.9993
▶ sd(X ) = 0.9996 (≈ $1).

Slide 57/67
Introduction Random variables Describing distributions Special case Independent events

Sums of i.i.d.r.v.s and the Central Limit Theorem

Example — 100 games

Suppose we play 100 times for $1 each. Then


▶ E (X1 + X2 + · · · + X100 ) = 100E (X1 ) = −$2.70,
▶ Var (X1 + · · · + X100 ) = Var (X1 ) + · · · + Var (X100 ) = 99.93 ≈
100

▶ sd(X1 + X2 + · · · + X100 ) ≈ 100 = $10.
Interpretation: In 100 plays, we expect to lose $2.70. The
average deviation from $2.70 is likely to be in the order of $10.

The mean gives the forecast. The standard deviation gives the
accuracy of the forecast.

Slide 58/67
Introduction Random variables Describing distributions Special case Independent events

Sums of i.i.d.r.v.s and the Central Limit Theorem

Example: comparing number of plays

When betting on red in Roulette, how do the results for 100 games
of $1 each compare to:
playing 50 games for $2 each?

playing 25 games for $4 each?

Slide 59/67
Introduction Random variables Describing distributions Special case Independent events

Sums of i.i.d.r.v.s and the Central Limit Theorem

Example: effect of rounding transactions

Define a random variable describing the effect of eliminating one


and two cent coins and rounding transactions to the nearest five
cents. Find its mean, variance and standard deviation.

Slide 60/67
Introduction Random variables Describing distributions Special case Independent events

Independence: a key property

Let X and Y be the height and weight, respectively, of a randomly


chosen person from some population. If we found X to be
relatively large then we might expect Y to also be relatively large.

However, if X and Y were the height and weight of different


people, selected at random, then X would tell us nothing about Y
and we say that they are independent. Independence is a very
important concept.

Slide 61/67
Introduction Random variables Describing distributions Special case Independent events

Definition of Independence

Two discrete random variables X and Y are independent if, and


only if, for all possible values x ⊂ X and y ⊂ Y ,

Pr(X = x and Y = y ) = Pr(X = x) × Pr(Y = y )

More generally, two random variables X and Y are independent if,


and only if, for all real values a, b, c and d,

Pr(a ≤ X ≤ b and c ≤ Y ≤ d) = Pr(a ≤ X ≤ b)×Pr(c ≤ Y ≤ d)

Slide 62/67
Introduction Random variables Describing distributions Special case Independent events

Hypertension and Cardiac problems

Example: Residents of a retirement village

A retirement village has 1000 residents. Suppose that 200 of them


have hypertension problems and 160 have cardiac problems, as
indicated below: Are the events cardiac problems and hypertension
problems independent?
Cardiac (C ) No cardiac (C ′ ) Total
Hypertension (H) 80 120 200
No Hypertension (H ′ ) 80 720 800
Total 160 840 1000

Slide 63/67
Introduction Random variables Describing distributions Special case Independent events

Hypertension and Cardiac problems

If we focus our attention on the group that has hypertension we


observe that, of these 200 residents, 80 have cardiac problems.
80
Pr (C |H) = = 0.4
200
Conditioning on H means restricting the population to just those
(200 residents) who have hypertension.

Using the multiplication rule:


80
Pr (C andH) 1000
Pr (C |H) = = 200
= 0.4
Pr (H) 1000

Slide 64/67
Introduction Random variables Describing distributions Special case Independent events

Hypertension and Cardiac


problems. . . dependent?

160
Pr (C ) = = 0.16 ̸= Pr (C |H) = 0.4
1000

That is, knowing that a resident has hypertension actually


increases the probability that they have cardiac problems.

The events H and C are said to be dependent events.

Furthermore, since:
Pr (C |H) > Pr (C ). . . then the two events are positively associated.

Slide 65/67
Introduction Random variables Describing distributions Special case Independent events

If two events are (statistically) independent then knowing that one


event has occurred (or will occur) does not affect the probability of
the other event occurring.

If two events are independent:


Pr (A|B) = Pr (A and B)
Pr (B) = Pr (A)×Pr
Pr (B)
(B)
= Pr (A)

If two events are dependent: Pr (A|B) > Pr (A) =⇒ events are


positively associated. Pr (A|B) < Pr (A) =⇒ events are negatively
associated.

Slide 66/67
Introduction Random variables Describing distributions Special case Independent events

Exercises

Exercise: Allergic reactions

Drug A causes an allergic reaction in 3% of adults, drug B in 6%,


while 0.4% are allergic to both. What sort of relationship exists
between allergic reactions to the drugs A and B?

Slide 67/67
Introduction Random variables Describing distributions Special case Independent events

Exercises

Exercise: Independence and mutual exclusion

If two events are (statistically) independent, are they are also


mutually exclusive?
Example:

Slide 68/67

You might also like