0% found this document useful (0 votes)

47 views31 pages

Lecture 4: Random Variables and Distributions

This document provides an overview of random variables and common probability distributions important in genetics and genomics. It discusses random variables, probability distributions, and key concepts like expectation and variance. It then describes several important probability distributions: the binomial distribution, hypergeometric distribution, Poisson distribution, and normal distribution. Examples are provided to illustrate how to work with each distribution.

Uploaded by

JBA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views31 pages

Lecture 4: Random Variables and Distributions

Uploaded by

JBA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Lecture 4: Random

Variables and Distributions

Goals

• Random Variables

• Overview of discrete and continuous

distributions important in genetics/genomics

• Working with distributions in R

Random Variables
A rv is any rule (i.e., function) that associates
a number with each outcome in the sample
space

-1 0 1

!
Two Types of Random Variables

• A discrete random variable has a

countable number of possible values

• A continuous random variable takes all

values in an interval of numbers
Probability Distributions of RVs
Discrete
Let X be a discrete rv. Then the probability mass function (pmf), f(x),
of X is:
P(X = x), x ∈ Ω
f (x) =
0, x∉Ω

A a
Continuous
! Let X be a continuous rv. Then the probability density function (pdf) of
X is a function f(x) such that for any two numbers a and b with a ≤ b:

b
P(a " X " b) = # f (x)dx
a
a b
Using CDFs to Compute Probabilities
x
Continuous rv: F(x) = P(X " x) = % f (y)dy
#$

pdf cdf

P(a " X " b) = F(b) # F(a)

Using CDFs to Compute Probabilities
x
Continuous rv: F(x) = P(X " x) = % f (y)dy
#$

pdf cdf

P(a " X " b) = F(b) # F(a)

Expectation of Random Variables
Discrete
Let X be a discrete rv that takes on values in the set D and has a
pmf f(x). Then the expected or mean value of X is:

µX = E[X] = $ x " f (x)

x #D

Continuous
The expected or mean value of a continuous rv X with pdf f(x) is:
! $
µX = E[X] = % x " f (x)dx
#$
Variance of Random Variables
Discrete
Let X be a discrete rv with pmf f(x) and expected value µ. The
variance of X is:

" X2 = V[X] = % (x #µ ) 2
= E[(X # µ ) 2
]
x $D

Continuous
The variance of a continuous rv X with pdf f(x) and mean µ is:
!
%
" X2 = V[X] = & (x # µ ) 2
$ f (x)dx = E[(X # µ ) 2
]
#%
Example of Expectation and Variance
• Let L1, L2, …, Ln be a sequence of n nucleotides and define the rv
Xi :
1, if Li = A
Xi
0, otherwise

• pmf is then: P(Xi = 1) = P(Li = A) = pA

P(Xi = 0) = P(Li = C or G or T) = 1 - pA

• E[X] = 1 x pA + 0 x (1 - pA) = pA

• Var[X] = E[X - µ]2 = E[X2] - µ2

= [12 x pA + 02 x (1 - pA)] - pA2
= pA (1 - pA)
The Distributions We’ll Study

1. Binomial Distribution

2. Hypergeometric Distribution

3. Poisson Distribution

4. Normal Distribution
Binomial Distribution
• Experiment consists of n trials
– e.g., 15 tosses of a coin; 20 patients; 1000 people surveyed
• Trials are identical and each can result in
one of the same two outcomes
– e.g., head or tail in each toss of a coin
– Generally called “success” and “failure”
– Probability of success is p, probability of failure is 1 – p
• Trials are independent
• Constant probability for each observation
– e.g., Probability of getting a tail is the same each time we
toss the coin
Binomial Distribution
pmf:
n x n"x
P{X = x} = ( ) p (1" p)
x

cdf: x
n
P{X " x} = $ ( ) p (1# p)
y
y n#y

! y= 0

E(x) = np

! Var(x) = np(1-p)
Binomial Distribution: Example 1
• A couple, who are both carriers for a recessive
disease, wish to have 5 children. They want to know
the probability that they will have four healthy kids

5 4 1
P{X = 4} = ( )0.75 " 0.25
4

= 0.395

! p(x)

0 1 2 3 4 5
Binomial Distribution: Example 2
• Wright-Fisher model: There are i copies of the A allele
in a population of size 2N in generation t. What is the
distribution of the number of A alleles in generation t
+ 1?
j 2N ( j
2N " i % " i %
pij = j
$ ' $1( ' j = 0, 1, …, 2N
# 2N & # 2N &

!
Hypergeometric Distribution

• Population to be sampled consists of N

finite individuals, objects, or elements

• Each individual can be characterized as a

success or failure, m successes in the
population

• A sample of size k is drawn and the rv of

interest is X = number of successes
Hypergeometric Distribution
• Similar in spirit to Binomial distribution, but from a finite
population without replacement

20 white balls
out of
100 balls

If we randomly sample 10 balls, what is the probability that 7

or more are white?
Hypergeometric Distribution
• pmf of a hypergeometric rv:

m n
i k-i For i = 0, 1, 2, 3, …
P{X = i | n,m,k} =
m+n
k

Where,
k = Number of balls selected

m = Number of balls in urn considered “success”

n = Number of balls in urn considered “failure”

m + n = Total number of balls in urn
Hypergeometric Distribution
• Extensively used in genomics to test for “enrichment”:

Number of genes of interest with

annotation

Number of
Number of
genes with
genes of
annotation
interest

" = Number of annotated genes

Poisson Distribution
• Useful in studying rare events

• Poisson distribution also used in situations

where “events” happen at certain points
in time

• Poisson distribution approximates the

binomial distribution when n is large and p
is small
Poisson Distribution
• A rv X follows a Poisson distribution if the pmf of X is:
i
"# # For i = 0, 1, 2, 3, …
P{X = i} = e
i!
• λ is frequently a rate per unit time:
λ = αt = expected number of events per unit time t
!
• Safely approximates a binomial experiment when n > 100, p <
0.01, np = λ < 20)

• E(X) = Var(X) = λ
Poisson RV: Example 1

• The number of crossovers, X, between two

markers is X ~ poisson(λ=d)

di
P{X = i} = e"d
i!

P{X = 0} = e"d

!
P{X " 1} = 1# e#d
!

!
Poisson RV: Example 2

• Recent work in Drosophila suggests the spontaneous rate of

deleterious mutations is ~ 1.2 per diploid genome. Thus, let’s
tentatively assume X ~ poisson(λ = 1.2) for humans. What is
the probability that an individual has 12 or more spontaneous
deleterious mutations?
11 i
1.2
P{X " 12} = 1# $ e#1.2
i= 0 i!

= 6.17 x 10-9
!
Poisson RV: Example 3

• Suppose that a rare disease has an incidence of 1 in 1000 people

per year. Assuming that members of the population are affected
independently, find the probability of k cases in a population of
10,000 (followed over 1 year) for k=0,1,2.

The expected value (mean) = λ = .001*10,000 = 10

(10) 0 e"(10)
P(X = 0) = = .0000454
0!
(10)1 e"(10)
P(X = 1) = = .000454
1!
(10) 2 e"(10)
P(X = 2) = = .00227
2!

!
Normal Distribution

• “Most important” probability distribution

• Many rv’s are approximately normally

distributed

• Even when they aren’t, their sums and

averages often are (CLT)
Normal Distribution
• pdf of normal distribution:
2 1 $( x$ µ )2 / 2" 2
f (x;µ," ) = e
2#"
• standard normal distribution (µ = 0, σ2 = 1):

! 1 $z 2 / 2
f (z;0,1) = e
2"#
• cdf of Z:
z

! P(Z " z) = % f (y;0,1) dy

#$
Standardizing Normal RV

• If X has a normal distribution with mean µ and

standard deviation σ, we can standardize to a standard
normal rv:

X "µ
Z=
#

!
I Digress: Sampling Distributions
• Before data is collected, we regard observations as random
variables (X1,X2,…,Xn)

• This implies that until data is collected, any function (statistic)

of the observations (mean, sd, etc.) is also a random variable

• Thus, any statistic, because it is a random variable, has a

probability distribution - referred to as a sampling
distribution

• Let’s focus on the sampling distribution of the mean, X

!
Behold The Power of the CLT
• Let X1,X2,…,Xn be an iid random sample from a distribution with mean µ and
standard deviation σ. If n is sufficiently large:
"
X ~N(µ n
)
,

!
!
Example
• If the mean and standard deviation of serum iron values from
healthy men are 120 and 15 mgs per 100ml, respectively, what is
the probability that a random sample of 50 normal men will yield a
mean between 115 and 125 mgs per 100ml?

First, calculate mean and sd to normalize (120 and 15 / 50 )

$ 115 #120 125 #120 '

p(115 " x " 125 = p& "x" )
% 2.12 ! 2.12 (

= p("2.36 # z # 2.36)
= p( z " 2.36) # p( z " #2.36)
!
= 0.9909 # 0.0091
! = 0.9818
R
• Understand how to calculate probabilities from probability
distributions
 Normal: dnorm and pnorm

 Poisson: dpois and ppois

 Binomial: dbinom and pbinom

 Hypergeometric: dhyper and phyper

• Exploring relationships among distributions

Probability - Distributions
No ratings yet
Probability - Distributions
20 pages
Chapter 6 Probability Distribution
No ratings yet
Chapter 6 Probability Distribution
22 pages
Slides 2 Statistics
No ratings yet
Slides 2 Statistics
158 pages
Discrete and Continuous Probability Distributions PPT BEC
No ratings yet
Discrete and Continuous Probability Distributions PPT BEC
68 pages
Statatics and Probability Chapter 3 and 4
No ratings yet
Statatics and Probability Chapter 3 and 4
10 pages
Section
No ratings yet
Section
62 pages
WINSEM2024-25 MAT1011 ETH AP2024254000674 2025-02-05 Reference-Material-I
No ratings yet
WINSEM2024-25 MAT1011 ETH AP2024254000674 2025-02-05 Reference-Material-I
45 pages
Lec10 Jan30
No ratings yet
Lec10 Jan30
26 pages
Notes-04-Random Variables
No ratings yet
Notes-04-Random Variables
17 pages
Lecture 9
No ratings yet
Lecture 9
45 pages
Sem 6 Notes Maths
No ratings yet
Sem 6 Notes Maths
7 pages
Chapter 4
No ratings yet
Chapter 4
63 pages
Navidi ch4
No ratings yet
Navidi ch4
85 pages
Mstat Note7 Random Variable f23
No ratings yet
Mstat Note7 Random Variable f23
76 pages
Discrete Random Variables
No ratings yet
Discrete Random Variables
22 pages
1853 - Random Variable & Distribution
No ratings yet
1853 - Random Variable & Distribution
43 pages
Lecture 4 - Binomial Distribution Chapter 6 Douglas Lind-Week 3 (Random Variables, Dicrete Prob Distributions)
No ratings yet
Lecture 4 - Binomial Distribution Chapter 6 Douglas Lind-Week 3 (Random Variables, Dicrete Prob Distributions)
82 pages
Varuna A
No ratings yet
Varuna A
15 pages
DGFGHF
No ratings yet
DGFGHF
4 pages
Random Variables: Petter Mostad 2005.09.19
No ratings yet
Random Variables: Petter Mostad 2005.09.19
24 pages
Bab 3 Probabilitas Diskrit
No ratings yet
Bab 3 Probabilitas Diskrit
41 pages
Group4 Randomvariableanddistribution 151014015655 Lva1 App6891
No ratings yet
Group4 Randomvariableanddistribution 151014015655 Lva1 App6891
59 pages
Lecture-4 - Random Variables and Probability Distributions
67% (3)
Lecture-4 - Random Variables and Probability Distributions
42 pages
U1 4-RVDistributions
No ratings yet
U1 4-RVDistributions
36 pages
Chapter 5 Summary
No ratings yet
Chapter 5 Summary
26 pages
DISC: 203 - Probability & Statistics: Lecture 7 - 9 Probability Distributions - I
No ratings yet
DISC: 203 - Probability & Statistics: Lecture 7 - 9 Probability Distributions - I
34 pages
AdvStats - W3 - Discrete
No ratings yet
AdvStats - W3 - Discrete
24 pages
Stats 1 - IITM BS Notes - Part 5
No ratings yet
Stats 1 - IITM BS Notes - Part 5
17 pages
Chapter 3: Discrete Distributions: Probability and Statistics For Science and Engineering With Examples in R Hongshik Ahn
No ratings yet
Chapter 3: Discrete Distributions: Probability and Statistics For Science and Engineering With Examples in R Hongshik Ahn
44 pages
Lec 01
No ratings yet
Lec 01
44 pages
STATS For Business Lec 6
No ratings yet
STATS For Business Lec 6
53 pages
Istanbul Aydin University: Chapter 4: Probability Distributions
No ratings yet
Istanbul Aydin University: Chapter 4: Probability Distributions
12 pages
Lecture 5 & 6 - Statistics
No ratings yet
Lecture 5 & 6 - Statistics
41 pages
Isolated Footing Excel Computation
No ratings yet
Isolated Footing Excel Computation
27 pages
Statistics For Business and Economics: Discrete Random Variables and Probability Distributions
No ratings yet
Statistics For Business and Economics: Discrete Random Variables and Probability Distributions
82 pages
CH 05
No ratings yet
CH 05
28 pages
Lecture 5
No ratings yet
Lecture 5
28 pages
Chapter 7 Eng
No ratings yet
Chapter 7 Eng
59 pages
Class 4 SP
No ratings yet
Class 4 SP
23 pages
(Innovations in Transactional Analysis - Theory and Practice) Sari Van Poelje, Anne de Graaf - New Theory and Practice of Transactional Analysis in Organizations - On The Edge-Routledge (2021)
100% (1)
(Innovations in Transactional Analysis - Theory and Practice) Sari Van Poelje, Anne de Graaf - New Theory and Practice of Transactional Analysis in Organizations - On The Edge-Routledge (2021)
213 pages
RV 1
No ratings yet
RV 1
22 pages
Sample Study Matter JEE (Advanced) PDF
100% (1)
Sample Study Matter JEE (Advanced) PDF
89 pages
Discrete Random Variables and Probability Distributions
No ratings yet
Discrete Random Variables and Probability Distributions
4 pages
Unit 3 Part II
No ratings yet
Unit 3 Part II
45 pages
Probability Distributions in R
No ratings yet
Probability Distributions in R
42 pages
3 Random Variables PDF
No ratings yet
3 Random Variables PDF
26 pages
The College Walkthrough Ver 0.39
No ratings yet
The College Walkthrough Ver 0.39
22 pages
State Budget 2025-26
No ratings yet
State Budget 2025-26
13 pages
Chapter Six
No ratings yet
Chapter Six
45 pages
BS Unit 2
No ratings yet
BS Unit 2
24 pages
991.20 Nitrogeno Total en Leche - Kjeldahl
No ratings yet
991.20 Nitrogeno Total en Leche - Kjeldahl
2 pages
Chapter 3 - Special Probability Distributions
No ratings yet
Chapter 3 - Special Probability Distributions
45 pages
Statistical Tools
No ratings yet
Statistical Tools
79 pages
500 Grammar Questions With Keys PDF
No ratings yet
500 Grammar Questions With Keys PDF
48 pages
Chapter 5 Prob
No ratings yet
Chapter 5 Prob
6 pages
MAT 263 Lecture 3 - Special Discrete Distribution
No ratings yet
MAT 263 Lecture 3 - Special Discrete Distribution
21 pages
CH 3 3502
No ratings yet
CH 3 3502
9 pages
1743 Chapter 4 Probability Distribution
No ratings yet
1743 Chapter 4 Probability Distribution
23 pages
TOS TLE 8 Agricrop For Sharing
No ratings yet
TOS TLE 8 Agricrop For Sharing
2 pages
TX Planning Presentation
No ratings yet
TX Planning Presentation
18 pages
Insurance Awareness Handouts - Basics of Insurance
No ratings yet
Insurance Awareness Handouts - Basics of Insurance
8 pages
CRM Section Two
No ratings yet
CRM Section Two
4 pages
Chap 4
No ratings yet
Chap 4
21 pages
Orson Welles' Memo On by Lawrence French
100% (1)
Orson Welles' Memo On by Lawrence French
41 pages
Special Probability Distributions: Presented By: Juanito S. Chan
No ratings yet
Special Probability Distributions: Presented By: Juanito S. Chan
37 pages
Eim Q3W8
No ratings yet
Eim Q3W8
47 pages
Lab-2: Probability Distributions Name: Objective:To Compute Probability Density Function (PDF) and Cumulative Distribution Function (CDF) Outcomes
No ratings yet
Lab-2: Probability Distributions Name: Objective:To Compute Probability Density Function (PDF) and Cumulative Distribution Function (CDF) Outcomes
15 pages
Teaching Resume 2017
No ratings yet
Teaching Resume 2017
2 pages
Probability Cheatsheet
No ratings yet
Probability Cheatsheet
4 pages
A Study On The Performance of Insurance Companies in 1xynrowx1f
No ratings yet
A Study On The Performance of Insurance Companies in 1xynrowx1f
13 pages
Synthetic
No ratings yet
Synthetic
6 pages
Our Annual List of Must-Have Wines.: by The Editors of Wine Enthusiast Magazine
100% (1)
Our Annual List of Must-Have Wines.: by The Editors of Wine Enthusiast Magazine
10 pages
Force of Friction
No ratings yet
Force of Friction
30 pages
IClebo Arte User Guide-English
No ratings yet
IClebo Arte User Guide-English
20 pages
0812 0819BL
No ratings yet
0812 0819BL
15 pages
Maroon Black Minimalist Best Genre Movie List Planner
No ratings yet
Maroon Black Minimalist Best Genre Movie List Planner
5 pages
Lioba CV
No ratings yet
Lioba CV
5 pages
Message Analyzer FAQ and Known Issues
No ratings yet
Message Analyzer FAQ and Known Issues
11 pages
Activity 3 Earths Interior
No ratings yet
Activity 3 Earths Interior
3 pages
Mobilink Packages FF
No ratings yet
Mobilink Packages FF
6 pages
Portfolio
No ratings yet
Portfolio
19 pages
Ka & TN Cbse (c3 To c5) C Batch BWT - 7 Syllabus (19.02.2024)
No ratings yet
Ka & TN Cbse (c3 To c5) C Batch BWT - 7 Syllabus (19.02.2024)
2 pages
Balloon Tutorial
No ratings yet
Balloon Tutorial
19 pages
Assignment - 2 (Google in China)
100% (1)
Assignment - 2 (Google in China)
5 pages
The Cell
No ratings yet
The Cell
6 pages
Random Variables and Probability Distributions: Week IV
No ratings yet
Random Variables and Probability Distributions: Week IV
4 pages
Multiple Choice Questions: University of Cape Town School of Economics Eco1010F Tutorial 8
No ratings yet
Multiple Choice Questions: University of Cape Town School of Economics Eco1010F Tutorial 8
6 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Math for Computer Applications
From Everand
Math for Computer Applications
The Editors of REA
No ratings yet

Lecture 4: Random Variables and Distributions

Uploaded by

Lecture 4: Random Variables and Distributions

Uploaded by

Lecture 4: Random

Variables and Distributions

• Overview of discrete and continuous

• Working with distributions in R

• A discrete random variable has a

• A continuous random variable takes all

P(a " X " b) = F(b) # F(a)

P(a " X " b) = F(b) # F(a)

µX = E[X] = $ x " f (x)

• pmf is then: P(Xi = 1) = P(Li = A) = pA

• Var[X] = E[X - µ]2 = E[X2] - µ2

• Population to be sampled consists of N

• Each individual can be characterized as a

• A sample of size k is drawn and the rv of

If we randomly sample 10 balls, what is the probability that 7

m = Number of balls in urn considered “success”

n = Number of balls in urn considered “failure”

Number of genes of interest with

" = Number of annotated genes

• Poisson distribution also used in situations

• Poisson distribution approximates the

• The number of crossovers, X, between two

• Recent work in Drosophila suggests the spontaneous rate of

• Suppose that a rare disease has an incidence of 1 in 1000 people

The expected value (mean) = λ = .001*10,000 = 10

• “Most important” probability distribution

• Many rv’s are approximately normally

• Even when they aren’t, their sums and

! P(Z " z) = % f (y;0,1) dy

• If X has a normal distribution with mean µ and

• This implies that until data is collected, any function (statistic)

• Thus, any statistic, because it is a random variable, has a

• Let’s focus on the sampling distribution of the mean, X

First, calculate mean and sd to normalize (120 and 15 / 50 )

$ 115 #120 125 #120 '

 Poisson: dpois and ppois

 Binomial: dbinom and pbinom

 Hypergeometric: dhyper and phyper

• Exploring relationships among distributions

You might also like