06 RandomVariableMath PDF
06 RandomVariableMath PDF
2024-02-12
A random variable, Y gets used as a model for a random outcome from some single situation. But often we care about
some alteration of that random variable.
For example, X could be the number of three pointers an NBA player makes in a game, but maybe what we care about
is the number of points they get from threes in a game which would be Y = 3X. which is a function or transformation
of the original random variable Y = f (X).
Or suppose you want to know how long it takes you to drive 100 miles away from Tandon. We make T the random
time in hours it takes for a single trip.
Now we pick a model for this situation, so let’s choose
2
t3
t>1
f (t) =
0
else
Using this model we can determine the probability of having to travel certain amounts of time or our “expected” travel
time.
CDF:
t
2 1 1
Z t
F (T ) = dt = − 2 =1−
1 x3 x 1 t2
1
F (T ) = 0.9 → 1 − = 0.9 → t2 = 10 → t = 3.1 hrs
t2
Average trip time
∞
2 2
Z ∞
E(T ) = dt = − = 0 − (−2) = 2 hrs
1 t2 t 1
Finding a Density Function (NOT TESTED)
2.0
1.5
Density
1.0
0.5
E(T)
90th
0.0
1 2 3 4 5
Time (hrs)
However, what if you are pretty confident in the probability model for your travel time (maybe through repeated trips)
but you actually care about the average velocity you travel during this trip? (recall distance = velocity × time)
100
V=
T
This says that average velocity is a transformation of the time random variable. What can we learn about this?
For example, since E(T) = 2 does that mean E(V) = 100/2 = 50? (The answer is NO).
Well since V is a random variable that means it has a pdf. Let’s try and find it
100
V=
T
What would f (v) be? One method we can use is to try and find the CDF of V (THIS IS NOT TESTED)
100 100 1 v2
F (v) = 1 − P T < = 1 − FT =1− 1− =
v v (100/v)2 10000
Now using the fact that the pdf is the derivative of the cdf we get:
dFv v
f (v) = = → 0 < v ≤ 100
dv 5000
And finally let’s find the expected average velocity:
100
1003
Z 100 2
v v3
E(V) = dv = = = 66.67 mph
0 5000 15000 0 15000
The Difficulty
0.020
E(V)
0.015
Density
0.010
0.005
0.000
0 25 50 75 100
Velocity (hrs)
The Difficulty
The main take away here is that this process is not fun and you should be happy you dont have to learn it.
Things get worse as the pdfs get worse and there are other techniques for getting these pdfs that exist but you also
dont have to worry about that.
The question now is, “can we learn anything without finding the full density function?”
A Partial Solution
The answer is yes, we can learn the expectation of V using the pdf of T only.
Which means, just integrate (or sum) the original pdf multiplied by the function you want to get ONLY THE
AVERAGE of W
Example
Or more loosely:
This rule says rather than go through all that cdf non-sense, AS LONG AS WE ONLY CARE ABOUT THE
AVERAGE then we can just work with the pdf we have.
In the travel time example
100
V = g(T) =
T
And so we can calculate the average velocity easily as:
∞
100 200 200 200
Z ∞
E(V) = ET = dt = − = = 66.67 mph
T 1 t4 3t3 1 3
Example
The amount of sadness you will experience this semester is given by the random variable S with pdf
8
s3
2≤s≤∞
f (s) =
0
elsewhere
Where 1 sadness unit is equivalent to eating 4 slices of pineapple pizza off a subway floor.
8 8
Z ∞ Z ∞ ∞
−8
E(S) = s ds = ds = = (0 − −4) = 4
2 s3 2 s 2 s 2
2. If I’m having a good year and promise to half the sadness of all who takes my class this semester, how much
sadness can you expect for a student in my class?
s 4
Z ∞ ∞
S −4
E = ds = =2
2 2 2 s3 s 2
Notice:
S 1 4
E = 2 = E(S) = = 2
2 2 2
Example 2
3. If my year has been crappy and I vow to square the sadness of all who takes my class this semester, how much
sadness can you expect for a student in my class?
8
Z ∞
E(S ) =
2
s 2
ds = [8 ln(s)]∞
2 =∞
2 s3
Notice:
E(S2 ) = ∞ =
̸ E(S)2 = 42 = 16
There will be rules for these.
Example 2
For any Random variable Y with pdf f (y) and mean E(Y) = µY determine
E(Y − µY )
Which is the average deviation (or ± distance from the random variable to the mean)
Z ∞ Z ∞ Z ∞ Z ∞ Z ∞
(y − µY )f (y) dy = yf (y) dy − µY f (y) dy = yf (y) dy − µY f (y) dy
−∞ −∞ −∞ −∞ −∞
Z ∞ Z ∞
E(Y − µY ) = yf (y) dy − µY f (y) dy = E(Y) − µY (1) = µY − µY = 0
−∞ −∞
Note: The mean µ or E(Y) is a constant and so can be treated as any other number.
The Variance
An important use of the previous rule is to determine the variance of a random variable.
The variance and standard deviation are measures of how compact the probability distribution is. Larger variance
means the probability is spread out over the support.
Multiple Random Variables
A transformation of a random variable takes Y and turns it into something else g(Y) and is primarily used in this
class to get the variance formula.
More commonly, we need to take two or more random variables Y1 , Y2 and combine them in some way. Usually by
addition, Y1 + Y2 or subtraction Y1 − Y2 .
This also creates a new problem that leads to a “sort of” solution that works for this class and many other applications
of probability.
Joint Distributions
The probability functions we have been looking at (pmfs or pdfs) tell you “how the random variable moves”, or “where
the outcomes should be around”.
Once you have 2 or more random variables, Y1 , Y2 , . . . , Yk then you need a function that tells you how all of these
are moving together.
The Difficulty
A Partial Solution
As was hinted at, there is a simplification that makes this problem go away
Translation, as long as we assume that we are dealing with independent random variables then all the multivariate
calculus stuff can be broken into single variable stuff. This is what we will do.
fY1 ,Y2 (Y1 , Y2 ) Is the joint distribution from before and the usual functions fY1 (Y1 ) and fY2 (Y2 ) are called
Marginal Probability Functions
The ideal way of dealing with random variables is to have the pdf, pmf or joint distribution and then you can figure
out any thing you want.
So if we have the random variable T = Y1 + Y2 + Y3 + Y4 + Y5 and I know the joint distribution of T then I can
calculate probabilities, means, variances etc.
However this joint distribution is terrible to figure out, and you may need calculus 3 to do anything with it.
In general joint distributions of combined random variables will be off limits just like the distributions of functions of
random variables were off limits.
However expectations and variances will be easy to figure out.
1. T = a for a constant.
• E(T ) = E(a) = a
• V (T ) = V (a) = 0
2. T = aY for a constant and Y a random variable.
• E(T ) = E(aY ) = aE(Y )
• V (T ) = V (aY ) = a2 V (Y )
3. T = X + Y for X and Y independent.
• E(T ) = E(X + Y ) = E(X) + E(Y )
• V (T ) = V (X + Y ) = V (X) + V (Y )
4. T = aX + bY + c for X and Y independent random variables, and a, b, c constants.
• E(T ) = E(aX + bY + c) = aE(X) + bE(Y ) + c
• V (T ) = V (aX + bY + c) = a2 V (X) + b2 V (Y )
Sums Summed Up
Sums Summed Up
All rolled up into one rule and extended for n random variables gives you
n n
E(T ) = E(a0 + ai Yi ) = a0 + ai E(Yi )
X X
i=1 i=1
n n
V (T ) = V (a0 + ai Yi ) = a2i V (Yi )
X X
i=1 i=1
σ 2 = V (Y) = E(Y2 ) − µ2
Where
P
y y fY (y)
2 Y Discrete
E(Y ) =
2
R ∞ y 2 f (y) dy
−∞ Y Y Continuous
Examples
Example 1
RV Expectation Variance
W 13 9
X -4 25
Y 2 144
Z 0 49
Example 2
Z 1 h i1
E(W) = w(2 − 3w2 ) dw = w2 − 0.75w4 = 0.25
0 0
Z 10 h i10
E(Y) = y(0.05) dy = 0.025y 2 =0
−10 −10
2 3 3 5 1
Z 1 1
E(W2 ) = w2 (2 − 3w2 ) dw = w − w =
0 3 5 0 15
0.05 3 50 50 100
Z 10 10
E(Y2 ) = y 2 (0.05) dy = y = + = = 33.3
−10 3 −10 3 3 3
E(R2 ) = 02 (0.2) + 22 (0.2) + 102 (0.2) + 202 (0.2) + 1002 (0.2) = 2100.8
1
V (W) = − (0.25)2 = 0.004
15
Warnings (Important)
1. When dealing with joint distributions, an important property is the Covariance (or similarly the Correlation)
of the random variables. This is basically a measure that tells you how “in sync” the two random variables
are, or how much information about one of the random variables is contained in the other random variable.
Essentially all you need to know about this is that you can ignore it for almost all of the class.
2. DONT DO THIS A big mistake people make is to do something where they incorrectly combine the rules
such as
W RON G (BU T W ORKS) → E(Y1 + Y2 + Y3 + Y4 + Y5 ) = E(5Y ) = 5E(Y )
W RON G (DON T DO) → V (Y1 + Y2 + Y3 + Y4 + Y5 ) = V (5Y ) = 25V (Y )
Confusion seems to happen here because this does actually work for expectation, mathematically, but way
inflates that variance. The reason is that doing something 5 times and adding the results up is not the same as
doing something once and multiplying the answer by 5.
3. Notice that there are NO RULES FOR STANDARD DEVIATION. so you always have to work with
variances first before you get the standard deviation.
Main Points
•
n n
!
E (T ) = E a0 + = a0 + ai E (Yi )
X X
ai Yi
i=1 i=1
•
n n
!
V (T ) = V a0 + = a2i V (Yi )
X X
ai Yi
i=1 i=1
V (Y ) = E(Y 2 ) − µ2
3. You have to determine the variance of sums before you can get the standard deviation.