0% found this document useful (0 votes)
80 views

Matlab Assignment

1) The document reviews concepts of probability and random variables, including discrete and continuous random variables. 2) When modeling experiments with outcomes that can be measured to high accuracy, continuous random variables may be better than discrete as they do not require changing the model with increased accuracy of measurements. 3) Common continuous distributions like the uniform distribution are used when all outcomes are equally likely, while the exponential distribution applies when events occur continuously at a constant average rate.

Uploaded by

Taral jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

Matlab Assignment

1) The document reviews concepts of probability and random variables, including discrete and continuous random variables. 2) When modeling experiments with outcomes that can be measured to high accuracy, continuous random variables may be better than discrete as they do not require changing the model with increased accuracy of measurements. 3) Common continuous distributions like the uniform distribution are used when all outcomes are equally likely, while the exponential distribution applies when events occur continuously at a constant average rate.

Uploaded by

Taral jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Probability and Statistics - Section B

A Review of Some Concepts, and Matlab Assignment

March 19, 2020


Review

We started off by looking at a probability model to describe a


given experiment in terms of the following:

A sample space S, consisting of observations


A collection of admissible subsets of S, which we call events
A probability measure P which assigns a probability to each
event of interest

Given such a model, we next looked at random variables, which we


defined as “measurable” functions from S → R.

Can all functions X : S → R be random variables? Please think


about this for a few seconds before moving to the next slide.
The Answer is No
A function X : S → R can only be a random variable if sets of the
form X −1 (−∞, x]) are events which have well-defined probabilities
assigned to them. This is what it means to be “measurable”.

But we will not go into details because it is a deep subject, about


which you will learn in higher classes. It is possible to come up
with examples of functions which are not random variables, but
this is too complicated to do right now. If you’re interested in
answering this question, you could try reading the book Probability
through Problems, by Capinski and Zastawniak.
A random variable X : S → R assigns a number to each outcome.
The set of all these numbers is called SX . It is the range of the
function X . This gives us a new probability model for the same
experiment.

New Probability Model


We can now think of SX as the new sample space and subsets of
SX as events. A subset B ⊂ SX is an event in the new model
whenever X −1 (B) is an event in the old model. The probability
measure in the new model is called PX . The probability
measure in the old model is P.

If B ⊂ SX then
PX (B) = P(X −1 (B)).
Which Model is Better? New model or Old model?
If we know PX
then we can answer most (if not all) questions that interest us. So
unless the question is very, very complicated, we can focus on the
new model and put the old one on the shelf.

PX is a Set Function
PX acts on sets of numbers. It gives us the information that we
are looking for if we can formulate the question in the language of
sets. We think of a point as a singleton set - a set consisting of
exactly one element.

What kinds of set of numbers do we know? Finite sets? Countable


sets? Uncountable sets? Do you think it’s useful to think of
uncountable sets in terms of intervals? Or in terms of geometric
curves/shapes/regions/solids?
Discrete RVs
When SX is a countable or finite set, we call PX the Probability
Mass Function of X . In this situation points in SX can be thought
of as point masses.

Non-discrete RVs
When SX is not countable it’s no longer useful to think of the
points in SX as being particulate. In this case two situations arise:
Continuous RVs: No points of SX have any mass, in other
words
PX (x) = 0, ∀x ∈ SX .
Mixed RVs: Some points in SX have mass and the remaining
points don’t have any mass. This situation will be discussed
later.
Discrete Random Variables

X is called a discrete RV when SX is finite or countable.


PX on Singletons
We know PX completely, once we know how PX acts on single
element sets.
This is because every subset of SX is a disjoint union of singleton
sets. So if B ⊂ SX , then all we have to do to find PX (B) is add up
the point masses of all the elements inside B.
The PMF
Since we’re only interested in the action of PX on singletons, we
emphasize this by identifying singleton sets with points. So we say

PX (x) = PX ({x}), ∀x ∈ SX .
When B is a finite subset of SX summing up the PMF over values
of B is straightforward. When B is countably finite, we use the
following technique:

We can sum up the PMF over B because we know how to add the
terms of a convergent series.

Examples of such series may be reviewed. Six typical families of


RVs have been discussed in class, out of which three were
described using infinite sequences and series.
Non-discrete Random Variables
When SX is uncountable, then adding values of PX at points no
longer makes sense, because uncountable numbers cannot be
added using a mathematical series. In this situation we look at
intervals instead.
PX on Intervals
We know PX completely, once we know how PX acts on intervals.

This is because every event of interest (also known as a


“measurable” set) can be obtained from intervals. How we can
actually do this is outside the scope of this class. Interested
students may refer to the Capinski-Zastawniak text mentioned
earlier.
The action of PX on intervals is described by the Cumulative
Distribution Function.
The CDF

FX (x) = PX ((−∞, x]).


We also express this relationship as FX (x) = P(X ≤ x).

Since the CDF tells us how PX acts on intervals, we know


everything we need to know in order to calculate probabilities of
events of interest, once we know the CDF.

When the CDF if continuous we say that X is a continuous


random variable.
Continuous Random Variables

A continuous random variable X is said to be absolutely


continuous if there exists a non-negative function fX (x) such that
Z x
FX (x) = fX (x) dx .
−∞

All continuous random variables that interest us can be safely


assumed to be absolutely continuous.
The PDF of a Continuous RV
The function fX is called the probability density function of X .

PDF Defined up to Measure Zero


The pdf is not unique. Two functions are considered to be the
same pdf if they differ on a set of measure zero.
Geometric Interpretation of an Integral

Intuitively, a set of measure zero is a set which has zero length - for
example a finite or countable set of points. All conditions imposed
on a pdf need only be satisfied outside of a set of measure zero.
The idea is that a set of measure zero does not contribute to
the integral. A point has zero length, so the slice above the point
has zero area.
Paradigm Shift
The definite integral of a function is the area under the curve of
the function.
In high school we’re used to think of integrals as antiderivatives.
However if we think of definite integrals as areas it not only helps
intuitively, but it also helps approximate the integral when no
antiderivative is available.
The relationship between the CDF and the PDF of a continuous
random variable X is given by
dFX
fX (x) = .
dx
As the pdf is only defined upto a set of measure zero, FX is only
required to be differentiable at all points except on a set of
measure zero. This condition is usually met quite easily - reasons
behind it will be explained in higher classes.
Caution
Geometrically, the PDF is the slope of the CDF. The slope
measures the rate at which the probability distribution
changes and should not be confused with the actual probability at
any given point!
Practical Applications
We’ve already discussed examples of discrete random variables at
length, and as far as I know, this concept is clear to most of you. I
did, however, get the impression that the same is not true for
continuous RVs. What follows is an attempt to answer the
following-
Question
What practical situations are best modelled using the families of
continuous random variables that we’ve seen?

The First Decision


How do we decide whether the set of outcomes of our experiment
is better approximated by a model which has countable outcomes
or uncountable outcomes? Which is better, a discrete RV or a
continuous RV, for a given situation?
The Choice: Discrete or Continuous?
I thought the following might be important.
Factors to Consider
Accuracy
Adaptability to Changing Conditions.

Suppose my experiment consists of measuring the length of a


randomly chosen object. I can model this as a discrete random
variable if I am using an instrument which can only measure length
in centimetres. But if tomorrow somebody presents me with an
instrument which can measure length in nanometres, I would have
to change my model drastically. My PMF would have to be
completely redefined.
However if I use a continuous random variable from the beginning,
I will not have to change my model to incorporate the increased
accuracy. Can you think of other factors?
Next Choice: Which Family of Continuous RVs?

Uniform Distribution
This is typically used to model a situation in which all outcomes
are equally likely. Examples are:
A meteor strikes the earth. The location where it lands is
observed.
An office worker glances at his watch. He observes the minute
hand.
Questions: In the second example, if the worker observes the hour
hand, would all outcomes still be (approximately) equally likely?
Should separate models be used for digital and analog watches?
Question: Consider an experiment in which an arrow is shot at a
target and the position where it lands is observed. Under what
conditions would we model this using a uniform distribution?
A Thought Experiment
Consider the following thought experiment. The amount of time
elapsed until the Riemann hypothesis is proved, is measured, from
the point when it was first conjectured in 1859. Is the probability
that the problem remains open for a total of 200 years less or more
than the probability that it will remain open for 200 years starting
now? Are the probabilities equal? Mathematically, we can express
this question in the following manner.
Let X be the amount of time elapsed until the Riemann hypothesis
is proved, starting from 1859. The current year is 2020. So
approximately 161 years have elapsed so far. Is

P(X > 361 | X > 161) = P(X > 200)?

To tell you the truth, I have no idea what the answer to this
question is. But if the above probabilities were equal then we
would say that X is memoryless.
Exponential Distribution

Whether or not an observed quantity is actually memoryless or not,


may be a non-trivial question to answer. However, it is an
assumption which is often very useful in modelling real life
situations. Such random variables are described using the
exponential distribution. The example of the length of a phone call
was mentioned in class. Questions that may help:
Do I lose anything by assuming that the observed quantity is
memoryless?
If I were to discretize the observation, that is if I round off to
the greatest integer value of the observation, then can
successive integer values be thought of as the number of
independent trials of an experiment as modelled by a
Geometric random variable?
Normal Distribution

There are experiments in which it is completely obvious that


neither the uniform distribution nor the exponential distribution are
of help to us. For instance, let the life-span of a randomly chosen
micro-organism be measured.
If the species of the organism was fixed, then we could perhaps
consider the exponential distribution. But if we consider all
possible species in our experiment then the exponential distribution
will not be appropriate.
This is because every additional species adds randomness to the
data. The increase in uncertainty leads us to choose the normal
distribution. We will come back to this question when we study
the Central Limit Theorem.
The Rest of the Iceberg

There are a lot more distributions than the three just mentioned.
Students are encouraged to find out for themselves how to choose
between them, as further investigation is presently beyond our
scope.
Most of the practice problems in the text are based on pre-defined
models. It’s a great idea to solve all of them, to be well-acquainted
with the formulas and techniques and how to use them. A few
examples of how to do this have been presented in class. At the
same time it is important not to forget the bigger picture - the
question of how these models were arrived at.
Matlab Assignment - 10% of your grade, due 15th
April

Experiment 1
Observe the time gaps between your next 30 WhatsApp
messages. (You may replace WhatsApp with any instant
messaging app of your choice.)
Plot a histogram of your data using Matlab.
Fit a density function on to your histogram, by using the
appropriate Matlab tools.
Based on your distribution, find the probability that the time
elapsed until your next message is less than the expected time
gap.
Assignment - continued.

Experiment 2
Repeat Experiment 1, but this time, record the time gaps between
messages from one person - a person you communicate with
sufficiently often.

Do the following for both experiments:


Observe the time gaps between the next 10 messages you
receive. If a time gap is less than the expected time gap, mark
the observation as a Heads, otherwise as Tails.
Record the number of Heads and Tails.
Upload your code as a text file into Google Classroom. Make sure
the name of your text file includes your name and roll
number, otherwise your assignment will not be graded. Type
up the rest of your answer separately and upload that as well.

You might also like