Simulating Continuous and Non-Continuous Distributions
Simulating Continuous and Non-Continuous Distributions
1
2
1 newpage
1.1 Part 1: Simulation in SciPy
The stats module of SciPy is familiar to you by now. For any of the well known distributions, you
can use stats to simulate values of a random variable with that distribution. The general call is
stats.distribution_name.rvs(size = n) where rvs stands for "random variates" and n is the number
of independent replications you want.
Every statistical system has conventions for how to specify the parameters of a distribution. In this lab
we will tell you the specifications for a few distributions. Later you will be able to see a general pattern in
the specifications.
3
(iii) Histograms represent percents byarea, so the answers to (i) and (ii) imply that the percent of simulated
values in each bin is approximately4%.
(iv) Let the random variableUhave theUni f orm(0, 1)distribution, and letBbe any bin of the histogram.
The answer to (i) implies thatP(U ∈ B) = 4%.
(v) If instead of bins=25 we had used bins=20 as the option to hist, then the answer to (iv) would have
been5%
i) 1/25, ii) 100 percent, unit, iii) area, 4, iv) 4, v) 5 Your answer here
4
Compare the numbers on the horizontal axes of the two histograms, and fill in the blank.
The value−1on the horizontal axis of Histogram 1 is the same as the value5on the horizontal axis of
Histogram 2 expressed instandardunits.
In [8]: np.average(sim_expon)
Out[8]: 2.0095740428535898
5
6
2 newpage
2.1 Part 2. The Idea
How are all these random numbers generated? In the rest of the lab we will develop the method that
underlies all the simulations above, by considering examples of increasing complexity.
Our starting point is a distribution on just four values.
SupposeXhas the distribution dist_X below.
i. 0.3
ii. .1
iii. .2
iv. .4
There should be a function that takes in an object representing the dist of U and randomly generate a
number based on the probabilities associated withe the various values of U.
7
8
3 newpage
3.1 Part 3. Visualizing the Idea
The method plot_discrete_cdf takes a distribution as its argument and plots a graph of the cdf.
Run the cell below to get a graph of the cdf of the random variableXin Part 1.
The definition ofFX (2.57)is the sum of probabilities associated to values of X below and including 2.57
The points with a jump are the values that X can take on, namely -2, 1, 4, and 7. The size of the jumps
are 0.3, 0.1, 0.2, and 0.4, respectively
this does exactly what the method I proposed in Part 1 does. It randomly generates values of X w/
probabilities associated w/those values.
9
10
4 newpage
4.1 Part 4. Extension to Continuous Distributions
Now suppose you want to generate a random variable that has a specified continuous distribution. Let’s
start with the exponential(λ)distribution.
def expon_mean2_cdf(x):
if x < 0:
return 0
return 1 - np.exp(-1*lamb*x)
The method should involve randomly generating a probability between 0 and 1 and return the value of
x in the graph above with the cdf associated with that probability.
11
12
5 newpage
5.1 Part 5. Empirical Verification that the Method Works
5.1.1 a) The Initial Values
Create a table that is called sim for simulation and consists of one column called Uniform that contains the
values of 100,000 i.i.d.Uni f orm(0, 1)random variables.
In [19]: N = 100000
u = stats.uniform.rvs(0, 1, 100000)
sim = Table().with_column("Uniform", u)
sim
Out[19]: Uniform
0.652742
0.626008
... Omitting 5 lines ...
0.986858
0.315918
... (99990 rows omitted)
13
In [22]: def norm_ppf(k):
"""Optional helper function."""
return stats.norm.ppf(k)
14
In [24]: z = sim.column('Sim. Standard Normal')
sim = sim.with_column('Sim. Normal (Mu=10, Sigma=5)', stats.norm.rvs(10, 5, size=100000))
sim
Out[24]: Uniform | Sim. Exponential (rate 0.5) | Sim. Standard Normal | Sim. Normal (Mu=10, Sigma=5)
0.652742 | 2.11537 | 0.392734 | 13.5822
0.626008 | 1.96704 | 0.3213 | 4.17318
... Omitting 5 lines ...
0.986858 | 8.6639 | 2.22199 | 18.1448
0.315918 | 0.759355 | -0.479144 | 8.05223
... (99990 rows omitted)
15
16
6 newpage
6.1 Part 6. Radial Distance
You can apply the general method developed above to simulate values of any continuous random variable.
Here is an example.
Consider a point( X, Y )picked uniformly on the unit disc{( x, y) : x2 + y2 ≤ 1}. That’s the disc with
radius 1 centered at the origin (0, 0).
LetRbe the distance between the point( X, Y )and the center(0, 0).
The point( X, Y )is random, so the radial distanceRis random as well and has a density.
In [ ]:
sim
Out[35]: Uniform | Sim. Exponential (rate 0.5) | Sim. Standard Normal | Sim. Normal (Mu=10, Sigma=5) |
0.652742 | 2.11537 | 0.392734 | 13.5822 |
0.626008 | 1.96704 | 0.3213 | 4.17318 |
... Omitting 5 lines ...
0.986858 | 8.6639 | 2.22199 | 18.1448 |
0.315918 | 0.759355 | -0.479144 | 8.05223 |
... (99990 rows omitted)
17