0% found this document useful (0 votes)
133 views17 pages

Simulating Continuous and Non-Continuous Distributions

The document appears to be notes from a statistics simulation lab. It includes: 1) Simulations of random variables from common distributions like uniform, standard normal, and exponential using SciPy. 2) Explanations of how to read histograms from simulations and relate bin widths to probabilities. 3) Development of the general method for simulating random variables by mapping uniformly distributed random numbers to the desired distribution based on its CDF. 4) Examples applying this method to specific distributions and empirical verification that the simulations match the theoretical distributions.

Uploaded by

Cameron Mandley
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
133 views17 pages

Simulating Continuous and Non-Continuous Distributions

The document appears to be notes from a statistics simulation lab. It includes: 1) Simulations of random variables from common distributions like uniform, standard normal, and exponential using SciPy. 2) Explanations of how to read histograms from simulations and relate bin widths to probabilities. 3) Development of the general method for simulating random variables by mapping uniformly distributed random numbers to the desired distribution based on its CDF. 4) Examples applying this method to specific distributions and empirical verification that the simulations match the theoretical distributions.

Uploaded by

Cameron Mandley
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Notebook

October 19, 2018

Local date & time is : 10/19/2018 14:52:15 PDT

1
2
1 newpage
1.1 Part 1: Simulation in SciPy
The stats module of SciPy is familiar to you by now. For any of the well known distributions, you
can use stats to simulate values of a random variable with that distribution. The general call is
stats.distribution_name.rvs(size = n) where rvs stands for "random variates" and n is the number
of independent replications you want.
Every statistical system has conventions for how to specify the parameters of a distribution. In this lab
we will tell you the specifications for a few distributions. Later you will be able to see a general pattern in
the specifications.

In [4]: sim_uniform = stats.uniform.rvs(0, 1, size=100000)


sim_uniform_tbl = Table().with_column(
'Simulated Uniform (0, 1)', sim_uniform
)
sim_uniform_tbl.hist(bins=25)

1.1.1 1b) Reading the Scales of the Histogram


The unit on the horizontal axis is any unit of length; you can think of it as centimeters if you want, but we
will just refer to it as the "unit". Fill in the blanks below and provide units where appropriate. Some units
have been provided for you.

(i) The width of each bin is1/25units.


(ii) The height of each bar is approximately100percentperunit.

3
(iii) Histograms represent percents byarea, so the answers to (i) and (ii) imply that the percent of simulated
values in each bin is approximately4%.
(iv) Let the random variableUhave theUni f orm(0, 1)distribution, and letBbe any bin of the histogram.
The answer to (i) implies thatP(U ∈ B) = 4%.

(v) If instead of bins=25 we had used bins=20 as the option to hist, then the answer to (iv) would have
been5%

i) 1/25, ii) 100 percent, unit, iii) area, 4, iv) 4, v) 5 Your answer here

In [5]: sim_std_norm = stats.norm.rvs(0, 1, size=100000)


sim_std_norm_tbl = Table().with_column(
'Simulated Standard Normal', sim_std_norm
)
sim_std_norm_tbl.hist(bins = 25)
plt.xticks(np.arange(-4, 4.1))
plt.title('Histogram 1');

In [6]: sim_norm = stats.norm.rvs(10, 5, size=100000)


sim_norm_tbl = Table().with_column(
'Simulated Normal (mu 10, sigma 5)', sim_norm
)
sim_norm_tbl.hist(bins = 25)
plt.title('Histogram 2');

4
Compare the numbers on the horizontal axes of the two histograms, and fill in the blank.
The value−1on the horizontal axis of Histogram 1 is the same as the value5on the horizontal axis of
Histogram 2 expressed instandardunits.

In [8]: np.average(sim_expon)

Out[8]: 2.0095740428535898

5
6
2 newpage
2.1 Part 2. The Idea
How are all these random numbers generated? In the rest of the lab we will develop the method that
underlies all the simulations above, by considering examples of increasing complexity.
Our starting point is a distribution on just four values.
SupposeXhas the distribution dist_X below.

i. 0.3
ii. .1

iii. .2
iv. .4

There should be a function that takes in an object representing the dist of U and randomly generate a
number based on the probabilities associated withe the various values of U.

7
8
3 newpage
3.1 Part 3. Visualizing the Idea
The method plot_discrete_cdf takes a distribution as its argument and plots a graph of the cdf.
Run the cell below to get a graph of the cdf of the random variableXin Part 1.
The definition ofFX (2.57)is the sum of probabilities associated to values of X below and including 2.57
The points with a jump are the values that X can take on, namely -2, 1, 4, and 7. The size of the jumps
are 0.3, 0.1, 0.2, and 0.4, respectively

3.1.1 3c) From the Unit Interval to Values ofX


The function unit_interval_to_discrete takes a distribution as its argument and displays an animation
of a method that takes a number on the unit interval and returns one value of a random variable that has
the given distribution.
Run the cell below. Move the slider around and see how the returned value changes depending on
the starting value in the unit interval. How is the method that is being displayed related to the one you
proposed in Part 1? the slider takes on various values of cdfs and return a value of X with that cdf

In [13]: plot_discrete_cdf(dist_X, stats.uniform.rvs(0, 1))

this does exactly what the method I proposed in Part 1 does. It randomly generates values of X w/
probabilities associated w/those values.

9
10
4 newpage
4.1 Part 4. Extension to Continuous Distributions
Now suppose you want to generate a random variable that has a specified continuous distribution. Let’s
start with the exponential(λ)distribution.

In [14]: # don't use "lambda" as that means something else in Python


lamb = 0.5

def expon_mean2_cdf(x):
if x < 0:
return 0
return 1 - np.exp(-1*lamb*x)

The method should involve randomly generating a probability between 0 and 1 and return the value of
x in the graph above with the cdf associated with that probability.

In [17]: plot_continuous_cdf((-1, 8), expon_mean2_cdf, stats.uniform.rvs(0, 1))

In [18]: plot_continuous_cdf((-3, 3), stats.norm.cdf, stats.uniform.rvs(0, 1))

11
12
5 newpage
5.1 Part 5. Empirical Verification that the Method Works
5.1.1 a) The Initial Values
Create a table that is called sim for simulation and consists of one column called Uniform that contains the
values of 100,000 i.i.d.Uni f orm(0, 1)random variables.

In [19]: N = 100000
u = stats.uniform.rvs(0, 1, 100000)
sim = Table().with_column("Uniform", u)
sim

Out[19]: Uniform
0.652742
0.626008
... Omitting 5 lines ...
0.986858
0.315918
... (99990 rows omitted)

In [20]: def uniform_to_exponential_mean2(u):


return (-1/0.5)*np.log(-1*(u-1))

exponential_mean2 = sim.apply(uniform_to_exponential_mean2, 'Uniform')


sim = sim.with_column('Sim. Exponential (rate 0.5)', exponential_mean2)
sim

Out[20]: Uniform | Sim. Exponential (rate 0.5)


0.652742 | 2.11537
0.626008 | 1.96704
... Omitting 5 lines ...
0.986858 | 8.6639
0.315918 | 0.759355
... (99990 rows omitted)

In [21]: sim.hist('Sim. Exponential (rate 0.5)', bins=25)


plt.xticks(np.arange(0, 21, 2));

13
In [22]: def norm_ppf(k):
"""Optional helper function."""
return stats.norm.ppf(k)

standard_normal = sim.apply(norm_ppf, 'Uniform')


sim = sim.with_column('Sim. Standard Normal', standard_normal)
sim

Out[22]: Uniform | Sim. Exponential (rate 0.5) | Sim. Standard Normal


0.652742 | 2.11537 | 0.392734
0.626008 | 1.96704 | 0.3213
... Omitting 5 lines ...
0.986858 | 8.6639 | 2.22199
0.315918 | 0.759355 | -0.479144
... (99990 rows omitted)

In [23]: sim.hist('Sim. Standard Normal', bins=25)

14
In [24]: z = sim.column('Sim. Standard Normal')
sim = sim.with_column('Sim. Normal (Mu=10, Sigma=5)', stats.norm.rvs(10, 5, size=100000))
sim

Out[24]: Uniform | Sim. Exponential (rate 0.5) | Sim. Standard Normal | Sim. Normal (Mu=10, Sigma=5)
0.652742 | 2.11537 | 0.392734 | 13.5822
0.626008 | 1.96704 | 0.3213 | 4.17318
... Omitting 5 lines ...
0.986858 | 8.6639 | 2.22199 | 18.1448
0.315918 | 0.759355 | -0.479144 | 8.05223
... (99990 rows omitted)

In [25]: sim.hist('Sim. Normal (Mu=10, Sigma=5)', bins=25)

15
16
6 newpage
6.1 Part 6. Radial Distance
You can apply the general method developed above to simulate values of any continuous random variable.
Here is an example.
Consider a point( X, Y )picked uniformly on the unit disc{( x, y) : x2 + y2 ≤ 1}. That’s the disc with
radius 1 centered at the origin (0, 0).
LetRbe the distance between the point( X, Y )and the center(0, 0).
The point( X, Y )is random, so the radial distanceRis random as well and has a density.

6.1.1 6a) Visualization


Run the cell below. The figure on the left shows simulated i.i.d. copies of the point. On the right you have
the empirical histogram of the simulated distances. Move the slider to increase the number of simulations.

In [ ]:

In [35]: valuesx = stats.uniform.rvs(0, 1, 100000)


valuesy = (1 - valuesx**2)**1/2
sim = sim.with_column('Sim. Radial Distance', valuesx**2 + valuesy**2)

sim

Out[35]: Uniform | Sim. Exponential (rate 0.5) | Sim. Standard Normal | Sim. Normal (Mu=10, Sigma=5) |
0.652742 | 2.11537 | 0.392734 | 13.5822 |
0.626008 | 1.96704 | 0.3213 | 4.17318 |
... Omitting 5 lines ...
0.986858 | 8.6639 | 2.22199 | 18.1448 |
0.315918 | 0.759355 | -0.479144 | 8.05223 |
... (99990 rows omitted)

17

You might also like