0% found this document useful (0 votes)
28 views

Computational Science For Engineers - Unit-IV - DataDrivenModels - Simulations, Random Numbers and Random Walk

Simulations are computer programs that model real-world processes. They allow experimenting with scenarios that may be too difficult, costly, or dangerous to perform in reality. Simulations involve random numbers to introduce probabilistic elements. Common uses of simulations include modeling things like nuclear reactions, molecular interactions, weather patterns, and more. Key aspects of simulations discussed in the document include how random numbers are generated, measuring the quality of simulation results, and techniques for restricting random values to specific ranges.

Uploaded by

rt.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Computational Science For Engineers - Unit-IV - DataDrivenModels - Simulations, Random Numbers and Random Walk

Simulations are computer programs that model real-world processes. They allow experimenting with scenarios that may be too difficult, costly, or dangerous to perform in reality. Simulations involve random numbers to introduce probabilistic elements. Common uses of simulations include modeling things like nuclear reactions, molecular interactions, weather patterns, and more. Key aspects of simulations discussed in the document include how random numbers are generated, measuring the quality of simulation results, and techniques for restricting random values to specific ranges.

Uploaded by

rt.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Data Driven Models

Functions – Empirical Models


Simulating with Randomness
Simulations
Random Numbers from various Distributions
Random Walk
Data Driven Models
Simulations
Random Numbers from various Distributions
Introduction – Simulations
 Modelling is the application of methods to analyze complex, real-world problems in order to make predictions about
what might happen with various actions.
 When it is too difficult, time-consuming, costly, or dangerous to perform experiments, the modeler might resort to
computer simulation, or having a computer program imitate reality, in order to study situations and make decisions.
 By simulating a process, he or she can consider various scenarios and test the effect of each.
 For example,
 Scientist might simulate the effects of ozone depletion on global warming.
 Scientists at Los Alamos National Laboratory used simulations to predict the behaviour of nuclear reactions
before physically testing a nuclear bomb during World War II (LANL 2012).
 Lawrence Livermore National Laboratory scientists have used molecular dynamics simulations to study the
total energy and other quantities associated with molecules as they interact with one another.
 At the same laboratory, they have studied the greenhouse effect, making predictions based on levels of various
pollutants (LLNL 2012).
 Before the Gulf War, military experts simulated a number of scenarios to test preparedness.
 The National Oceanographic and Atmospheric Administration performs simulations to predict the path and
intensities of hurricanes (NOAA 2012).
 The Boeing Company designed the Boeing 777 airplane completely using computer-aided design and tested the
designs using computer simulations before construction began.
 Also, flight simulators allow pilots to practice emergency situations under safe conditions (Boeing 2012).
Introduction – Simulations
Simulations are preferred when:
It is not feasible to do the actual experiment, as in the study of the greenhouse effect.
The cost in money, time, or danger of the actual experiment is prohibitive, as with the study of nuclear
reactions.
The system does not exist yet, as in the development of an airplane.
We want to test various alternatives, as with hurricane predictions.

Disadvantages of Computer Simulations:


The simulation may be expensive in time or money to develop.
Because it is impossible to test every alternative, we can provide good solutions but not the best solution.
The results may be difficult to verify because often we do not have real-world data.
We cannot be sure we understand what the simulation actually does.
When a simulation is probabilistic, involving an element of chance, we should be careful of our
conclusions.
Simulations – The Element of Chance
At the core of most simulations is random number generation. The
computer generates a sequence of numbers, called random numbers, or
pseudo random numbers.
An algorithm actually produces the numbers, so they are not really random,
but they appear to be random. Because of the element of chance, we often
call a simulation a Monte Carlo simulation, named after the gambling
capital.
A Monte Carlo simulation is a probabilistic model involving an element of
chance.
Hence, such a simulation is not deterministic but is probabilistic or
stochastic, and the results of each execution can vary from those of other
runs.
Simulations – The Element of Chance
 To illustrate the difference between a
Monte Carlo simulation model and a
purely mathematical model, consider
a problem of finding the area
between the curve f(x) and the x-axis
on the interval from x = 0 to x = 2:
 The area is certainly deterministic;
exactly one answer exists for the
area. Moreover, if the function f is
integrable then from 0 to 2, then we
can determine the area by integrating
i.e.
 This is called definite integral.
Simulations – The Element of Chance
 Pick a rectangle of an arbitrary height.
 Then, hypothetically throw darts at the rectangle,
counting the total number of darts thrown and the
number of darts that hit below the graph.
 To simulate a dart throw, generate a random floating-
point number, randomX, between 0.0 and 2.0 for the x-
coordinate and a random floating-point number,
randomY, between 0 and 1.5 for the y-coordinate.
 If the simulated dart hits on the curve, then its y-
coordinate, randomY, would be f(randomX); while
randomY < f(randomX) if and only if the dart strikes
the board below the curve.

AUC ≈ 3.0 * 778/1000 = 2.334


Simulations – Measure of Quality
 For the area problem, we can obtain a better estimate by throwing a larger
number of darts.
 In this case, we may either define the number of darts in one simulation to be
larger or run the simulation many times, taking the mean (average) of the area
estimates from all the runs.
 The latter technique has the advantage of enabling us to use the standard
deviation (σ) of the estimates from the different executions as a measure of the
quality of the overall estimate.
 About 68.3% of the estimates are within ±σ of the mean. Thus, for a mean of
6.2838 and a standard deviation of 0.442276, 68.3% of the area estimates are
between 6.2838 – 0.442276 = 5.79602 and 6.2838 + 0.442276 = 6.72608.
 A small standard deviation relative to the mean indicates a certain consistency
for most of the simulations and gives us more confidence in the mean as an
estimate of the area.
Simulations – Multiplicative Linear Congruential
Method
 In 1949, D. J. Lehmer presented one of the best techniques for generating uniformly distributed pseudo
random numbers, the linear congruential method.
 One simple linear congruential random number generator that generates values between 0 and 10,
inclusive, is as follows:
r0 = 10
rn = (7rn–1 + 1) mod 11, for n > 0
 The initial value in the sequence of random numbers, r0 = 10, is the seed. The mod function returns the
remainder.
 For example, 71 mod 11 is 5, the remainder in the division of 11 into 71. Thus, substituting r0 = 10 on the
right-hand side of the second line of the definition, the generating function, we calculate
r1 = (7 · 10 + 1) mod 11 = 5.
 After we calculate one “random number,” to evaluate the next, we substitute that value into the expression on
the right-hand side.
 Consequently, the next random number is r2 = (7 · 5 + 1) mod 11 = 36 mod 11 = 3
 Continuing in this fashion, we obtain ten pseudorandom numbers 5, 3, 0, 1, 8, 2,
4, 7, 6, 10 before the sequence starts repeating.
 A maximum of 11 nonnegative integers is generated for computation with mod
11.
 Should we desire floating-point numbers between 0 and 1, we divide each number
in the sequence by the modulus, 11, to obtain the following sequence:

5 3 0 1 8 2 4 7 6 10
, , , , , , , , ,
11 11 11 11 11 11 11 11 11 11
0.454545, 0.272727, 0.0, 0.0909091, 0.727273, 0.181818, 0.363636, 0.636364, 0.545455,
0.909091
 For this computation, the smallest possible pseudo random floating-point number
is 0.0 and the largest is (modulus – 1)/modulus = 10/11. Thus, floating-point
numbers that we generate by dividing by the modulus are in the interval [0.0, 1.0),
or the interval between the two values that includes 0.0 but not 1.0.
 Much research has been done to discover choices for multiplier and modulus that
give the largest possible sequence that appears random.
 For built-in random number generators, modulus is often the largest integer a
computer can store, such as 231 – 1 = 2,147,483,647 on some machines.
 For this modulus, a multiplier of 16,807 and an increment of 0 produce a
sequence of 231 – 2 elements.

Different Ranges of Random Numbers

 Suppose that rand is a uniformly distributed random floating-point number from


0.0 up to 1.0.
 And that we need a random floating-point number from 0.0 up to 5.0.
 Because the length of this interval is 5.0, we multiply rand by this value, 5.0, to
stretch the interval of numbers. Mathematically, we have the following:
0.0 ≤ 5.0 rand < 5.0
 If the lower bound of the range is different from 0, we add that bound. For
example, if we need a random floating-point number from 2.0 up to 7.0, we
multiply by the length of the interval, 7.0 – 2.0 = 5.0, to expand the range.
 Then, we add the lower bound, 2.0, to shift, or translate, the result so that the
following inequalities hold:
2.0 ≤ (7.0 – 2.0) rand + 2.0 < 7.0
or
2.0 ≤ 5.0 rand + 2.0 < 7.0

In general
min ≤ (max – min) rand + min < max
 Frequently, we need a more-restricted range of random integers than from 0 up to modulus. For
example, a simulation might require random integer temperatures between 0 and 99, inclusive.
 One method of restricting the range is to multiply a floating-point random number between 0.0 and
1.0 by 100 (the number of integers from 0 through 99, or 99 + 1) and then return the integer part
(the number before the decimal point).
For example, suppose rand is 0.692871.
Multiplying by 100, we obtain 100 · 0.692871 = 69.2871.
Truncating, we obtain an integer (69) between 0 and 99.
 Sometimes we want the range of random integers to have a lower bound other than 0, for example,
from 100 to 500, inclusive. Because we include 100 and 500 as options, the number of integers
from 100 to 500 is one more than the difference in these values, (500 – 100 + 1) = 401.
 As with the last example, we multiply this value by rand to expand the range. Then, we add the
lower bound, 100, to the product to translate the range to start at 100 as follows:
100 ≤ INT(401 rand + 100) < 501
or
100 ≤ INT(401 rand + 100) ≤ 500
In general
min ≤ INT( (max – min + 1)·rand + min) ≤ max
Simulations – Random Numbers from Various
Distributions
 Monte Carlo simulation requires the use
of unbiased random numbers. The
distribution of these numbers is a
description of the portion of times each
possible outcome or each possible range
of outcomes occurs on the average over a
great many trials.
 However, the distribution that a simulation
requires depends on the problem.
 Now we will discuss the algorithms for
generating random numbers from several
types of distributions.
Suppose a specified range is partitioned into intervals of the same length. With
a uniform distribution, the generator is just as likely to return a value in any
of the intervals.
Equivalently, in a list of many such random numbers, on the average each
interval contains the same number of generated values.
For example, the figure presents a histogram with 10 intervals of length 0.1 of
a table of 10,000 random floating-point numbers, uniformly distributed from
0.0 up to 1.0.
As expected, approximately one-tenth of the 10,000, or 1000, numbers appears
in each subdivision.
Thus, the curve across the tops of the bars is virtually a horizontal line of
height 1000.
As we will see, methods for generating random numbers in other distributions
depend on our ability to produce random numbers with a uniform distribution.
Simulations – Discrete vs. Continuous Distributions
 A distribution can be discrete or continuous. To illustrate the difference between the
terms discrete and continuous, a digital clock shows time in a discrete manner, from
one minute to the next, while a clock with two hands indicates time in a continuous,
unbroken way.
 Similarly, as you pass the time-and-temperature sign in front of a bank, one moment
it might register 28°C, the next it might jump to 29°C. As a continuous counterpart,
a thermometer outside a house might have a column of liquid (mercury), smoothly
rising and falling to indicate the temperature.
 In a simulation of pollution, we might generate a random integer to indicate the
number of dust particles in a cubic meter of air. The distribution of such values is
discrete.
 In the same simulation, for the velocities of the particles, we might generate
random floating-point values that have a continuous distribution.
 For a discrete distribution, a probability mass function returns the probability of
occurrence of a particular argument value. For example, P(1382) might be the
probability that the random number generator returns 1382, indicating 1382 dust
particles.
 However, if a distribution is continuous, the probability of occurrence of any
particular value is zero. Thus, for a continuous distribution, a probability density
function indicates the probability that a given outcome falls inside a specific range
of values.
 The integral of the probability function from the lower to the upper bound of the
range, which is the area under that portion of the curve, gives the probability that
the outcome is in that range.
 For example, the probability that the random velocity in the x-direction of a dust
particle is between 3.0 and 4.0 mm/s is the integral of the probability density
function from 3.0 to 4.0.
Simulations - Normal Distributions
A normal, or Gaussian, distribution, which statistics frequently
employs, has a probability density function

where μ is the mean and σ is the standard deviation. The figure on


the next slide displays a histogram of a set of 1000 random numbers
in the Gaussian distribution with mean 0 and standard deviation 1.
Without getting into a formal definition of standard deviation, 68.3%
of the values in a normal distribution are within ±σ of the mean, μ;
95.5% are within ±2σ of μ; and 99.7% are within ±3σ of μ.
Simulations - Exponential Distributions
 A model for unconstrained growth or decay employs an
exponential function ert, where t is time and r is the
growth rate or –r the decay rate, respectively.
 Functions of the form f(t) = |r|ert with r < 0 and t > 0 or
f(t) = |r|ert with r > 0 and t < 0 are probability density
functions in which the area under each curve is 1.
 To obtain a number in such a distribution, the
exponential method divides the natural logarithm of a
uniformly distributed random number from 0.0 to 1.0 by
the rate constant (r), that is, ln(rand)/r, where rand is
random between 0 and 1.
 For example, to generate numbers in the distribution
f(t) = 2e–2t, we calculate ln(rand)/(–2). The figure
displays a histogram of 1000 such exponentially distributed
random numbers.
 In many applications, several methods for generating random
numbers in other specific distributions are employed.
 When these techniques do not apply, however, we can
employ the rejection method.
 First, we obtain a uniformly distributed random number,
randInterval, in the requested interval.
 If the probability density function at randInterval is greater
than a uniform random number from 0.0 to an upper bound
for the function, we return randInterval.
 Otherwise, we repeat the process.
Data Driven Models
Random Walk
Simulations – Random Walk
 Random walk is one technique of Monte Carlo simulations that has many applications in
the sciences. Random walk refers to the apparently random movement of an entity.
 In a time-driven simulation, we depict the entity in a cell on a rectangular grid.
 At any time step, the entity can move, perhaps under certain constraints, at random to a
neighboring cell.
 A certain type of computer simulation involving grids is a cellular automaton.
 Cellular automata are dynamic computational models that are discrete in space, state,
and time. We picture space as a one-, two-, or three-dimensional grid, or array, or lattice.
 A site, or cell, of the grid has a state, and the number of states is finite.
 Rules or transition rules, specifying local relationships and indicating how cells are to
change state, regulate the behavior of the system.
 An advantage of such grid-based models is that we can visualize the progress of events
through informative animations.
 For example, we can view a simulation of the movement of ants toward a food source,
the spread of fire, or the motion of gas molecules in a container.
Simulations – Random Walk
 A random walk cellular automaton can model Brownian
motion, which is the behavior of a molecule suspended in
a liquid.
 The phenomenon bears the name of the English botanist
Robert Brown.
 In 1827, he observed the rapid, random motion of pollen
particles in a liquid could not occur because of life within
the pollen, as some conjectured.
 A generation later, the physicists Maxwell, Clausius, and
Einstein explained the phenomenon as invisible liquid
particles (molecules) striking the visible particles, causing
small movements.
 Because diffusion of many things, such as pollutants in
the atmosphere and calcium in living bone tissue, exhibit
Brownian motion, simulations using random walks can
also model these processes (according to Encyclopaedia
Britannica 1997; Exploratorium 1995).
Simulations – Random Walk
 In genetics, random walks have been used to simulate mutation of genes.
 As another example, scientists use the method polymerase chain reaction (PCR) to make
many copies of particular pieces of DNA.
 A strand of DNA contains sequences of four bases, A, T, C, and G.
 Using the random walk technique in simulations, computational scientists can determine
good proportions of these bases in solution to speed replication of the DNA.
Random Walk - Algorithm
 At each time step of a particular random walk
simulation, suppose an entity moves in a
random, diagonal direction - NE, NW, SE, or
SW.
 To go in such a direction, the entity walks east
or west one unit and north or south one unit,
covering a diagonal distance of units.
 After calling randomWalkPoints to generate the
list containing the points of a path, we can
create and display a graphics representing the
random walk.
 For example, we might show all the random
walk locations as colored dots, the path as line
segments, and the first and last points as black
dots.
 Because the walk is random, each run of the
function randomWalkPoints will very probably
produce a different walk.
 For the random walk in the above Figure, 5.09902 units is the distance between the final point and the
initial one, which are the two black dots.
 However, because the walks are random, great variation can exist in both the paths and in the final
distances from the starting point.
Random Walk – Distance Algorithm
Sometimes, we may wish to obtain
an estimate of a typical distance
between the starting and ending
points of a random walk of n steps.
Then, we should run the simulation
many times and take the average of
all the distances.
In such a case, we are not interested
in viewing a random walk, so we first
define another function, randomWalk
- Distance, that is similar to
randomWalkPoints, but which
returns the desired distance instead of
the list of points in a walk.
Random Walk – Mean Distance Algorithm
 A variable, sumDist, accumulates the distances
covered by the random walks. Before the loop,
sumDist is initialized to zero; after the loop, this
sum is divided by numTests to return the average
distance.
 One run of meanRandomWalkDistance(25, 100)
might return an average distance of 5.75278 units
for 100 simulations of random walks of 25 steps.

 For a function meanRandomWalkDistance, which


returns the average distance travelled over numTests
number of random walks of n steps each, we place a call
to randomWalkPoints(n) in a loop that iterates numTests
number of times.
Random Walk - Relationship between Number of
Steps and Distance Covered
 To discern a relationship between the number of steps, n, and average distance
covered in a random walk, we can execute meanRandomWalkDistance(n, 100) for
values of n from 1 to 50 and store each average distance in a list or array, listDist.
 Then, we may employ the techniques of Module, “Empirical Models,” to determine
the relationship.
Simulations – Programming Demo

You might also like