Event-Driven Simulation: 14.1 Some Queueing Definitions
Event-Driven Simulation: 14.1 Some Queueing Definitions
Having covered how to generate random variables in the previous chapter, we are
now in good shape to move on to the topic of creating an event-driven simulation.
The goal of simulation is to predict the performance of a computer system under
various workloads. A big part of simulation is modeling the computer system as
a queueing network. Queueing networks will be revisited in much more detail in
Chapter 27, where we analytically address questions of performance and stability
(analysis is easier to do after covering Markov chains and hence is deferred until
later).
Figure 14.1 depicts a queue. The circle represents the server (you can think
of this as a CPU). The red rectangles represent jobs. You can see that one of
the jobs is currently being served (it is in the circle) and three other jobs are
queueing, waiting to be served, while three more jobs have yet to arrive to the
system. The red rectangles have different heights. The height of the rectangle
is meant to represent the size of a job, where size indicates the job’s service
requirement (number of seconds needed to process the job). You can see that
some jobs are large, while others are small. Once the job finishes serving (being
processed) at the server, it leaves the system, and the next job starts serving. We
assume that new jobs arrive over time. The time between arrivals is called the
interarrival time. Unless otherwise stated, we assume that jobs are served in
first-come-first-served (FCFS) order.
Question: If the arrival process to a queue is a Poisson process, what can we say
about the interarrival times?
We will generally assume a stochastic setting where all quantities are i.i.d.
random variables. We will denote a job’s size by the random variable (r.v.) 𝑆.
For example, if 𝑆 ∼ Uniform(0, 10), then jobs each require independent service
times ranging between 0 and 10 seconds. The interval times between jobs is
denoted by the r.v. 𝐼, where again we assume that these are independent. For
example, if 𝐼 ∼ Exp(𝜆), where 𝜆 = 0.1, then the average time between arrivals
is 10 seconds. When running a simulation based on distributions for interarrival
times and job sizes, we are assuming that these distributions are reasonable
approximations of the observed workloads in the actual computer system being
simulated.
However, it is also possible to assume that job sizes and interarrival times are
taken from a trace. In that case, the simulation is often referred to as a trace-
driven simulation. The trace typically includes information collected about the
system over a long period of time, say a few months or a year.
Question: What are some advantages of using a trace to drive the simulation as
opposed to generating inputs from distributions?
We define the response time of job, typically denoted by r.v. 𝑇, to be the time
from when the job first arrives until it completes service. We can also talk about
the waiting time (a.k.a. delay) of a job, denoted by r.v. 𝑇𝑄 , which is the time from
when the job first arrives until it first receives service. We define the number of
jobs in system, denoted by r.v. 𝑁, to be the total number of jobs in the system.
We define the server utilization, denoted by 𝜌, as the long-run fraction of time
that the server is busy.
the first job, 𝑇2 the response time of the second job, etc. Then,
𝑛
1 ∑︁
E [𝑇] = 𝑇𝑖 ,
𝑛 𝑖=1
where it is assumed that 𝑛 is sufficiently large that the mean response time is
not changing very much. Thus, to get the mean response time, we can imagine
having each of the first 𝑛 jobs record its response time, where we then average
over all of these.
Imagine that we want to simulate the queue shown in Figure 14.1, where the
interarrival times are i.i.d. instances of r.v. 𝐼 and the job sizes (service require-
ments) are i.i.d. instances of some r.v. 𝑆. Assume that we know how to generate
instances of 𝐼 and 𝑆 using the techniques described in Chapter 13.
The whole point is to be able to process millions of arrivals in just a few hours. To
do this, we use an event-driven simulation. The idea is to maintain the system
state at all times and also maintain a global clock. Then we ask,
“What is the next event that will cause a change in the system state?”
We then increase the time on the global clock by the time until this next event,
and we update the system state to reflect the next event. We also update the times
until the next events. We then repeat this process, stepping through events in
near-zero time.
For example, let’s consider an event-driven simulation of the queue in Figure 14.1.
The interarrival times will need to be generated according to r.v. 𝐼. The job sizes
(service requirements) will need to be generated according to r.v. 𝑆.
Question: Do we generate all the arrival times and all the job sizes for the whole
simulation in advance and store these in a large array?
Let’s run through how this works. We are going to maintain four variables:
The simulation starts here: State is 0 jobs. Clock = 0. There’s no job serving,
so Time-to-next-completion = ∞. To determine the time to the next arrival, we
generate an instance of 𝐼, let’s say 𝐼 = 5.3, and set Time-to-next-arrival = 5.3.
We ask which event will happen first. Since min(∞, 5.3) = 5.3, we know the
next event is an arrival.
We now update everything as follows: State is 1 job. Note that this job starts
serving immediately. Clock = 5.3. To determine the time to the next completion,
we generate an instance of 𝑆 representing the service time of the job in service,
say 𝑆 = 10, and set Time-to-next completion = 10. To determine the next arrival
we generate an instance of 𝐼, say 𝐼 = 2, and set Time-to-next-arrival = 2.
We again ask which event will happen first. Since min(10, 2) = 2, we know the
next event is an arrival.
We again ask which event will happen first. Since min(8, 9.5) = 8, we know the
next event is a completion.
We continue in this manner, with updates to the state happening only at job
arrival times or completions. Note that we only generate new instances of 𝐼 or 𝑆
as needed.
Answer: There are two times: The main time we generate a new instance of 𝐼 is
immediately after a new job arrives. However, we also generate a new instance
of 𝐼 at the very start of the simulation when there are 0 jobs.
Answer: Nothing, really. The same approach is used, except that rather than
generating a new instance of 𝐼 or 𝑆 when we need it, we just read the next value
from the trace.
So now you have your simulation running. How do you figure out the mean
response time? We propose two methods, the first of which we already discussed
briefly.
Method 1: Every job records the clock time when it arrives and then records the
clock time when it completes. Taking the difference of these gives us the job’s
response time. We now just need to average the response time over all the jobs.
Question: Should we write each job’s response time into a file and then take the
average at the end of our simulation?
Answer: No, the writing wastes time in our simulation. You should be able to
maintain a running average. Let 𝑇 𝑛 denote the average over the first 𝑛 jobs:
𝑛
1 ∑︁
𝑇𝑛 = 𝑇𝑖 .
𝑛 𝑖=1
If one runs a simulation for long enough, it really doesn’t matter whether one
uses Method 1 or Method 2, assuming that your system is well behaved. 1 This
brings us to another question.
Answer: We want to run the simulation until the metric of interest, in this case
mean response time, appears to have stabilized (it’s not going up or down sub-
stantially). There are many factors that increase the time it takes for a simulation
to converge. These include load, number of servers, and any type of variability,
either in the arrival process or the job service times. It is not uncommon to need
to run a simulation with a billion arrivals before results stabilize.
Now suppose the goal is not the mean response time, but rather the mean number
of jobs in the system, E [𝑁]. Specifically, we define the mean number as a time-
1 Technically, by well behaved we mean that the system is “ergodic.” It suffices that the system
empties infinitely often. For a more detailed discussion of ergodicity, see Chapter 25 and
Section 27.7.
average, as follows: Let 𝑀 (𝑠) denote the number of jobs in the system at time 𝑠.
Then,
∫ 𝑠=𝑡
𝑀 (𝑠)𝑑𝑠
E [𝑁] = lim 𝑠=0 . (14.1)
𝑡→∞ 𝑡
Think of this as summing the number of jobs in the system over every moment of
time 𝑠 from 𝑠 = 0 to 𝑠 = 𝑡 and then dividing by 𝑡 to create an average. Obviously
we’re not really going to take 𝑡 to infinity in our simulation, but rather just some
high enough number that the mean number of jobs stabilizes.
Question: But how do we get E [𝑁] from our simulation? We’re not going to
look at the number at every single time 𝑠. Which times do we use? Can we simply
measure the number of jobs in the system as seen by each arrival and average all
of those?
Answer: This is an interesting question. It turns out that if the arrival process
is a Poisson process, then we can simply record the number of jobs as seen by
each arrival. This is due to a property called PASTA (Poisson arrivals see time
averages), explained in [35, section 13.3]. Basically this works because of the
memoryless property of a Poisson process, which says that the next arrival can
come at any time, which can’t in any way be predicted. Thus the arrival times of
a Poisson process are good “random” points for sampling the current number of
jobs.
Unfortunately, if the arrival process is not a Poisson process, then having each
arrival track the number of jobs that it sees can lead to very wrong results.
Question: Can you provide an example for what goes wrong when we average
over what arrivals see?
Answer: Suppose that 𝐼 ∼ Uniform(1, 2). Suppose that 𝑆 = 1. Then every arrival
finds an empty system and thus we would conclude that the mean number of jobs
is 0, when in reality the mean number of jobs is: 23 · 1 + 13 · 0 = 23 .
Question: So how do we measure the mean number of jobs in the system if the
arrival process is not a Poisson process?
Figure 14.2 shows a router with finite (bounded) buffer space. There is room for
𝑛 = 6 packets, one in service (being transmitted) and the others waiting to be
transmitted. Note that all the packets are purposely depicted as having the same
size, as is typical for packets. When a packet arrives and doesn’t find space, it is
dropped.
FCFS
Arrivals that don’t
fit are dropped
In terms of running the simulation, nothing changes. The system state is still
the number of packets in the system. As before we generate packet sizes and
interarrival times as needed. One of the common reasons to simulate a router
with finite buffer space is to understand how the buffer space affects the fraction
of packets that are dropped. We will investigate this in Exercise 14.4.
Question: Suppose we are trying to understand mean response time in the case
of the router with finite buffer space. What do we do with the dropped packets?
Answer: Only the response times of packets that enter the system are counted.
Figure 14.3 shows a network of three queues, where all queues are unbounded
(infinite buffer space). A packet may enter either from queue 1 or from queue
2. If the packet enters at queue 2, it will serve at queue 2 and leave without
joining any other queues. A packet entering at queue 1 will serve at queue 1 and
then move to either queue 2 or queue 3, each with probability 0.5. We might
be interested here in the response time of a packet entering at queue 1, where
response time is the time from when the packet arrives at queue 1 until it leaves
the network (either at server 2 or at server 3).
Answer: The system state is the number of packets at each of the three queues.
Queue 2
External
arrivals
Queue 1 0.5
External
arrivals
Queue 3
0.5
Answer: We need to track five possible events. For queue 1, we need to track
Time-to-next-arrival and Time-to-next-completion. For queue 3, we only need
to track Time-to-next-completion. The arrival times at queue 3 are determined
by flipping a fair coin after each completion at queue 1. Likewise, for queue
2, the internal arrival times at queue 2 are determined by flipping a fair coin
after each completion at queue 1. However, queue 2 also has external arrivals.
These external arrivals need to be tracked. Thus, for queue 2 we need to track
the Time-to-next-external-arrival and Time-to-next-completion.
Answer: No. Calls enter service in the order that they arrived, but some calls
might be shorter than others, and hence may leave sooner, even though they
entered later.
Answer: The system state is the total number of jobs in the system (we do
not need to differentiate between those in service and those queued), plus the
remaining service time for each of the jobs in service.
External
arrivals
FCFS
14.5 Exercises
the x-axis. Note that a job of size 0 may still experience a queueing
time, even though its service time is 0.
(e) What happens when 𝐶𝑆2 increases? Why do you think this is? Think
about it from the perspective of the time that the average job waits.
0.5
FCFS
Incoming Random
jobs
0.5
FCFS
Short
jobs
FCFS
Incoming SITA
jobs
Long
jobs
FCFS