BCSP 801 Aos Lab File
BCSP 801 Aos Lab File
2|Pa ge
Lab 1 - Getting Started with Kernel Tracing - I/O
The benchmark
Our I/O benchmark is straightforward: it performs a series of read() or write() I/O
system calls in a loop using configurable buffer and total I/O sizes. Optionally, an
fsync() system call be issued at the end of the I/O loop to ensure that buffered data
is written to disk. The benchmark samples the time before and after the I/O loop,
optionally displaying an average bandwidth. The lab bundle will build two versions
of the benchmark: io-static and io-dynamic. The former is statically linked (i.e.,
without use of the run-time linker or dynamic libraries), whereas the latter is
dynamically linked.
3|Pa ge
# cd io
# make
# cd ..
Running the benchmark
Once built, you can run the benchmark binaries as follows, with command-line
arguments specifying various benchmark parameters:
# io/io-static
or:
# io/io-dynamic
The benchmark must be run in one of three operational modes: create, read, or
write, specified by flags. In addition, the target file must be specified. If you run the
io-static or io-dynamic benchmark without arguments, a small usage statement will
be printed, which will also identify the default buffer and total I/O sizes configured
for the benchmark.
In your experiments, you will need to be careful to hold most variables constant
in order to isolate the effects of specific variables; for example, you may wish to
hold the total I/O size constant as you vary the buffer size. You may wish to
experiment initially using /dev/zero – the kernel’s special device node providing an
unlimited source of zeros, but will also want to run the benchmark against a file in
the filesystem.
-c Create the specified file using the default (or requested) total I/O size.
-r Benchmark read() of the target file, which must already have been created.
-w Benchmark write() of the target file, which must already have been created.
4|Pa ge
buffered mode, the first measurement using -d should be discarded, as some cached data
may still be used.
-s Synchronous mode causes the benchmark to call fsync() after writing the file to cause
all buffered writes to complete before the benchmark terminates – and in particular
before the final timestamp is taken.
-q Quiet mode suppress all terminal output from the benchmark, which is preferred
when performing wholeprogram benchmarking.
-v Verbose mode causes the benchmark to print additional information, such as the
time measurement, buffer size, and total I/O size.
5|Pa ge
# /usr/bin/time -p io/io-static -r -B -d -q iofile
real 1.31 user 0.00 sys 0.31
However, it may be desirable to use DTrace to collect more granular timestamps by
instrumenting return from execve() and entry of exit() for the program under test.
This run of io-static, reading the data file /data/iofile, bypassing the buffer cache,
and running in bare mode (i.e., without quiescing prior to the benchmark) took 1.31
seconds of wall-clock time to complete. Of this, (roughly) 0.00 seconds were spent
in userspace, and (roughly) 0.31 seconds were spent in the kernel on behalf of the
process. From this output, it is unclear where the remaining 1.00 seconds were
spent, but presumably a substantial fraction was spent blocked on (slow) SD Card
I/O. Time may also have been spent running other processes – and lower-precision
time measurement, such as provided by time, may suffer from non-trivial rounding
error.
/dev/zero The zero device: an infinite source of zeroes, and also an infinite sink for
any data you write to it.
/data/iofile A writable file in a journalled UFS filesystem on the SD card. You will
need to create the file using -c before performing read() or write() benchmarks.
Jupyter
The Jupyter Notebook is a web application that allows you to create and
share documents that contain live code, equations, visualisations and
explanatory text. Uses include: data cleaning and transformation,
numerical simulation, statistical modelling, machine learning and much
more. (jupyter.org)
The laboratory work requires students to bring together a diverse range of skills and
knowledge: from shell commands, scripting languages and DTrace to knowledge of
microarchitectural features and statistical analysis. The course’s aim is to focus on
the intrinsic complexity of the subject matter and not the extrinsic complexity
resulting from integrating disparate tools and platforms. Jupyter Notebooks support
this goal by providing a unified environment for:
• Executing benchmarks.
• Measuring the performance of these benchmarks with DTrace.
• Post-processing performance measurements.
• Plotting performance measurements.
• Performing statistical analysis on performance measurements.
6|Pa ge
Further information about the Jupyter Notebooks can be found at the project’s
website: jupyter.org.
Template
The BeagleBone Black comes preinstalled with a template Jupyter Notebook 2017-
2018-l41-lab-template.ipnb. This template is designed to give working examples of all
the features necessary to complete the first laboratory: including measuring the
performance of the benchmarks using DTrace, producing simple graphs with matplotlib
and performing basic statistics on performance measurements using pandas.
Details of working with the Jupyter Notebook template are given in the L41: Setup
guide.
Notes on benchmark
This benchmark calculates average I/O rate for a particular run. Be sure to run the
benchmark more than once (ideally, perhaps a dozen times), and discard the first
7|Pa ge
output which may otherwise be affected by prior benchmark runs (e.g., if data is left
in the buffer cache and the benchmark is not yet in the steady state). Do be sure that
terminal I/O from the benchmark is not included in tracing or time measurements.
Experimental questions
Your lab report will compare several configurations of the benchmark, exploring
(and explaining) performance differences between them. Do ensure that your
experimental setup quiesces other activity on the system, and also use a suitable
number of benchmark runs. The following questions are with respect to the
benchmark reading a file through the buffer cache:
• Holding the total I/O size constant (16MB), how does varying I/O buffer size
affect IO-loop performance?
• Holding the buffer size constant (16K) and varying total I/O size, how does
static vs. dynamic linking affect whole-program performance?
• At what file-size threshold does the performance difference between static and
dynamic linking fall below 5%? At what file-size threshold does the
performance difference fall below 1%?
• Consider the impact of the probe effect on your causal investigation.
For the purposes of performance graphs, plot measured performance (the dependent
variable, or Y axis) with respect to I/O bandwidth rather than literal execution time.
This will make it easier to analyse relative I/O efficiency (per unit of data) as file
and buffer sizes vary.
8|Pa ge
Lab 2 - Kernel Implications of IPC
Pipes are used most frequently between pairs of processes in a UNIX process pipeline:
a chain of processes started by a single command line, whose output and input file
descriptors are linked. Although pipes can be set up between unrelated processes,
the primary means of acquiring a pipe is through inheritance across fork(), meaning
that they are used between closely related processes (e.g., with a common parent
process).
Sockets are used when two processes are created in independent contexts and must later
rendezvous – e.g., via the filesystem, but also via TCP/IP. In typical use, each
endpoint process creates a socket via the socket() system call, which are then
interconnected through use of bind(), listen(), connect(), and accept(). However,
there is also a socketpair() system call that returns a pair of interconnected
endpoints in the same style as pipe() – convenient for us as we wish to compare the
two side-by-side.
Both pipes and sockets can be used to transmit ordered byte streams: a sequence of bytes
sent via one file descriptor that will be received reliably on the other without loss or
reordering. As file I/O, the read() and write() system calls can be used to read and write
data on file descriptors for pipes and sockets. It is useful to know that these system calls
are permitted to return partial reads and partial writes: i.e., a buffer of some size (e.g.,
1k) might be passed as an argument, but only a subset of the requested bytes may be
received or sent, with the actual size returned via the system call’s return value. This
may happen if the in-kernel buffers for the IPC object are too small for the full amount,
or if non-blocking I/O is enabled. When analysing traces of IPC behaviour, it is
important to consider both the size of the buffer passed and the number of bytes returned
in evaluating the behaviour of the system call.
9|Pa ge
You may wish to read the FreeBSD pipe(2) and socketpair(2) manual pages to learn
more about these APIs before proceeding with the lab.
The benchmark
As with our earlier I/O benchmark, the IPC benchmark is straightforward: it sets up a
pair of IPC endpoints referencing a shared pipe or socket, and then performs a series of
write() and read() system calls on the file descriptors to send (and then receive) a total
number of bytes of data. Data will be sent using a smaller userspace buffer size –
although as hinted above, there is no guarantee that a full user buffer will be sent or
received in any individual call. Also as with the I/O benchmark, there are several modes
of operation: sending and receiving within a single thread, a pair of threads in the same
process, or between two threads in two different processes.
The benchmark will set up any necessary IPC objects, threads, and processes, sample
the start time using the clockgettime() system call, perform the IPC loop (perhaps split
over two threads), and then sample the finish time using the clockgettime() system call.
Optionally, both the average bandwidth across the IPC object, and also more verbose
information about the benchmark configuration, may be displayed. Both statically and
dynamically linked versions of the binary are provided: ipc-static and ipc-dynamic.
10 | P a g e
Required operation argument
Specify the mode in which the benchmark should operate:
1thread Run the benchmark entirely within one thread; note that, unlike other
benchmark configurations, this mode interleaves the IPC calls and must place the
file descriptors into non-blocking mode or risk deadlock. This may have observable
effects on the behaviour of the system calls with respect to partial reads or writes.
2thread Run the benchmark between two threads within one process: one as a ‘sender’
and the other as a ‘receiver’, with the sender capturing the first timestamp, and the
receiver capturing the second. System calls are blocking, meaning that if the in-
kernel buffer fills during a write(), then the sender thread will sleep; if the in-kernel
buffer empties during a read(), then the receiver thread will sleep.
2proc As with the 2thread configuration, run the benchmark in two threads – however,
those threads will be in two different processes. The benchmark creates a second
process using fork() that will run the sender. System calls in this variation are
likewise blocking.
11 | P a g e
or that tracing and benchmarking only occurs during a period of program execution
unaffected by terminal I/O:
-q Quiet mode suppress all terminal output from the benchmark, which is preferred
when performing wholeprogram benchmarking.
-v Verbose mode causes the benchmark to print additional information, such as the time
measurement, buffer size, and total IPC size.
12 | P a g e
Notes on using DTrace
On the whole, this lab will be concerned with just measuring the IPC loop, rather than
whole-program behaviour. As in the last lab, it is useful to know that the system call
clockgettime is both run immediately before, and immediately after, the IPC loop. In
this benchmark, these events may occur in different threads or processes, as the sender
performs the initial timestamp before transmitting the first byte over IPC, and the
receiver performs the final timestamp after receiving the last byte over IPC. You may
wish to bracket tracing between a return probe for the former, and an entry probe for
the latter; see the notes from the last lab for an example.
As with the last lab, you will want to trace the key system calls of the benchmark:
read() and write(). For example, it may be sensible to inspect quantize() results for both
the execution time distributions of the system calls, and the amount of data returned by
each (via arg0 in the system-call return probe). You will also want to investigate
scheduling events using the sched provider. This provider instruments a variety of
scheduling-related behaviours, but it may be of particular use to instrument its on-cpu
and off-cpu events, which reflect threads starting and stopping execution on a CPU. You
can also instrument sleep and wakeup probes to trace where threads go to sleep waiting
for new data in an empty kernel buffer (or for space to place new data in a full buffer).
When tracing scheduling, it is useful to inspect both the process ID (pid) and thread ID
(tid) to understand where events are taking place.
By its very nature, the probe effect is hard to investigate, as the probe effect does, of
course, affect investigation of the effect itself! However, one simple way to approach
the problem is to analyse the results of performance benchmarking with and without
DTrace scripts running. When exploring the probe effect, it is important to consider not
just the impact on bandwidth average/variance, but also on systemic behaviour: for
example, when performing more detailed tracing, causing the runtime of the benchmark
to increase, does the number of context switches increase, or the distribution of read()
return values? In general, our interest will be in the overhead of probes rather than the
overhead of terminal I/O from the DTrace process – you may wish to suppress that
output during the benchmark run so that you can focus on probe overhead.
Notes on benchmark
As with the prior lab, it is important to run benchmarks more than once to collect a
distribution of values, allowing variance to be analysed. You may wish to discard the
first result in a set of benchmark runs as the system will not yet have entered its steady
state. Do be sure that terminal I/O from the benchmark is not included in tracing or time
measurements (unless that is the intent).
13 | P a g e
Experimental questions (part 1/2)
You will receive a separate handout during the next lab describing Lab Report 2;
however, this description will allow you to begin to prepare for the assignment, which
will also depend on the outcome of the next lab. Your lab report will compare several
configurations of the IPC benchmark, exploring (and explaining) performance
differences between them. Do ensure that your experimental setup suitably quiesces
other activity on the system, and also use a suitable number of benchmark runs; you
may wish to consult the FreeBSD Benchmarking Advice wiki page linked to from the
module’s reading list for other thoughts on configuring the benchmark setup. The
following questions are with respect to a fixed total IPC size with a statically linked
version of the benchmark, and refer only to IPC-loop, not whole-program, analysis.
Using 2thread and 2proc modes, explore how varying IPC model (pipes, sockets, and
sockets with -s) and IPC buffer size affect performance:
• How does increasing IPC buffer size uniformly change performance across IPC
models – and why?
• Is using multiple threads faster or slower than using multiple processes?
Graphs and tables should be used to illustrate your measurement results. Ensure that,
for each question, you present not only results, but also a causal explanation of those
results – i.e., why the behaviour in question occurs, not just that it does. For the purposes
of graphs in this assignment, use achieved bandwidth, rather than total execution time,
for the Y axis, in order to allow you to more directly visualise the effects of
configuration changes on efficiency.
14 | P a g e
Lab 3: Micro-architectural implications of IPC
15 | P a g e
Hardware performance counters (2/2)
• Optimisation is therefore not just about reducing instruction count
• Optimisation must take into account microarchitectural effects
• TLB/cache effects tricky as they vary with memory footprint
• How can we tell when the cache overflows?
• Hardware performance counters let us directly ask the processor about
architectural and micro-architectural events
• #instructions, #memory accesses, #cache misses, DRAM traffic...
Optional flags:
-B Run in bare mode: no preparatory activities
-i pipe|localSelect pipe or socket for IPC (default: pipe)
-P l1d|l1i|l2|mem|tlb|axi Enable hardware performance counters
-q Just run the benchmark, don't print stuff out
-s Set send/receive socket-buffer sizes to buffersize
-v Provide a verbose benchmark description
-b buffersize Specify a buffer size (default: 131072) -t totalsize
Specify total I/O size (default: 16777216)
16 | P a g e
ipctype: socket
time: 0.084140708
pmctype: mem
INSTR_EXECUTED: 25463397
CLOCK_CYCLES: 46233168
CLOCK_CYCLES/INSTR_EXECUTED: 1.815672
MEM_READ: 8699699
MEM_READ/INSTR_EXECUTED: 0.341655
MEM_READ/CLOCK_CYCLES: 0.188170
MEM_WRITE: 7815423
MEM_WRITE/INSTR_EXECUTED: 0.306928
MEM_WRITE/CLOCK_CYCLES: 0.169044
194721.45 KBytes/sec
17 | P a g e
• Can we reach causal conclusions about the scalability of pipes vs. sockets from
processor performance counters?
• Remember to consider the hypotheses the experimental questions are exploring.
• Ensure that you directly consider the impact of the probe effect on your causal
investigation.
18 | P a g e
Lab 4 - The TCP State Machine
19 | P a g e
TCP identifies every byte in one direction of a connection via a sequence number.
Data segments contain a starting sequence number and length, describing the range of
transmitted bytes. Acknowledgment packets contain the sequence number of the byte
that follows the last contiguous byte they are acknowledging. Acknowledgments are
piggybacked onto data segments traveling in the opposite direction to the greatest extent
possible to avoid additional packet transmissions. In slow start, TCP performance is
directly limited by latency, as the congestion window can be opened only by receiving
ACKs – which require successive round trips. These periods are referred to as latency
bound for this reason, and network latency a critical factor in effective utilisation of path
bandwidth.
The benchmark
Our IPC benchmark also supports a tcp socket IPC type which requests use of TCP over
the loopback interface on port 10141. Use of a fixed port number makes it easy to
identify and classify experimental packets on the loopback interface using packet-
sniffing tools such as tcpdump, and also via DTrace predicates. You are advised to
minimise network activity during the running of TCP-related benchmarks, and when
using DTrace, to reduce the degree of interference both from the perspective of
analysing behaviour, and for reasons of the probe effect.
21 | P a g e
fbt::syncache expand:entry
FBT probe when a TCP packet converts a pending SYN cookie or SYN cache
connection into a full TCP connection. The third argument (args[2]) is a pointer to a
struct tcphdr.
fbt::tcp do segment:entry
FBT probe when a TCP packet is received in the ‘steady state’. The second argument
(args[1]) is a pointer to a struct tcphdr that describes the TCP header (see RFC 893).
You will want to classify packets by port number to ensure that you are collecting data
only from the flow of interest (port 10141), and associating collected data with the right
direction of the flow. Do this by checking TCP header fields thsport (source port) and
thdport (destination port) in your DTrace predicate. In addition, the fields thseq
(sequence number in transmit direction), thack (ACK sequence number in return
direction), and th win (TCP advertised window) will be of interest. The fourth argument
(args[3]) is a pointer to a struct tcpcb that describes the active connection.
fbt::tcp state change:entry
FBT probe that fires when a TCP state transition takes place. The first argument
(args[0]) is a pointer to a struct tcpcb that describes the active connection. The tcpcb
field tstate is the previous state of the connection. Access to the connection’s port
numbers at this probe point can be achieved by following tinpcb->inpinc.incie, which
has fields iefport (foreign, or remote port) and ie lport (local port) for the connection.
The second argument is the new state to be assigned.
When analysing TCP states, the D array tcpstatestring can be used to convert an integer
state to a human-readable string (e.g., 0 to TCPSCLOSED). For these probes, the port
number will be in network byte order; the D function ntohs() can be used to convert to
host byte order when printing or matching values in thsport, thdport, ielport, and iefport.
Note that sequence and acknowledgment numbers are cast to unsigned integers. When
analysing and graphing data, be aware that sequence numbers can (and will) wrap due
to the 32-bit sequence space.
22 | P a g e
Trace state transitions printing the receiving and sending port numbers for the
connection experiencing the transition:
dtrace -n fbt::tcp_state_change : entry ‘{
trace(ntohs(args[0]->t_inpcb->inp_inc.inc_ie.ie_lport));
trace(ntohs(args[0]->t_inpcb->inp_inc.inc_ie.ie_fport));
trace(tcp_state_string[args[0]->t_state]);
trace(tcp_state_string[args[1]]); }’
These scripts can be extended to match flows on port 10141 in either direction as
needed.
Exploratory questions
These questions are intended to help you understand the TCP state machine and
behaviour of TCP with simulated latencies, and should help provide supporting
evidence for your experimental questions. However, they are just suggestions – feel free
to approach the problem differently! These questions do not need to be addressed in
your lab report.
1. Exploring the TCP state machine:
• Trace state transitions occurring for your test TCP connections.
• Using DTrace’s stack() function, determine which state transitions are
triggered by packets received over the network (e.g., passing via tcpinput() vs.
those that are triggered by local system calls).
2. Baseline benchmark performance analysis:
• As you vary one-way latency between 0ms and 40ms, with 5ms intervals, what
is the net effect on performance?
• Plot an effective (i.e., as measured) TCP state-transition diagram for the two
directions of a single TCP connection: states will be nodes, and transitions will be
edges. Where state transitions diverge between the two directions, be sure to label
edges indicating ‘client’ vs. ‘server’.
23 | P a g e
• Extend the diagram to indicate, for each edge, the TCP header flags of the received
packet triggering the transition, or the local system call (or other event – e.g., timer)
that triggers the transition.
• Compare the graphs you have drawn with the TCP state diagram in RFC 793.
• Using DUMMYNET, explore the effects of simulated latency at 5ms intervals
between 0ms and 40ms. What observations can we make about state-machine
transitions as latency increases?
In Lab 5, we will extend our analysis using knowledge of TCP’s congestion-control
model, illustrating behaviours using time–sequence-number diagrams. Be sure, in your
lab report, to describe any apparent simulation or probe effects.
24 | P a g e
Lab 5 - TCP Latency and Bandwidth
25 | P a g e
‘opened’ gradually as available bandwidth is probed. The name ‘slow start’ is initially
confusing as it is actually an exponential ramp-up. However, it is in fact slow compared
to the original TCP algorithm, which had no notion of congestion and overfilled the
network immediately!
When congestion is detected (i.e., because the congestion window has gotten above
available bandwidth triggering a loss), a cycle of congestion recovery and avoidance is
entered. The congestion window will be reduced, and then the window will be more
slowly reopened, causing the congestion window to continually (gently) probe for
additional available bandwidth, (gently) falling back when it re-exceeds the limit. In the
event a true timeout is experienced – i.e., significant packet loss – then the congestion
window will be cut substantially and slow start will be re-entered.
The steady state of TCP is therefore responsive to the continual arrival and departure of
other flows, as well as changes in routes or path bandwidth, as it detects newly available
bandwidth, and reduces use as congestion is experienced due to over utilisation.
TCP composes these two windows by taking the minimum: it will neither send too much
data for the remote host, nor for the network itself. One limit is directly visible in the
packets themselves (the advertised window from the receiver), but the other must either
be intuited from wire traffic, or more preferably, monitored using end-host
instrumentation. Two further informal definitions will be useful:
Latency is the time it takes a packet to get from one endpoint to another. TCP
implementations measure RoundTrip Time (RTT) in order to tune timeouts detecting
packet loss. More subtlely, RTT also limits the rate at which TCP will grow the
congestion window, especially during slow start: the window can grow only as data is
acknowledged, which requires round-trip times as ACKs are received.
Bandwidth is the throughput capacity of a link (or network path) to carry data, typically
measured in bits or bytes per second. TCP attempts to discover the available bandwidth
by iteratively expanding the congestioncontrol window until congestion is experienced,
and then backing off. While bandwidth and latency are notionally independent of one
another, they are entangled in TCP as the protocol relies on acknowledgments to control
the rate at which the congestion window is expanded, which is dependent upon round-
trip time.
26 | P a g e
packet-loss detection or a transition out of slow start. Rather than directly overlaying,
which can be visually confusing, a better option may be to “stack” the graphs: place
them on the same X axis (time), horizontally aligned but vertically stacked. Possible
additional data points (and Y axes) might include advertised and congestion-window
sizes in bytes.
The benchmark
This lab uses the same IPC benchmark as prior labs. You will run the benchmark both
with, and without, setting the socket-buffer size, allowing you to explore the effects of
manual versus automatic socket-buffer tuning. The benchmark continues to send its data
on the accepted server-side socket on port 10141. This means that data segments
carrying benchmark data from the sender to the receiver will have a source port of
10141, and acknowledgements from the receiver to the sender will have a destination
port of 10141. Do ensure that, as in Lab 2, you have increased the kernel’s maximum
socket-buffer size.
DTrace probes
As in Lab 4, you will utilise the tcpdosegment FBT probe to track TCP input. However,
you will now take advantage of access to the TCP control block (tcpcb structure –
args[3] to the tcpdosegment FBT probe) to gain additional insight into TCP behaviour.
The following fields may be of interest: sndwnd On the sender, the last received
advertised flow-control window. sndcwnd On the sender, the current calculated
congestion-control window.
sndssthresh On the sender, the current slow-start threshold – if sndcwnd is less than or
equal to snd ssthresh, then the connection is in slow start; otherwise, it is in congestion
avoidance.
When writing DTrace scripts to analyse a flow in a particular direction, you can use the
port fields in the TCP header to narrow analysis to only the packets of interest. For
example, when instrumenting tcpdosegment to analyse received acknowledgments, it
will be desirable to use a predicate of /args[1]->thdport == htons(10141)/ to select only
packets being sent to the server port (e.g., ACKs), and the similar (but subtly different)
/args[1]->thsport == htons(10141)/ to select only packets being sent from the server
port (e.g., data). Note that you will wish to take care to ensure that you are reading fields
from within the tcpcb at the correct end of the connection – the ‘send’ values, such as
last received advertised window and congestion window, are properties of the server,
and not client, side of this benchmark, and hence can only be accessed from instances
of tcp dosegment that are processing server-side packets
To calculate the length of a segment in the probe, you can use the tcp:::send probe to
trace the iplength field in the ipinfo_t structure (args[2]):
27 | P a g e
typedef struct ipinfo {
uint8_t ip_ver; /* IP version (4, 6) */
uint16_t ip_plength; /* payload length */
string ip_saddr; /* source address */
string ip_daddr; /* destination address */
} ipinfo_t;
As is noted in the DTrace documentation for this probe this ipplength is the expected IP
payload length so no further corrections need be applied.
Data for the two types of graphs described above is typically gathered at (or close to)
one endpoint in order to provide timeline consistency – i.e., the viewpoint of just the
client or the server, not some blend of the two time lines. As we will be measuring not
just data from packet headers, but also from the TCP implementation itself, we
recommend gathering most data close to the sender. As described here, it may seem
natural to collect information on data-carrying segments on the receiver (where they are
processed by tcpdosegment), and to collect information on ACKs on the server (where
they are similarly processes). However, given a significant latency between client and
server, and a desire to plot points coherently on a unified real-time X axis, capturing
both at the same endpoint will make this easier.
It is similarly worth noting that tcpdosegment’s entry FBT probe is invoked before the
ACK or data segment has been processed – so access to the tcpcb will take into account
only state prior to the packet that is now being processed, not that data itself. For
example, if the received packet is an ACK, then printed tcpcb fields will not take that
ACK into account.
28 | P a g e
Experimental questions (part 2)
These questions supplement the experimental questions in the Lab 4 handout. Configure
the benchmark as follows:
29 | P a g e