Fingerprinting 802.11 Rate Adaptation Algorithms
Fingerprinting 802.11 Rate Adaptation Algorithms
Abstract—The effectiveness of rate adaptation algorithms In this paper, we describe a method for identifying or
is an important determinant of 802.11 wireless network fingerprinting the rate adaptation algorithms used in an
performance. The diversity of algorithms that has resulted 802.11 environment. We envision this capability as being
from efforts to improve rate adaptation has introduced a
new dimension of variability into 802.11 wireless networks, part of a toolkit for automated performance analysis and
further complicating the already difficult task of understand- debugging of production networks. The need for automated
ing and debugging 802.11 performance. To assist with this analysis and debugging has become increasingly urgent
task, in this paper we present and evaluate a methodology for as 802.11 networks have grown to support large user
accurately fingerprinting 802.11 rate adaptation algorithms. populations. Client devices set their own configurations and
Our approach uses a Support Vector Machine (SVM)-based
classifier that requires only simple passive measurements of connect and disconnect at will. Wireless network adminis-
802.11 traffic. We demonstrate that careful conversion of raw trators have little control over, and knowledge of, network
packet traces into input features for SVM is necessary for configurations, and cannot rely on cooperation from clients
achieving high classification accuracy. We tested our classifier for performance analysis and debugging. Thus, practical
on the four rate adaptation algorithms available in MadWifi, performance analysis and debugging efforts for large-scale
the most popular open source driver for commodity wireless
cards. The classifier performs with an accuracy of 95%-100%. wireless networks such as [4], [5] are typically based
We also show that the classifier is robust over a variety of entirely on passive monitoring, which requires no support
network conditions if the training data includes a sufficient or participation from clients. The presence of multiple rate
sampling of the range of an algorithm’s behavior. adaptation algorithms introduces a new dimension of vari-
I. I NTRODUCTION ability into 802.11 wireless networks. However, to the best
of our knowledge, none of the passive monitoring-based
802.11 supports multiple data transmission rates at the performance analysis and debugging efforts in the literature
physical layer to allow senders to maximize throughput consider the impact of 802.11 rate adaptation algorithms,
based on channel conditions. The modulation schemes used despite the fact that the choice of rate adaptation algorithms
to encode data at lower rates are more robust to channel can have a major impact on network throughput. The rate
noise that those used for higher rates. If the channel quality adaptation algorithm fingerprinting capability will provide
is good, i.e., the signal-to-noise ratio (SNR) is high, then additional information to passive monitoring systems to
higher data rates will maximize throughput because the bit- facilitate wireless network performance analysis.
error rate (BER) will be low. If the channel is noisy, lower We begin by investigating the details of the four open
data rates will maximize throughput because the high BER source rate adaptation algorithms from the popular MadWifi
at higher data rates will lead to increased loss and MAC- driver that constitute the test cases for our study. Manual ex-
layer backoffs, resulting in poor throughput. amination of implementations shows that the algorithms can
Designing algorithms that allow wireless senders to con- result in many possible rate change permutations depending
verge to the optimal rate for prevailing channel conditions on the timing and pattern of packet transmissions and losses.
in a timely fashion is challenging due to the difficulty of The large space of permutations precludes a fingerprinting
determining the cause of packet loss [1], limitations of the approach based on explicitly enumerating all possible cases
PHY/MAC interface in commodity wireless cards [2], and of an algorithm’s behavior and suggests the need for a
the assumption that a higher transmission rate always results learning-based approach for algorithm classification.
in higher loss for a given RF environment not always being We develop a rate adaptation classifier using Support
true [3]. Many attempts have been made to address this Vector Machines, a state-of-the-art machine learning tech-
challenge, resulting in a large number of rate adaptation nique, using carefully selected input features from passive
algorithms, with different algorithms performing best under packet traces. We then conduct extensive experiments in a
different network conditions. laboratory environment to identify combinations of features
This work was supported in part by NSF grants CNS-0716460, CNS- that result in the most accurate classification capability. Our
0831427, CNS-0905186, CNS-0747177, CNS-0855201, CNS-1040648 results show that a classifier trained with a robust set of
and CNS-0916955. Any opinions, findings, conclusions or other recom-
mendations expressed in this material are those of the authors and do not features can exhibit classification accuracy as high as 95%-
necessarily reflect the view of the NSF. 100%. We show that careful selection of input features is
necessary for achieving high classification accuracy. We unknown identity (but known to be one of those C algo-
also demonstrate that a classifier generated in one set of rithms) can be represented by its m instances {xi }n+m i=n+1 .
network conditions can identify algorithms in a different set Our goal is to infer the class labels yn+1 . . . yn+m in the test
of network conditions as long as the training data includes run. Note by definition yn+1 = . . . = yn+m . However, for
a sufficiently broad sampling of an algorithm’s behavior. computational convenience we adopt a two-stage procedure:
In the first stage, we use a Support Vector Machine (SVM,
II. R ELATED W ORK discussed below) to predict the instance labels yn+1 . . . yn+m
which may be inconsistent (not all the same). In the second
Given the potentially large impact rate adaptation al-
stage, we compute the single consensus label of the test run
gorithms can have on 802.11 network throughput, it is
by a majority vote of the predicted instance labels. That is,
not surprising that many research efforts over the past
the consensus label is the one which appear most frequently
decade have focused on these algorithms. Rate adaptation
in yn+1 . . . yn+m .
algorithms fall into two categories, those that use physical
To predict the instance labels, we train an SVM which
layer information such as signal-to-noise ratio (SNR) [6]–
can be understood as a function Rd → {1, . . . ,C} that
[9], and those that use frame level information such as
attempts to map any instance x to its class label y. For
packet loss and throughput [3], [10], [11]. Specialized
simplicity, we describe the linear, binary-label case C = 2,
rate adaptation algorithms have also been developed for
and refer the reader to the literature for the multi-label case.
vehicular wireless networks, e.g., [12], [13].
In this case, it is customary to encode the labels equivalently
A number of projects such as [3], [8], [14] have fo-
as {−1, 1} instead of {1, 2}. A linear binary-label SVM
cused on characterizing the performance of rate adaptation
estimates a real-valued function f : Rd → R with parameters
algorithms, analyzing whether particular algorithm design
w ∈ Rd and b ∈ R:
choices results in optimal throughput in a particular type
of RF environment. Our work is complementary to these f (x) = w% x + b,
projects because it can be used to identify the algorithms
and predicts the class label according to sign( f (x)). Train-
deployed, and the knowledge of algorithm performance
ing amounts to selecting an f that performs the best on the
characteristics garnered from these other projects can be
training set {(xi , yi )}ni=1 . Performance here is measured by
used to determine whether the rate adaptation algorithm is
the so-called hinge loss function
the cause of performance problems.
Several machine learning techniques, including SVMs, L( f (x), y) = max(1 − y f (x), 0)
have been used to select optimal modulation and coding
which is to be minimized. The hinge loss is a surrogate
schemes based on physical layer parameters for MIMO
(and in fact, a convex upper bound) of the 0-1 loss, which
systems (e.g., [15]). Such efforts are complementary to our
is one if the prediction sign( f (x)) differs from the true label
work because they use SVMs to optimize throughput while
y, and zero otherwise. The hinge loss is preferred over the
our work uses SVMs for algorithm identification.
0-1 loss because the former is easier to optimize due to its
III. S UPPORT V ECTOR M ACHINES convexity.
Given the training set, training an SVM involves finding
This section introduces Support Vector Machines: for the function fˆ that minimizes the hinge loss on the training
more details see [16]. We are interested in predicting set, plus a regularization term:
the identity of rate adaptation algorithms based on their n
observed characteristics. In statistical machine learning, this fˆ = arg min ∑ L(( f (xi ), yi ) + λ& f &2 . (1)
f
can be cast into a classification problem. i=1
Each run of a particular algorithm produces a series The first term is the total loss on the training set. The
of feature vectors. We call each such feature vector x an function that minimizes this total loss “does best” on the
instance. Each instance is represented by a d-dimensional training set. Such a function, however, may not be the
real-valued vector x ∈ Rd . The label y ∈ {1, . . . ,C} of an one that produces the most accurate predictions on future
instance x is the identity of the algorithm underlying this test instances. This is because minimizing training set loss
run, where C is the number of distinct algorithms. For has the danger of overfitting the training data. One way
example, if a run of algorithm 2 produces 1000 feature to prevent overfitting is to regularize f by its norm & f &2 ,
vectors, we would have the following instance-label pairs: with the intuition that we prefer a smoother function. The
(x1 , y1 = 2), (x2 , y2 = 2), . . . , (x1000 , y1000 = 2). scalar λ balances training set loss and smoothness of f .
We have a training set which consists of multiple runs of The solution to the optimization problem (1) can be found
all the algorithms under different conditions. The complete efficiently using a quadratic program. The basic idea can be
training set can be represented as a collection of n instance- readily extended to multi-label cases. We use the multi-label
label pairs {(xi , yi )}ni=1 . A “test run” of an algorithm with linear SVM software SVMmulticlass [17].
We have used SVM-based methods for our earlier work rate (6 Mbps in 802.11a/g). c0 is set to 4, and c1, c2 and c3
on TCP throughput prediction for wireline and wireless are set to 2. Since r0 is updated indirectly based on credits,
environments in [18], [19]. The use of SVM in this case is and the retry chain is 10 packets long, Onoe is rather slow
considerably different because in the earlier work, the input to adapt to changing network conditions.
feature set was small (less than 5 features), fairly obvious AMRR [11], like Onoe, tries to maximize throughput by
(e.g., path properties such as loss rate and queuing delay), selecting the highest transmission rate that results in loss
and the time scale of interest was on the order of several rate below a certain threshold. If less than 10% of packets
seconds. However, in this case, the feature set is large and are lost in the last interval, the current r0 is increased to the
non-obvious as described in Section V-C, and the time scale next highest rate, and if greater than 30% of packets are lost,
of interest is on the order of tens of milliseconds. it is decreased to the next lowest rate, otherwise it remains
unchanged. Loss rate is evaluated every 10 packets. If a rate
IV. R ATE A DAPTATION A LGORITHMS increase is attempted and it results in a loss, the interval
The development of our classifier is based on the four for attempting the next increase is enlarged exponentially,
802.11 rate adaptation algorithms implemented by the lat- up to a maximum of 50 packets. This is done to prevent
est version (0.9.4) of the popular open-source MadWifi unnecessary losses if the current transmission rate is the
driver [20]. We chose an open source driver so we could highest possible for the target loss rate. r1 and r2 are set
gain insight into algorithm behavior and validate the results to the two rates consecutively below the current r0, and r3
of the classifier based on manual algorithm inspection. In is set to the lowest possible rate. c0, c1, c2 and c3 are all
this section we summarize the MadWifi rate adaptation set to 1 to make the retry chain shorter and the algorithm
algorithms. Our objective is to highlight the complexities more responsive compared to Onoe.
of an algorithm’s behavior, in particular the fact that an Sample Rate [3] selects r0 by explicitly computing the
algorithm’s behavior can change dramatically depending on rate most likely to maximize throughput in the prevailing
the prevailing network conditions. The need for a learning- network conditions, unlike Onoe and AMRR, which use the
based classifier arises because the large number of packet combination of loss minimization and rate maximization
rate and retransmission patterns that can occur with a given to estimate the best rate. Onoe and AMRR assume that
algorithm would be very difficult to enumerate explicitly. a higher transmission rate will always result in a higher
The MadWifi driver is designed for wireless cards using loss rate in a given environment. However, [3] shows this
Atheros chips, which implement multi-rate retries [21]. The assumption to be incorrect, and shows that loss rate at a
Hardware Abstraction Layer (HAL) exports a retry chain, higher transmission rate may be lower depending on the
consisting of 4 ordered pairs of rate/count values, r0/c0 modulation and encoding of the rates and the amount of
through r3/c3. The hardware makes c0 attempts to transmit noise in the RF environment. Motivated by these obser-
a given packet at rate r0, c1 attempts to transmit a packet vations, Sample Rate explicitly computes throughput for a
and rate r1, and so on. Once the packet is successfully given rate based on the number of successful and failed
transmitted, the remainder of the retry chain is discarded. transmissions and 802.11 parameters such as inter-frame
The rate adaptation algorithms have three tasks, (a) to spacing and ACK transmission time. Sample Rate changes
select rate r and count c values for the retry chain, (b) to r0 when another rate begins to yield better throughput.
determine the conditions under which the retry chain values Since a rate other than the next highest or next lowest
are updated, and (c) to determine how often to check for from the current r0 may yield the best throughput, the
the update condition. In the remainder of this section, we algorithm has to periodically sample all other rates. 10%
outline how the four algorithms perform these tasks. We of transmission time is used for sampling alternate rates.
present the algorithms in increasing order of complexity. Rates for sampling are selected intelligently, with rates more
Onoe [22] tries to maximize throughput by selecting the likely to improve throughput selected more frequently. r0 is
highest transmission rate that results in loss rate below changed if sampling indicates that another rate will result
a certain threshold. It uses a system of credits to decide in higher throughput. r1 and r2 are no longer set to the two
whether to change the current rate r0. The credit associated next lowest rates after r0, rather they are set to rates with
with r0 is increased by one if less than 10% of packets the next lowest throughputs. Throughput is reevaluated for
in the last interval need retries, and r0 is increased to the the current r0 and other candidate rates periodically and
next highest rate when the credit exceeds 10. The credit is smoothed using EWMA, with 5% of the weight coming
associated with r0 is decreased if more than 10% of packets from the last evaluation interval. r0 is changed if another
need retries. r0 is decreased to the next lowest rate if the rate’s EWMA throughput is greater.
average number of retries per packet exceeds one. The Minstrel [23] is the most advanced rate adaptation algo-
interval for evaluating loss rate and updating credits is 0.5– rithm implemented by the MadWifi driver. It improves on
1.0 seconds. r1 and r2 are set to the two rates consecutively two aspects of Sample Rate. First, it sets c0, c1, c2 and
below the current r0, and r3 is set to the lowest possible c3 based on r0, r1, r2 and r3 such that the retry chain
3%456(1'*2 ,%..
#$%&'($)$*+
,-($.-*$
012$& 3%456(1'*2
3%456(1'*2/ 4 GB RAM, Intel 82546EB (e1000) chips, running CentOS
,-($.$&&/012$/
9??#AB&/C+>$(*$+
79 5.2. The wireless nodes are installed with R52-350 mini-
:-*5&;&/<=-+4>
PCI cards (Atheros 5414 chip). We used default wireless
8I/@+
interface configurations for our experiments. These were:
,%.. ,%.. (a) no RTS/CTS, and (b) no MAC layer fragmentation.
JI/@+ Communication between wireless and wireline nodes is
3%456(1'*2/
pairwise, i.e., during an experiment, each wireless node
,-($.$&&/012$
78
J?/@+ sends data to a single, pre-assigned, wireline node and
E11(F/4.1&$2/2'(-*6/
#$%&'($)$*+ vice versa. One wireline-wireless pair is designated the
,-($.$&&/012$/
$GB$(-)$*+& %*2/#1*-+1(/ measurement pair. Algorithm classification is done for
D%..=%; ,%.. D%..=%;
,%..
packet traces from the measurement node pair. The other
H/@+ 9??#AB&/C+>$(*$+
two node pairs generate background traffic. We refer to
!" them as background pair one and background pair two.
9??/@+ The monitor, an AirPcapNx adapter [24] is located next to
the measurement node for all experiments. We use 802.11a
Fig. 1: Laboratory testbed used to generate wireless network channel 36 for all our experiments.
traces for rate adaptation algorithm classification. B. Experimental Protocol
The measurement node transferred 8MB files with 5
seconds between transfers. There were three levels of back-
completes within 26 ms, a time limit selected to minimize
ground traffic: no background traffic, one node pair gen-
TCP performance deterioration in case of losses. Second, it
erating background traffic, and two node pairs generating
was noted that even with Sample Rate’s intelligent selection,
background traffic. The first background node pair transfers
sampling alternate rates resulted in use of low rates and low
4MB files with an interval of 2 seconds between transfers,
throughput. Minstrel tries to avoid this problem by more
and the second pair transfers 512KB files with 1 second
sophisticated sampling rate selection, the complete details
between transfers. Both background nodes use Sample Rate,
of which can be found in [23]. The throughput is calculated
the default MadWifi rate adaptation algorithm.
in a manner similar to Sample Rate. It is reevaluated for the
For each background traffic level, 100 files per rate algo-
current r0 and other candidate rates every 100ms. The value
rithm were transferred for each of the four algorithms. The
is smoothed using EWMA, with 25% of the weight coming
file transfers for the different algorithms were interleaved in
from the latest 100ms interval, and r0 is changed if another
sets of 25 to compensate for possible external interference
rate’s EWMA throughput is greater.
that could effect the integrity of classification.
V. E XPERIMENTAL S ETUP The training set consists of 10% of samples selected
uniformly at random from the first half of the transfers
In this section we describe our experimental testbed, how at each background traffic level. The test set consists of
we used it to generate packet traces, and how we processed all samples from the second half of transfers at each
the traces for SVM-based algorithm classification. background traffic level. We constructed 5 different training
A. Experimental Environment sets via random sampling, and tested all of them using
the second half of the transfers. The classification accuracy
Figure 1 illustrates the experimental testbed that we used values in Section VI are the averages of the five runs.
to collect traces for classification. There are four primary
components to the setup: wireline nodes, wireless nodes, C. Feature Selection for Algorithm Classification
one commodity access point (AP), and a monitor node. The We process packet traces for rate adaptation algorithm
wireline and wireless nodes are connected in a dumbbell classification in the following manner. We generate a
topology via the APs and a switch. The AP is a Cisco training/test feature vector for every instance where the
AP1200, running IOS version 12.3(8), with single Rubber transmission rate of the kth and k+1th 802.11 data packet
Duck antenna, and integrated 802.11a module/antenna. The transmitted by the measurement node is different. The rate
switch is a commodity LinkSys 10/100 16-port Workgroup transition is the center of the packet window over which
Hub. The wireline nodes and switch are connected to the we compute features. The feature values were normalized
AP via 100Mbps Ethernet connections. The maximum data by subtracting the mean of each feature from the respective
rate for 802.11a is 54Mbps. Having 100Mbps wireline links feature values and then dividing by the standard deviation
insures that the wireless, rather than the wireline, part of the before training and testing.
network is the throughput bottleneck. We use two different feature sets, detailed below. The
The wireline and wireless nodes are identically config- difference in the feature sets is that they represent infor-
ured Sun 4200 AMD Opteron 275 (dual Core) nodes, with mation at varying levels of granularity for training and
testing. Feature Set 1 aggregates the information in the before and after a rate transition, the total number of
feature vector window, while Feature Set 2 exposes detailed features per feature vector is 3 ∗ ((2 ∗ m2 ) + 1)1 .
packet trace information. The results in Section VI will Finally, we constructed an all features vector, which is
show that having packet trace information at varying levels the concatenation of feature vectors for all window sizes for
of granularity is essential for accurate classification. both feature sets, and contains 3720 features. In Section VI,
Feature Set 1: The following features are computed for due to space considerations we present classification accu-
a window of size m1 around a rate transition event. Features racy results for all features, all window sizes for Set 2, and
(1)–(4) are computed once for the entire window, and (5)– 10, 20, 30, 40 and 50 packet window sizes for Set 1, because
(12) are computed individually for each of the eight 802.11a these feature sets yielded the most interesting results.
data rates. Each feature in the list below corresponds to
VI. R ESULTS
two distinct features, one computed for a window of size
m1 before the rate transition event, and the second for a We conducted two sets of experiments one week apart,
window of size m1 after the rate transition event. which we refer to as Experiment 1 and Experiment 2,
(1) The number of packets in the window. both using the protocol described in Section V-B. For
results presented in Section VI-A, training and test sets,
(2) The packet reception probability, defined as the number
constructed as described in Section V-B, were drawn from
of non-retry packets divided by the total number of packets
the same experiment. For results presented in Section VI-B,
in the window.
the classifier trained on data from Experiment 1 was tested
(3) The fraction of packets in the window that are unique,
on data from Experiment 2 and vice versa to investigate the
defined as the number of distinct 802.11 sequence numbers
portability and robustness of the classifier.
observed divided by the total number of packets.
We considered two classification metrics, transfer classi-
(4) The trace accuracy, which is defined as fication accuracy and sample classification accuracy. The
1−# o f missing 802.11 sequence numbers
total # o f packets . first metric indicates whether the rate adaptation algorithm
(5) The number of packets with each rate in the window. for the whole transfer is classified correctly. The classifica-
(6) The number of packets with each rate that are retries in tion for the whole transfer is determined by the majority
the window. of the classifications of the individual samples (feature
(7,8,9) The minimum, median, and maximum distance in vectors) in the transfer. We report the fraction of transfers
packets for packets with each rate from the center of the classified correctly and incorrectly for each rate adaptation
window. Distance in packets is calculated in terms of the algorithm. The second metric is defined as the percentage of
number of packets in the trace rather than packet sequence samples classified as a particular algorithm for each transfer.
numbers. When a packet of a particular rate does not occur This is a measure of the confidence we can have in the
in a window, distance is set to a constant high value. classification of a transfer. We used this metric to guard
(10, 11, 12) The minimum, median, and maximum distance against accuracy inflation, because a transfer classification
in packets for a retry packet with each rate from the center is considered correct whether 100% of the samples from
of the window. the transfer are classified correctly or whether 25.1% of
Hence, for any value of m1 , there are a total of 136 the samples are classified correctly and the other 74.9%
features, 2 ∗ 4 for the four features (1–4) before and after of the samples are split evenly between the remaining
the center of the window, and 8 ∗ 8 ∗ 2 for the eight features three classes. We found that sample classification accuracy
(5–12) for each of the eight rates before and after the center closely followed transfer classification accuracy. Due to
of the window. space considerations, we only present transfer classification
We consider the following different values of m1 : 100ms, accuracy in this paper.
200ms, 300ms, 400ms, 500ms, 10 packets, 20 packets, 30
A. Classification Accuracy
packets, 40 packets, 50 packets, 1-5 802.11 retries around
a packet transition event. Figure 2 illustrates the algorithm classification accuracy
Feature Set 2: For this set, the following three features for the case where the training and test sets are drawn
corresponding to each packet in a window of size m2 are from the same experiments. We present results from two
included in the feature vector: different runs of the same experiment, conducted in the
same laboratory environment but one week apart, because
(1) The transmission rate of the packet.
wireless network conditions are difficult to replicate due to
(2) A binary value indicating whether the packet is a retry.
external interference effects. Table I presents the distribution
(3) The difference in sequence numbers between the packet
of transmission rates for the two experiments, and it can be
and the center of the window.
seen that there is a significant difference in the distributions
We use the following values of m2 : 5, 10, 15, 20, 25, in many cases.
30, 35, 40, 45 and 50 packets. In this case, since there are
3 features per packet and the window size is m2 packets 1 +1 for the packet at the center of the window.
2 conducted one week apart in the same laboratory setting. In each case, the training and test set was drawn from the same
experiment. The individual figure labels indicate the experiment and the correct algorithm. The x-axis indicates the feature
set (1 or 2) and window size used. The y-axis indicates the fraction of transfers classified for a certain algorithm and feature
Fig. 2: Stacked histograms illustrating the fraction of transfers classified for each algorithm for Experiment 1 and Experiment
set, e.g., Set 1, 20 pkts for Experiment 2, Onoe shows that approximately 70% of transfers were classified correctly as Onoe,
25% were classified incorrectly as Minstrel and 5% were classified incorrectly as AMRR. Algorithms are indicated in the
s
2, 40pkts 2, 40pkts 2, 40pkts 2, 40pkts
2, 35pkts 2, 35pkts 2, 35pkts 2, 35pkts
2, 30pkts 2, 30pkts 2, 30pkts 2, 30pkts
o
2, 25pkts 2, 25pkts 2, 25pkts 2, 25pkts
(d) Experiment 2, Minstrel
(b) Experiment 2, AMRR
m
2, 10pkts 2, 10pkts 2, 10pkts 2, 10pkts
2, 5pkts 2, 5pkts 2, 5pkts 2, 5pkts
1, 50pkts 1, 50pkts 1, 50pkts 1, 50pkts
a
a
1, 40pkts 1, 40pkts 1, 40pkts 1, 40pkts
1, 30pkts 1, 30pkts 1, 30pkts 1, 30pkts
1, 20pkts 1, 20pkts 1, 20pkts 1, 20pkts
1, 10pkts 1, 10pkts 1, 10pkts 1, 10pkts
all features all features all features all features
1
0.8
0.6
0.4
0.2
0.8
0.6
0.4
0.2
0.8
0.6
0.4
0.2
0.8
0.6
0.4
0.2
0
Fraction of transfers Fraction of transfers Fraction of transfers Fraction of transfers
2, 50pkts 2, 50pkts 2, 50pkts 2, 50pkts
2, 45pkts 2, 45pkts 2, 45pkts 2, 45pkts
s
s
2, 40pkts 2, 40pkts 2, 40pkts 2, 40pkts
2, 35pkts 2, 35pkts 2, 35pkts 2, 35pkts
2, 30pkts 2, 30pkts 2, 30pkts 2, 30pkts
o
2, 25pkts 2, 25pkts 2, 25pkts 2, 25pkts
m
2, 10pkts 2, 10pkts 2, 10pkts 2, 10pkts
2, 5pkts 2, 5pkts 2, 5pkts 2, 5pkts
1, 50pkts 1, 50pkts 1, 50pkts 1, 50pkts
a
1, 40pkts 1, 40pkts 1, 40pkts 1, 40pkts
1, 30pkts 1, 30pkts 1, 30pkts 1, 30pkts
1, 20pkts 1, 20pkts 1, 20pkts 1, 20pkts
1, 10pkts 1, 10pkts 1, 10pkts 1, 10pkts
all features all features all features all features
0.8
0.6
0.4
0.2
0.8
0.6
0.4
0.2
0.8
0.6
0.4
0.2
0.8
0.6
0.4
0.2
0
Fraction of transfers Fraction of transfers Fraction of transfers Fraction of transfers
TABLE I: Distribution of packets across transmission rates for four target algorithms. The first column indicates the algorithm
(algorithms are identified by the first letter in their names), the background traffic level (nobg for no background traffic,
bg1 for one background node pair generating traffic, and bg2 for two background node pairs generating traffic), and the
experiment number (1 or 2). The columns labeled with rates indicate the percentage of packets in each experiment transmitted
at that rate. Rate Change indicates the percentage of times two consecutive packets were transmitted at different rates. Retry
indicates the percentage of packets that were retries. Consecutive Retry indicates the percentage of times the k+1th packet
was a retry given that the kth packet was a retry. This is an estimate of how far down the retry chain the algorithm had to
go in case of a loss. The last three values are measures of the information content of a packet trace. Algorithms that have
higher values are likely to be classified with higher accuracy because they provide SVM with a larger amount of information
over a given time window, enabling SVM to generate a better distinguishing signature. All values are aggregates for all
transfers in a given experiment.
Experiment 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps Rate Change Retry Consec. Retry
a, nobg, 1 0.3 0.0 0.1 0.7 9.6 78.1 11.1 0.0 31.8 16.5 6.2
a, nobg, 2 0.4 0.1 0.6 1.3 15.3 77.2 5.0 0.0 30.3 15.7 6.2
a, bg1, 1 0.3 0.1 1.3 6.7 37.7 51.2 2.8 0.0 28.6 15.4 9.8
a, bg1, 2 0.6 0.6 2.8 8.7 35.7 50.0 1.6 0.0 29.1 15.7 10.9
a, bg2, 1 0.3 0.7 1.4 5.4 31.3 55.4 5.5 0.0 30.7 16.4 9.2
a, bg2, 2 0.6 0.5 2.4 8.1 48.7 39.4 0.3 0.0 31.6 17.0 11.0
m, nobg, 1 0.0 0.0 0.1 0.2 0.5 92.4 4.0 2.8 7.4 14.2 42.9
m, nobg, 2 0.3 0.0 0.0 0.6 2.8 89.7 3.9 2.5 10.3 23.0 46.8
m, bg1, 1 0.2 0.1 0.1 0.3 2.8 88.9 5.1 2.6 13.1 24.6 44.9
m, bg1, 2 0.5 0.1 0.2 1.5 15.5 75.9 3.9 2.4 13.4 27.8 47.8
m, bg2, 1 0.4 0.0 0.1 0.5 1.9 89.5 4.9 2.6 10.5 22.7 41.3
m, bg2, 2 0.6 0.1 0.3 1.2 12.6 78.5 4.3 2.4 14.9 30.3 50.8
o, nobg, 1 0.0 0.0 0.0 0.0 4.9 94.8 0.2 0.0 4.1 10.9 32.8
o, nobg, 2 0.2 0.0 0.0 0.5 46.7 52.6 0.0 0.0 5.8 15.1 38.7
o, bg1, 1 0.2 0.0 0.0 0.7 33.0 66.2 0.0 0.0 7.7 18.8 30.4
o, bg1, 2 0.3 0.0 0.1 1.8 81.8 16.0 0.0 0.0 5.4 17.3 26.8
o, bg2, 1 0.3 0.0 0.0 1.4 56.1 42.1 0.0 0.0 5.8 17.2 26.6
o, bg2, 2 0.3 0.0 0.1 1.9 82.4 15.2 0.0 0.0 5.1 17.2 26.7
s, nobg, 1 0.2 0.0 0.2 2.9 41.5 54.5 0.7 0.1 9.7 8.2 23.9
s, nobg, 2 0.4 0.0 0.5 11.9 61.7 25.2 0.2 0.0 10.7 9.0 25.9
s, bg1, 1 1.1 0.0 2.9 18.9 49.2 27.4 0.4 0.1 11.6 16.8 25.5
s, bg1, 2 0.8 0.0 4.5 24.7 57.2 12.7 0.1 0.0 11.3 16.3 25.3
s, bg2, 1 1.7 0.0 4.6 17.1 40.3 36.0 0.2 0.1 13.2 17.9 29.7
s, bg2, 2 0.9 0.0 4.7 22.5 54.6 17.0 0.1 0.0 11.8 16.6 25.8
1
a m o s Figure 2 shows that for all algorithms except Onoe
(2e,2f), the classification accuracy for all features is high,
Fraction of transfers
0.8
ranging from 95% to 100%. Also, for all cases except
0.6 Experiment 1, Onoe 2e, the accuracy for all features is very
0.4
close to that of the highest accuracy individual feature set
for that experiment. Both observations support our learning-
0.2
based approach. Their combination means two things. First,
0 classification accuracy is high. Second, feature set and win-
a, 2--1
a, both--1
a, 1--2
a, both--2
m, 2--1
m, both--1
m, 1--2
m, both--2
o, 2--1
o, both--1
o, 1--2
o, both--2
s, 2--1
s, both--1
s, 1--2
s, both--2