0% found this document useful (0 votes)
12 views14 pages

1 - 6 - RPA A FlexibleScheduling Algorithm For Input Buffered Switches

The document discusses the RPA (Reservation with Preemption and Acknowledgment) scheduling algorithm for input buffered switches, which aims to optimize data transfer efficiency by allowing input ports to indicate their most urgent transfer needs. RPA is designed to overcome throughput limitations associated with traditional input queuing methods and can support multiple traffic classes with strict priority disciplines. The paper evaluates RPA's performance through simulations and compares it with other scheduling algorithms, demonstrating its flexibility and effectiveness under various traffic conditions.

Uploaded by

trongvinhle04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views14 pages

1 - 6 - RPA A FlexibleScheduling Algorithm For Input Buffered Switches

The document discusses the RPA (Reservation with Preemption and Acknowledgment) scheduling algorithm for input buffered switches, which aims to optimize data transfer efficiency by allowing input ports to indicate their most urgent transfer needs. RPA is designed to overcome throughput limitations associated with traditional input queuing methods and can support multiple traffic classes with strict priority disciplines. The paper evaluates RPA's performance through simulations and compares it with other scheduling algorithms, demonstrating its flexibility and effectiveness under various traffic conditions.

Uploaded by

trongvinhle04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/3160027

RPA: A flexible scheduling algorithm for input buffered switches

Article in IEEE Transactions on Communications · January 2000


DOI: 10.1109/26.809713 · Source: IEEE Xplore

CITATIONS READS
78 98

4 authors, including:

Emilio Leonardi
Politecnico di Torino
274 PUBLICATIONS 4,663 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

NAPA-WINE EU FP7 View project

Protocol-Independent Switch Fabric View project

All content following this page was uploaded by Emilio Leonardi on 02 December 2014.

The user has requested enhancement of the downloaded file.


IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 47, NO. 12, DECEMBER 1999 1921

RPA: A Flexible Scheduling Algorithm


for Input Buffered Switches
Marco Ajmone Marsan, Fellow, IEEE, Andrea Bianco, Member, IEEE, Emilio Leonardi, and Luigi Milia

Abstract— This paper presents and evaluates a quasi-optimal either a memory with access rate times the link speed,
scheduling algorithm for input buffered cell-based switches, or complex parallel memory architectures. For these reasons,
named reservation with preemption and acknowledgment (RPA). nonoutput buffered switch designs have recently received a
RPA is based on reservation rounds where the switch input
ports indicate their most urgent data transfer needs, possibly lot of attention.
overwriting less urgent requests by other input ports, and Input buffered switches are designed to operate with a
an acknowledgment round to allow input ports to determine switching fabric running at an internal rate equal to the
what data they can actually transfer toward the desired switch external links speed; unfortunately, when using a first-input
output port. RPA must be executed during every cell time to
first-output (FIFO) queuing discipline at input queues, due to
determine which cells can be transferred during the following
cell time. RPA is shown to be as simple as the simplest proposals the head-of-the-line (HoL) blocking phenomenon, they provide
of input queuing scheduling, efficient in the sense that no a maximum achievable throughput limited to about 60% of
admissible traffic pattern was found under which RPA shows the link speed, under uniform traffic conditions [1]. In order to
throughput limitations, and flexible, allowing the support of either reduce, or completely overcome, the throughput penalty,
packet-mode operations and different traffic classes with either
strict priority discipline or bandwidth guarantee requirements.
separate queues are required at each input port for the storage
The effectiveness of RPA is assessed with detailed simulations of cells directed to different output ports.
in uniform as well as unbalanced traffic conditions and its Once cells have been sorted into separate queues, the
performance is compared with output queuing switches and the performance of an input buffered switch essentially depends on
optimal maximum weighted matching (MWM) algorithm for its scheduling algorithm. With the term scheduling algorithm,
input-buffered switches. A bound on the performance difference
between the heuristic weight matching adopted in RPA and we refer in this paper to the algorithm required to select the
MWM is analytically computed. cells that must be transferred within a cell time from input
queues to output ports, with the constraint of selecting not
Index Terms— Input buffering, scheduling algorithms, switch
architectures. more than one cell from each input port and not more than
one cell directed to each output port. These algorithms must be
efficient, to provide throughput comparable with that of output
I. INTRODUCTION buffered switches, and simple, since they must be executed
within a cell time in high-speed switches.
S WITCH designs with output buffers are very popular be-
cause they provide optimal performance. However, output
buffered switch designs require a switching fabric speedup
A different issue is the ability to deal with the requirements
of different traffic classes in input buffered switches, i.e.,
equal to , i.e., an internal data transfer rate times providing the means to give priority to cells belonging to
higher than the external link speed, being the number classes of traffic with more stringent quality-of-service (QoS)
of switch input–output ports. Since data rates on point-to- requirements. Unfortunately, also the algorithms used for this
point fiber links keep growing very rapidly, providing within purpose in the literature are called scheduling algorithms; we
switching fabrics the required speedup for output buffered will use the term QoS-aware scheduling for such algorithms
designs is becoming harder. Moreover, when adopting the [2], [3].
output buffered architecture, within a cell time, cells could Several scheduling algorithms were presented in the techni-
need to be transferred to the same output port, thus requiring cal literature to address these issues (mostly trying to overcome
HoL blocking) in input buffered switches under uniform traffic
Paper approved by P. E. Rynes, the Editor for Switching Systems of
the IEEE Communications Society. Manuscript received June 30, 1998; conditions [4]–[7]. Most of them focus on architectures where
revised April 19, 1999 and April 28, 1999. This work was supported in separate queues are available at each input port, each one
part by a research contract between Politecnico di Torino and CSELT, and storing cells directed toward a different output port. However,
in part by the Italian Ministry for University and Research. This paper was
presented in part at the IEEE Symposium on Computers and Communications those algorithms do not succeed in providing the maximum
(ISCC), Athens, Greece, June 1998, and also in part at the SCS Symposium achievable throughput under nonuniform traffic patterns. Some
on Performance Evaluations of Computer and Telecommunications Systems of them can even lead to starvation in some traffic scenarios.
(SPECTS), Chicago, IL, July 1999.
M. Ajmone Marsan, A. Bianco, and E. Leonardi are with the Dipar- To the best of our knowledge, the first proposal of an
timento di Elettronica, Politecnico di Torino, 10129 Turin, Italy (e-mail: algorithm that leads to the optimal and fair exploitation of
[email protected]; [email protected]; [email protected]). the switch bandwidth under every admissible traffic pattern ap-
L. Milia is with Centro Ricerche Fiat, 10043 Orbassano, Italy (e-mail:
[email protected]). peared recently (see the maximum weighted matching (MWM)
Publisher Item Identifier S 0090-6778(99)09774-3. algorithm of [8]). It is based on solving a bipartite graph
0090–6778/99$10.00  1999 IEEE
1922 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 47, NO. 12, DECEMBER 1999

matching problem, where one node partition is associated or MSM algorithm. The heuristic algorithm characterizes and
with input ports and the other node partition with output distinguishes the various proposals.
ports; edges are labeled with a number representing the edge Whereas iSLIP and 2DRR propose heuristics to mimic
weight according to a suitable criterion, and the algorithm MSM, all other proposals (including RPA) try to emulate the
selects the matching that provides the maximum total weight. more efficient (but more complex) MWM algorithm. RPA, as
Unfortunately, these algorithms are not simple to implement, detailed in the following sections, requires ordered decisions at
requiring a computational complexity , and they each input port; decisions are taken on the basis of some local
do not allow the management of different traffic classes. As a information and a limited amount of global information, stored
consequence, they may not be adequate for the solution of all in a reservation array. This approach can be implemented
problems in the implementation of large high-speed switches. in distributed architectures, but it makes RPA difficult to
Simpler solutions can be based on a maximum size matching run on parallel processors, although a time-pipeline can be
(MSM) algorithm (that can be seen as a MWM with weight set envisioned.
to 1 on all edges), whose complexity is . However, Another important aspect is related to the metrics used by
MSM algorithms can lead to starvation under inadmissible the algorithms to choose a subset of cells to be transmitted in
traffic (i.e., with traffic patterns not satisfying the relations each cell time (via the edge weights). Excluding the MSM-
, where represent, respec- like algorithms, whose admissible edge weights are only 0 (no
tively, the input and output port indices, and is the average cell to transfer) and 1 (at least one cell to transfer), we can
load from input port to output port ), and instability, i.e., identify the following metrics: 1) queue length (QL) metrics,
throughput loss, under admissible traffic. where edge weights are associated with the queue length, used
In this paper, we describe reservation with preemption and in iLQF and RPA; 2) port occupation (PO) metrics, where edge
acknowledgment (RPA), a scheduling algorithm previously weights are based on input and output port occupation, used in
presented and examined by the authors in [9] and [10], whose iLPF; 3) MUCS metrics, a peculiar metric defined and used in
computational complexity is only . RPA leads to an MUCS; and 4) oldest cell (OC) metrics, where edge weights
efficient and fair exploitation of the switch bandwidth under are computed as cell queuing delays, adopted in iOCF. Of
several critical traffic conditions; at the same time, it provides course, each metric has its advantages (simplicity for QL, good
simple approaches to deal with different traffic classes and delay control for OC) and disadvantages (potential starvation
packet-mode (as opposed to cell-by-cell) operations. for lightly loaded queues for QL, implementation complexity
The novel contribution of this paper is threefold. First, we for OC and MUCS, inefficiency under nonuniform traffic for
show how the original RPA algorithm [9] can be adapted to PO). Note that although RPA and iLQF share the same metrics,
deal with multiple traffic classes providing a strict priority since they are based on different heuristic algorithms, they
discipline, presenting detailed simulation results under several provide (slightly) different performance.
traffic patterns and comparing RPA with output buffered archi- Concerning algorithmic complexity, measured as the num-
tectures and with MWM. Part of these results was presented ber of operations to be executed on a single central processor
in [10], although here we consistently use confidence intervals running the scheduling algorithm, we showed in [15] that
in all simulation runs. Second, we extend RPA to jointly MUCS has complexity , although it has been efficiently
deal with cell scheduling and QoS-aware scheduling, although implemented in analog hardware, iSLIP, iLQF, and iOCF
in this context, performance is less satisfactory than that have complexity , adopting the suggested number
obtained in output buffered switches using well-known QoS- of iterations, [6], and RPA, iLPF, and 2DRR have
aware scheduling [3]. Third, we prove that the weight of the complexity . As a consequence, RPA shares with 2DRR
matching obtained with RPA in the worst case cannot be less and iLPF the lowest complexity; however, RPA, 2DRR, and
than half the maximum weight, which is obtained with the iLPF cannot be run on parallel processors, a nice feature
MWM algorithm. presented by iSLIP, iLQF, and iOCF, which could reduce
the algorithmic complexity, if measured as the number of
operations to be executed on each parallel processor. Recall
that the scheduling algorithm must be executed during every
II. RELATED WORK cell time to determine which cells must be transferred during
Several scheduling algorithms for input buffered switches the following cell time. It is thus very important to keep
were recently presented. Among them are iSLIP in [6], 2DRR complexity at a minimum.
in [11], iLQF and iOCF in [12], RPA in [9], MUCS in [13], RPA can be easily adapted to run in packet-mode operation,
and iLPF in [14]. it can support multiple classes of traffic in a strict priority
Their characteristics can be examined from several points of discipline, and it can support QoS-aware scheduling algo-
view; for a detailed comparison in terms of throughput, delay, rithms. The only previous proposal of QoS-aware scheduling
complexity, and flexibility, see [15]. We provide in this paper a algorithm for IQ switches that appeared in the literature so
brief taxonomy and highlight only the peculiar characteristics far is called weighted probabilistic iterative matching (WPIM)
of RPA. [16]. WPIM allows the allocation of the bandwidth on each
Given the complexity of the MWM (and MSM) algorithm, output channel to connections entering the switch at differ-
most of the algorithms proposed in the literature rely on ent inputs. However, WPIM cannot separately consider the
heuristics to approximate the behavior of an exact MWM bandwidth allocations to flows with the same input and output
MARSAN et al.: RPA: A FLEXIBLE SCHEDULING ALGORITHM 1923

ports (WPIM implements what the authors call a ‘connection- of traffic can be managed with clever definitions of the
level’ scheduling, as opposed to a ‘flow-level’ scheduling). urgency, as we shall see.
Moreover, the random choices on which WPIM is based limit 3) is set to 1 during the acknowledgment
the maximum throughput on each output channel to about round by the input port being granted the right to transfer
90%. a cell to .
Finally, RPA can be also modified to handle multicast traffic, The three fields are initialized to null values at the beginning
using an approach similar to that adopted in the iPOINT of every reservation round.
multicast contention resolution algorithm (iMCRA), since both Input ports access array following a preestablished
scheduling algorithms are based on the use of a vector where order. For the description of the RPA operations, suppose
the information for contention resolution is stored [17]. We do that, within some arbitrary cell time, is selected as the first
not tackle the multicast problem in this paper, whereas iMCRA input port in the access order; the other input ports follow, for
mainly focuses on this issue. Other multicast scheduling example, in ascending subscript order.
algorithms for input-queued switches were presented in the In the reservation round, selects its most urgent cell (i.e.,
literature, for example in [18]. the cell with the highest urgency among the cells at the head
of its input queues ) and issues a
reservation for a cell transfer toward the output port of this
III. RPA SCHEDULING most urgent cell. If the output port to which the most urgent
RPA stands for reservation with preemption and acknowl- cell must be transferred is , port records its index in
edgment. The name indicates that the scheduling algorithm the field, and the urgency of the selected cell
is based on a reservation round where the switch input ports in the field.
can indicate their most urgent cell transfer needs, possibly The reservation array is accessed next by , the input port
overwriting less urgent requests by other input ports, and an following in the access order (if the order is by increasing
acknowledgment round to allow input ports to determine what index, ). Input port first evaluates a weight
cell they can actually transfer toward the desired switch output function for each output port
port.
Denote by the number of input and output ports of the
switch, and label input and output ports with a subscript where is the urgency of the cell at the head of
. We shall denote by and , respectively, queue . Then computes and
the th input and output ports. records the value at which the function reaches its
In the description of RPA, we first concentrate on the maximum. If is allowed to issue a reservation for
case of one traffic class, in Section III-A, providing some a cell transfer toward , writing its index in
indications about the extensions required to implement packet- and in ; this implies that may overwrite
mode operations. Then, in Section III-B, we illustrate priority- a previous reservation for a cell transfer toward , thus
RPA (P-RPA), the version of RPA that is capable of supporting preempting it. If , no reservation can be issued.
multiple traffic classes with a strict priority discipline. Finally, The reservation round then continues; array is next
in Section III-C, we describe fair-RPA (F-RPA), the version processed by input port , where , which executes
of RPA supporting QoS-aware scheduling. the same algorithm. After all input ports have processed array
, the reservation round terminates.
At this point, the acknowledgment round immediately starts,
A. RPA with One Class of Traffic and array is processed for the second time by all input
RPA requires, at each input port, the availability of ports in the same order followed during the reservation round.
separate FIFO queues for the storage of cells directed to First, checks whether its reservation toward has
different output ports. More precisely, the th queue at , been overwritten by any other input port with more urgent
denoted by , contains cells directed from the considered cell transfers toward . If the reservation has not been
input port toward . overwritten, is granted access to , and the
RPA operates on an array where reservations are written by field is set to 1. Otherwise, cannot transfer a cell to ;
input ports at each cell time. This reservation array will be however, if at least one “idle” output port is found, i.e., an
denoted by ; it contains elements, denoted by output port toward which no reservation has been made (the
with , that refer to the output ports in order field is null), is allowed to transfer a cell toward
( refers to ). any one of the idle output ports. In practice, selects the idle
Each element comprises three fields. output port for which it has the most urgent cell and sets to 1
1) contains the address of the input port the corresponding busy field. Note that in an switch,
trying to reserve a cell transmission to . if a reservation has been overwritten, at least one idle output
2) contains the urgency (a value ) of the cell port exists. The same may not be true when the number of
transfer from to . The urgency defines output ports is smaller than the number of input ports.
the importance of the cell transfer; several approaches Next, port processes array to check whether its
may be adopted for the computation of the urgency val- reservation toward has been overwritten; if not, it can access
ues; packet-mode operations as well as multiple classes the desired output port, and it sets the field to
1924 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 47, NO. 12, DECEMBER 1999

1. Otherwise, it checks whether any idle output port exists; if algorithms must be revised so as to separate the service of
some idle output ports are found, is granted access toward different traffic classes. Finally, the definition of the urgency
the output port for which it has the most urgent cell, and the metrics must be changed aiming at the support of different
appropriate busy field is set to 1. types of QoS requirements.
This acknowledgment algorithm is orderly executed by all In this section, we first describe the modifications to be
input ports; when all input ports have processed array , introduced in the RPA algorithms, and later we focus on the
the scheduling algorithm terminates. One cell can now be urgency metrics that allows the implementation of a static
transferred from each input port to the output port toward priority service discipline among traffic classes.
which it has been granted access. When traffic classes are present at the switch input
The ordering of input ports in their access to the reservation ports, in order to achieve acceptable performance, the RPA
vector can either be the same in any cell time (in our example, reservation round must comprise reservation cycles to be
this means that always is the first input port, the second, executed in sequence, before the acknowledgment round. This
and so on), or it can change. We call static the former version is necessary because, as we shall see, successive overwriting
of the scheduling and dynamic the latter. In the presentation of reservations made by different input ports toward the
of numerical results, we shall consider the static version, as same output port can occur, if higher priority cells must be
well as a dynamic version where a round-robin selection of the transferred by input ports that access the reservation array
starting point of the cyclic rounds is performed. Of course, the later in the round (according to the preestablished ordering).
static version leads to some unfairness (only for delays, not for This could unnecessarily forbid transmissions of cells destined
throughputs), due to the fact that input ports always have the toward different output ports from input ports accessing the
same position within the round. The simple dynamic version reservation array early in the round. A small number of
that we consider somewhat remedies such unfairness. reservation cycles is sufficient to reduce this undesirable
Several metrics can be adopted to quantify the cell urgency. phenomenon.
When just one class of traffic flows enters the switch, the During the whole reservation round, like in single-class
number of cells stored at each input queue can be chosen as the RPA, each input can reserve the transmission of no more than
urgency for the cell at the head of the queue; this is the urgency one cell toward one output port. During each one of the
metric that we used in our simulation experiments. With this reservation cycles, each input port that has not yet been suc-
metric, we prove in Section V that the weight of the matching cessful in its reservation can overwrite previous reservations of
obtained with RPA in the worst case cannot be less than half other input ports if it has more urgent cells at the heads of its
the maximum weight, produced by the MWM algorithm. queues. At the end of the th reservation cycle, the reservation
1) RPA in Packet-Mode Operation: In order to obtain a round terminates, and an acknowledgment round starts, where
packet-mode operation, which may be quite useful if the switch each input checks whether its reservation has been overwritten
should, for example, offer a service consisting in the routing by any other input port. If the reservation has not been over-
of IP packets, it is possible to dynamically alter priorities written, the input port is granted access to the desired output.
whenever the first cell of a packet is transferred from the Otherwise, the input port cannot transfer any cell toward any
switch input port to the switch output port, so that the transfer output. Note that in P-RPA, reservations of cell transfers to-
of a sequence of cells corresponding to one packet cannot be ward idle output ports (i.e., ports toward which no reservation
interrupted. has been made) are not allowed during the acknowledgment
More formally, whenever input port has been granted round; this slight modification is required in order to guarantee
access toward output port for the first cell of a packet, a static priority service discipline among traffic classes.
it sets the urgency value of the queue Consider now the definition of an urgency metric that allows
, the maximum admissible value for the the implementation of a static priority service discipline among
urgency metrics. This value is kept until the last cell of the traffic classes. In a multiclass traffic scenario, the urgency
packet has been transferred toward output port . value must be defined for each cell at the head of every
Note that this allows a contiguous transfer of cells belonging transmission queue ( urgencies for each input port).
to the same packet, thus avoiding any requirement of packet Since we wish to transmit cells destined to the same output
reassembly at output ports; cut-through output transmission port according to a static priority discipline, the transmission
could be envisioned in this context, to reduce the packet transit schedule of higher priority cells must be completely indepen-
delay. dent of the lower priority traffic load. This implies that the
urgency of every higher priority class cell must be greater than
the urgency of any lower class priority cell. Hence, the urgency
B. P-RPA: Multiclass RPA with Strict Priority Discipline function must return values ranging in disjoint intervals for
The support of several different traffic classes in a strict different traffic class cells. However, this condition is not
priority framework requires some modifications to the RPA sufficient; for the implementation of a priority service, it is
algorithm: we call this version priority-RPA (P-RPA). First of necessary to guarantee that a reservation for a lower priority
all, denoting by the number of traffic classes, each input port cell directed to output port is impossible if a higher priority
must be equipped with a number of queues equal to , in cell directed to output port , with urgency larger than
order to separate cells directed to a specific output port within , is present at the same input. Thus, the minimum
each traffic class. Second, the reservation and acknowledgment urgency value returned for cells belonging to traffic class
MARSAN et al.: RPA: A FLEXIBLE SCHEDULING ALGORITHM 1925

must be greater than twice the maximum value returned for As a consequence, since the algorithms used in IQ switches
lower priority cells. Moreover, any difference among values cannot guarantee that cells directed to a specific output are
returned by the urgency function for -class cells must exceed transmitted in increasing tag value order, results obtained in
the maximum value taken by lower priority cells. IQ switches are different from those of OQ switches, even if
Let be the queue at input port storing cells directed the same SC-WFQ algorithm is used.
to output and belonging to traffic class . Let We assume that the value of can be known at all inputs
be the capacity of queues at input ports storing cells of simply by observing the cells that are transferred through the
traffic class . The urgency value assigned to the switching matrix. Should this not be possible, the values of
cell at the head of queue depends on the traffic class can be broadcast to all inputs by adding a field to
and the number of cells in the transmission queue .A array . The value of at step gives the value
possible expression of the cell urgency is of at step .
In F-RPA, it is necessary to couple the tag definition with
the urgency metrics characteristics of RPA. This is a peculiar
where . requirement introduced by the IQ architecture. We define the
Observe that the modification of the reservation round and cell urgency , associated with the cell at the head of
of the urgency metrics with respect to the original RPA algo- the queue storing cells directed toward output port at input
rithm allow a strict priority service discipline to be obtained; port , as the smallest tag associated with cells belonging to
lower priority cells never access the switching matrix prior to the queue.
higher priority cells. This occurs independently of the number The initial value for each element of the array
of reservation cycles ; increasing the number of reservation (i.e., the urgency value associated with no reservation) is set
cycles only improves the overall switch performance. equal to , where is the smallest
rate allocated to flows directed to output .
Given the above assumptions, the F-RPA algorithm can be
C. F-RPA: Multiclass RPA with Fair Bandwidth Allocation executed as the normal RPA algorithm, with only a minor
By changing the urgency metrics, RPA can be extended to modification: since with our definition, smaller tags correspond
deal with QoS requirements. We present fair-RPA (Fair-RPA), to more urgent cells, the function must be replaced by
an adaptation of RPA, that allows a good bandwidth separation a function. Moreover, if the field is used,
among different traffic flows to be obtained. during the acknowledgment round, input that can transfer
According to the self-clocked weighted fair queueing (SC- a cell to must set the value of equal to the
WFQ) algorithm proposed in [3] for OQ architectures, when urgency of the cell to be transferred. This value will be read
the th cell belonging to flow (that enters the switch at by all inputs during the following reservation round.
and exits from ) arrives at the (output) queue, a tag is IV. COMPLEXITY
computed and associated with the cell
We now prove that the computational complexity of the
RPA scheduling algorithm is , as claimed in Section I.
We first focus on the single traffic class RPA, then we also
(1)
discuss P-RPA and F-RPA.
where is the negotiated rate for flow , and the virtual During the reservation round, operations are required
time associated with the output queue is set equal to the at each input port to compute for each ; in addi-
tag of the last cell that has been transmitted on . Tag tion, operations are required to determine the output
represents the time stamp of the cell, and cells are transmitted port for which is maximized; one operation is
according to increasing tag values. sufficient to verify whether .
In the case of IQ switches, the same formulation can be The computational complexity of the reservation round is
used, setting the virtual time equal to the tag of the last cell thus , since it is composed of sequential steps, each
that has been transferred (from any input) toward ; note that one of complexity .
this cell will be transmitted on the output link with no delay, During the acknowledgment round, operations are
since no queuing at output ports is required in IQ switches. required at each input port to verify whether the reservation
QoS-aware scheduling in IQ switch architectures must se- has been overwritten; in addition, operations may be
lect the cells to be transferred from input to output ports, necessary to obtain the set of idle outputs and to determine
considering two requirements: 1) throughput optimization; 2) the one for which the input port has the most urgent cell.
QoS provision. If priority is given to throughput optimization, As a consequence, the computational complexity of the
QoS cannot be guaranteed; conversely, if priority is given to acknowledgment round is also . Hence, the total com-
QoS provision, unacceptable switch performance is obtained. plexity of the scheduling algorithm is . Recall that
Trying to achieve a compromise between the two extremes this should be compared with the complexity of the MWM
implies that some sacrifice is necessary in performance and algorithm that has been shown to be in [8].
that the implementation of WFQ scheduling is not perfect; The computational complexity of P-RPA, the multiclass
this means that cells directed toward any output are not RPA with strict priority, is . Indeed, during each one
always transmitted in strictly increasing time-stamp order. of the reservation rounds, operations are required,
1926 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 47, NO. 12, DECEMBER 1999

while during the final acknowledgment round only Let us compare and , and suppose
operations are needed. that ; otherwise, trivially
Finally, with F-RPA, since only a new urgency definition is , and the theorem is proved. Note that in the
required, the complexity remains . comparison, we will use the sets defined during the
cyclic processing performed by each input.
Let us examine all inputs , in the same
V. WORST-CASE WEIGHT DIFFERENCE order followed during the RPA reservation round, and let
BETWEEN RPA AND MWM be an input for which
In this section, we prove that the worst-case weight of any (2)
matching obtained with RPA is larger than half the weight of
Let be the queue selected at input by the MWM
the corresponding matching generated by the MWM algorithm.
algorithm, and be the queue selected at input by the RPA
The result is formally expressed by the following theorem.
algorithm; i.e., , and .
Theorem: For any given occupancy of the virtual output
Note that it is possible that either or .
queues of an input-queued switch, the sum of the weights in
Define
the matching obtained with the RPA algorithm cannot be less
than half the sum of the weights in the matching obtained
with the optimal MWM algorithm. the set that would be obtained by adding instead of
Proof: We consider an input-queued switch, to and, if necessary, removing the queue in
where separate virtual output queues exist at each input directed to output (the sign is used to indicate removal
port, organized according to the VOQ architecture. from the set).
Let represent a set of queues, where element Since has been chosen at input according to the RPA
is the queue at input storing packets directed to output . algorithm, it holds
Each queue has an associated weight, which, according to
the metrics adopted by RPA, coincides with the queue length.
Let be the set of queues in the matching obtained However, since
at the end of the first reservation round of the RPA algo-
rithm. Let be the set of queues in the matching
we get
obtained with the MWM algorithm. Note that
. Also, according to the constraints imposed
by the input-queued switch architecture, both and thus
can contain at most one queue from each input and at
most one queue storing cells directed toward each output.
Consider a set of queues such that only one of
its elements, , stores data units directed to output , and The RPA algorithm guarantees that
such that only one of its elements, , stores data units at
input .
so, in conclusion
Let be the function defined on that returns ;
if no queue containing data units directed to (3)
is found in . By repeating the same process over all inputs for which
Let be the function defined on that returns ; , we can prove that (3) holds for
if no queue at input is found in . each input satisfying (2), and obviously for all other inputs.
The set is incrementally created starting with an By summing over all inputs, we finally obtain
empty set. Each input can modify the set following the
same cyclic ordering adopted during the reservation round; let
us assume that input 0 is the first input in the reservation round,
while input is the last. Let be the set of queues
included in the matching by the RPA algorithm, after input
has made its reservation; obviously, . Note
that is not guaranteed to be either a subset of VI. SIMULATION SCENARIO
or of , since some queues belonging to can be For our simulation experiments, we considered 8 8 as
removed later by other inputs. well as 16 16 switch configurations, with two different
Let be a function defined on that returns the sum traffic patterns. Both traffic patterns assume that the loads of
of the weights associated with all queues in . By definition, all input ports are equal; however, in the first case (uniform
. load), the loads of all output ports are equal, whereas in the
Let be the weight second (hot-spot load), one of the output ports (the hot spot)
increment obtained with the reservation performed by input is subject to twice the load of all others.
in the RPA algorithm. Note that the RPA algorithm guarantees Concerning the input traffic characterization, we consider
that . the following four types of statistics.
MARSAN et al.: RPA: A FLEXIBLE SCHEDULING ALGORITHM 1927

(a) (b)
Fig. 1. Worst-case cell access delays: (a) average and (b) standard deviation versus the normalized switch load, for static single class RPA in an
2
8 8 switch, under uniform traffic load.

• Bernoulli input traffic: cells arrive at input ports according 1 corresponds to a saturated input channel). Note that due to
to a Bernoulli process; the cell output ports are selected the uniformity assumption for input loads, the horizontal axes
with random, independent choices. of our plots also indicate the normalized load of the whole
• ON–OFF input traffic: cells continuously arrive at input switch.
ports during geometrically distributed ON periods, whose We mainly consider the cell access delay, defined as the time
average duration is 100 cells; no cells arrive during between the cell arrival at the input port and the successful
geometrically distributed OFF periods, whose average reservation of the cell transfer toward the desired output port.
duration is selected so as to obtain the desired load; the The packet access delay is defined as the time between the
cell output ports are selected with random, independent arrival of the first cell of the packet at the input port and
choices. the successful reservation of the transfer of the same first
• Batch input traffic: similar to ON–OFF input traffic, but cell toward the desired output port. Since the packet-mode
now the cell output port is the same for all cells arriving operation of RPA guarantees that when the transfer of the
during one ON period, and is selected with a random first cell of the packet is successfully reserved, all subsequent
choice at the beginning of the batch. cells follow with no interruption, the access delay is the same
• Packet input traffic: similar to batch input traffic, but now for all cells of the same packet. Nevertheless, when using
batches logically correspond to packets. packet-mode RPA, both cell and packet access delays will
From the viewpoint of input traffic characterization, batch be presented, since the variable number of cells per packet
and packet arrivals are identical; we distinguish the two cases induces a (small) difference between the two numerical results.
because we assume that within the switch, batch traffic is When plotting numerical results for the static version of
transferred with a cell-by-cell approach, whereas packet traffic RPA, if not otherwise stated, we choose the worst-case mean
uses the packet-mode version of RPA. and standard deviation, i.e., the mean and standard deviation
The queue capacity is set to 10 000 cells for each input experienced by the least favored input port. Instead, when
queue, so that a total of 80 000 cells per input port can be considering the dynamic version of RPA, we present the value
stored in an 8 8 switch. This limitation of queue capacities obtained by averaging over all input ports.
is obviously responsible of a saturation of delay curves; this We first present performance results for single-class RPA,
saturation is not shown in the figures presented in Section VII. then we compare RPA with pure output queueing and with
All numerical results were obtained by stopping simulation the MWM of [8]. Then, we illustrate results for P-RPA in the
runs when a 5% confidence interval width was reached with case of two classes of traffic (i.e., ). Finally, we compare
95% confidence level. F-RPA with the performance of a classical WFQ algorithm in
output buffered architectures.
VII. SIMULATION RESULTS
In this section, we briefly overview a sample of the simula- A. Results for Single-Class RPA
tion results that were obtained in the performance investigation To begin with, we present in Fig. 1 the curves of the cell
of RPA. We only illustrate results obtained for an 8 8 switch (and packet, for packet input traffic) access delay mean and
configuration because no significant difference was observed standard deviation versus the normalized load of the switch
for 16 16 switches. for static single-class RPA in the case the least favored input
We use as performance metrics the curves of the mean and port under the uniform load pattern.
standard deviation of the cell and packet access delays as First of all, it should be noted that the access delay mean
functions of the normalized load of input ports (a load equal to and standard deviation curves in the cases of Bernoulli and
1928 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 47, NO. 12, DECEMBER 1999

(a) (b)
Fig. 2. Worst-case cell access delays: (a) average and (b) standard deviation versus the normalized switch load, for static single class RPA in an
8 2 8 switch, under hot-spot traffic load.

(a) (b)
Fig. 3. Worst and best cell access delays: (a) average and (b) standard deviation versus the normalized switch load, for static single class RPA in an
8 2 8 switch, under hot-spot traffic load.

ON–OFF traffic are not very sensitive to the normalized switch lines). Since the horizontal axis refers to the normalized load
load, and that for all traffic statistics, they remain bounded of the switch as a whole, and the load of the hot-spot output
also for a load equal to 99% (see the right-most marker of is twice that of normal outputs, the curves referring to the hot
each curve). spot saturate at load 0.5625 (this is the value of input port load
Moreover, it is interesting to observe that in the case of that gives 1 when multiplied by 8 to obtain the total load of
packet input traffic (black circle markers, cell delays shown the switch, and further multiplied by 2/9 to account for the fact
as solid lines, packet delays as dashed lines), we obtain that the hot-spot output receives twice the load of the other
performance indices very close to those of batch input traffic seven output ports).
(white circle markers); this means that forcing the contiguity of Again, we see that the performance obtained with Bernoulli
cells belonging to the same packet does not jeopardize switch input traffic is best, somewhat better than that yielded by
performance. Bernoulli and ON–OFF input traffic provides ON–OFF input traffic and much better than the performance
better, and very similar, performance; increasing the load obtained with either batch or packet input traffic. These
makes the two input traffic patterns almost identical, and as numerical results lead us to the conclusion that also with the
a consequence, performance results tend to become closer. hot-spot loading pattern, the packet-mode operations of RPA
Finally, it is worth noting that standard deviation values are yield very similar performance to the cell-mode operations,
comparable with means. even providing a beneficial impact on the performance indices
In Fig. 2, we present the same performance indices for when considering a normalized load higher than 0.5.
the hot-spot loading pattern. Curves show the performance In Fig. 3, we present the worst (circle markers) and best
obtained either when cells are directed toward the hot-spot (square markers) cell access delay curves for Bernoulli (white
output (dashed lines) or toward one of the other outputs (solid markers) and packet (black markers) input traffic, with the
MARSAN et al.: RPA: A FLEXIBLE SCHEDULING ALGORITHM 1929

(a) (b)
Fig. 4. (a) Worst and best cell and (b) packet access delays: averages and standard deviation versus the normalized switch load, for static and dynamic
RPA in an 8 2 8 switch, under hot-spot packet traffic load.

(a) (b)
Fig. 5. Comparison among dynamic RPA, output buffer and MWM: (a) average and (b) standard deviation of cell access delays versus the normalized
switch load, in an 8 2
8 switch, under uniform traffic load.

hot-spot loading pattern. The differences between the best and full utilization of the switch bandwidth) only under asymptotic
worst performance are not very significant, for both the mean conditions, i.e., with infinite queue capacities.
and the standard deviation.
In Fig. 4, we present the worst and best cell and packet
access delays for packet input traffic in the hot-spot loading B. Single-Class RPA versus Output Queuing and MWM
pattern for both the static and dynamic RPA versions. The In this section, we compare the performance achieved by
dynamic version allows a somewhat better fairness to be 8 8 switches adopting either the dynamic RPA version, or
obtained, achieving only marginally worse performance when the output queuing architecture, or the MWM algorithm of [8].
considering the hot-spot output. It is fair mentioning that even Figs. 5 and 6 refer, respectively, to uniform and hot-spot
with the dynamic RPA, some slight unfairness among the traffic loads; in both cases, we consider Bernoulli (solid
switch input ports still exists. lines) as well as batch (dashed lines) input traffic. Results
Finally, it must be emphasized that no throughput limitation for dynamic RPA are presented with round markers, results
was observed for any configuration, and no cell losses were for output queuing as square markers, and results for MWM
experienced, as long as the load of all input and output ports as triangular markers.
was kept less than 1. This cannot be completely ascribed to In both figures, we see that the performance achieved
the rather large buffer sizes that we used in the simulation with the three algorithms is quite similar. However, as ex-
experiments, and it is one of the main results of RPA (see pected, when differences become noticeable, we see that output
[9]). Indeed, different scheduling algorithms were observed queuing performs best, MWM is second, and RPA performs
to produce both throughput limitations and cell losses with worst. However, the small performance differences are largely
the same buffer sizes. However, it must be recognized that the compensated for by the simplicity of RPA.
behavior of RPA (like that of more complex switch algorithms Moreover, we see that for all algorithms, the performance
[8]) becomes quasi-optimal (in the sense that it can guarantee a with batch input traffic is much worse than with Bernoulli in-
1930 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 47, NO. 12, DECEMBER 1999

(a) (b)
Fig. 6. Comparison among dynamic RPA, output buffer and MWM: (a) average and (b) standard deviation of cell access delays versus the normalized
switch load, in an 8 2
8 switch, under hot-spot traffic load.

(a) (b)
Fig. 7. Dynamic RPA with two different traffic classes: (a) average and (b) standard deviation of cell access delays versus the normalized switch load,
in an 8 2 8 switch, under hot-spot batch traffic load.

put traffic (the ratio between averages and standard deviations


of cell access delays are close to one hundred).

C. Results for P-RPA: Multiclass RPA with Strict Priority


The performance achievable with multiclass RPA with strict
priority is illustrated by the curves in Fig. 7 that refer to
average and standard deviation of cell access delays in an
8 8 switch with hot-spot batch traffic load using dynamic
RPA with two different traffic classes. The low-priority curves
are plotted with solid lines, while the high-priority curves are
plotted with dashed lines. Square markers refer to the hot-spot
output, whereas circular markers refer to lightly loaded output
ports. The high-priority normalized load is always equal to
0.4, and the low-priority normalized load is increased from
Fig. 8. F-RPA with one fixed load (0.4) traffic class and one variable load
0 to 0.6, so that the total normalized load varies in the range (0.0–0.8)—average cell access delays versus the normalized switch load, in
from 0.4 to 1.0, as indicated by the horizontal axis. The number 2
an 8 8 switch, under uniform Bernoulli traffic load.
of reservation rounds is .
Numerical results clearly show that the high-priority traffic We have also reported the delay mean and standard de-
performance is insensitive with respect to changes in the viation, averaged over the two priorities in the two-class
low-priority traffic load, as desired. Instead, as expected, the scenario (black markers) for comparison with the single-class
performance of low-priority cells heavily depends on the total scenario (dotted lines). It can be seen that RPA manages
switch load. efficiently the multiclass scenario with strict priority, since
MARSAN et al.: RPA: A FLEXIBLE SCHEDULING ALGORITHM 1931

(a) (b)

Fig. 9. F-RPA with (a) one traffic class with fixed load (0.2) and (b) one traffic class with variable load (0.1–0.8): average cell access delays versus the
normalized switch load, in an 8 2 8 switch, under hot-spot Bernoulli traffic load.

TABLE I by increasing the number of reservation cycles from 1 to 2.


DYNAMIC RPA WITH TWO TRAFFIC CLASSES AND BERNOULLI INPUT A marginal improvement is observed under uniform traffic
TRAFFIC, WITH A VARIABLE NUMBER R OF RESERVATION CYCLES
pattern for , while increasing to 8 does not provide
additional benefits.

D. Results for F-RPA: Multiclass RPA with


Fair Bandwidth Allocation
We now study the performance of F-RPA with two traffic
classes and two different traffic patterns: uniform and hot-spot
load.
In the first scenario, at each input port, cells belonging
to both classes arrive according to a Bernoulli process with
parameter with a uniform distribution toward
TABLE II output ports. The load of the first traffic class is kept constant
DYNAMIC RPA WITH TWO TRAFFIC CLASSES AND BATCH INPUT TRAFFIC,
WITH A VARIABLE NUMBER R OF RESERVATION CYCLES , whereas the load of the second class ranges
from 0.0 to 0.8. A nominal bandwidth is assigned
to each flow at any input port for both traffic classes; as
a consequence, a nominal load equal to 1 is assigned at any
output port. In this scenario, traffic flows of class 1 represent
well-behaved sources, whereas traffic flows of class 2 represent
sources trying to exploit more than their reserved fair share
(for ).
Results not reported show that a good bandwidth separation
is achieved by F-RPA. However, a significant penalty has to be
paid in terms of delay protection of well-behaved flows with
respect to badly-behaved flows, as can be observed comparing
the two curves with black circle markers in Fig. 8.
the means averaged over the two classes are quite close to
In the hot-spot scenario, the load of the first class is kept
the means obtained with just one class. Standard deviations constant, , whereas the load of the second class, ,
instead are obviously higher, as expected when enforcing ranges from 0.1 to 0.8. Also, in this case, a nominal bandwidth
priorities. equal to 0.5 is assigned to each flow.
Finally, in Tables I and II, we report average delays versus In the left-hand plot of Fig. 9, the average delay of cells
the number of reservation cycles, for several traffic patterns. directed to the hot spot is shown, while the right-hand picture
For uniform traffic, the total load is 0.99, 0.72 being the load presents the average delay for cells directed to other out-
of high-priority traffic and 0.27 of low-priority traffic. For puts. Results show that also in this case, a good bandwidth
hot-spot traffic, the total load is 0.55, 0.40 being the load separation can be achieved with F-RPA.
of high-priority traffic and 0.15 of low-priority traffic. Results As a final observation, recall that WFQ algorithms in general
show that a significant average delay reduction is obtained are aimed at providing bandwidth separation, which we indeed
1932 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 47, NO. 12, DECEMBER 1999

obtain successfully, not at delay guarantees; however, the [9] M. Ajmone Marsan, A. Bianco, and E. Leonardi, “RPA: A simple,
increase in the average delay due to badly-behaved sources efficient and flexible policy for input buffered ATM switches,” IEEE
Commun. Lett., vol. 1, pp. 83–86, May 1997.
is not a nice property of F-RPA. Still, the fact that F-RPA [10] M. Ajmone Marsan, A. Bianco, E. Leonardi, and L. Milia, “Quasi-
exhibits acceptable behavior in the case of unbalanced switch Optimal algorithms for input buffered ATM switches,” in Proc. IEEE
loads, coupled with the observation that traffic hardly ever is ISCC’98 Workshop, Athens, Greece, June 1998, pp. 336–342.
[11] R. LaMaire and D. Serpanos, “Two dimensional round-robin schedulers
uniform within switches, is a point in favor of the applicability for packet switches with multiple input queues,” IEEE/ACM Trans.
of the algorithm. Networking, vol. 2, pp. 471–482, Oct. 1994.
[12] N. McKeown and A. Mekkittikul, “A starvation free algorithm for
achieving 100% throughput in an input queued switch,” in Proc.
ICCCN’96, Rockville, MA, Oct. 1996, pp. 226–229.
VIII. CONCLUSION [13] H. Duan, J. Lockwood, S. Kang, and J. Will, “A high performance
OC12/OC48 queue design prototype for input buffered ATM switches,”
RPA, a scheduling algorithm for input buffered switches, in Proc. IEEE INFOCOM 97, Kobe, Japan, Mar. 1997, vol. 1, pp.
was described and evaluated, showing that it can provide 20–28.
[14] N. McKeown and A. Mekkittikul, “A practical scheduling algorithm to
performance close to those of optimal algorithms. achieve 100% throughput in input-queued switches,” in Proc. IEEE
RPA is simple (as simple as the simplest among previous INFOCOM’98, San Francisco, CA, Apr. 1998, vol. 2, pp. 792–
proposals of efficient input queuing scheduling algorithms), 799.
[15] M. Ajmone Marsan, A. Bianco, E. Filippi, P. Giaccone, E. Leonardi,
efficient, and flexible, allowing the support of different traffic and F. Neri, “On the behavior of input queuing switch architectures,”
classes and packet-mode operations. Eur. Trans. Telecommun., vol. 10, no. 2, pp. 111–124, Mar./Apr.
The effectiveness of RPA was assessed with detailed simu- 1999.
[16] D. Stiliadis and A. Varma, “Providing bandwidth guarantees in an input-
lations in uniform, as well as unbalanced, traffic conditions. buffered crossbar switch,” in Proc. IEEE INFOCOM’95, Boston, MA,
RPA is capable of supporting the uninterrupted flow of cells Apr. 1995, vol. 3, pp. 960–968.
belonging to one packet with marginal performance penalties [17] J. W. Lockwood, “Design and implementation of a multicast, input-
buffered ATM switch for the iPOINT testbed,” Ph.D. dissertation, Univ.
with respect to the case in which cells of different packets are of Illinois at Urbana-Champaign, 1995.
multiplexed within the switch. [18] R. Ahuja, B. Prabhakar, and N. McKeown, “Multicast scheduling for
RPA has been shown to be able to deal with multiple traffic input-queued switches,” IEEE J. Select. Areas Commun., vol. 15, pp.
855–866, May 1996.
classes, enforcing a strict priority among classes, without
a significant performance degradation. A limited number of
reservation cycles is sufficient to obtain good overall perfor-
mance.
RPA has been adapted to provide bandwidth guarantees to Marco Ajmone Marsan (S’76–M’78–SM’86–
F’99) received the Dr. Ing. degree in electronic
traffic flows; whereas a good bandwidth separation has been engineering from Politecnico di Torino, Torino,
successfully obtained, performance results in terms of delay Italy, and the M.S. degree from the University of
are less satisfactory if compared to fair queuing algorithms California at Los Angeles (UCLA).
Currently, he is a full Professor at the Electronics
applied at output ports in output buffered switches. This Department of Politecnico di Torino, where from
remains an interesting and open field for further research. November 1975 to October 1987, he was first a
Finally, we proved that the worst-case weight of any match- Researcher and then an Associate Professor. From
November 1987 to October 1990, he was a full
ing obtained with RPA is larger than half the weight of the Professor at the Computer Science Department,
corresponding matching generated by the MWM algorithm. University of Milan, Milan, Italy. During the summers of 1980 and 1981, he
was with the Research in Distributed Processing Group, Computer Science
Department, UCLA. During the summer of 1998, he was an Erskine Fellow
REFERENCES at the Computer Science Department of the University of Canterbury, New
Zealand. He has co-authored over 200 journal and conference papers
[1] M. Karol, M. Hluchyj, and S. Morgan, “Input versus output queueing in the areas of communications and computer science, as well as the
on a space division switch,” IEEE Trans. Commun., vol. COM-35, pp. books, Performance Models of Multiprocessor Systems (Boston, MA: MIT
1347–1356, Dec. 1987. Press, 1986) and Modelling with Generalized Stochastic Petri Nets (New
[2] A. Demers, S. Keshav, and S. Shenker, “Analysis and simulation of a York, Wiley, 1995). His current research interests include the performance
fair queueing algorithm,” J. Internetworking: Res. Experience, vol. 1, evaluation of communication networks and their protocols.
no. 1, pp. 3–26, Oct. 1990. Prof. Marsan received the Best Paper Award at the Third International
[3] S. Golestani, “A self-clocked fair queueing scheme for broadband Conference on Distributed Computing Systems, in 1982.
applications,” in Proc. IEEE INFOCOM’94, Toronto, ON, Canada, June
1994, vol. 2, pp. 636–646.
[4] M. Karol, K. Eng, and H. Obara, “Improving the performance of input-
queued ATM packet switches,” in Proc. IEEE INFOCOM’92, Firenze,
Italy, May 1992, vol. 1, pp. 110–115.
[5] T. Anderson, S. Owicki, J. Saxe, and C. Thacker, “High speed switch Andrea Bianco (M’99) was born in Torino, Italy,
scheduling for local area networks,” ACM Trans. Comput. Syst., vol. 11, in 1962. He received the Dr. Ing. degree in elec-
no. 4, pp. 319–352, Nov. 1993. tronics engineering, in 1986, and the Ph.D. degree
[6] N. McKeown, P. Varaiya, and J. Walrand, “Scheduling cells in an input- in telecommunications engineering, in 1993, both
queued switch,” Electron. Lett., vol. 29, no. 25, pp. 2174–2175, Dec. from the Politecnico di Torino, Torino, Italy.
1993. Since 1994, he has been an Assistant Professor at
[7] M. Chen, N. D. Georganas, and O. W. W. Yang, “A fast algorithm the Politecnico di Torino, first with the Dipartimento
for multi-channel/port traffic assignment,” in Proc. IEEE ICC’94, New di Sistemi di Produzione and later with the Dipar-
Orleans, LA, May 1994, vol. 1, pp. 96–100. timento di Elettronica. In 1993, he worked with
[8] N. McKeown, V. Anantharam, and J. Walrand, “Achieving 100% Hewlett-Packard Labs, Palo Alto, CA. His current
throughput in an input-queued switch,” in Proc. IEEE INFOCOM’96, research interests include the fields of protocols for
San Francisco, CA, Mar. 1996, vol. 1, pp. 296–302. all-optical networks and switch architectures for high-speed networks.
MARSAN et al.: RPA: A FLEXIBLE SCHEDULING ALGORITHM 1933

Emilio Leonardi received the Dr. Ing. degree in Luigi Milia received the Dr. Ing. degree in elec-
electronics engineering, in 1991, and a Ph.D. degree tronics engineering in 1998 from the Politecnico di
in telecommunications engineering, in 1996, both Torino, Torino, Italy.
from the Politecnico di Torino, Torino, Italy. In 1998, he joined the Software Development
In 1995, he spent one year with the Computer Department as a Researcher at Centro Ricerche Fiat,
Science Department at the University of California Orbassano, Italy. His research interests include real-
at Los Angeles, where he was involved in the time software for automotive embedded systems,
Supercomputer-SuperNet (SSN) project. Currently, from fuel injections systems to on-vehicle data
he is an Assistant Professor with the Electronics acquisition and communications systems.
Department at Politecnico di Torino. His research
interests include the fields of all-optical networks,
high-speed wormhole routing networks, and high-speed switches.

View publication stats

You might also like