0% found this document useful (0 votes)
119 views

Real-Time Performance of Dynamic Memory Allocation Algorithms

Dynamic memory management is an important aspect of modern software engineering techniques. However, developers of real-time systems avoid using it because they fear that the worst-case execution time of the dynamic memory allocation routines is not bounded or is bounded with an excessively large bound. The degree to which this concern is valid is quantified in this paper, by giving detailed average and worst-case measurements of the timing performance of a comprehensive panel of dynamic memory allocators. For each allocator, we compare its worst-case behavior obtained analytically with the worst timing behavior observed by executing real and synthetic workloads, and with its average timing performance. The results provide a guideline to developers of real-time systems to choose whether to use dynamic memory management or not, and which dynamic allocation algorithm should be preferred from the viewpoint of predictability

Uploaded by

JNarigon
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
119 views

Real-Time Performance of Dynamic Memory Allocation Algorithms

Dynamic memory management is an important aspect of modern software engineering techniques. However, developers of real-time systems avoid using it because they fear that the worst-case execution time of the dynamic memory allocation routines is not bounded or is bounded with an excessively large bound. The degree to which this concern is valid is quantified in this paper, by giving detailed average and worst-case measurements of the timing performance of a comprehensive panel of dynamic memory allocators. For each allocator, we compare its worst-case behavior obtained analytically with the worst timing behavior observed by executing real and synthetic workloads, and with its average timing performance. The results provide a guideline to developers of real-time systems to choose whether to use dynamic memory management or not, and which dynamic allocation algorithm should be preferred from the viewpoint of predictability

Uploaded by

JNarigon
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Real-Time Performance of Dynamic Memory Allocation Algorithms

Isabelle Puaut
INSA/IRISA, Campus de Beaulieu, 35042 Rennes Cédex, France
e-mail: [email protected], Tel: +33 2 99 84 73 10; Fax: +33 99 84 25 29

Abstract behavior of allocators, with respect to allocation times and


wasted memory due to fragmentation.
Dynamic memory management is an important aspect of While the average execution time of a program is suited as
modern software engineering techniques. However, devel- a performance measure for non real-time applications, differ-
opers of real-time systems avoid using it because they fear ent performance criteria apply for real-time systems. In hard
that the worst-case execution time of the dynamic memory real-time systems, it must be guaranteed that all tasks meet
allocation routines is not bounded or is bounded with an ex- their deadlines. In such systems, the knowledge of tasks
cessively large bound. The degree to which this concern is worst-case execution times (WCETs) is an essential input
valid is quantified in this paper, by giving detailed average parameter of the schedulability analysis used to validate the
and worst-case measurements of the timing performance of system temporal correctness. Safe estimations of worst-case
a comprehensive panel of dynamic memory allocators. For execution times of programs can be obtained using static
each allocator, we compare its worst-case behavior obtained analysis of their code [9]. In contrast, in soft real-time sys-
analytically with the worst timing behavior observed by ex- tems, a limited fraction of tasks may miss their deadlines. In
ecuting real and synthetic workloads, and with its average such systems, the safety in the determination of tasks worst-
timing performance. The results provide a guideline to de- case execution times is less stringent than in hard real-time
velopers of real-time systems to choose whether to use dy- systems, and a probabilistic knowledge of tasks worst-case
namic memory management or not, and which dynamic al- execution times is sufficient.
location algorithm should be preferred from the viewpoint of Developers of real-time systems avoid the use of dynamic
predictability. memory management because they fear that the worst-case
execution time of dynamic memory allocation routines is
1. Introduction not bounded or is bounded with an excessively large bound.
While some studies have been undertaken to identify the
Dynamic memory allocation has been a fundamental part worst-case memory needs of dynamic memory allocators de-
of most computer systems since the sixties, and many differ- spite fragmentation [10, 6], most work concerning the tim-
ent dynamic memory allocation algorithms have been devel- ing behavior of dynamic memory allocators has focused on
oped to make dynamic memory allocation fast and memory- optimizing their timing behavior in the average case. An ex-
efficient [14] for a survey. A dynamic memory allocator ception to this rule is the work presented in [7]. This work
must keep track of which parts of memory are in use and gives detailed measurements of the worst-case allocation and
which parts are free. One of the problems the allocator deallocation times observed by executing a set of programs
must address is that the application program may allocate on a simulated architecture for different dynamic memory
and free blocks of any size in any order, creating free blocks allocators. These figures are compared with average allo-
(“holes”) amid used ones. If these holes are too numerous cation/deallocation times. Compared to [7], the work pre-
and small, they cannot be used to satisfy future requests for sented in this paper does not restrict to worst-case alloca-
larger blocks (fragmentation problem). The goal when de- tion/deallocation times observed by executing real programs,
signing an allocator is usually to minimize this wasted space but also gives worst-case figures obtained analytically (i.e.
without undue time cost. The time and memory performance the worst possible allocation/deallocation times).
of dynamic memory allocators are very often evaluated us- This paper gives detailed average and worst-case mea-
ing simulation, using either real traces [1, 7] or synthetic surements of the timing performance of a comprehensive
traces [5, 15]. Most efforts on the evaluation of dynamic panel of dynamic memory allocators. The allocators stud-
memory allocators have focused on evaluating the average ied are representative of the different classes of existing dy-
 14th Euromicro Conference on Real-Time Systems, Vienna, Austria, namic memory allocators (sequential fits, indexed fits, seg-
June 2002, pages 41-49 regated fits, buddy systems). For every allocator, we com-
pare its worst-case behavior obtained analytically with (i) ? Splitting threshold. Upon allocation of a block, when a
the worst timing behavior actually observed by executing free block larger than the requested size exists, it is split
real and synthetic workloads, (ii) its average timing perfor- only if the remainder is larger than a known constant,
mance. Both performance measures are given by executing named splitting threshold.
the workload (real application, synthetic workload or worst-
case workload) on a test platform, with careful attention put The values for the minimum block sizes and the splitting
on the experimental conditions to be able to compare the real threshold are allocator-dependent. They depend on the size
and synthetic workloads. of the allocator data structures in the case they are stored in
The main contributions of our work are: (i) an analyt- the free blocks. The values actually used are given in x 3.4.
ical and quantitative evaluation of the worst-case alloca- Some of the algorithms we have implemented use bound-
tion/deallocation times for a given hardware platform; (ii) ary tags to coalesce free blocks into larger blocks. This tech-
a quantification of the impact that could have the considera- nique, introduced by Knuth in [5], uses a header and a footer
tion of worst-case allocation/free times obtained analytically for every block indicating whether the block is free or not
on the applications end-to-end execution time. The results (the header is at the beginning of the block and the footer
given in this paper should provide a guideline to developers at the end). The size of the block is also stored in both
of real-time systems to choose whether to use dynamic mem- header and footer. When a block is freed, coalescing can
ory management or not, and which dynamic allocation algo- be achieved easily by looking at the header of the following
rithm should be preferred for the viewpoint of predictability. block and the footer or the preceding block.
The remainder of this paper is structured as follows. Sec-
tion 2 details the dynamic memory allocation algorithms we
2.2. Sequential fits: first-fit and best-fit
have studied. Section 3 describes the experimental condi-
tions used for the performance evaluation. Section 4 details
the experimental results. In section 5, we summarize the Sequential fits are based on the use of a unique linear list
results so as to provide guidance to select an allocator un- of all the free blocks in memory, whatever their size is.
der a set of constraints (e.g. heap size) and to integrate its The first-fit and best-fit allocators we have implemented
timing costs into a schedulability analysis. We conclude in use a doubly-linked list to chain the free blocks. The pointers
Section 6. that implement the list of free blocks (free list) are embedded
in the free blocks. The first-fit allocator searches the free
2. Dynamic Memory Allocation Algorithms list and selects the first block that is at least as large as the
requested size, while the best-fit allocator selects the block
that fits the request best (the block that generates the smallest
remainder). Knuth’s boundary tags (x 2.1) are used for block
A wide variety of dynamic memory allocators have been
implemented and used in computer systems [14]. They are
coalescing. We have selected a LIFO strategy for inserting
typically categorized by the mechanisms they use for record-
newly freed blocks in the free-list upon block deallocation
ing which areas of memory are free, and for merging adja-
for the first-fit allocator, and FIFO strategy for the best-fit
cent free blocks into larger blocks (coalescing). The algo-
rithms we have studied share a number of features ( x 2.1), but
allocator.
differ in the way the free blocks are identified and searched
for: sequential fits (x 2.2), indexed fits (x 2.3), segregated fits 2.3. Indexed fits
(x 2.4) and buddy systems (x 2.5).
Indexed fits are an alternative to sequential fits where the
2.1. Common characteristics of the algorithms free blocks are linked together thanks to a data structure
more sophisticated than the linear list used in the sequential
The algorithms we have implemented share a set of com- fits. We have implemented two algorithms of this class.
mon features:
Ordered binary tree best-fit. This allocator uses a binary
? Allocation in real memory. Since we focus on real-time tree ordered by block size to link free blocks with each other.
systems, we have restricted our study to algorithms that There is one node in the tree per size of free blocks, contain-
allocate areas of real memory, and assume that no ad- ing a doubly-linked list of free blocks of that size. When a
dress translation nor paging is used. block allocation request is issued, the tree is traversed until
? No block relocation for memory defragmentation. a block with the best size (generating the smallest remain-
? Immediate coalescing. A block is merged with its der) is found. The pointers used to implement the tree and
neighboring free blocks as soon as it is freed. lists data structures are embedded in the free blocks. By con-
? Minimum block size. The size of a small allocated block struction of the algorithm, the binary tree that chains the free
is rounded up to a minimum block size. blocks is not balanced. Knuth’s boundary tags (x 2.1) are
used to coalesce blocks.
Fast-fit (Cartesian tree best-fit). The fast-fit allocation, is freed, it can only be coalesced with its buddy. Note that the
introduced in [11], uses a Cartesian tree, sorted in both size buddy is considered allocated as soon as there is an allocated
and address. A Cartesian tree [12] encodes two-dimensional block within it.
information in a binary tree, using two constraints on the tree We have implemented two allocators in the class of buddy
shape. The tree is effectively sorted on a primary key and a systems: a binary buddy allocator, in which all block sizes
secondary key. The tree nodes are totally-ordered with re- are a power of two, and a Fibonacci buddy allocator, in
spect to the primary key (here, the addresses of free blocks). which block sizes are members of a Fibonacci series. Both
With respect to the secondary key (here, the sizes of free allocators use a doubly-linked list of free blocks for every
blocks), the tree is partially ordered with each node having a legal block size (e.g. powers of two for the binary buddy al-
greater value than its descendants. This dual constraint lim- locator). The pointers implementing the list are embedded in
its the ability to re-balance the tree, because the shape of the the free blocks.
tree is highly constrained by the dual indexing keys.
The Cartesian tree is used to implement an approximation 3. Experiment Description
of the best-fit policy: when allocating a block, the tree is
traversed according to the “size” secondary key until the best
In this section, we give information concerning the work-
free block large enough to satisfy the request is found. When
loads used to evaluate the performance of the dynamic mem-
a block is freed, the “address” primary key is used for block
ory allocators: the workload exhibiting the worst-case allo-
cation/deallocation times (x 3.1), real applications (x 3.2) and
coalescing. The tree structure is embedded in the free blocks.
a synthetic workload (x 3.3). The measurement method used
for the performance evaluation is given in x 3.4.
2.4. Segregated fits: quick-fit

The principle of segregated fits is to use separate data


structures for free blocks of different sizes. 3.1. Worst-case allocation/deallocation workloads
The segregated fits algorithm we have implemented,
named quick-fit in the following, adapts the Gnu libc alloca- Identifying the worst-case execution times of dynamic al-
tor to work in real memory only. It is based on the quick-fit location algorithms can be achieved either by using static
allocator as introduced in [13]. This allocator is an hybrid WCET analysis [9], which returns an upper bound on the
algorithm in the sense that it uses two different allocation time required to execute a program using static analysis of its
algorithms depending on the size of the requested block. source code, or by worst-case complexity analysis of the al-
For allocation of a block larger than a page (a page is a gorithm. Using static WCET analysis on memory allocation
fixed-sized chunk of memory, here 4KB), the allocator uses algorithms is not possible without having an in-depth knowl-
a next-fit allocation strategy. Next-fit is a common optimiza- edge of the allocation algorithms, because the time required
tion of first-fit which does not scan the free-list from the be- to allocate a block does not only depends on the parameters
ginning at every allocation. Instead, the scan is started from of the memory allocation routines. It also depends on the al-
the position where the last search in the free-list was satis- locator internal state (e.g. free lists), which itself depends on
fied. The size granted for large blocks is rounded up to the the history of the past allocation requests, which is in general
upper integral number of pages. There is one header per unknown statically.
page, used to implement the next-fit allocation policy (the Let us consider for instance the first-fit allocator. The
free-list is an external data structure). worst-case allocation time is obtained for the longest size of
Blocks smaller than a page are allocated from page-sized the free list. The actual size of the free list is unknown, un-
chunks, within which all blocks are the same size (powers of less the history of allocations/deallocations is known, which
two). The page header maintains a count of the number of is not the case in general. However, knowing the mini-
free blocks within the page as well as a doubly-linked list of mum block size M and the heap size H, and noticing that
all free blocks. When the last free block of a page is freed, the longest free list is obtained when the heap alternates be-
the page is immediately freed. tween free blocks and busy blocks of size M, we see that
the maximum length for the free list is 2H M , which allows
2.5. Binary and Fibonacci buddy systems us to bound the duration of a block allocation for the first-fit
algorithm. We see from this example that the WCET analy-
In buddy systems, the heap is recursively divided up into sis techniques that use flow analysis to obtain automatically
smaller and smaller blocks. At every level of the hierarchy, loop bounds (e.g. [3]) are not able to derive automatically the
there are two possible blocks, which are further subdivided worst-case number of iterations for this algorithm.
into two blocks and so on. When an allocation request is As a consequence, for all dynamic memory allocators
made, the requested size is rounded up to the closest possible studied, the worst-case allocation and deallocation times
block size in the hierarchy of blocks. A block’s buddy is the have been obtained through a worst-case complexity analy-
other block at the same level of the hierarchy. When a block sis of the allocation algorithms instead of using static WCET
analysis. In the following, we present the worst-case behav- 3.1.2 Worst-case deallocation scenarios
ior of the allocators studied in an informal manner (giving
formal proofs of such worst-case behaviors is outside the The worst-case deallocation scenarios for the algorithms
scope of this paper). studied are sketched below.
To obtain quantitative data of the worst-case execution
Sequential fits. Due to the use of boundary tags for block
times of memory allocation/deallocation, we have measured
coalescing, the worst situation that can occur is when
the execution times of the allocation/deallocation on work-
the two neighboring blocks to be freed are free, causing
loads that are representative of the worst-case scenarios, de-
scribed below (x 3.1.1 and x 3.1.2).
the three blocks to be merged into a single big block
that is inserted in the free list.
Indexed-fits. For the binary tree best fit, the worst time
3.1.1 Worst-case allocation scenarios taken to free a node is when its two neighboring nodes
The worst-case allocation scenarios for the algorithms stud- are free. Checking whether neighboring nodes are free
ied are the following (in the text, H denotes the heap size): is achieved in constant time due to the use of boundary
tags. However, block merging needs tree reorganiza-
Sequential fits. The worst time taken to allocate a block is tions to keep the free blocks sorted according to their
when the free list has the maximum size, i.e. when the sizes. A similar worst-case deallocation scenario oc-
heap alternates between busy and free blocks of mini- curs for the fast-fit allocator, except for the identifica-
mum size M . The maximum number of pointer traver- tion of neighboring free nodes, that uses the Cartesian
sals is then 2HM.
tree structure instead of the boundary tags.
Indexed-fits. For the binary tree best fit, the worst time Segregated-fits. The worst time taken to free a block for
for block allocation is when the tree that chains the the quick-fit allocator for a 4MB heap is when the last
free blocks is unbalanced and has the longest possi- fragment of a page is freed, causing the page to be freed
ble length, which occurs when the heap is fragmented. and coalesced with the neighboring pages.
Since free blocks of identical sizes are linked in a sin- Buddy systems. The worst-case deallocation time for the
gle node of the tree, the maximum length of the chain buddy systems allocators occurs when a block of the
is when the memory alternates between busy blocks of smallest possible size is freed in a heap in which only
the minimum size M and free blocks of every possible this block is allocated.
q
size. The maximum number of node traversals is then
d c + 2MH e, with c a constant. A similar worst-case 3.2. Real applications
allocation scenario occurs for the fast-fit allocator, the
Since designers of real-time systems avoid the use of dy-
difference being that free blocks of identical sizes are
namic memory management, it’s very hard to find task sets
chained into different tree nodes since they have a dif-
that are both representative of actual real-time workloads and
ferent address. The worst-case number of node traver-
use dynamic memory management. In this work, we have
sals is then identical to the sequential-fits ( 2H
M ). used three applications, the first one being a soft-real time
Segregated-fits. The worst-case allocation time for the
application and the two other ones1 being non real-time ap-
quick-fit allocator (with H = 4 MB) occurs when an
plications used in a previous work on dynamic memory man-
allocation of the smallest possible size is requested and
agement [7].
the corresponding free list is empty. In this situation,
a page has to be allocated using the next-fit allocation mpg123. mpg123 [4] is a player of MPEG level 3 streams.
strategy. It is then split into blocks of the smallest size, It has been executed with a 128Kbps audio stream as
and the corresponding free list is initialized. More gen- input.
erally, for any heap size H and page size P the worst- Cfrac. Cfrac is a program to factor large integers using the
case number of node traversals is max( M P ; H ). The
P continued fraction method. Input was the 22 digit num-
first term corresponds to the allocation of the smallest ber 1,000,000,001,930,000,000,057, which is the prod-
possible block in a page, and the second term corre- uct of two primes.
sponds to the allocation of a large block (set of pages). Espr. Espr is a logic optimization program. The largest file,
Buddy systems. The worst-case allocation time for buddy distributed with the release code, was used as input to
systems is when the heap is initially entirely empty the program.
and an allocation of the smallest possible size M is re-
quested. In this situation, the free lists for all possible A summary of the application characteristics is given in
sizes have to be updated. The number of lists updates is Table 1. The last two columns of the table give respectively
dlog2( MH )e for the binary buddy and dlog( MH )e forpthe
Fibonacci buddy, where  is the golden number 1+2 5 .
1 Downloaded from ftp://ftp.cs.colorado.edu, directory
/pub/cs/misc/malloc-benchmarks
Appli # # largest mean alloc Inter-arrival Alloc/free time
alloc free alloc (B) (bytes) (cycles) (% of total, quick-fit)
mpg123 5987 5981 65656 143 8852358 0.20
cfrac 227091 227087 266 18 26608 35.86
espr 25585 24757 4608 41 60947 12.18
synthetic 25585 25585 345 40 60952 10.80
Table 1. Workload characteristics

mgp123 cfrac espr synthetic analytical


worst mean worst mean worst mean worst mean worst
first-fit 6316 6283 13936 6482 22856 7349 93646 16647 61769847
best-fit 6790 6766 11060 7615 26010 13727 54265 28136 64410985
btree-best-fit 8770 8733 19279 8284 34300 12678 47890 11657 898391
fast-fit 6619 6550 32449 11277 62656 12240 53091 18438 62919925
quick-fit 66412 10752 248634 5735 248581 4327 244411 4212 275550
buddy-bin 43883 6583 65145 5785 62046 6376 62748 4830 66696
buddy-fibo 36994 8966 51567 7361 51807 7689 48310 5455 55090
Table 2. Worst-case and average performance of memory allocation (processor cycles)

the mean delay between allocation requests and the mean tion, super-scalar execution) were disabled.
time spent performing dynamic memory management when An initially empty 4 MB heap is used in the experiments.
using the quick-fit allocator. The smallest value for minimum block size and splitting
threshold possible for all allocators (16 bytes) is used.
3.3. Synthetic workload
4. Results
The synthetic workload application we have used is the
MEAN model of memory allocation evaluated in [15]. This 4.1. Real-time performance of memory allocation
simple model characterizes the behavior of a program by
computing three means: the mean block size, the mean block Table 2 gives for the three classes of workloads (real, syn-
lifetime and the mean time between successive allocation re- thetic and worst-case) the allocation times measured. For the
quests. These three values are then used to generate random real and synthetic workloads, table 2 gives both the average
values according to an exponential distribution. This model and worst measured allocation times. The remainder of this
is slightly less accurate than other models of memory allo- section is devoted to a detailed analysis of the contents of ta-
cation [15], but performs well on relatively simple allocation ble 2, which is illustrated by a set of diagrams built from the
algorithms and requires much less coding effort and storage contents of the table.
than more sophisticated models. For the experiments, the Subfigure 1(a) depicts the average allocation times mea-
model parameters have been selected to mimic the behavior sured during the execution of the real and the synthetic
of the Espr application, as shown in Table 1. workloads. The allocator that exhibits the best average per-
formance is the quick-fit allocator, followed by the binary
3.4. Measurement method buddy and the first-fit allocator. It can be noted that the
quick-fit allocator has modest average performance for the
Measurements were conducted on a Pentium 90 MHz ma- mpg123 application. This is because mpg123 allocates and
chine. The code is loaded and monitored using the Pen- frees a small block every time a frame is decoded. This
tane tool, which allows the non-intrusive monitoring of ap- causes a page to be split into fragments at allocation time,
plications (there is no activity other than the module to be and the same page to be freed at deallocation time, which
tested). Pentane provides control over the hardware (e.g. en- has an important time overhead. Deferred coalescing would
abling/disabling of caches), and allows to count the number be required to improve the average allocation time of such
of occurrences of a number of events (e.g. number of proces- an allocator. Also remark that most allocators perform bet-
sor cycles). All the timing measures given in the next sec- ter for real applications than on the synthetic workload. The
tion are expressed in number of processor cycles. In order reason is that the number of different sizes allocated by the
to be able to compare the timing of memory allocation us- synthetic workload is greater than the one of real applica-
ing different workloads having very different locality prop- tions, thus causing more fragmentation and increasing the
erties (e.g. a real workload with synthetic workload that do average allocation time.
not actually executes application code), all performance en- Subfigure 1(b) depicts the worst-case allocation times
abling features (instruction and data caches, branch predic- (obtained analytically) for the allocators studied. The worst
30000
100000000
mpg123
25000 10000000
cfrac
espr 1000000
20000 synthetic 100000

15000 Cycles 10000


Cycles
1000
10000
100

5000 10
1
0 first-fit best-fit btree- fast-fit quick-fit buddy- buddy-
first-fit best-fit btree- fast-fit quick-fit buddy- buddy- best-fit bin fibo
best-fit bin fibo Allocator
Allocator

(a) Average allocation times (b) Worst-case allocation times obtained analytically

Figure 1. Average vs worst-case allocation times

35 10000
mpg123
30 mpg123
cfrac
cfrac
espr 1000
25 espr
synthetic
synthetic
20
Ratio Ratio 100
15

10
10
5

0 1
first-fit best-fit btree- fast-fit quick-fit buddy- buddy- first-fit best-fit btree- fast-fit quick-fit buddy- buddy-
best-fit bin fibo best-fit bin fibo
Allocator Allocator

(a) Worst-case actually observed (b) Worst-case allocation times obtained analytically

Figure 2. Ratio between worst and average allocation times

time required for block allocation in sequential fits is very times are used instead of actually measured worst-case allo-
large, because at worst, the free-list covers the entire memory cation times (subfigure 2(b)), the ratio becomes much larger,
and has to be scanned entirely to find a free block (see x 3.1). especially for the sequential-fits and the fast-fit allocators.
Sequential-fits, while exhibiting reasonably good average However, this ratio stays reasonable for some allocators like
performance, have very poor worst-case performance. In the buddy systems, for which the ratio is lower than ten.
contrast, buddy systems, that have a worse average timing
behavior than sequential fits, have a much better worst-case An important issue to address when using dynamic mem-
behavior. ory allocation in real-time systems, especially in hard real-
time systems in which all deadlines must be met, is to
Figure 2 depicts the ratio between the worst-case allo- select the upper bound of the time for memory alloca-
cation times and average allocation times for the allocators tion/deallocation to be used in the system schedulability
studied. In the left part of the figure, the worst-case alloca- analysis. As said in the introduction, the average allocation
tion times used to compute the ratio are the worst allocation time is clearly not appropriate because it underestimates the
time actually observed while executing the real and synthetic time required for block allocation, as shown in figure 2. Two
workload, while in the right part of the figure, the worst-case
allocation times are those obtained analytically (see x 3.1).
upper bounds can then be selected: the worst-case measured
allocation times on a specific application, or the worst-case
The ratio between actually observed worst-case alloca- allocation times identified analytically. Figure 3 examines
tion times and worst-case allocation times (subfigure 2(a)) the impact of the selection of one of these two upper bounds
is rather small (less than six) for most allocators, whatever on the application execution time for mpg123 and Espr ap-
workload is executed. When analytical worst-case allocation plications (subfigures 3(a) and 3(b)). The figure depicts the
100% 100% Average
90% Average 90% worst-measured
80% worst-measured 80% worst-analytical

70% worst-analytical 70%


60% 60%
Impact Impact
50% 50%
(%) (%)
40% 40%
30% 30%
20% 20%
10% 10%
0% 0%
first-fit best-fit btree- fast-fit quick-fit buddy- buddy- first-fit best-fit btree- fast-fit quick-fit buddy- buddy-
best-fit bin fibo best-fit bin fibo
Allocator Allocator

(a) Mpg123 workload (b) Espr workload

Figure 3. Impact of worst-case allocation times on application execution time

mgp123 cfrac espr synthetic analytical


worst mean worst mean worst mean worst mean worst
first-fit 11073 9720 11049 8884 10929 9060 10963 8783 10958
best-fit 13419 10635 13481 9139 12878 9198 13269 8581 13567
btree-best-fit 11487 8844 72283 8798 75570 15279 176274 14983 1262860
fast-fit 16092 13153 156900 23854 243349 45495 252109 47150 1467178556
quick-fit 13953 6666 107468 3810 107478 3219 108196 3168 108177
buddy-bin 7524 5261 38959 5364 38861 5887 55711 4377 68715
buddy-fibo 7729 6549 42122 6305 45874 6602 57085 4576 93822
Table 3. Worst-case and average performance of memory deallocation (processor cycles)

percentage of the workload execution time spent allocating lyze the schedulability of the mpg123 application, which is
and freeing memory, with three durations for block alloca- not allocation intensive, would be equivalent to consider that
tion and deallocation requests: average times; (ii) worst-case more than 90% of the workload execution time is devoted to
times measured on the application and (iii) worst-case times memory allocation. We can also observe that the impact of
obtained analytically. the choice of an upper bound for dynamic memory allocation
A first observation from the figure is that when using ac- is influenced by the type of application executed: the impact
tually measured worst-case allocation times, their impact on on the Espr workload is always higher than the one on the
the workload execution times is reasonable for almost all al- mpg123 application, because in the former workload, mem-
locators. This value can then be used by the system schedu- ory allocation requests are more frequent and more varied in
lability analysis without introducing excessive pessimism, as size than in the latter. An interesting result is that for some
far as the measurement conditions used to obtain the alloca- allocators, like for instance the buddy systems and the quick-
tion times do not lead to allocation times lower than those fit allocators, considering the worst possible allocation time
that could be measured on other executions of the appli- is realistic, at least for non memory intensive applications
cation. Such unsafe measurements can come for instance like mpg123 and the selected heap size (4 MB).
from the application control-flow that varies from one ex-
4.2. Real-time performance of memory deallocation
ecution to another (for instance because of different input
data), causing potentially different orders of memory alloca- Table 3 gives the average and worst-case deallocation
tion requests. Also note that the worst-measured allocation times measured on the real and synthetic workloads, as well
times are application-dependent, in contrast to the analyti- as the worst-case deallocation times obtained analytically.
cally obtained worst-case allocation time. Thus they should One can notice the very long worst-case deallocation
be determined every time a different application is built and times obtained analytically for the indexed-fits allocators,
only used on a per-application basis. that come from the tree restructuring operations that occur
If we now consider analytically-obtained worst-case al- at block deallocations (see x 3.1). Another remark is that
location times, one conclusion is that using them, albeit the allocators that use boundary tags for block coalescing
safe and application-independent, should not be used for the (sequential-fits) have low and predictable deallocation times.
sequential-fit allocators. For instance, using them to ana- This block merging technique, while space consuming, ex-
300000000 1200000

Worst-case allocatime time (cycles)

Worst-case allocation time (cycles)


250000000 first-fit 1000000
best-fit
200000000 fast-fit 800000

150000000 600000
btree-best-fit
quick-fit
100000000 400000 buddy-bin
buddy-fibo
50000000 200000

0 0
0 5000 10000 15000 20000 0 5000 10000 15000 20000
Heap size (KBytes) Heap size (KBytes)

(a) Sequential-fits and fast-fit (b) Other allocators

Figure 4. Evolution of worst-case allocation times with heap size

hibits low and predictable deallocation times. Buddy sys- ranging from 128 KBytes to 16 MBytes. Because of the
tems also have reasonable worst-case deallocation times. large variety of worst-case allocation times, the algorithms
are divided into two subfigures: subfigure 4(a), for sequen-
5. Discussion tial fits and fast fit, for which the worst-case allocation time
increases linearly with the heap size ; and subfigure 4(b) for
We classify below the studied allocators with respect to the other algorithms. We see on the figure that the allocator
their worst-case allocation times identified analytically in with the lower worst-case allocation time is different for dis-
section 3.1, and examine how their worst-case allocation tinct heap sizes. The heap size is thus an important parameter
time varies with the heap size (x 5.1). Then, we discuss in the process of selecting a suitable allocator according to
which real-time bound should be used in which context (hard its worst-case timing behavior.
or soft real-time system) (x 5.2).
5.2. Which metric for real-time performance of al-
5.1. Impact of heap size locators ? In which context ?
The following table summarizes the pros and cons of us-
The algorithms we have studied can be classified in four ing each of the two metrics of real-time performance evalu-
categories with respect to their worst-case allocation times
obtained analytically (x 3.1):
ated in this paper, and the context of use for which they are
best suited.
? sequential fits and fast-fit, for which the worst time re- (a) Worst-case (b) Worst-case measured
quired to allocate a block is linear with the number of obtained analytically on specific application
Pros Safety Reduced pessimism compared
free blocks, which at worst increases linearly with the to (a)
heap size. Context-independent
? the binary tree best fit allocator, for which the worst- Cons Pessimism Context-dependent
case allocation time increases as the square root of the Safety conditioned by the rep-
resentativity of testing condi-
heap size.
? buddy systems, for which the worst-case allocation time Context Hard real-time sys-
tions
Hard real-time systems (if test-
is proportional to the number of different block sizes, of use tems ing conditions correct)
which increases logarithmically with the heap size. Soft real-time systems
? quick-fit allocator, which has a different worst-case al-
location behavior depending on the heap size: for small The clear advantage of using the worst-case execution
heaps, the worst-case allocation time is when a page is time obtained analytically is that it is safe and context-
fragmented into the smallest block size; for large heaps, independent (it is an upper bound on the allocation times for
the worst-case allocation time is when the list of pages any application). However, as shown earlier in the paper, the
has to be scanned entirely to find a free block. values obtained can be very high depending on the allocator
and the heap size. Thus, the best suited context of use of
Figure 4 confirms this classification with measured values such a value is the one of hard real-time systems, in which
of the worst-case allocation times for different heap sizes, safety primes over pessimism reduction.
In contrast, using actually measured worst-case alloca- of the allocators has also to be considered. Finally, we are
tion times yields to less important values than the worst- currently studying the use of optimization methods such as
case allocation times obtained analytically. However, the ob- genetic algorithms to automatically obtain approximations
tained value is context-dependent; in particular, it depends of the allocators worst-case execution times.
on the allocation pattern of the application. In addition, it
is safe only if the measurement conditions actually lead to
the worst-case allocation/deallocation scenario for this ap- Acknowledgments. The author would like to thank M.
plication. For instance, if the execution of the application Bellengé and K. Perros for their work on the performance
is not deterministic and thus can lead to different traces of measurements, as well as D. Decotigny (IRISA), P. Chevo-
memory allocation requests for its different executions, the chot (Sogitec), and G. Bernat (Univ. of York) for their com-
worst-case allocation time of a given execution is not neces- ments on earlier drafts of this paper.
sarily the worst-case allocation time of any execution.
Instead of using one of the two above performance met- References
rics, one could consider using a probabilistic real-time bound
m together with a confidence level on this bound indicating [1] A. Diwan, D. Tarditi, and E. Moss. Memory system per-
the likelihood that the worst-case execution time is greater formance of programs with intensive heap allocation. ACM
or equal to m [8, 2]. Establishing such a probabilistic real- Transaction on Computer Systems, 13(3):244–273, Aug.
time bound requires in particular that the different tests of 1995.
execution times be independent [2]. Generating independent [2] S. Edgar and A. Burns. Statistical analysis of WCET for
tests on the kind of algorithms studied in this paper seems to scheduling. In Proceedings of the 22nd IEEE Real-Time
Systems Symposium (RTSS01), pages 215–224, London, UK,
us practically difficult: the execution time of the algorithms
Dec. 2001.
depends not only on the allocation requests parameters but
[3] A. Ermedahl and J. Gustafsson. Deriving annotations for
also on the system state (e.g. contents of the allocator data tight calculation of executing time. In Proc. Euro-Par’97 Par-
structures), the latter being difficult in practice to generate in allel Processing, volume 986 of Lecture Notes in Computer
a random manner. Sciences, pages 1298–1307. Springer-Verlag, Aug. 1997.
[4] M. Hipp. Mpg123 real time mpeg audio player (mpg123
6. Conclusion 0.59r). Free MPEG audio player available at http://www-
ti.informatik.uni-tuebingen.de/˜hippm/mpg123.html, 1999.
[5] D. E. Knuth. The Art of Computer Programming, chapter
This paper has given detailed average and worst-case tim-
Information structures (chapter 2 of volume 1), pages 228–
ing analysis of a comprehensive panel of dynamic memory
463. Addison–Wesley, 1973.
allocators. For every allocator, we have compared its worst-
[6] E. L. Lloyd and M. C. Loui. On the worst case performance
case behavior obtained analytically with the worst timing be- of buddy systems. Acta Informatica, 22:451–473, 1985.
havior observed by executing real and synthetic workloads [7] K. Nilsen and H. Gao. The real-time behavior of dynamic
and its average timing performance. A result of our perfor- memory management in C++. In Proceedings of the 1995
mance analysis is the identification of a discrepancy between Real-Time technology and Applications Symposium, pages
the allocators with the best average performance and the allo- 142–153, Chicago, Illinois, June 1995.
cators with the best worst-case performance. We have shown [8] P. Puschner and A. Burns. Time-constrained sorting - a com-
that for applications with low allocation rates, the worst- parison of different algorithms. In Proc. of the 11th Euromi-
case allocation/deallocation times obtained analytically can cro Conference on Real-Time Systems, pages 78–85, York,
be used without an excessive impact on the application exe- UK, June 1999.
cution times for the most predictable allocators (buddy sys- [9] P. Puschner and A. Burns. A review of worst-case execution-
tems and quick-fit). We have also examined the impact of time analysis. Real-Time Systems, 18(2-3):115–128, May
the heap size on the worst-case behavior of the memory allo- 2000. Guest Editorial.
cators, and discussed the context in which analytical and ac- [10] J. M. Robson. Worst case fragmentation of first fit and
tually measured worst-case allocation times are best suited. best fit storage allocation strategies. The Computer Journal,
20(3):242–244, 1975.
Our work can be extended in several directions. First, it
[11] C. J. Stephenson. Fast fits: new methods for dynamic stor-
would be interesting to study the impact of parameters such age allocation. In Proc. of the 9th Symposium on Operating
as the minimum block size, splitting threshold, distribution Systems Principles, pages 30–32, Betton Woods, New Hamp-
of the sizes of allocated blocks, on the allocators timing per- shire, Oct. 1983.
formance. Another important issue to work on is to find the [12] J. Vuillemin. A unifying look at data structures. Communi-
best trade-off between time and memory consumption, the cations of the ACM, 23(4):229–239, Apr. 1980.
most predictable allocators not being necessarily the most [13] C. B. Weinstock and W. A. Wulf. Quickfit: An efficient al-
space-efficient. The impact of hardware performance en- gorithm for heap storage allocation. ACM SIGPLAN Notices,
abling features (e.g. caching) on the worst-case performance 23(10):141–144, Oct. 1988.
[14] P. Wilson, M. Johnstone, M. Neely, and D. Boles. Dynamic
storage allocation: A survey and critical review. In Inter-
national Workshop on Memory Management, volume 986 of
Lecture Notes in Computer Sciences, Kinross, UK, 1995.
[15] B. Zorn and D. Grunwald. Evaluating models of memory
allocation. ACM Transactions on Modeling and Computer
Simulation, 4(1):107–131, Jan. 1994.

You might also like