Real-Time Performance of Dynamic Memory Allocation Algorithms
Real-Time Performance of Dynamic Memory Allocation Algorithms
Isabelle Puaut
INSA/IRISA, Campus de Beaulieu, 35042 Rennes Cédex, France
e-mail: [email protected], Tel: +33 2 99 84 73 10; Fax: +33 99 84 25 29
the mean delay between allocation requests and the mean tion, super-scalar execution) were disabled.
time spent performing dynamic memory management when An initially empty 4 MB heap is used in the experiments.
using the quick-fit allocator. The smallest value for minimum block size and splitting
threshold possible for all allocators (16 bytes) is used.
3.3. Synthetic workload
4. Results
The synthetic workload application we have used is the
MEAN model of memory allocation evaluated in [15]. This 4.1. Real-time performance of memory allocation
simple model characterizes the behavior of a program by
computing three means: the mean block size, the mean block Table 2 gives for the three classes of workloads (real, syn-
lifetime and the mean time between successive allocation re- thetic and worst-case) the allocation times measured. For the
quests. These three values are then used to generate random real and synthetic workloads, table 2 gives both the average
values according to an exponential distribution. This model and worst measured allocation times. The remainder of this
is slightly less accurate than other models of memory allo- section is devoted to a detailed analysis of the contents of ta-
cation [15], but performs well on relatively simple allocation ble 2, which is illustrated by a set of diagrams built from the
algorithms and requires much less coding effort and storage contents of the table.
than more sophisticated models. For the experiments, the Subfigure 1(a) depicts the average allocation times mea-
model parameters have been selected to mimic the behavior sured during the execution of the real and the synthetic
of the Espr application, as shown in Table 1. workloads. The allocator that exhibits the best average per-
formance is the quick-fit allocator, followed by the binary
3.4. Measurement method buddy and the first-fit allocator. It can be noted that the
quick-fit allocator has modest average performance for the
Measurements were conducted on a Pentium 90 MHz ma- mpg123 application. This is because mpg123 allocates and
chine. The code is loaded and monitored using the Pen- frees a small block every time a frame is decoded. This
tane tool, which allows the non-intrusive monitoring of ap- causes a page to be split into fragments at allocation time,
plications (there is no activity other than the module to be and the same page to be freed at deallocation time, which
tested). Pentane provides control over the hardware (e.g. en- has an important time overhead. Deferred coalescing would
abling/disabling of caches), and allows to count the number be required to improve the average allocation time of such
of occurrences of a number of events (e.g. number of proces- an allocator. Also remark that most allocators perform bet-
sor cycles). All the timing measures given in the next sec- ter for real applications than on the synthetic workload. The
tion are expressed in number of processor cycles. In order reason is that the number of different sizes allocated by the
to be able to compare the timing of memory allocation us- synthetic workload is greater than the one of real applica-
ing different workloads having very different locality prop- tions, thus causing more fragmentation and increasing the
erties (e.g. a real workload with synthetic workload that do average allocation time.
not actually executes application code), all performance en- Subfigure 1(b) depicts the worst-case allocation times
abling features (instruction and data caches, branch predic- (obtained analytically) for the allocators studied. The worst
30000
100000000
mpg123
25000 10000000
cfrac
espr 1000000
20000 synthetic 100000
5000 10
1
0 first-fit best-fit btree- fast-fit quick-fit buddy- buddy-
first-fit best-fit btree- fast-fit quick-fit buddy- buddy- best-fit bin fibo
best-fit bin fibo Allocator
Allocator
(a) Average allocation times (b) Worst-case allocation times obtained analytically
35 10000
mpg123
30 mpg123
cfrac
cfrac
espr 1000
25 espr
synthetic
synthetic
20
Ratio Ratio 100
15
10
10
5
0 1
first-fit best-fit btree- fast-fit quick-fit buddy- buddy- first-fit best-fit btree- fast-fit quick-fit buddy- buddy-
best-fit bin fibo best-fit bin fibo
Allocator Allocator
(a) Worst-case actually observed (b) Worst-case allocation times obtained analytically
time required for block allocation in sequential fits is very times are used instead of actually measured worst-case allo-
large, because at worst, the free-list covers the entire memory cation times (subfigure 2(b)), the ratio becomes much larger,
and has to be scanned entirely to find a free block (see x 3.1). especially for the sequential-fits and the fast-fit allocators.
Sequential-fits, while exhibiting reasonably good average However, this ratio stays reasonable for some allocators like
performance, have very poor worst-case performance. In the buddy systems, for which the ratio is lower than ten.
contrast, buddy systems, that have a worse average timing
behavior than sequential fits, have a much better worst-case An important issue to address when using dynamic mem-
behavior. ory allocation in real-time systems, especially in hard real-
time systems in which all deadlines must be met, is to
Figure 2 depicts the ratio between the worst-case allo- select the upper bound of the time for memory alloca-
cation times and average allocation times for the allocators tion/deallocation to be used in the system schedulability
studied. In the left part of the figure, the worst-case alloca- analysis. As said in the introduction, the average allocation
tion times used to compute the ratio are the worst allocation time is clearly not appropriate because it underestimates the
time actually observed while executing the real and synthetic time required for block allocation, as shown in figure 2. Two
workload, while in the right part of the figure, the worst-case
allocation times are those obtained analytically (see x 3.1).
upper bounds can then be selected: the worst-case measured
allocation times on a specific application, or the worst-case
The ratio between actually observed worst-case alloca- allocation times identified analytically. Figure 3 examines
tion times and worst-case allocation times (subfigure 2(a)) the impact of the selection of one of these two upper bounds
is rather small (less than six) for most allocators, whatever on the application execution time for mpg123 and Espr ap-
workload is executed. When analytical worst-case allocation plications (subfigures 3(a) and 3(b)). The figure depicts the
100% 100% Average
90% Average 90% worst-measured
80% worst-measured 80% worst-analytical
percentage of the workload execution time spent allocating lyze the schedulability of the mpg123 application, which is
and freeing memory, with three durations for block alloca- not allocation intensive, would be equivalent to consider that
tion and deallocation requests: average times; (ii) worst-case more than 90% of the workload execution time is devoted to
times measured on the application and (iii) worst-case times memory allocation. We can also observe that the impact of
obtained analytically. the choice of an upper bound for dynamic memory allocation
A first observation from the figure is that when using ac- is influenced by the type of application executed: the impact
tually measured worst-case allocation times, their impact on on the Espr workload is always higher than the one on the
the workload execution times is reasonable for almost all al- mpg123 application, because in the former workload, mem-
locators. This value can then be used by the system schedu- ory allocation requests are more frequent and more varied in
lability analysis without introducing excessive pessimism, as size than in the latter. An interesting result is that for some
far as the measurement conditions used to obtain the alloca- allocators, like for instance the buddy systems and the quick-
tion times do not lead to allocation times lower than those fit allocators, considering the worst possible allocation time
that could be measured on other executions of the appli- is realistic, at least for non memory intensive applications
cation. Such unsafe measurements can come for instance like mpg123 and the selected heap size (4 MB).
from the application control-flow that varies from one ex-
4.2. Real-time performance of memory deallocation
ecution to another (for instance because of different input
data), causing potentially different orders of memory alloca- Table 3 gives the average and worst-case deallocation
tion requests. Also note that the worst-measured allocation times measured on the real and synthetic workloads, as well
times are application-dependent, in contrast to the analyti- as the worst-case deallocation times obtained analytically.
cally obtained worst-case allocation time. Thus they should One can notice the very long worst-case deallocation
be determined every time a different application is built and times obtained analytically for the indexed-fits allocators,
only used on a per-application basis. that come from the tree restructuring operations that occur
If we now consider analytically-obtained worst-case al- at block deallocations (see x 3.1). Another remark is that
location times, one conclusion is that using them, albeit the allocators that use boundary tags for block coalescing
safe and application-independent, should not be used for the (sequential-fits) have low and predictable deallocation times.
sequential-fit allocators. For instance, using them to ana- This block merging technique, while space consuming, ex-
300000000 1200000
150000000 600000
btree-best-fit
quick-fit
100000000 400000 buddy-bin
buddy-fibo
50000000 200000
0 0
0 5000 10000 15000 20000 0 5000 10000 15000 20000
Heap size (KBytes) Heap size (KBytes)
hibits low and predictable deallocation times. Buddy sys- ranging from 128 KBytes to 16 MBytes. Because of the
tems also have reasonable worst-case deallocation times. large variety of worst-case allocation times, the algorithms
are divided into two subfigures: subfigure 4(a), for sequen-
5. Discussion tial fits and fast fit, for which the worst-case allocation time
increases linearly with the heap size ; and subfigure 4(b) for
We classify below the studied allocators with respect to the other algorithms. We see on the figure that the allocator
their worst-case allocation times identified analytically in with the lower worst-case allocation time is different for dis-
section 3.1, and examine how their worst-case allocation tinct heap sizes. The heap size is thus an important parameter
time varies with the heap size (x 5.1). Then, we discuss in the process of selecting a suitable allocator according to
which real-time bound should be used in which context (hard its worst-case timing behavior.
or soft real-time system) (x 5.2).
5.2. Which metric for real-time performance of al-
5.1. Impact of heap size locators ? In which context ?
The following table summarizes the pros and cons of us-
The algorithms we have studied can be classified in four ing each of the two metrics of real-time performance evalu-
categories with respect to their worst-case allocation times
obtained analytically (x 3.1):
ated in this paper, and the context of use for which they are
best suited.
? sequential fits and fast-fit, for which the worst time re- (a) Worst-case (b) Worst-case measured
quired to allocate a block is linear with the number of obtained analytically on specific application
Pros Safety Reduced pessimism compared
free blocks, which at worst increases linearly with the to (a)
heap size. Context-independent
? the binary tree best fit allocator, for which the worst- Cons Pessimism Context-dependent
case allocation time increases as the square root of the Safety conditioned by the rep-
resentativity of testing condi-
heap size.
? buddy systems, for which the worst-case allocation time Context Hard real-time sys-
tions
Hard real-time systems (if test-
is proportional to the number of different block sizes, of use tems ing conditions correct)
which increases logarithmically with the heap size. Soft real-time systems
? quick-fit allocator, which has a different worst-case al-
location behavior depending on the heap size: for small The clear advantage of using the worst-case execution
heaps, the worst-case allocation time is when a page is time obtained analytically is that it is safe and context-
fragmented into the smallest block size; for large heaps, independent (it is an upper bound on the allocation times for
the worst-case allocation time is when the list of pages any application). However, as shown earlier in the paper, the
has to be scanned entirely to find a free block. values obtained can be very high depending on the allocator
and the heap size. Thus, the best suited context of use of
Figure 4 confirms this classification with measured values such a value is the one of hard real-time systems, in which
of the worst-case allocation times for different heap sizes, safety primes over pessimism reduction.
In contrast, using actually measured worst-case alloca- of the allocators has also to be considered. Finally, we are
tion times yields to less important values than the worst- currently studying the use of optimization methods such as
case allocation times obtained analytically. However, the ob- genetic algorithms to automatically obtain approximations
tained value is context-dependent; in particular, it depends of the allocators worst-case execution times.
on the allocation pattern of the application. In addition, it
is safe only if the measurement conditions actually lead to
the worst-case allocation/deallocation scenario for this ap- Acknowledgments. The author would like to thank M.
plication. For instance, if the execution of the application Bellengé and K. Perros for their work on the performance
is not deterministic and thus can lead to different traces of measurements, as well as D. Decotigny (IRISA), P. Chevo-
memory allocation requests for its different executions, the chot (Sogitec), and G. Bernat (Univ. of York) for their com-
worst-case allocation time of a given execution is not neces- ments on earlier drafts of this paper.
sarily the worst-case allocation time of any execution.
Instead of using one of the two above performance met- References
rics, one could consider using a probabilistic real-time bound
m together with a confidence level on this bound indicating [1] A. Diwan, D. Tarditi, and E. Moss. Memory system per-
the likelihood that the worst-case execution time is greater formance of programs with intensive heap allocation. ACM
or equal to m [8, 2]. Establishing such a probabilistic real- Transaction on Computer Systems, 13(3):244–273, Aug.
time bound requires in particular that the different tests of 1995.
execution times be independent [2]. Generating independent [2] S. Edgar and A. Burns. Statistical analysis of WCET for
tests on the kind of algorithms studied in this paper seems to scheduling. In Proceedings of the 22nd IEEE Real-Time
Systems Symposium (RTSS01), pages 215–224, London, UK,
us practically difficult: the execution time of the algorithms
Dec. 2001.
depends not only on the allocation requests parameters but
[3] A. Ermedahl and J. Gustafsson. Deriving annotations for
also on the system state (e.g. contents of the allocator data tight calculation of executing time. In Proc. Euro-Par’97 Par-
structures), the latter being difficult in practice to generate in allel Processing, volume 986 of Lecture Notes in Computer
a random manner. Sciences, pages 1298–1307. Springer-Verlag, Aug. 1997.
[4] M. Hipp. Mpg123 real time mpeg audio player (mpg123
6. Conclusion 0.59r). Free MPEG audio player available at http://www-
ti.informatik.uni-tuebingen.de/˜hippm/mpg123.html, 1999.
[5] D. E. Knuth. The Art of Computer Programming, chapter
This paper has given detailed average and worst-case tim-
Information structures (chapter 2 of volume 1), pages 228–
ing analysis of a comprehensive panel of dynamic memory
463. Addison–Wesley, 1973.
allocators. For every allocator, we have compared its worst-
[6] E. L. Lloyd and M. C. Loui. On the worst case performance
case behavior obtained analytically with the worst timing be- of buddy systems. Acta Informatica, 22:451–473, 1985.
havior observed by executing real and synthetic workloads [7] K. Nilsen and H. Gao. The real-time behavior of dynamic
and its average timing performance. A result of our perfor- memory management in C++. In Proceedings of the 1995
mance analysis is the identification of a discrepancy between Real-Time technology and Applications Symposium, pages
the allocators with the best average performance and the allo- 142–153, Chicago, Illinois, June 1995.
cators with the best worst-case performance. We have shown [8] P. Puschner and A. Burns. Time-constrained sorting - a com-
that for applications with low allocation rates, the worst- parison of different algorithms. In Proc. of the 11th Euromi-
case allocation/deallocation times obtained analytically can cro Conference on Real-Time Systems, pages 78–85, York,
be used without an excessive impact on the application exe- UK, June 1999.
cution times for the most predictable allocators (buddy sys- [9] P. Puschner and A. Burns. A review of worst-case execution-
tems and quick-fit). We have also examined the impact of time analysis. Real-Time Systems, 18(2-3):115–128, May
the heap size on the worst-case behavior of the memory allo- 2000. Guest Editorial.
cators, and discussed the context in which analytical and ac- [10] J. M. Robson. Worst case fragmentation of first fit and
tually measured worst-case allocation times are best suited. best fit storage allocation strategies. The Computer Journal,
20(3):242–244, 1975.
Our work can be extended in several directions. First, it
[11] C. J. Stephenson. Fast fits: new methods for dynamic stor-
would be interesting to study the impact of parameters such age allocation. In Proc. of the 9th Symposium on Operating
as the minimum block size, splitting threshold, distribution Systems Principles, pages 30–32, Betton Woods, New Hamp-
of the sizes of allocated blocks, on the allocators timing per- shire, Oct. 1983.
formance. Another important issue to work on is to find the [12] J. Vuillemin. A unifying look at data structures. Communi-
best trade-off between time and memory consumption, the cations of the ACM, 23(4):229–239, Apr. 1980.
most predictable allocators not being necessarily the most [13] C. B. Weinstock and W. A. Wulf. Quickfit: An efficient al-
space-efficient. The impact of hardware performance en- gorithm for heap storage allocation. ACM SIGPLAN Notices,
abling features (e.g. caching) on the worst-case performance 23(10):141–144, Oct. 1988.
[14] P. Wilson, M. Johnstone, M. Neely, and D. Boles. Dynamic
storage allocation: A survey and critical review. In Inter-
national Workshop on Memory Management, volume 986 of
Lecture Notes in Computer Sciences, Kinross, UK, 1995.
[15] B. Zorn and D. Grunwald. Evaluating models of memory
allocation. ACM Transactions on Modeling and Computer
Simulation, 4(1):107–131, Jan. 1994.