0% found this document useful (0 votes)
86 views6 pages

Smart Dynamic Memory Allocator For Embedded Systems

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views6 pages

Smart Dynamic Memory Allocator For Embedded Systems

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Smart Dynamic Memory Allocator for

Embedded Systems
Ramakrishna M, Jisung Kim, Woohyong Lee and Youngki Chung
Embedded Software Platform, System LSI Division, Samsung Semiconductor Business
Abstract—Dynamic memory (DM) allocation is one of the most time independent of number of free blocks and the length of
crucial components of modern software engineering. It offers a execution, should use small memory footprint, should provide
greatest flexibility to the software systems’ design; nevertheless, good locality properties by allocating blocks close in time are
developers of real-time systems often avoid using dynamic
close in space and as well as similar size objects should be
memory allocation due to its problems like unbounded or long
bounded response time and memory fragmentation. However, the allocated close in space.
modern complex applications like multimedia streaming and The majority of memory blocks used in any real application
network applications made the dynamic memory allocation are small size fragments of size typically smaller than few kilo
mandatory for applications’ design. The major challenges of bytes [1, 2]. Moreover, most memory blocks, which are used in
memory allocator are minimizing fragmentation, providing a any application, last for short durations [3]. If all the memory
good response time, and maintaining a good locality among the
requests are allocated from a single memory chunk (heap)
memory blocks. This paper introduces a new smart dynamic
memory allocator particularly for embedded systems that have without considering the life-spans, the heap memory will be
limited memory and processing power. It aimed at addressing the splinted over a time due to the short-lived blocks. Since the
major challenges of dynamic memory allocator. The smart short-lived blocks are most likely the small size blocks, the
allocator predicts the short-lived objects and allocates those effect of those objects on memory fragmentation is severe than
objects on one side of the heap memory and remaining objects on the long-lived objects. If the memory requirements and average
other side of the heap memory for effective utilization of memory
life-span of the objects were known, using a special chunk of
footprint. The allocator is implemented with enhanced multilevel
segregated mechanism using lookup tables and hierarchical memory for short-lived objects would solve the problem.
bitmaps that ensure very good response time and reliable timing Unfortunately, the memory requirements of the recent
performance. The proposed algorithm has shown excellent applications like multimedia streaming and wireless
experimental results, with respect to both response time and applications are unpredictable and moreover, the average
memory footprint usage, as compared to the well known memory requirement varies widely from one configuration to
algorithms. In addition, this paper presents a memory intensive
another [11]. Hence, dedicating a special memory chunks
synthetic (MIS) work load, which can model the allocation
behavior of most of the real applications. considering the worst case memory requirements would lead to
high overhead in memory space. The problem of handling
Index Terms—Dynamic memory allocation, Embedded short-lived objects and long-lived objects effectively, without
systems, Memory workload leading any extra memory overhead, at the same time providing
the advantage of using a special memory chunk is generated in
I. INTRODUCTION this memory allocator.
Dynamic memory allocation has been one of the most active In this paper, a new allocation methodology is introduced
research areas in computer systems for over four decades. A which allows designing a custom dynamic memory allocation
large number of algorithms on dynamic memory allocation mechanism with the reduced memory fragmentation and
have been proposed in the literature. Particularly, some excellent response time. In addition, the allocation mechanism
algorithms have been proposed for a good and reliable timing ensures the worst case response time always bounded and
response at the cost of high memory footprint. However, the almost independent of the application’s execution time.
DM Allocators usage is still considered as unreliable for The remainder of the paper is organized as follows. Section
embedded systems due to its unbounded long response time or II describes the background and related work. Section III
its inefficient memory usage. The DM allocators for embedded introduces the design of smart memory allocator. Section IV
systems should be implemented in their constrained operating describes the workloads used for the performance evaluation.
systems by considering the limited available resources. Hence, Section V presents the performance of the proposed allocator
the allocators should provide both the features such as optimum and the other state of art allocators. Section VI presents the
memory footprint usage and good response time conclusion.
simultaneously.
The important issue in the design of DM allocator is response II. BACKGROUND
time. However the issues like fragmentation, locality, false A vast number of DM allocators have been proposed in the
sharing and mutual exclusion are also equally important in the literature. The basic algorithms among all are first-fit, best-fit,
design of DM allocators [13]. Ideally, a memory allocator that next-fit, worst-fit, segregated-fit, bitmapped algorithms and
is targeted for wide range of applications, should take constant buddy systems. The rest of the algorithms are several variants
and combinations of these basic algorithms. However,
describing the exhaustive list of all algorithms is beyond the
978-1-4244-2881-6/08/$25.00 ©2008 IEEE
scope of this paper. An overview of the DM allocators is number of allocation events is a good measure for objects’
available in the literature [4, 5, and 12]. prediction. The proposed allocator has used a combination of
The major issues of any memory allocator are response time, object size and number of allocation events for the prediction of
fragmentation, locality and cache pollution. The other issues object lifetimes.
like mutual exclusion and synchronization are not described
here since the proposed algorithm aimed at minimizing the
fragmentation and serving the requests in a bounded response
time. However the smart allocator frame work can be used with
any fine grained locking mechanism for mutual exclusion.
Fragmentation denotes the wasted space in memory. The
fragmentation can be occurred in two forms: internal and
external fragmentation. Internal fragmentation occurs when
Fig. 1 Data Structures of Free and Allocated Blocks
memory space is allocated for program without ever intending
to use it, such as headers, footers and padding around an Fig.1 shows the structure of free and used data blocks. The
allocated object. External fragmentation is the phenomenon in memory allocator inserts header information into each used and
which free space is splinted into numerous small blocks over a free blocks. The block header of free blocks holds the
time and those blocks cannot be useful for any future memory information such as BS (32-bits, last two bits are always zeros
requests of large size blocks. Unlike the internal fragmentation, since block sizes are always multiples of 4), which specifies the
external fragmentation is difficult to quantify since it is entirely size of the block, BT (1-bit), which specifies the block type, AV
depends on future memory requests. The external (1-bit), which specifies the block status, Prev_Physical_BlkPtr
fragmentation is usually measured as proportion of total free and Next_Physical_BlkPtr, which are required for identifying
memory space available at the time of memory overflow. the status of physically adjacent blocks. Prev_FreeListPtr and
Cache pollution is the other issue in the designing of a Next_FreeListPtr are required for locating the previous and
memory allocator. A memory allocator may degrade the cache next free block in a segregated free-list. The block header of
performance by accessing several objects before finding a used blocks holds all the fields of the free block header except
suitable free block for memory allocation request. If several the free-list pointers, which are not required for used blocks
objects are accessed for each request, the allocator may cause because those blocks are not linked into any segregated
several cache misses [6]. free-lists. However the used block’s header holds an extra field
The memory allocation algorithms are basically classified called BlkAllocStat, which keeps the block allocation statistics
into following categories [5]: Sequential fits (first-fit, next-fit, for block prediction algorithm. The header overhead is
best-fit and worst-fit), segregated fits, Buddy systems [13] and accounted into internal fragmentation.
Bitmaps.

III. DESIGN OF SMART MEMORY ALLOCATOR


In this section, design mechanism of a new smart dynamic
memory allocator is presented. The smart memory allocator
predicts the short-lived objects and allocates those objects on
one side of the heap memory and remaining objects on other
side of the heap memory for reducing the memory
fragmentation. The allocator is implemented with enhanced
multilevel segregated mechanism using lookup tables and
hierarchical bitmaps that ensure very good response time as Figure 2 Free-Block-Lists Organization
well as reliable timing performance.
The smart memory allocator uses a large number of
The allocator reduces the memory fragmentation by using
free-lists. Where, each list keeps the free blocks of size within a
directional adaptive allocation based on the predicted object
predefined range and also the blocks belongs to some specific
lifetimes. The lifetimes can be predicted using several
block-type. Handling the short-lived blocks separately from
parameters such as, block size, number of instruction executed
long-lived blocks effectively without causing extra search
between the memory block allocation and deallocation, total
overhead, the free-lists are organized into three-level arrays as
number of bytes allocated between the memory block
shown in Fig.2. The first-level divides the free-lists into
allocation and its deallocation, and the number of allocation
free-list-classes (for example, the block sizes between (215-1)
events between a block allocation and its deallocation [8, 10].
and 214 comes under 14th free-list-class), each class of
The number of instructions executed parameter is not
free-lists are further subdivided into different free-list-sets with
particularly suitable for this context since memory management
each set keeping the free blocks of size within a predefined
events are the only interesting events in this context. The total
range (the dynamic rage of class is linearly subdivided among
number of bytes allocated parameter is not effective since
all the free-list-sets). Finally, each set is divided into two
object sizes vary from few bytes to few mega bytes. Our
free-list-ways, one corresponding to short-lived and other
experiments suggested that a combination of block size and the
corresponding to long-lived objects. pool has insufficient free memory for some request and
Fig.3 shows the two-level bit masks that are used in the short-lived memory pool has large chunk of free block at the
allocator for identifying the available free blocks. Two 32-bit memory-pool boundary, the boundary free block will be split
fields are used as first-level masks, one for short-lived blocks into two and one block will be submitted to long-lived memory
and another for long-lived blocks. Each bit of the first-level pool. Similarly, the memory space for short-lived and
mask indicates the availability of free blocks in the long-lived memory-pools is adapted according to the run time
corresponding free-list-class (32 bits corresponds to 32 requirements.
classes). Based on the memory block-type, one or both of the Fig.5 describes the proposed memory allocation technique
32-bit fields are used for finding the free-list-class that has the using a simple case model, assuming 32 free-list-classes and
suitable closest size free-blocks for block request. An 8-bit each class contains 8 free-list-sets (Number of free-list-sets can
second level mask (can be configured) is maintained for each be configured). In response to a memory request of specific
bit of the first level mask. In total, 64 8-bit masks are used for block size, the appropriate free block is identified by following
identifying the free blocks in any free-list-sets. the below steps. First, it computes the first-level-index, i.e. the
most significant non zero bit position of the block-size. It is
computed using a lookup table LTB1 (Fig.4) without causing
any bit search overhead. It is computed as the following
example,
BitShift = 24; Byte = BlkSize >> BitShift; first-level-index = LTB1[Byte];
While(first-level-index == 0xFF){
BitShift += -8; Byte = (BlkSize >> BitShift) && 0XFF;
first-level-index = LTB1[Byte];} //lookup table
Figure 3 Bit-masks Organization
first-level-index += BitShift;; N = first-level-index;
Once the first-level-index is found, block-type is predicted by
using a 32-bit BlkPredMask. If the Nth bit of the mask is one, the
block-type will be treated as short-lived block; otherwise, it
will be treated as long-lived block.
The BlkPredMask is initially set with predefined value 1023
(The first 10 free-list-classes are treated as short-lived classes)
and whenever some block is de-allocated; the mask is adapted
based on the life-span of that block. The life-span of any block
Figure 4 Sample Heap Memory Organization & Lookup Tables
is defined as the number of blocks has been allocated since that
The allocator handles the short-lived blocks separately from block gets allocated. Whenever some block is de-allocated, the
the long-lived blocks. As shown in Fig.4, the short-lived blocks BlkPredMask will be adapted as the following example,
are allocated from heap in the direction of bottom to top and the Blk_LifeTime = Global_AllocNum – Blk_AllocNum; //from block statistics
long-lived blocks are allocated from top to bottom. The used //number of blocks has been allocated since the block got allocated.
space of the heap grows from both sides. For example, let if(Blk_LifeTime < (Blk_Max_LifeTime/2)){
the heap size is 200 bytes, each short and long-lived object pool ModeCnt[Class]++} // updating count
has only one free block of size 100-bytes. In response to a
else{ModeCnt[Class] --;
memory block request of size 8-bytes (for instance, short-lived
object), the bottom 8-bytes of the free block corresponding to Max_Span_In-Blks = MAX(Max_Span_In-Blks, Blk_LifeTime)}
short-lived blocks is allocated for the request. In contrast, upon if(ModeCnt[Class] > 0){ //mode decision
a memory request of block size 32-bytes (assume, long-lived BlkPredMask = BlkPredMask | (1 << Class);}
object), the top 32-bytes of the free block corresponding to else{BlkPredMask = BlkPredMask & (0xFFFFFFFF ^ (1 << Class));}
long-lived blocks is allocated for the request. Similarly, all The Blk_Max_LifeTime is initially considered as 16. If the
short-lived blocks can be allocated from heap in the direction of life time of any block is smaller than the half of the maximum
top to bottom and the long-lived blocks can be allocated from life time, the corresponding ModeCnt will be increased by one
bottom to top. otherwise it will be decreased by one. If the Blk_LifeTime is
Initially whole heap memory is free and there is only one free greater than the Blk_Max_LifeTime, the maximum life time will
block each for short-lived and long-lived memory pool. The be updated with current block’s life time. Finally, if the block
heap space is initially split into two blocks (no physical size belongs to the Jth class, the Jth bit of the BlkPredMask will
boundary, only virtual memory-pool boundary) in a predefined be set or reset based on the value of ModeCnt[J].
proportions, one for short-lived and the other for long-lived Once the block-type is predicted, the first level mask is
objects. Since the heap grows from both sides, the boundary initialized with either 32-bitS_MASK, which corresponds to
between short-lived memory pool and log-lived memory pool short-lived blocks, or 32-bitL_MASK, which corresponds to
can be easily adapted according to the run time memory long-lived blocks, based on the block-type. If the block type is
requirements. For instance, the case when long-lived memory short-lived block, the block splitting flag will be disabled.
Otherwise, it will be enabled. Based on the split flag status, the if(Mask == 0){//Out of memory. Get new memory block from OS.}
allocator will decide whether to split the block into two or not Temp = LTb2[Mask & 0xFF]; first-level-index += 8;}
when the size of available free block, which is going to be used
second-level-index = LTb2[SecondLevMask[first-level-index]];
for the request, is larger than the requested size.
M = second-level-index;

By using the second-level-mask corresponding to


first-level-index; it checks the availability of free blocks in the
Mth free-list-set. If any free blocks are available in that
free-list-set, based on block-type flag, the free block at the top
of the free-list-Sway or free-list-Lway is allocated for the block
request. Otherwise, when free blocks are not available in the
Mth free-list-set or free block at the top of the list is smaller than
the desired block size, the first non zero MSB location from Mth
bit of second-level-mask is treated as new second-level-index.
It is computed as the following example,
Mask = SecondLevMask >> (M+1);
second-level-index + = LTb2[Mask];
Finally, top free block of one of the free-list-way, which is
indexed by the second-level-index, is used for the block request
based on the BT flag.
If the block type is long-lived block and the size of free
block, which is going to be used for the block request, is larger
than the requested block size, the free block will be split into
two and the remaining fragment will be inserted into a list
based on the fragment size. On the other hand when the block
type is short-lived block, the block will be split based on the
status of the split-flag and the free block-size. If the free block,
which is going to be used for block request, and requested block
size represents the same free-list-class, the entire free block
will be allocated for the block request irrespective of the
available block size. If the requested block size and the free
block at the top of the list are from different classes, the free
block will be split into two and remaining fragment will be
inserted into the corresponding list. The block split condition
for short-lived blocks (most likely the block sizes of these
blocks are in few bytes to few Kbytes) would rescue from
memory being splinted into very small blocks.
In response to a memory block free request, the status of the
physically adjacent blocks is identified with the help of doubly
Figure 5 Flow chart of memory-allocation mechanism
linked list. If one or both blocks are free, all the free segments
The availability of any free blocks in the Nth free-list-class is are coalesced to form a larger free block. The free-list-class and
identified by using the first-level-mask. If any free blocks are free-list-set index are computed based on the block size
available in Nth free-list-class, the first-level-index will be set as (coalesced block size if any adjacent blocks are coalesced) by
N and the next k MSBs (number of free list sets = 2^k) of the using the lookup table TB1. Finally, the free block is inserted
block size is considered as second-level-index. The case when into one of the free-list-way corresponding to either long-lived
the Nth free-list-class has no free blocks or none of the suitable objects or short-lived objects based on the BT status and then
free-list-sets in Nth class have free blocks, the first-level-index the bit masks are updated. The following sections show the
will be set as the next non zero MSB position from the Nth bit performance of the proposed dynamic memory allocation
of the first-level-mask and the second-level-index will be set as mechanism.
zero. The non-zero position is located using the lookup table
LTB2 without doing any bit level search. It is computed as the IV. WORKLOAD GENERATION
following example This section describes two synthetic workloads which are
first-level-index++; Mask = FirstLevMask >> first-level-index; used for performance evaluation of the DM allocators: One is a
Temp = LTb2[Mask & 0xFF]; memory intensive synthetic (MIS) work load that can model the
while(Temp == 0xFF){ Mask = Mask >> 8;
allocation behavior of real applications and other is a workload
that can exhibit the worst case allocation and deallocation times
[5]. In this work, MIS workload which is slightly modified V. EXPERIMENTAL RESULTS
version of JGCW [9] is used for average response time and All the experiments were carried on Pentium D 3.4 GHz
memory footprint measurements. The MIS workload attempts CPU with 1 GB RAM. The results shown are average values
to model the dynamic object behavior of real applications. The after a set of 100 simulations for each DM allocator.
MIS workload implements a binary tree data structure by Table-1 gives the average response times of allocation and
considering each memory block request as a node. The deallocation operations of all the test allocators for MIS
benchmark can be configured with the number of blocks, the workload. The workload has been configured as, the maximum
maximum and minimum block size, the boundary between and minimum object sizes to 16384 and 64 respectively, the
large/small size objects, and the weightage of number of small boundary-size to 2048, small size blocks weightage to 60%.
size blocks to large size blocks. This workload tries to control The average response times of the test allocators have been
the block life span. The large size objects, which have been tabulated by varying the total number of objects in the
classified by the boundary size, tend to have a longer life span workload.
than the life span of small size objects. Fig.6 & Table-1 depicts the average response times during
The MIS workload initially creates a binary tree of specified the execution of MIS workload. The allocator that gave the best
number of nodes with randomly generated object-sizes by average response time was the binary-buddy allocator with
maintaining the specified weightage between small-size objects bitmap mechanism, followed by the smart memory allocator
to large-size objects. The boundary between large/small size and DL’s malloc. The response times of the smart memory
objects is considered as root node size and any object of size allocator, binary-buddy and DL’s malloc were almost
smaller than the root size is located to the left of the root node, independent of the total number of blocks allocated. In contrast,
otherwise it is located to the right of the root node. The objects the response times of the first-fit, next-fit, best-fit and worst-fit
in the left-sub-tree of the root node are smaller in size and the allocators were increased almost exponential with the total
objects in the right-sub-tree are larger in size compared to the number of blocks allocated.
root object. Table-2 gives the maximum footprint used by all the test
Once the initial tree is created with the specified number of allocators for MIS workload. The optimum memory footprint
objects, it starts to free & reallocate the objects (same size) in a usage is one of the important requirements of DM allocators for
left-sub-tree from bottom level to top level recursively. For embedded systems. Fig.7 depicts the maximum footprint used
instance, maximum depth of the left sub tree is four. All the by the test allocators for MIS workload. The allocator that used
objects below the depth-3 are freed & reallocated in the first the minimum footprint was the smart malloc, followed by the
iteration, and all the objects below the depth-2 are freed & DL’s malloc and best-fit malloc. The first-fit, next-fit and
reallocated (depth-3 and depth-4) in the second iteration. It binary- buddy allocators used the maximum memory footprint
continues the same procedure until it reaches to minimum and moreover the footprint usage of these allocators has been
depth. Similarly, it repeats the same method for right-sub-tree. grown with the total number of blocks allocated. It is certainly
But in case of right-sub-tree, apart from the objects (in unacceptable for embedded systems. The results shown in the
right-sub-tree) below the designated tree depth, all the objects Table-2 are with the MIS workload parameter, small size
in the left-sub-tree will be reallocated in each iteration. By blocks weightage 60%. The memory usage performance of the
doing that, the MIS workload, creates more small size objects, smart memory allocator is further improved at higher
reallocates the small size objects more frequently than the large weightage of small size blocks (the ratio of small size objects to
size objects (life span), and maintains constant stress on large size objects).
memory allocators. In addition, the workload effectively Table 1 Average response time (Micro seconds)
simulates the temporal locality of object creations. Allocator/ Num Of 10000 25000 50000 75000 100000
The second workload used in our experiments is a synthetic Objects (Tot. Num (0.46 (1.8 (5.74 (11.6 (19.74
Of malloc events) Meg) Meg) Meg) Meg) Meg)
workload for measuring the worst case response times. The set First-Fit 0.39 0.54 0.66 0.91 5.27
of allocators used in the experiments are: First-Fit, Next-Fit, Next-Fit 0.43 0.61 0.75 0.95 46.27
Best-Fit, Worst-Fit, Binary-buddy (bitmap), Douglas Lea’s Best-Fit 16.91 41.21 97.10 215.4 416.3
malloc (version-2.8.3, USE_LOCKS = 0) [7] and smart malloc. Worst-Fit 13.40 33.84 83.16 183.4 370.8
BinaryBuddy 0.34 0.48 0.57 0.59 0.58
The detailed explanation about the worst case scenarios of all Bitmap
the test allocators can be found in [5]. The performances of the DL’s Malloc 0.48 0.62 0.68 0.68 0.66
smart allocator and binary-buddy mechanisms are almost Smart Malloc 0.39 0.52 0.60 0.62 0.62
independent of the number of free blocks, however the small
Table-3 gives the average internal fragmentation of test
variation of execution times are due to the different execution
allocators. Since all the test allocators have been implemented
branches of the code. The worst case scenarios for the smart
with same coalescing and rounding schemes, the internal
allocator and binary-buddy mechanisms are same. The worst
fragmentation was almost same for all the allocators accept the
case for allocation occurs if the application requests a small
binary-buddy allocator. The internal fragmentation of the
block when the entire memory is free. In case of object free
binary-buddy allocator was very high. It affects the data cache
request, it occurs when the released block is to be coalesced
performance. The internal fragmentation of DL’s malloc has
with its neighbor blocks.
not been included in the results due to implementation
differences. Average Response Time per Malloc&free Operation (Micro seconds)
1.2

Response Time (Micro Seconds)


Table-4 gives the worst case response times of the test Smart-Malloc
First-Fit
allocators. The worst case response times for object allocation 1
Next-Fit
was very high in case of first-fit, next-fit, best-fit, worst-fit 0.8 Binary-Buddy
DL's Malloc
(sequential fits) allocators. In contrast, binary-buddy and smart 0.6
memory allocators had much better worst-case behavior. The 0.4
response times of these two allocators were almost same,
0.2
independent of the number of blocks allocated and the length of
0
the application’s execution time. 0.463 1.803 5.741 11.615
Number of Malloc & Free Operations (Million)
Table 2 Maximum Memory Footprint used (Mega Bytes) Figure 6 Average Response Time per Malloc & Free operation (u seconds)
Allocator/ Num 5K 10K 20K 35K 50K
Of Objects (Tot (227 MB) (538 (1328 (2239 (4798
Alloc. MB) MB) MB) MB)
Memory)
First-Fit 12.90 25.93 51.63 77.03 128.16
Next-Fit 13.96 27.82 53.32 74.71 132.89
Best-Fit 8.86 17.81 35.41 52.78 87.81
BinaryBuddy 13.51 27.16 53.05 78.73 134.32
Bitmap
DL’s Malloc 8.85 17.76 35.32 52.63 87.49
Smart Malloc 8.27 16.84 33.23 49.44 82.23

Table 3 Average Internal fragmentation per allocation (in Bytes)


Average First-Fit Next-Fit Best-Fit Worst Binary Smart
Fig. 7 Maximum memory footprints used (Mega Bytes)
Internal -Fit Buddy Malloc
Fragmentation -Bitmap
(Bytes/Allo 12.38 12.4 12.6 12.3 548 12.24 REFERENCES
cation)
[1] M.S. Johnstone, and P. R. Wilson, “The Memory Fragmentation Problem
Solved?,” ISMM, 1998. pp.26-36.
Table 4 Worst case response Times (Micro Seconds) [2] Dougles James, “Dynamic Memory Allocation in a Computer using a
First- Next- Best- Worst Binary DL’s Smart
Bitmap Index,” US Patent 5784699.
Fit Fit Fit -Fit Buddy Malloc Malloc
Bitmap [3] Yusuf Hasan, and Morris Chang, “A Hybrid Allocator,” Proceedings of
malloc 67.57 71.23 67.33 65.4 0.468 9.821 0.581 the 2003 IEEE ISPASS, 2003.
free 0.201 0.213 0.208 0.22 0.220 0.379 0.266 [4] P. R. Wilson, M. S. Johnstone, M. Neely, and D. Boles, “Dynamic
Storage Allocation: A Survey and Critical Review,” In Proceedings of
the Memory Management International Workshop IWMM95, Sep 1995,
VI. CONCLUSION pp. 1-126
[5] I. Puaut, "Real-Time Performance of Dynamic Memory Allocation
This paper introduced a new smart dynamic memory Algorithms," in Proceedings of the 14th Euromicro Conference on
allocator particularly for embedded systems that have limited Real-Time Systems (ECRTS'02), June 2002.
[6] F. Sebek, “Cache memories in real-time systems,” Technical Report
memory and processing power. In addition, a memory 01/37, Mlardalen Real-Time Research Centre, Department of Computer
intensive synthetic workload, which can model the allocation Engineering, Mlardalen University, Sweden, September 29, 2001.
[7] Doug Lea, “A Memory Allocator,” Unix/Mail, 6/96, 1996.
behavior of real applications, is presented. The performance of [8] David R. Hanson, “Fast allocation and deallocation of memory based on
object life times,” Software-Practice and Experience, Vol. 20(1), Jan
the proposed algorithm has been compared with the basic and 1990 pp. 5-12.
state-of-the-art DM allocators. The proposed algorithm has [9] Woo Hyong Lee, “A Java Garbage Collection Workload,” ACM, 7TH
Joint Conference on Information Sciences, Sep 2003.
shown an excellent performance with respect to average [10] David A. Barrett, Benjamin Zorn, “Using Lifetime Predictors to Improve
response time, worst case response time and maximum memory Memory Allocation Performance,” ACM, SIGPLAN. SIGPLAN'93
Conference on Programming Language Design and Implementation,
footprint usage as compared to DL’s malloc (version 2.8.3). 1993, pp. 187-196.
The average response time of smart malloc is 7 to 20% shorter [11] M. Leeman, D. Atienza and G Deconinck, “Methodology for Refinement
and Optimization of Dynamic Memory Management for Embedded
than the DL’s malloc. The worst case response time (malloc) of Systems in Multimedia Applications,” Signal Processing Systems, Aug
smart-malloc is about 16 times shorter than the DL’s malloc 2003, pp. 369-374.
[12] M. Masmano, I. Ripoll and A. Crespo, “TLSF: A new dynamic memory
worst case time. The maximum memory footprint usage of allocator for real-time systems,” In 16th Euromicro Conference on
smart malloc is 5 to 6.5% smaller than the DL’s malloc. As Real-Time Systems, pages 79-88,. Catania, Italy, July 2004.
[13] Lecture: Dynamic memory Allocation-1, Rutgers University,
compared to the binary-buddy (bitmap segregation and Lookup http://camino. rutgers.edu/cs211/lecture13.pdf
table optimizations are incorporated) malloc, the smart malloc
is just 5 to 10% slower but almost 40% more efficient in
memory usage.

You might also like