C++ STL For Embedded System PDF
C++ STL For Embedded System PDF
Kazuhiro Ondo
VSX Development/CDMA System Division/GTSS
IL75/[email protected]
ABSTRACT
C++ Standard Template Library (STL) is a convenient set of reusable algorithms and containers available with most of
C++ distributions. Especially its templated containers (such as “list”, “map” etc.) help improve the software
development productivity in a high degree.
Despite its advantages, there are some pitfalls that STL may cause when it is used for embedded system development
where typically there are restrictions and limitations on the capacity, the performance and the development tool set.
The paper will cover those potential issues of STL with combination of the embedded systems, and present one flavor of
STL implementation which is intended to make up for those disadvantages. The paper will also share the quantified
result of the comparison between the prototyped “embedded STL” and the original STL implementation.
int main(){
Due to the nature of the C++ templates, the use of STL map<int, ClassA*> hash_a;
can result in a bloat of the code size. Its data containers’ hash_a[100] = new ClassA;
hash_a[3200] = new ClassA;
memory usage scheme assumes that plenty of system
map<int, ClassA>:: iterator iter;
resources are always available, so it is not best utilized
if ((iter = hash_a.find(2000)) == hash_a.end()){
for embedded systems . STL provides several cout << “The item not found at index of
2000”
conveniences during the coding phase, but can cause << endl;
};
significant problems during the system debug phase if
// program continues
there are no sophisticated memory/performance analysis
tools available. Figure 1 : Example – hash map with STL map
The use of STL is not recommended for embedded Underneath the STL data containers, the list and hash
systems development unless there is sufficient capacity data structures are implemented using ordinary
techniques such as “doubly linked list”, “red-black tree” applied to an embedded systems development
etc. – however they have been made available as C++ environment that has memory and performance
templates to accept generic data types. restrictions. The following is the list of potential
problematic areas:
Figure 2 and Figure 3 show the details of how STL lists
and STL maps are constructed internally. Code Size
As many C++ compilers automatically instantiate the
C++ template code in-lined within the object, this same
list<T
>
operational code may possibly be repeated in multiple
header places if the template code with the same type is used in
multiple locations. While this is more of a C++ template
Head Node
implementation characteristic, the frequent use of STL by
an application (from a convenience standpoint) could
header header header header
easily lead to code size bloat.
T0 T1 T2 T3
Memory Overhead
Node 0 Node 1 Node 2 Node 3
As shown in Figure 2 and Figure 3, each node used in
Figure 2 : Node association - STL List STL data containers typically requires header
information to support the doubly linked list or tree
map<Key,T algorithm. In case of the “list” (which utilizes the doubly
Parent
- num of Nodes linked list), there is a pointer to the next node and
header left or right
another pointer to the previous node – all of which costs
8 bytes on a system with a 32-bit CPU. Similarly, the
“map” requires 16 bytes per node as it utilize the red-
Head Node
black tree algorithm (“node-color”, “parent node ptr”,
header
“left node ptr” and “right node ptr”). If STL data
Key1
containers are used to store the data that is for instance,
T1 4 bytes long, the overhead introduced by the node
Node 1
header header header is very significant compared to the data carried
Key0 Key3 on the node. If the application wants to store huge
T0 T3 number of blocks (such as 1000-10000+), a significant
Node 3 amount of memory would be consumed by the STL node
Node 0
header header
overhead.
Key2 Key4
Number of dynamic memory allocation
T2 T4
One of the problematic areas in embedded system is the
Node 2 Node 4
uncertainty in terms of performance during dynamic
Figure 3 : Node association – STL map memory allocation. Many embedded systems developers
avoid the dynamic memory allocation as much as
In the typical implementation, the list or the map itself possible by using the statically allocated memory. Often
has a reference to the Head Node, which does not carry times, teams prefer to develop and use their own memory
the data itself - instead, it provides a reference to the management scheme rather than the one provided by the
beginning (or the end) of the data node. The rest of data OS so as to minimize the performance degradation of
nodes are linked by the doubly-linked list or the red- using dynamic memory allocation. STL data containers
black tree starting from the Head Node. Each node is a rely heavily on the dynamic memory allocation by its
separate memory chunk, and the node is expected to be design. Whenever a container needs to be manipulated,
retrieved dynamically during insertion or deletion from a memory chunk will be dynamically allocated (or de-
STL container. allocated) for the use of the data node. STL uses the
Downside of using STL in embedded OS’s default memory call (i.e. “malloc” and “free”) to
allocate memory – this is not usually recommended due
environments to the uncertainty in performance and memory
As seen previously, the implementation of STL data fragmentation that may be encountered.[1] Instead,
containers seems to be highly scalable and flexible. But custom memory allocation APIs can be specified through
these benefits might become disadvantages if they are STL declarations. However, even with this modification,
a large number of memory calls need to go through the Memory Tracing/Debug Capability
custom memory manager for each node addition or Unlike Windows or Unix, many embedded systems still
deletion. do not have a good tool set or a good debugging
Error Handling environment to perform memory related traces. Even if a
good tool set is available, those tools might not work
STL relies on C++’s exception mechanism to handle
with the target system due to certain resource
errors that occur during memory allocation. While this
restrictions.
works fine for normal C++ programs, it presents a
problem when it comes down to embedded systems. In many cases, the embedded systems developers
Specifically, the exceptions may (intentionally) have implement their own memory managers to compensate
been turned “off” in the compiler to minimize code size, for the lack of a tool set. Because memory allocation is
or due to design decisions (being that the exceptions do hidden in the STL implementation, it is difficult to
not provide sufficiently flexible error handling in the identify the ultimate client of a given memory block. (See
embedded application code). Once again, in such cases, Figure 5) Also, if the number of STL nodes being used
the use of vanilla STL would cause problems . becomes larger, it typically hits the limitation of the
memory manager due to the excessive number of
Figure 4 is a snap shot from the “list” implementation to
dynamic memory blocks to track.
create a brand new STL node. The null pointer check is
not performed assuming the exception is thrown before Memory Manager: 20 byte
the pointer is accessed. 0x45000
0x45000: Allocated from
list<int>::mem_allocator
link_type get_node() {
return list_node_allocator::allocate(); ?
} ?
link_type create_node(const T& x) {
link_type p = get_node();
__STL_TRY { list<int> list<int>
construct(&p->data, x);
} <<usage>> <<usage>>
__STL_UNWIND(put_node(p));
return p; Class A Class B
}
Prototyping Embedded STL - An example design. With this , only one memory block is allocated for
one STL instance. And there is no dynamic memory
The STL implementation prototyped in this section is
allocation when manipulating the data in the STL
just one flavor of the possible extensions from the
container. The number of elements to be used in the STL
existing STL implementation. The design policy
container is specified through the STL container
employed in this custom prototype features:
declaration. Finally, all the nodes are pre-allocated and
- Re-use of the existing algorithm. reserved as empty nodes.
- Minimizing number of memory allocations.
When it is needed, a new node will be picked up from un-
- Error handling without relying on the
used nodes in pre-allocated node list in the STL
exceptions.
instance. And the node will be marked as un-used once it
- Reduced number of memory blocks used.
is deleted from STL linked list or tree.
While achieving these design goals, some sacrifices
Re-use of the algorithm
must be made to the flexibility of STL. These are:
From the logical perspective, the prototyped STL
- Limited growth of STL containers. implementations are equivalent to those in Figure 2 and
- Inhibition of some interfaces which rely on Figure 3 (i.e. use of Head Node and that all the nodes are
exception error handling. linked by “doubly linked lists” or “red-black tree”).
Figure 6 and Figure 7 are the implementation diagrams of Obviously, the intention was to try to re-use the existing,
the prototyped STL code. (proven) logic without modification. The prototyped
implementation was only modified where STL nodes
list<T, n> were allocated.
- num of in-used Nodes
- Fee Node Idx (start, end) Most of the operations were also kept compatible with
0 (Head Node) prev idx next idx
the ordinary implementation except for some cases
1 prev idx next idx T1 described later.
2 prev idx next idx
3 prev idx next idx T0 Reduction of the memory overhead
4 prev idx next idx
prev idx next idx
Node array Instead of using the pointer to link one node with the
5 T2
6 prev idx next idx T4 other, the index to the array element is used. If the
number of elements used by the STL consumes less than
n - 2 prev idx next idx T5 64K of memory, the size of the index can be 2 bytes
n - 1 prev idx next idx
n prev idx next idx T3 (compared to 4 bytes on the pointer based approach). In
the prototyped implementation, the overhead was
reduced by 50% for both lists and maps.
Figure 6 : Image of Embedded STL list
implementation Improved error handling, Limited operations
The error handling of a node allocation failure was
map<Key, T, n> enhanced so that it does not rely on the C++ exception
- num of in-used Nodes
- Fee Node Idx (start, end) handler. In addition to this, STL operations where it is
0 (Head Node) P idx L idx R idx Key difficult to capture the failure reason without C++
1 color P idx L idx R idx Key T0 exceptions, were hidden for public use. Figure 8 provides
2 color P idx L idx R idx Key T1 an example of such an operation.
3 color P idx L idx R idx Key
4 color P idx L idx R idx Key T3 Node
array
The subscript operator will perform a node insertion
5 color P idx L idx R idx Key T2
6 color P idx L idx R idx Key which could potentially suffer a memory allocation
n - 2 color P idx L idx R idx Key
failure, but it is difficult to capture the error from the code
n - 1 color P idx L idx R idx Key T4 written below without the use of C++ exception.
n color P idx L idx R idx Key
With the prototyped implementation, the “insert” With reference to this, the RandomEvtGenerator will
method may be called and the error checked against its generate a key (0-100) and a value (32 bits) randomly.
return code to detect any memory allocation failure for The EventManager is provided this information. The
the examp le above. EventManager will then locate the EventClient based on
the key passed in, and the data will be stored in
Better memory tracing
EventClient for a certain number of entries (in this case,
As there will be only one memory block associated with 50). The EventManager has an STL map for the
one STL instance, the number of memory blocks used is EventClient instances with an integer key. Likewise, the
significantly reduced compared to the original EventClient employs an STL list to store the integer
implementation. As the block numbers go down, the values passed in.
memory tracing/memory debugging will likely get easier
and be possible with simpler set of tools or even by RandomEvtGenerator
using an in-house memory manager.
+generateEvent():void
Easy memory size estimation
With the prototyped implementation, all the memory that
itsEventManager
is expected to be used by an STL container is reserved
as one memory block within the container itself. The size 1
#include “my_list.h”
} storeValue(value)
}
References
[1] Dov Bulka, David Mayhew, Efficient C++, Addison-
Wesley, 1999
[2] Robert Sedgewick, Algorithms in C++, Addison-
Wesley, 1998
[3] Stanley B. Lippman, Josee Lajoie, C++ Primer,
Addison-Wesley
[4] WindRiver Systems Staff, Tornodo 2.0 Users
Manual, Wind River Systems