0% found this document useful (0 votes)
96 views

C++ STL For Embedded System PDF

The document discusses using C++ Standard Template Library (STL) for embedded system development. STL provides reusable algorithms and containers but has some drawbacks for embedded systems with limited resources. The paper proposes an "embedded STL" implementation that addresses these issues by modifying the STL code to better fit embedded constraints regarding memory, CPU performance, and tools. It also shares results comparing the embedded STL to the original STL.

Uploaded by

ssinfod
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views

C++ STL For Embedded System PDF

The document discusses using C++ Standard Template Library (STL) for embedded system development. STL provides reusable algorithms and containers but has some drawbacks for embedded systems with limited resources. The paper proposes an "embedded STL" implementation that addresses these issues by modifying the STL code to better fit embedded constraints regarding memory, CPU performance, and tools. It also shares results comparing the embedded STL to the original STL.

Uploaded by

ssinfod
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

2004 Motorola S3 Symposium July 12-15, 2004

C++ STL for Embedded System

Kazuhiro Ondo
VSX Development/CDMA System Division/GTSS
IL75/[email protected]

ABSTRACT
C++ Standard Template Library (STL) is a convenient set of reusable algorithms and containers available with most of
C++ distributions. Especially its templated containers (such as “list”, “map” etc.) help improve the software
development productivity in a high degree.
Despite its advantages, there are some pitfalls that STL may cause when it is used for embedded system development
where typically there are restrictions and limitations on the capacity, the performance and the development tool set.
The paper will cover those potential issues of STL with combination of the embedded systems, and present one flavor of
STL implementation which is intended to make up for those disadvantages. The paper will also share the quantified
result of the comparison between the prototyped “embedded STL” and the original STL implementation.

Introduction in the system or there is an expert who is intimately


familiar with and can understand/handle all the issues
C++ Standard Template Library (STL) is a collection of
around STL. However, there are alternate choices to
generic algorithms and abstract data containers
modify the STL code so that it fits in the environment
described by using C++’s templates. STL supports most
despite its constraints.
algorithms and data containers typically used in any
type of application. Adopting STL into software STL data Containers
development will maximize the work productivity due to
STL data containers are the most popular components.
software re-use.
For instance, “list”, “map”, “vector” and “set” (and often
As the use of the C++ language becomes popular in the “multi map”, “multi set”) are frequently used in C++
development of embedded systems, the use of STL is based systems utilizing STL. The data is represented as
one of the many choices to maximize code re-use, being “template type” in the containers, and variety of
that STL is available in many standard C++ distributions. container manipulation functionalities (insert, erase, size
On the other hand, the concept of STL (or the concept of of the container, look up and etc.) are provided. It must
C++ templates themselves) does not necessarily align be noted that most applications can easily adopt STL
with the development environment for embedded without developing a linked list or a hash map for a
systems where there are constraints on memory, CPU certain data type. Figure 1 is a code example with an STL
performance and the development tool set. It is important Data Container (map). With help of STL, there is no need
for the embedded systems developer to understand how to create another hashing mechanism for the type of
STL works and whether the system being designed has “Class A*” (or void pointer).
enough capacity and a sufficient performance budget for
#include <map>
the use of STL. #include “ClassA.h”

int main(){
Due to the nature of the C++ templates, the use of STL map<int, ClassA*> hash_a;
can result in a bloat of the code size. Its data containers’ hash_a[100] = new ClassA;
hash_a[3200] = new ClassA;
memory usage scheme assumes that plenty of system
map<int, ClassA>:: iterator iter;
resources are always available, so it is not best utilized
if ((iter = hash_a.find(2000)) == hash_a.end()){
for embedded systems . STL provides several cout << “The item not found at index of
2000”
conveniences during the coding phase, but can cause << endl;
};
significant problems during the system debug phase if
// program continues
there are no sophisticated memory/performance analysis
tools available. Figure 1 : Example – hash map with STL map
The use of STL is not recommended for embedded Underneath the STL data containers, the list and hash
systems development unless there is sufficient capacity data structures are implemented using ordinary

Motorola Internal Use Only Page 1 of 7


2004 Motorola S3 Symposium July 12-15, 2004

techniques such as “doubly linked list”, “red-black tree” applied to an embedded systems development
etc. – however they have been made available as C++ environment that has memory and performance
templates to accept generic data types. restrictions. The following is the list of potential
problematic areas:
Figure 2 and Figure 3 show the details of how STL lists
and STL maps are constructed internally. Code Size
As many C++ compilers automatically instantiate the
C++ template code in-lined within the object, this same
list<T
>
operational code may possibly be repeated in multiple
header places if the template code with the same type is used in
multiple locations. While this is more of a C++ template
Head Node
implementation characteristic, the frequent use of STL by
an application (from a convenience standpoint) could
header header header header
easily lead to code size bloat.
T0 T1 T2 T3
Memory Overhead
Node 0 Node 1 Node 2 Node 3
As shown in Figure 2 and Figure 3, each node used in
Figure 2 : Node association - STL List STL data containers typically requires header
information to support the doubly linked list or tree
map<Key,T algorithm. In case of the “list” (which utilizes the doubly
Parent
- num of Nodes linked list), there is a pointer to the next node and
header left or right
another pointer to the previous node – all of which costs
8 bytes on a system with a 32-bit CPU. Similarly, the
“map” requires 16 bytes per node as it utilize the red-
Head Node
black tree algorithm (“node-color”, “parent node ptr”,
header
“left node ptr” and “right node ptr”). If STL data
Key1
containers are used to store the data that is for instance,
T1 4 bytes long, the overhead introduced by the node
Node 1
header header header is very significant compared to the data carried
Key0 Key3 on the node. If the application wants to store huge
T0 T3 number of blocks (such as 1000-10000+), a significant
Node 3 amount of memory would be consumed by the STL node
Node 0
header header
overhead.
Key2 Key4
Number of dynamic memory allocation
T2 T4
One of the problematic areas in embedded system is the
Node 2 Node 4
uncertainty in terms of performance during dynamic
Figure 3 : Node association – STL map memory allocation. Many embedded systems developers
avoid the dynamic memory allocation as much as
In the typical implementation, the list or the map itself possible by using the statically allocated memory. Often
has a reference to the Head Node, which does not carry times, teams prefer to develop and use their own memory
the data itself - instead, it provides a reference to the management scheme rather than the one provided by the
beginning (or the end) of the data node. The rest of data OS so as to minimize the performance degradation of
nodes are linked by the doubly-linked list or the red- using dynamic memory allocation. STL data containers
black tree starting from the Head Node. Each node is a rely heavily on the dynamic memory allocation by its
separate memory chunk, and the node is expected to be design. Whenever a container needs to be manipulated,
retrieved dynamically during insertion or deletion from a memory chunk will be dynamically allocated (or de-
STL container. allocated) for the use of the data node. STL uses the
Downside of using STL in embedded OS’s default memory call (i.e. “malloc” and “free”) to
allocate memory – this is not usually recommended due
environments to the uncertainty in performance and memory
As seen previously, the implementation of STL data fragmentation that may be encountered.[1] Instead,
containers seems to be highly scalable and flexible. But custom memory allocation APIs can be specified through
these benefits might become disadvantages if they are STL declarations. However, even with this modification,

Motorola Internal Use Only Page 2 of 7


2004 Motorola S3 Symposium July 12-15, 2004

a large number of memory calls need to go through the Memory Tracing/Debug Capability
custom memory manager for each node addition or Unlike Windows or Unix, many embedded systems still
deletion. do not have a good tool set or a good debugging
Error Handling environment to perform memory related traces. Even if a
good tool set is available, those tools might not work
STL relies on C++’s exception mechanism to handle
with the target system due to certain resource
errors that occur during memory allocation. While this
restrictions.
works fine for normal C++ programs, it presents a
problem when it comes down to embedded systems. In many cases, the embedded systems developers
Specifically, the exceptions may (intentionally) have implement their own memory managers to compensate
been turned “off” in the compiler to minimize code size, for the lack of a tool set. Because memory allocation is
or due to design decisions (being that the exceptions do hidden in the STL implementation, it is difficult to
not provide sufficiently flexible error handling in the identify the ultimate client of a given memory block. (See
embedded application code). Once again, in such cases, Figure 5) Also, if the number of STL nodes being used
the use of vanilla STL would cause problems . becomes larger, it typically hits the limitation of the
memory manager due to the excessive number of
Figure 4 is a snap shot from the “list” implementation to
dynamic memory blocks to track.
create a brand new STL node. The null pointer check is
not performed assuming the exception is thrown before Memory Manager: 20 byte
the pointer is accessed. 0x45000
0x45000: Allocated from
list<int>::mem_allocator

link_type get_node() {
return list_node_allocator::allocate(); ?
} ?
link_type create_node(const T& x) {
link_type p = get_node();
__STL_TRY { list<int> list<int>
construct(&p->data, x);
} <<usage>> <<usage>>
__STL_UNWIND(put_node(p));
return p; Class A Class B
}

Figure 4 : Error handling for the node allocation


Figure 5 : Difficulty to trace a memory block
In order to avoid any unpleasant surprises, an engineer
using STL needs to be aware of its error handling Problem solution
scheme before deciding to adopt it into his/her code.
The problem of code size bloat can be mitigated or even
Memory Estimation resolved completely by changing a compiler flag used to
If the amount of the available memory on the target determine C++’s template code instantiation policy.
system is restricted, it is critical to estimate the memory (Note however, that some compilers may not support
usage of each application. this capability.) This change helps in that the C++
template code with a specific data type is instantiated
The best (and simplest) way to estimate the memory only in one place instead of being in-lined and repeated
usage is just to compute the “size” of an application in the client object code for each occurrence.
class, and multiply this size by the number of object
instances instantiated during run time. However, if the On the other hand, other STL oriented problems are
application classes use STL, a number of memory blocks difficult to address unless there is a person who is very
would be used silently during execution as shown in familiar with those issues in the development. It might be
Figure 2 and Figure 3. It is important for the developer to better choice to not adopt the STL at all in the embedded
know the maximum number of nodes to be used for each system project with a capacity constraint environment.
employed STL type in the code. More so, the amount of But it would be also nice if those STL resources could be
memory overhead depends on the type of the data re-used in the project because of its strong re-usability,
container as seen before. It is therefore difficult to flexibility, and well-proven implementation.
foresee the amount of memory used up until the Since STL code is typically provided as a set of header
application is actually loaded on the system. (Or of files containing all the C++ template implementations and
course, sometimes even after the application is loaded on its license is the free public license, improving the STL
the system, given that memory usage depends on the code by customization is another choice to address the
board/system’s debugging capabilities.) issues.

Motorola Internal Use Only Page 3 of 7


2004 Motorola S3 Symposium July 12-15, 2004

Prototyping Embedded STL - An example design. With this , only one memory block is allocated for
one STL instance. And there is no dynamic memory
The STL implementation prototyped in this section is
allocation when manipulating the data in the STL
just one flavor of the possible extensions from the
container. The number of elements to be used in the STL
existing STL implementation. The design policy
container is specified through the STL container
employed in this custom prototype features:
declaration. Finally, all the nodes are pre-allocated and
- Re-use of the existing algorithm. reserved as empty nodes.
- Minimizing number of memory allocations.
When it is needed, a new node will be picked up from un-
- Error handling without relying on the
used nodes in pre-allocated node list in the STL
exceptions.
instance. And the node will be marked as un-used once it
- Reduced number of memory blocks used.
is deleted from STL linked list or tree.
While achieving these design goals, some sacrifices
Re-use of the algorithm
must be made to the flexibility of STL. These are:
From the logical perspective, the prototyped STL
- Limited growth of STL containers. implementations are equivalent to those in Figure 2 and
- Inhibition of some interfaces which rely on Figure 3 (i.e. use of Head Node and that all the nodes are
exception error handling. linked by “doubly linked lists” or “red-black tree”).
Figure 6 and Figure 7 are the implementation diagrams of Obviously, the intention was to try to re-use the existing,
the prototyped STL code. (proven) logic without modification. The prototyped
implementation was only modified where STL nodes
list<T, n> were allocated.
- num of in-used Nodes
- Fee Node Idx (start, end) Most of the operations were also kept compatible with
0 (Head Node) prev idx next idx
the ordinary implementation except for some cases
1 prev idx next idx T1 described later.
2 prev idx next idx
3 prev idx next idx T0 Reduction of the memory overhead
4 prev idx next idx
prev idx next idx
Node array Instead of using the pointer to link one node with the
5 T2
6 prev idx next idx T4 other, the index to the array element is used. If the
number of elements used by the STL consumes less than
n - 2 prev idx next idx T5 64K of memory, the size of the index can be 2 bytes
n - 1 prev idx next idx
n prev idx next idx T3 (compared to 4 bytes on the pointer based approach). In
the prototyped implementation, the overhead was
reduced by 50% for both lists and maps.
Figure 6 : Image of Embedded STL list
implementation Improved error handling, Limited operations
The error handling of a node allocation failure was
map<Key, T, n> enhanced so that it does not rely on the C++ exception
- num of in-used Nodes
- Fee Node Idx (start, end) handler. In addition to this, STL operations where it is
0 (Head Node) P idx L idx R idx Key difficult to capture the failure reason without C++
1 color P idx L idx R idx Key T0 exceptions, were hidden for public use. Figure 8 provides
2 color P idx L idx R idx Key T1 an example of such an operation.
3 color P idx L idx R idx Key
4 color P idx L idx R idx Key T3 Node
array
The subscript operator will perform a node insertion
5 color P idx L idx R idx Key T2
6 color P idx L idx R idx Key which could potentially suffer a memory allocation
n - 2 color P idx L idx R idx Key
failure, but it is difficult to capture the error from the code
n - 1 color P idx L idx R idx Key T4 written below without the use of C++ exception.
n color P idx L idx R idx Key

T& map::operator[](const key_type&


Figure 7 : Image of Embedded STL map k)
implementation

Reduction of the number of memory chunk used map<int, void*> mymap;


All the nodes used by an STL data container are kept as
an array in the STL instance itself in the prototyped Figure 8 : Example - subscript operator

Motorola Internal Use Only Page 4 of 7


2004 Motorola S3 Symposium July 12-15, 2004

With the prototyped implementation, the “insert” With reference to this, the RandomEvtGenerator will
method may be called and the error checked against its generate a key (0-100) and a value (32 bits) randomly.
return code to detect any memory allocation failure for The EventManager is provided this information. The
the examp le above. EventManager will then locate the EventClient based on
the key passed in, and the data will be stored in
Better memory tracing
EventClient for a certain number of entries (in this case,
As there will be only one memory block associated with 50). The EventManager has an STL map for the
one STL instance, the number of memory blocks used is EventClient instances with an integer key. Likewise, the
significantly reduced compared to the original EventClient employs an STL list to store the integer
implementation. As the block numbers go down, the values passed in.
memory tracing/memory debugging will likely get easier
and be possible with simpler set of tools or even by RandomEvtGenerator
using an in-house memory manager.
+generateEvent():void
Easy memory size estimation
With the prototyped implementation, all the memory that
itsEventManager
is expected to be used by an STL container is reserved
as one memory block within the container itself. The size 1

can be determined at compilation time, and the developer EventManager EventClient


-valueList : list<int>
can easily determine the size of memory to be used -clientMap : map<int, EventClient

without any detailed knowledge of the STL node


+notifyEvent(int key,int value):void +storeValue(int value):void
structure. And of course, there will be no “s ilent” <<Usage>> +isCapacityOk():bool

memory allocations that happen inside of STL that the


developer needs to be concerned about.
Figure 10 : OMD for benchmark program
Limited flexibility
Figure 11 and Figure 12 illustrate the sequence chart of
The maximum number of elements (in the prototype) was
the prototyped application. If the EventClient is not
limited whereas the original implementation supported
found on a particular key, an EventClient is instantiated
unlimited number (memory permitting). Also the STL
and data is stored until the EventClient reaches capacity.
user would need to know or estimate the number of
Once the capacity is reached on the EventClient, the
elements required (for each of the STL containers used)
EventClient is destroyed and the hash-map entry in
at the time of its declaration. Insufficient memory
EventManager is cleared on the key.
reservation will be signaled via the method call on the
STL. The developer also needs to ensure that sufficient RandomEvt Event EventClient
error handling is performed on all the cases to detect the Generator Manager

need for more memory, if the situation arose. generateEvent()

#include “my_list.h”

int main(){ notifyEvent(key, value)

list<int, 4> myList; # 4 elements allowed findClient(key)

for (int i=0; i<10; i++) {

if (myList.push_back(i) == false) { client not found


# push_back will fail on i=4
cout << “error while adding the data” << endl; Constructor
break;

} storeValue(value)
}

// program continues storeClient(key, client)

Figure 9 : Sample Code with Embedded STL

A Benchmark of the Embedded STL Figure 11 : Benchmark - Sequence Diagram #1

A benchmarking program was created to evaluate the


prototyped embedded STL compared to the ordinary
STL implementation. The program was modeled to
simulate the typical event driven application (Figure 10).

Motorola Internal Use Only Page 5 of 7


2004 Motorola S3 Symposium July 12-15, 2004

RandomEvt EventManager EventClient


allocations/de-allocations decreased, the CPU time used
Generator consumed by the program was also significantly
generateEvent() reduced.
notifyEvent(key, value) The maximum memory usage didn’t improve as much
findClient(key) compared to other categories. This is because the event
was not equally distributed across all the event clients.
client found
Of course, one could compare this result with the
isCapacityOk() theoretical maximum (instead of the observed value), and
conclude that the Embedded STL did offer a good
capacity full
improvement.
removeClient(key)
One notable point is the actual memory usage and the
actual memory blocks numbers match exactly with
theoretical numbers. This will help embedded software
developers to estimate the memory usage of their
Figure 12 : Benchmark – Sequence Diagram #2
applications accurately.
A total of 100K events were generated by the benchmark
The maximum number of memory blocks used was kept
program. The following items were measured between
low in the Embedded STL implementation. This will ease
the ordinary STL implementation and the (enhanced)
memory related debug activities when problems are
embedded STL implementation.
found.
- The maximu m memory usage.
In general, the result shows that Embedded STL has
- The maximum number of memory blocks used.
significant benefits on the performance, memory tracing
- The number of memory allocations, de-allocations.
and memory estimation due to less number of memory
- Total CPU time required for the entire operation.
blocks used. If the application uses a statis tical memory
The benchmark was run on a target board with the usage model, a disadvantage of the proposed scheme is
following configuration. that memory may be wasted if the maximum size is not
used due to memory pre-allocation. (For instance, in
- CPU: Motorola PPC 7410
such a case, if a class uses a max of 3 elements, but it is
- OS: VxWorks 5.4
known that only 50 elements are used total amongst 100
- Ordinary STL: Packaged with Tornado 2.0
instances – this would be highly wasteful.)
development environment (SGI implementaion).
- Compiler: Tornado 2.0 PPC cross-compiler (GCC 2.7.2 Conclusion
based.)
The prototyped STL code will help the creation of an
Ordinary Embedded application more suitable to the embedded systems
STL STL development environment with the re-use of STL
Max memory used (bytes) 48964 43632 resources. While it may not cover all the potential STL
Theoretical Max memory issues on some applications, yet significant advantages
64852 43632 were observed in certain areas. It is therefore important
required
Max number of memory blocks at a minimum, to understand the prototyped Embedded
4047 102 STL and to know if the given application model would
used
Theoretical Max memory bocks will fit with its policy (and therefore possibly benefit from
5303 102 it).
required
Number of memory allocations, Noteworthy is the fact, that the prototyped Embedded
208236 4016
de-allocations STL is very strong on its performance and maintenance
Total CPU Time (us) 611424 110262 capability even though there are some limitations on its
flexibility. Given this, while it would be difficult to solve
Table 1 : Benchmark Result
all the issues for the original STL at one time, it might be
The result proved the advantages provided by the a good idea to have different flavors of embedded STL
Embedded STL implementation on each category. implementations, such as “Flexibility Focused” and
Especially, the number of memory blocks used and the “Memory Usage Focused”.
number of memory allocations/de-allocations were
The use of STL with the ordinary implementation is not
significantly reduced. As the number of memory

Motorola Internal Use Only Page 6 of 7


2004 Motorola S3 Symposium July 12-15, 2004

really recommended on embedded systems due to


environment’s resource constraints. However, these
limitations can be managed to produce quality software
by following the guidelines outlined in this paper.

References
[1] Dov Bulka, David Mayhew, Efficient C++, Addison-
Wesley, 1999
[2] Robert Sedgewick, Algorithms in C++, Addison-
Wesley, 1998
[3] Stanley B. Lippman, Josee Lajoie, C++ Primer,
Addison-Wesley
[4] WindRiver Systems Staff, Tornodo 2.0 Users
Manual, Wind River Systems

Motorola Internal Use Only Page 7 of 7

You might also like