Software Development Kit 2.1 Programmer's Guide 2.1: Cell Broadband Engine
Software Development Kit 2.1 Programmer's Guide 2.1: Cell Broadband Engine
SC33-8325-01
Cell Broadband Engine
SC33-8325-01
Note: Before using this information and the product it supports, read the general information in Appendix A, “Notices,” on page
57.
iv Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Preface
The Software Development Kit 2.1 (SDK) for the Cell Broadband Engine™
(Cell/B.E.™) is a complete package of tools to enable you to program applications
for the Cell/B.E. processor. The SDK is composed of development tool chains,
software libraries and sample source files, a system simulator, and a Linux® kernel,
all of which fully support the capabilities of the Cell/B.E..
Supported platforms
Cell/B.E. applications can be developed on the following platforms:
v x86
v x86-64
v 64-bit PowerPC® (PPC64)
v BladeCenter QS20
Supported languages
The supported languages are:
v C/C++
v Assembler
Note: Although C++ is supported, take care when you write code for the
Synergistic Processing Units (SPUs) because many of the C++ libraries are
too large for the memory available.
Getting support
The SDK is supported through the Cell/B.E. architecture forum on the
developerWorks® Web site at
http://www.ibm.com/developerworks/power/cell/
There is also support for the Full-System Simulator and XL C/C++ Compiler
through their individual alphaWorks® forums. If in doubt, start with the Cell/B.E.
architecture forum.
vi Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
GDB is supported through many different forums on the Web, but primarily at the
GDB Web site
http://www.gnu.org/software/gdb/gdb.html
This version (2.1) of the Cell/B.E. SDK supersedes all previous versions of the
SDK.
Related documentation
For a full list of documentation available on the SDK 2.1 ISO image, see Appendix
B. Related documentation.
Preface vii
viii Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Chapter 1. SDK overview
This section describes the contents of the SDK, where it is installed on the system,
and how the various components work together.
The GCC compiler also contains a separate SPE cross-compiler that supports the
standards defined in the following documents:
v C/C++ Language Extensions for Cell BE Architecture V2.4. The GCC compiler
shipped in SDK 2.1 supports all language extension described in the
specification except for the following:
– The GCC compilers currently do not support alignment of stack variables
greater than 16 bytes as described in section 1.3.1.
– The GCC compilers currently do not support the optional alternate vector
literal format specified in section 1.4.6.
– The GCC compilers currently support mapping between SPU and VMX
intrinsics as defined in section 5 only in C++ code.
– The PPU GCC compiler does not support the PPU VMX intrinsics vec_extract,
vec_insert, vec_promote, and vec_splats as defined in section 7. (The other
eight intrinsics in that section are supported.)
– The recommended vector printf format controls as specified in section 8.1.1
are not supported. The SPU GCC compiler and library does not fully conform
to the behavior of floating-point operators and standard library functions as
documented in section 9.3.
– The GCC compilers support operator overloading for vector data types as
described in section 10 only for the following set of operators: unary +, -, ~;
binary +, +=, -, -=,*, *=, /, /=, &, &=, |, |=, ^, ^=.
v Application Binary Interface (ABI) Specification V1.7
v SPU Instruction Set Architecture V1.2
The associated assembler and linker additionally support the SPU Assembly
Language Specification V1.5. The assembler and linker are common to both the
GCC and XL C/C++ compilers. GDB support is provided for both PPU and SPU
debugging, and the debugger client can be in the same process or a remote
process. GDB also supports combined (PPU and SPU) debugging.
On a non-PPC system, the install directory for the GNU tool chain is /opt/cell.
There is a single bin subdirectory, which contains both PPU and SPU tools.
On a PPC64 or BladeCenter QS20, both tool chains are installed into /usr. See
“System root directories” on page 15 for further information.
IBM XL C/C++ supports the revised 2003 International C++ Standard ISO/IEC
14882:2003(E), Programming Languages -- C++ and the ISO/IEC 9899:1999,
Programming Languages -- C standard, also known as C99. The compiler also
supports the C89 Standard and K & R style of programming, as well as language
extensions for vector programming and language extensions for SPU
programming. In addition, the compiler supports numerous GCC C and C++
extensions to help users port their applications from GCC.
The XL C/C++ compiler provided in SDK 2.1 supports the languages extensions as
specified by the C/C++ Language Extensions for Cell BE Architecture V2.4
specification except:
v Alignment of greater than 16 for automatic variables as described in section 1.3.1
is not currently supported.
v Most, but not all, GCC inline assembly capabilities are supported as described in
section 1.7.
v Operator overloading for vector data, as described in section 10, is not currently
supported.
v The PPU VMX instrinsics vec_extract, vec_insert, vec_promote, and vec_splats
specified in section 7 are not currently supported.
The compiler invocation commands for the PPU perform all necessary steps to
compile C source files by ppuxlc (C++ source using ppuxlc++) into .o files and to
link the object files and libraries by ppu-ld into an executable program.
The compiler invocation command for the SPU performs all necessary steps to
compile C source files by spuxlc (C++ source using spuxlc++) into .s files,
assembling .s files into .o files by spu-as, and linking the object files and libraries
into an executable program by spu-ld. The ppu-embedspu tool that is part of the
GNU tool chain is used to link PPU object files and a SPU executable program into
a single PPU executable program.
2 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
v -O0: almost no optimization
v -O2: strong, low-level optimization that benefits most programs
v -O3: intense, low-level optimization analysis with basic loop optimization
v -O4: all of -O3 and detailed loop analysis and good whole-program analysis at
link time
v -O5: all of -O4 and detailed whole-program analysis at link time.
The simulator for SDK 2.1 provides additional support for performance
simulation. This is described in the IBM Full-System Simulator Users Guide.
The system root image for the simulator must be located either in the current
directory when you start the simulator or the default /opt/ibm/systemsim-cell/
images/cell directory. The cellsdk script automatically puts the system root image
into the default directory.
You can mount the system root image to see what it contains. Assuming a mount
point of /mnt/cell-sdk-sysroot, which is the mount point used by the cellsdk
script, the command to mount the system root image is:
mount -o loop /opt/ibm/systemsim-cell/images/cell/sysroot_disk /mnt/cell-sdk-sysroot/
Do not attempt to mount the image on the host system while the simulator is
running. You should always unmount the system root image before you start the
simulator. You should not mount the system root image to the same point as the
root on the host server because the system can become corrupted and fail to boot.
You can change files on the system root image disk in the following ways:
v Mount it as described above. Then change directory (cd) to the mount point
directory or below and modify the file using host system tools, such as vi or cp.
However, do not attempt to use the rpm utility on an x86 platform to install
packages to the sysroot disk, because the rpm database formats are not
compatible between the x86 and PPC platforms.
v Use the ./cellsdk synch command to synchronize the system root image with
the /opt/ibm/cell-sdk/prototype/sysroot directory for libraries and samples
(see “System root directories” on page 15) that have been cross-compiled and
linked on a host system and need to be copied to the target system.
4 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
v Use the callthru mechanism (see “The callthru utility” on page 17) to source or
sink the host system file when the simulator is running. This is the only method
that can be used while the simulator is running.
The source is distributed under the GPL license and the system root image is
available on the BSC Web site
http://www.bsc.es/projects/deepcomputing/linuxoncell
Linux kernel
A number of patches have been made to the Linux 2.6.18 kernel to provide the
services that are required to support the hardware facilities of the Cell/B.E.
processor.
For the BladeCenter QS20, the kernel is installed into the /boot directory,
yaboot.conf is modified and a reboot is required to activate this kernel. The
cellsdk install task (see SDK 2.1 Installation Guide) provides an option,
--nokernel, not to install this kernel.
Note: The cellsdk uninstall command does not automatically uninstall the
kernel to avoid leaving the system in an unusable state.
The kernel image for the simulator must be located in either the current
directory when you start the simulator or the default /opt/ibm/systemsim-
cell/images/cell directory. The cellsdk script automatically puts the kernel
image into the default directory.
The patches for the 2.6.18 kernel are distributed under the GPL license and are
available on the BSC Web site
http://www.bsc.es/projects/deepcomputing/linuxoncell
Cell/B.E. libraries
The libraries listed here have been function tested and are considered ready for
Cell/B.E. applications development. Any problems with these libraries should be
reported in:
http://www.alphaworks.ibm.com/tech/cellsw/forum
For the BladeCenter QS20, the SDK installs the libspe headers, libraries, and
binaries into the /usr directory and the standalone SPE executive, elfspe, is
registered with the kernel during boot by commands added to /etc/rc.d/init.d
using the binfmt_misc facility.
For the simulator, the libspe and elfspe binaries and libraries are preinstalled in
the same directories in the system root image and no further action is required at
install time.
The source for the SPE runtime management library is distributed under the GPL
license and available on the BSC Web site
http://www.bsc.es/projects/deepcomputing/linuxoncell
The SIMD math library provides short vector versions of a subset of the traditional
math functions. The MASS library provides long vector versions. These vector
versions conform as closely as possible to the specifications set out by the scalar
standards. However, fundamental differences between scalar architectures and the
Cell/B.E. Architecture require some deviations, including the handling of
rounding, error conditions, floating-point exceptions, and special operands, such as
NaN and infinities.
The SIMD math library is provided by the SDK as both a linkable library archive
and as a set of inlinable headers. Names of the SIMD math functions are
differentiated from their scalar counterparts by a vector type suffix that is
appended to the standard scalar function name. For example, the SIMD version of
fabsf(), which acts on a vector float, is called fabsf4(). Similarly, a SIMD version
of a standard scalar function that acts on a vector double has d2 appended to the
name, for example, fabsd2(). Inlinable versions of functions are prefixed with the
character “_” (underscore), so the inlinable version of fabsf4() is called _fabsf4().
Both versions require the inclusion of the primary header file, simdmath.h, and
linking against the libsimdmath.a archive. Additionally, the inlinable versions
require inclusion of a distinct header file for each function used. For example, to
use the inlinable function _fabsf4(), the fabsf4.h header file needs to be included
in addition to the simdmath.h header file.
The linkable library archive is more convenient, requiring the inclusion of a single
header file, but produces slower, larger binaries due to limitations of the linker and
the required branching inherent to function calls. The inlinable headers require
distinct header files to be included for each math function used, but produces
faster, smaller binaries because the compiler is able to reduce branching and often
6 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
achieves better dual-issue rates and optimization. In general, most developers
should use the inlinable versions whenever possible.
For the PPU, the SIMD math library header file simdmath.h is located in the
/usr/include directory, with the inlinable headers located in the
/usr/include/simdmath directory, and the library archive libsimdmath.a located in
the /usr/lib directory.
For the SPU, the SIMD math library header file simdmath.h is located in the
/usr/spu/include directory, with the inlinable headers located in the
/usr/spu/include/simdmath directory, and the library archive libsimdmath.a
located in the /usr/spu/lib directory.
For more information about the SIMD math library, refer to SIMD Math Library
Specification for Cell Broadband Engine Architecture.
Note: Some of the functions documented in the specification are not yet available.
The source code and the man pages document the functions that are
currently supported.
These libraries:
v Include both scalar and vector functions
v Are thread-safe
v Support both 32- and 64-bit compilations
v Offer improved performance over the corresponding standard system library
routines
v Are intended for use in applications where slight differences in accuracy or
handling of exceptional values can be tolerated
You can find information about using these libraries on the MASS Web site:
http://www.ibm.com/software/awdtools/mass
The MASS and MASS/V libraries are distributed under a modified ILAR license.
Prototype code
The function in these packages represents prototype or sample code that you can
use for experimentation. This code may change in future releases of the SDK.
8 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Table 1. Subdirectories for the libraries and samples RPM (continued)
Subdirectory Description
src/samples The samples directory contains examples of Cell/B.E. programming techniques. Each program
shows a particular technique, or set of related techniques, in detail. You may review these
programs when you want to perform a specific task, such as double-buffered DMA transfers to
and from a program, performing local operations on an SPU, or provide access to main memory
objects to SPU programs.
Some subdirectories contain multiple programs. The sync subdirectory has examples of various
synchronization techniques, including mutex operations and atomic operations.
The spulet model is intended to encourage testing and refinement of programs that need to be
ported to the SPUs; it also provides an easy way to build filters that take advantage of the huge
computational capacity of the SPUs, while reading and writing standard input and output.
The IDL tool has its own samples which show how to offload some processing to the SPU. The
IDL’s PPU stub code supports dynamic allocation of multiple SPUs to handle simultaneous
offloaded functions, and multiple functions can be loaded on a single SPU, if they are small
enough. Some features are still under development, such as double buffering support.
src/workloads The workloads directory provides a handful of examples that can be used to better understand
the performance characteristics of the Cell/B.E. processor. There are four sample programs,
which contain insights into how real-world code should run.
Note: Running these examples using the simulator takes much longer than on the native
Cell/B.E.-based hardware. The performance characteristics in wall-clock time using the
simulator are extremely inaccurate, especially when running on multiple SPUs. You need to
examine the emulator CPU cycle counts instead.
For example, the matrix_mul program lets you perform matrix multiplications on one or more
SPUs. Matrix multiplication is a good example of a function which the SPUs can accelerate
dramatically.
Unlike some of the other sample programs, these examples have been tuned to get the best
performance. This makes them harder to read and understand, but it gives an idea for the type
of performance code that you can write for the Cell/B.E. processor.
sysroot Contains some of the headers and libraries used during cross-compiling and contains the
compiled results of the libraries and samples. This can be synched up with the system root
image by using the command: /opt/ibm/cell-sdk/prototype/cellsdk synch
src/benchmarks The benchmarks directory contains sample benchmarks for various operations that are
commonly performed in Cell/B.E. applications. The intent of these benchmarks is to guide you
in the design, development, and performance analysis of applications for systems based on the
Cell/B.E. processor. The benchmarks are provided in source form to allow you to understand in
detail the actual operations that are performed in the benchmark. This also provides you with a
basis for creating your own benchmark codes to characterize performance for operations that
are not currently covered in the provided set of benchmarks.
ALF considers a natural division of labor among the two types of processing
elements in a hybrid system: the host element and the accelerator element. Within
ALF, two different types of tasks are defined in a typical parallel program: the
control task and the compute task. These tasks are assigned to the different
processing elements in the hybrid system. The control task typically resides on the
host element, while the compute task typically resides on the accelerator element.
This division of labor enables programmers to specialize in different parts of a
given parallel workload.
ALF defines three different types of work that can be assigned to three different
types of programmers. At the highest level, application developers only program at
the host level. Application programmers can use the provided accelerated libraries
without understanding the inner workings of the hybrid system. The second type
of programmers is accelerated library developers. Using the provided ALF APIs,
the library developers provide the library wrappers to invoke the computational
kernels on the accelerators. Library developers are responsible for breaking the
problem into the control process running on the host and the compute kernel
running on the accelerators. Library developers then partition the input and output
into work blocks which ALF can schedule to run on different accelerators. At the
accelerator level, the computational kernel developers write optimized accelerator
code. The ALF API provides a common interface for the compute task to be
invoked automatically by the framework.
The SPU timing tool is distributed as an RPM under the IBM ILAR license and is
located in the /opt/ibm/cell-sdk/prototype/bin directory.
10 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Feedback Directed Program Restructuring (FDPR-Pro)
The FDPR-Pro is a performance-tuning utility that reduces the runtime of
user-level application programs. The tool optimizes the executable image of a
program by collecting information about the program's behavior under a typical
workload, and creating a new version of the program that is optimized for that
workload. The new program generated by the post-link optimizer typically runs
faster than the original program. The FDPR-Pro utility is distributed as an RPM
under the IBM ILAR.
The PPE profile is collected in .nprof, and the SPE profile in .mprof files.
3. Optimize the program:
$ fdprpro -a opt myprog [optimization options...] -f myprog.nprof
OProfile
OProfile is a tool for profiling user and kernel level code. It uses the hardware
performance counters to sample the program counter every N events. You specify
the value of N as part of the event specification. The system enforces a minimum
value on N to ensure the system does not get completely swamped trying to
capture a profile.
Make sure you select a large enough value of N to ensure the overhead of
collecting the profile is not excessively high. The opreport tool produces the output
report. Reports can be generated based on the file names that correspond to the
samples, symbol names or annotated source code listings. The basic use of OProfile
and the postprocessing tool is described in the user manual available at
http://oprofile.sourceforge.net/doc/
The current SDK 2.1 version of OProfile for Cell/B.E. supports profiling on the
POWER™ processor events and SPU cycle profiling. These events include cycles as
well as the various processor, cache and memory events. It is possible to profile on
up to four events simultaneously on the Cell/B.E. system. There are restrictions on
which of the PPU events can be measured simultaneously. (The tool now verifies
that multiple events specified can be profiled simultaneously. In the previous
release it was up to the user to verify that.). When using SPU cycle profiling,
events must be within the same group due to restrictions in the underlying
There is one set of performance counters for each node that are shared between the
two CPUs on the node. For a given profile period, only half of the time is spent
collecting data for the even CPUs and half of the time for the odd CPUs. You may
need to allow more time to collect the profile data across all CPUs.
Notes:
1. Before you issue an opcontrol --start, you should issue the following
command:
opcontrol --start-daemon
2. To produce a report with Linux kernel symbol information you should install
the corresponding Kernel debuginfo RPM which is available from the BSC Web
site.
With –separate=CPU, the image and corresponding symbols can be displayed for
each SPU. The user can use the opreport –merge command to create a single report
for all SPUs that shows the counts for each symbol in the various embedded SPU
binaries. By default, opreport does not display the app name column when it
reports samples for a single application, such as when it profiles a single SPU
application. For opreport to attribute samples to a binary image, the opcontrol
script defaults to using –separate=lib when profiling SPU applications so that the
image name column is always be displayed in the generated reports.
With SPU profiling, opreport’s --long-filenames option may not print the full path
of the SPU binary image for which samples were collected. Short image names are
used for SPU applications that employ the technique of embedding SPU images in
another file (executable or shared library). The embedded SPU ELF data contains
only the filename and no path information to the SPU binary file being embedded
because this file may not exist or be accessible at runtime. You must have sufficient
knowledge of the application’s build process to be able to correlate the SPU binary
image names found in the report to the application’s source files.
Tip
Compile the application with -g and generate the OProfile report with -g to
facilitate finding the right source file(s) to focus on.
12 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Generally, when the report contains information about a single application,
opreport does not include the report column for the application name. It is
assumed that the performance analyst knows the name of the application being
profiled.
Known restrictions
Currently there are two known issues with the Cell/B.E. OProfile code.
v The first issue is when you use the opreport tool to generate XML output with
details for SPE embedded applications. More specifically, when you use the
command opreport --xml --details. The command is supposed to include the
binary code in the XML output. The binary code is missing.
v The second issue is with the opannotate tool for SPE embedded applications.
The opannotate tool is reporting the samples in the wrong source code file. The
opannotate works fine for PPU applications and where the SPE code is not
embedded.
OProfile is distributed under the GPL license and is available on the BSC Web site
http://www.bsc.es/projects/deepcomputing/linuxoncell
Cell-perf-counter tool
The cell-perf-counter (cpc) tool is used for setting up and using the hardware
performance counters in the Cell/B.E. processor. These counters allow you to see
how many times certain hardware events are occurring, which is useful if you are
analyzing the performance of software running on a Cell/B.E. system. Hardware
events are available from all of the logical units within the Cell/B.E. processor,
including the PPE, SPEs, interface bus, and memory and I/O controllers. Four
32-bit counters, which can also be configured as pairs of 16-bit counters, are
provided in the Cell/B.E. performance monitoring unit (PMU) for counting these
events.
CPC also makes use of the hardware sampling capabilities of the Cell/B.E. PMU.
This feature allows the hardware to collect very precise counter data at
programmable time intervals. The accumulated data can be used to monitor the
changes in performance of the Cell/B.E. system over longer periods of time.
The cpc tool provides a variety of output formats for the counter data. Simple text
output is shown in the terminal session, HTML output is available for viewing in a
Web browser, and XML output can be generated for use by higher-level analysis
tools such as the Visual Performance Analyzer (VPA).
You can find details in the documentation and manual pages included with the
cellperfctr-tools package, which can found in the /usr/share/doc/cellperfctr-
<version>/ directory after you have installed the package.
The IBM Eclipse IDE for Cell/B.E. SDK is available from the SDK ISO image and
is distributed under the IBM ILAR license.
14 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Chapter 2. Programming with the SDK
This section is a short introduction about programming with the SDK. Refer to the
Cell BE Programming Tutorial, the Full-System Simulator User’s Guide, and other
documentation for more details.
The systemsim script found in the simulator’s bin directory launches the simulator
and the –g parameter starts the graphical user interface.
Notes:
1. You must be on a graphical console, or at least have the DISPLAY environment
variable pointed to an X server to run the simulator's graphical user interface
(GUI).
2. If an error message about libtk8.4.so is displayed, you must load the TK
package as described in SDK 2.1 Installation Guide.
16 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Note: To make the simulator run in fast mode, you can click Mode and then Fast.
This forces the simulator to bypass its standard analysis and statistics
collection features. Fast mode is useful if you want to advance the simulator
through setup or initialization functions that are not the focus of analysis,
such as the Linux boot processing. You should disable fast mode when you
reach the point at which you wish to do detailed analysis or debug the
application. You can also select Simple or Cycle mode.
You can use the simulator's GUI to get a better understanding of the Cell/B.E.
architecture. For example, the simulator shows two sets of PPE state. This is
because the PPE processor core is dual-threaded and each thread has its own
registers and context. You can also look at the state of the SPE’s, including the state
of their Memory Flow Controller (MFC).
where:
Parameter Description
-f <filename> specifies an initial run script (TCL file)
-g specifies GUI mode, otherwise the simulator starts in command-line
mode
-n specifies that the simulator should not open a separate console
window
You can find documentation about the simulator including a user’s guide in the
/opt/ibm/systemsim-cell/doc directory.
Redirecting appropriately lets you copy files to and from the host. For example,
when the simulator is running on the host, you could copy a Cell/B.E. application
into /tmp:
cp matrix_mul /tmp
Then, in the console window of the simulated system, you could access it as
follows:
callthru source /tmp/matrix_mul > matrix_mul
chmod +x matrix_mul
./matrix_mul
To specify that you want update the sysroot image file with any changes made in
the simulator session, change the newcow parameter on the mysim bogus disk init
command in .systemsim.tcl to rw (specifying read/write access) and remove the
last two parameters. The following is the changed line from .systemsim.tcl:
mysim bogus disk init 0 $sysrootfile rw
When the simulator is started, it has access to 16 SPEs across two Cell/B.E.
processors.
18 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Specifying the processor architecture
Many of the tools provided in SDK 2.1 support multiple implementations of the
CBEA. These include the Cell/B.E. processor and a future processor. This future
processor is a CBEA-compliant processor with a fully pipelined, enhanced double
precision SPU.
The processor supports five optional instructions to the SPU Instruction Set
Architecture. These include:
v DFCEQ
v DFCGT
v DFCMEQ
v DFCMEQ
v DFCMGT
Detailed documentation for these instructions is provided in version 1.2 (or later)
of the Synergistic Processor Unit Instruction Set Architecture specification. The future
processor also supports improved issue and latency for all double precision
instructions.
The SDK compilers support compilation for either the Cell/B.E. processor or the
future processor.
Table 3. spu-gcc compiler options
Options Description
-march=<cpu type> Generate machine code for the SPU architecture specified by
the CPU type. Supported CPU types are either cell (default)
or celledp, corresponding to the Cell/B.E. processor or
future processor, respectively.
-mtune=<cpu type> Schedule instructions according to the pipeline model of the
specified CPU type. Supported CPU types are either cell
(default) or celledp, corresponding to the Cell/B.E. processor
or future processor, respectively.
The simulator also supports simulation of the future processor. The simulator
installation provides a tcl run script to configure it for such simulation. For
example, the following sequence of commands start the simulator configured for
the future processor with a graphical user interface.
export PATH=$PATH:/opt/ibm/systemsim/bin
systemsim -g -f config_edp_smp.tcl
Almost all of the samples run both within the simulator and on the BladeCenter
QS20. Some samples include SPU-only programs that can be run on the simulator
in standalone mode.
The source code, which is specific to a given Cell/B.E. processor unit type, is in the
corresponding subdirectory within a given sample’s directory:
v ppu for code compiled to run on the PPE
v ppu64 for code specifically compiled for 64-bit ABI on the PPE
v spu for code compiled to run on an SPE
v spu_sim for code compiled to run on an SPE under the system simulator in
standalone environment
The cellsdk script contains a task which automatically switches the compiler, does
a make clean and then a make which rebuilds all of the samples and libraries. The
syntax of this command is:
./cellsdk build [-xlc | -gcc ]
where the –xlc or –x flag selects the XL C/C++ compiler and the –gcc or –g flag
selects the GCC compiler. The default, if unspecified, is to compile the samples
with the GCC compiler.
After you have selected a particular compiler, that same compiler is used for all
future builds, unless it is specifically overwritten by shell environment variables,
SPU_COMPILER, PPU_COMPILER, PPU32_COMPILER, or PPU64_COMPILER.
20 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Building and running a specific program
You do not need to build all the sample code at once, you can build each program
separately. To start from scratch, issue a make clean using the Makefile in
the /opt/ibm/cell-sdk/prototype/src directory or anywhere in the path to a
specific library or sample.
If you have performed a make clean at the top level, you need to rebuild the
include files and libraries first before you compile anything else. To do this run a
make in the src/include and src/lib directories.
The example below shows the steps required to create the executable program
simple which contains SPU code, simple_spu.c, and PPU code, simple.c.
1. Compile and link the SPE executable.
/usr/bin/spu-gcc -g -o simple_spu simple_spu.c
2. Run embedspu to wrap the SPU binary into a CESOF ( CBE Embedded SPE
Object Format) linkable file. This contains additional PPE symbol information.
/usr/bin/ppu32-embedspu simple_spu simple_spu simple_spu-embed.o
3. Compile the PPE side and link it together with the embedded SPU binary
/usr/bin/ppu32-gcc -g -o simple simple.c simple_spu-embed.o -lspe
Notes:
1. This section only highlights 32-bit ABI compilation. To compile for 64-bit, use
ppu-gcc (instead of ppu32-gcc) and remove -m32 from the embedspu invocation.
2. You are strongly advised to use the -g switch as shown in the examples. This
embeds extra debugging information into the code for later use by the GDB
debuggers supplied with the SDK. See Chapter 3, “Debugging Cell/B.E.
applications,” on page 25 for more information.
3. The GCC compiler does not support the optional Altivec style of vector literal
construction using parenthesis ("(" and ")"). The standard C method of array
initialization using curly braces (″{″ and ″}″) should be used.
To configure the Cell/B.E.-based blade server for 20 pages (320 MB), run the
following commands:
mkdir -p /huge
echo 20 > /proc/sys/vm/nr_hugepages
mount -t hugetlbfs nodev /huge
To verify the large memory allocation, run the command cat /proc/meminfo. The
output is similar to:
MemTotal: 1010168 kB
MemFree: 155276 kB
. . .
HugePages_Total: 20
HugePages_Free: 20
Hugepagesize: 16384 kB
For example: to create a personal version of the tutorial sample, do the following:
cp -r /opt/ibm/cell-sdk/prototype/src/samples/tutorial/simple mysimple
export CELL_TOP=/opt/ibm/cell-sdk/prototype
cd mysimple
<modify the sample as desired>
make
Note: Some of the samples may attempt to install the built binaries. Errors result
when the install attempts to write private builds to system directories,
unless the user has root authority. To avoid this, either create a complete
sandbox (as previously recommended) without setting CELL_TOP or modify
the Makefile to not install the built binary.
22 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Alternatively, users can copy the sysroot image file to their own sandbox area and
then mount this version with read/write permissions to make persistent updates to
the image.
Introduction
GDB is the standard command-line debugger available as part of the GNU
development environment. GDB has been modified to allow debugging in a
Cell/B.E. processor environment and this section describes how to debug Cell/B.E.
software using the new and extended features of the GDBs which are supplied
with SDK 2.1.
There are three versions of GDB which can be installed on a BladeCenter QS20:
v gdb which is installed with Fedora Core 6 for debugging PowerPC applications.
You should NOT use this debugger for Cell/B.E. applications.
v ppu-gdb for debugging PPE code or for debugging combined PPE and SPE code.
This is the combined debugger.
v spu-gdb for debugging SPE code only. This is the standalone debugger.
This section also describes how to run applications under gdbserver. The
gdbserver program allows remote debugging.
For more information about compiling with GCC, see “Compiling and linking with
the GNU tool chain” on page 21.
Whichever method you choose, after you have started the application under
ppu-gdb, you can use the standard GDB commands available to debug the
application. The GDB manual is available at the GNU Web site
http://www.gnu.org/software/gdb/gdb.html
and there are many other resources available on the World Wide Web.
Note: Do not use gdb, which is the version of GDB which comes with the
operating system. Use ppu-gdb instead.
Note: You can use either spu-gdb or ppu-gdb to debug SPE only programs. In this
section spu-gdb is used.
The examples in this section use a standalone SPE (spulet) program, simple.c,
whose source code and Makefile are given below:
Source code:
#include <stdio.h>
#include <spu_intrinsics.h>
unsigned int
fibn(unsigned int n)
{
if (n <= 2)
return 1;
return (fibn (n-1) + fibn (n-2));
}
int
main(int argc, char **argv)
{
unsigned int c;
c = fibn (8);
printf ("c=%d\n", c);
return 0;
}
26 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Note: Recursive SPE programs are generally not recommended due to the limited
size of local storage. An exception is made here because such a program can
be used to illustrate the backtrace command of GDB.
Makefile:
simple: simple.c
spu-gcc simple.c -g -o simple
28 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Debugging in the Cell/B.E. environment
To debug combined code, that is code containing both PPE and SPE code, you
must use ppu-gdb.
On many operating systems, a single program can have more than one thread. The
ppu-gdb program allows you to debug programs with one or more threads. The
debugger shows all threads while your program runs, but whenever the debugger
runs a debugging command, the user interface shows the single thread involved.
This thread is called the current thread. Debugging commands always show
program information from the point of view of the current thread. For more
information about GDB support for debugging multithreaded programs, see the
sections ’Debugging programs with multiple threads’ and ’Stopping and starting
multi-thread programs’ of the GDB User’s Manual, available at
http://www.gnu.org/software/gdb/gdb.html
The info threads command displays the set of threads that are active for the
program, and the thread command can be used to select the current thread for
debugging.
Note: The source code for the program simple.c used in the examples below
comes with the SDK and can be found at /opt/ibm/cell-sdk/prototype/
src/samples/tutorial/simple.
Debugging architecture
On the Cell/B.E. processor, a thread can run on either the PPE or on an SPE. A
program typically starts as single thread running on the PPE which can then
spawn new threads that run on either the PPE or on an SPE. When you choose a
thread to debug, the debugger automatically uses the correct architecture for the
thread. If the thread is running on the PPE, the debugger uses the PowerPC
architecture. If the thread is running on an SPE, the debugger uses the SPU
architecture.
To see which architecture the debugger is using, use the show architecture
command.
The example below shows the results of the show architecture command at two
different breakpoints in a program. At breakpoint 1 the program is executing in the
original PPE thread, where the show architecture command indicates that
architecture is powerpc:common. The program then spawns an SPE thread which
will execute the SPU code in simple_spu.c. When the debugger detects that the
SPE thread has reached breakpoint 3, it switches to this thread and sets the
architecture to spu:256K For more information about breakpoint 2, see “Setting
pending breakpoints” on page 32.
Note: The source code for the example below can be found at
/opt/ibm/cell-sdk/prototype/src/samples/tutorial/simple.
The debugger sees SPE executable programs as shared libraries. The info
sharedlibrary command shows all the shared libraries including the SPE
executables when running SPE threads.
The example below shows the results of the info sharedlibrary command at two
breakpoints on one thread. At breakpoint 1, the thread is running on the PPE, at
breakpoint 3 the thread is running on the SPE. For more information about
breakpoint 2, see “Setting pending breakpoints” on page 32.
(gdb) break main
Breakpoint 1 at 0x1801654: file simple.c, line 23.
(gdb) r
Starting program: /home/user/md/simple/simple
[Thread debugging using libthread_db enabled]
[New Thread 4160655360 (LWP 2528)]
[Switching to Thread 4160655360 (LWP 2528)]
30 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
0x0f291cc0 0x0f2970e0 Yes /lib/librt.so.1
(gdb) break simple_spu.c:5
No source file named simple_spu.c.
Make breakpoint pending on future shared library load? (y or [n]) y
GDB creates a unique name for each shared library entry representing SPE code.
That name consists of the SPE executable name, followed by the location in PPE
memory where the SPE is mapped (or embedded into the PPE executable image),
and the SPE ID of the SPE thread where the code is loaded.
Using scheduler-locking
Scheduler-locking is a feature of GDB that simplifies multithread debugging by
enabling you to control the behavior of multiple threads when you single-step
through a thread. By default scheduler-locking is off, and this is the recommended
setting.
There are situations where you can safely set scheduler-locking on, but you should
do so only when you are sure there are no deadlocks.
You can use breakpoints for both PPE and SPE portions of the code. There are
some instances, however, where GDB must defer insertion of a breakpoint because
the code containing the breakpoint location has not yet been loaded into memory.
This occurs when you wish to set the breakpoint for code that is dynamically
loaded later in the program. If ppu-gdb cannot find the location of the breakpoint it
sets the breakpoint to pending. When the code is loaded, the breakpoint is inserted
and the pending breakpoint deleted.
You can use the set breakpoint command to control the behavior of GDB when it
determines that the code for a breakpoint location is not loaded into memory. The
syntax for this command is:
set breakpoint pending <on off auto>
where
v on on specifies that GDB should set a pending breakpoint if the code for the
breakpoint location is not loaded.
v off off specifies that GDB should not create pending breakpoints, and break
commands for a breakpoint location that is not loaded result in an error.
v auto auto specifies that GDB should prompt the user to determine if a pending
breakpoint should be set if the code for the breakpoint location is not loaded.
This is the default behavior.
32 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
The example below shows the use of pending breakpoints. Breakpoint 1 is a
standard breakpoint set for simple.c, line 23. When the breakpoint is reached,
the program stops running for debugging. After set breakpoint pending is set to
off, GDB cannot set breakpoint 2 (break simple_spu.c:5) and generates the error
message No source file named simple_spu.c. After set breakpoint pending is
changed to auto, GDB sets a pending breakpoint for the location simple_spu.c:5.
At the point where GDB can resolve the location, it sets the next breakpoint,
breakpoint 3.
(gdb) break main
Breakpoint 1 at 0x1801654: file simple.c, line 23.
(gdb) r
Starting program: /home/user/md/simple/simple
[Thread debugging using libthread_db enabled]
[New Thread 4160655360 (LWP 2651)]
[Switching to Thread 4160655360 (LWP 2651)]
Note: The example above shows one of the ways to use pending breakpoints. For
more information about other options, see the documentation available at
http://www.gnu.org/software/gdb/gdb.html
Note: The set spu stop-on-load command has no effect in the SPU standalone
debugger spu-gdb. To let an SPU standalone program proceed to its ″main″
function, you can use the start command in spu-gdb.
To check the status of spu stop-on-load, use the show spu stop-on-load command.
If you are working in GDB, you can access help for these new commands. To
access help, type help info spu followed by the info spu subcommand name. This
displays full documentation. Command name abbreviations are allowed if
unambiguous.
Note: For more information about the various output elements, refer to Cell
Broadband Engine Architecture available at
34 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
http://www.ibm.com/developerworks/power/cell/
Note: In the following section, gdbserver is used as the generic term for both
versions. Similarly GDB is used to refer to the two different debuggers.
This section describes how to set up remote debugging for the Cell/B.E. processor
and the simulator. It covers the following topics:
v “Remote debugging overview” on page 36
v “Using remote debugging” on page 36
v “Starting remote debugging” on page 37
The connection between GDB and gdbserver can either be through a traditional
serial line or through TCP/IP. For example, you can run gdbserver on a
BladeCenter QS20 and GDB on an Intel® x86 platform, which then connects to the
gdbserver using TCP/IP.
36 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Note: IDEs such as Eclipse do not directly communicate with gdbserver. However,
an IDE can communicate with GDB running on the same host which can
then in turn communicate with gdbserver running on a remote machine.
To use remote debugging, you need a version of the program for the target
platform and network connectivity. The gdbserver program comes packaged with
GDB and is installed with the SDK 2.1.
Note: To connect thru the network to the simulator, you must enable bogusnet
support in the simulator. This creates a special Ethernet device that uses a
″call-thru″ interface to send and receive packets to the host system. See the
simulator documentation for details about how to enable bogusnet.
Note: If you use ppu-gdbserver as shown here then you must use ppu-gdb on
the client.
2. Start GDB from the client system (if you are using the simulator this is the host
system of the simulator).
For the simulator this is:
/opt/cell/bin/ppu-gdb myprog
For the BladeCenter QS20 this is:
/usr/bin/ppu-gdb myprog
You should have the source and compiled executable version for myprog on the
host system. If your program links to dynamic libraries, GDB attempts to locate
these when it attaches to the program. If you are cross-debugging, you need to
direct GDB to the correct versions of the libraries otherwise it tries to load the
libraries from the host platform. The default path is opt/cell/sysroot. For the
Cell/B.E. SDK 2.1, issue the following GDB command to connect to the server
hosting the correct version of the libraries:
set solib-absolute-prefix
38 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Chapter 4. SPU code overlays
This section describes how to use the overlay facility to overcome the physical
limitations on code and data size in the SPU.
An overlay is a program segment which is not loaded into SPU local storage
before the main program begins to execute, but is instead left in Cell main storage
until it is required. When the SPU program calls code in an overlay segment, this
segment is transferred to local storage where it can be executed. This transfer will
usually overwrite another overlay segment which is not immediately required by
the program.
In an overlay structure the local storage is divided into a root segment, which is
always in storage, and one or more overlay regions, where overlay segments are
loaded when needed. Any given overlay segment will always be loaded into the
same region. A region may contain more than one overlay segment, but a segment
will never cross a region boundary.
(A segment is the smallest unit which can be loaded as one logical entity during
execution. Segments contain program sections such as functions and data areas.)
The overlay feature is supported for Cell SPU programming (but not for PPU
programming) on a native BladeCenter QS20 or on the simulator hosted on an x86
or PowerPC machine.
Ideally all data sections are kept in the root segment which is never overlaid. If the
data size is too large for this then sections for transient data may be included in
overlay regions, but the implications of this must be carefully considered.
Overview
The structure of an overlay SPU program module depends on the relationships
between the segments within the module. Two segments which do not have to be
in storage at the same time may share an address range. These segments can be
assigned the same load addresses, as they are loaded only when called. For
example, segments that handle error conditions or unusual data are used
infrequently and need not occupy storage until they are required.
Program sections which are required at any time are grouped into a special
segment called the root segment. This segment remains in storage throughout the
execution of an program.
Some overlay segments may be called by several other overlay segments. This can
be optimized by placing the called and calling segments in separate regions.
To design an overlay structure you should start by identifying the code sections or
stubs which receive control at the beginning of execution, and also any code
sections which should always remain in storage. These together form the root
segment. The rest of the structure is developed by checking the links between the
remaining sections and analyzing these to determine which sections can share the
same local storage locations at different times during execution.
Sizing
Because the minimum indivisible code unit is at the function level, the minimum
size of the overlay region is the size of the largest overlaid function. If this function
is so large that the generated SPU program does not fit in local storage then a
warning is issued by the linker. The user must address this problem by splitting
the function into one or more smaller functions.
40 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Scaling considerations
Even with overlays there are limits on how large an SPE executable can become.
An infrastructure of manager code, tables, and stubs is required to support
overlays and this infrastructure itself cannot reside in an overlay. For a program
with s overlay segments in r regions, making cross-segment calls to f functions, this
infrastructure requires the following amounts of local storage:
v manager: about 400 bytes,
v tables: s * 16 + r * 4 bytes,
v stubs: f * 8 + s * 8 bytes.
This allows a maximum available code size of about 512 megabytes, split into 4096
overlay sections of 128 kilobytes each. (This assumes a single entry point into each
section and no global data segment or stack.)
Except for the local storage memory requirements described above, this design
does not impose any limitations on the numbers of overlay segments or regions
supported.
The relationship between segments can be shown with a tree structure. This
graphically shows how segments can use local storage at different times. It does
not imply the order of execution (although the root segment is always the first to
receive control). Figure 2 shows the tree structure for this program. The structure
includes five segments:
The position of the segments in an overlay tree structure does not imply the
sequence in which the segments are executed; in particular sections in the root
segment may be called from any segment. A segment can be loaded and overlaid
as many times as the logic of the program requires.
If the program did not use overlays it would require 320 KB of local storage; the
sum of all sections. With overlays, however, the storage needed for the program is
the sum of all overlay regions, where the size of each region is the size of its
largest segment. In this structure the maximum is formed by segments 0, 4, and 2;
these being the largest segments in regions 0, 1, and 2. The sum of the regions is
then 200 KB, as shown in Figure 3.
Note: The sum of all regions is not the minimum requirement for an overlay
program. When a program uses overlays, extra programming and tables are
used and their storage requirements must also be considered. The storage
required by these is described in “Scaling considerations” on page 41.
Segment origin
The linker typically assigns the origin of the root segment (the origin of the
program) to address 0x80. The relative origin of each segment is determined by the
42 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
length of all previously defined regions. For example, the origin of segments 2 and
3 is equal to the root origin plus 80 KB (the length of region 1 and segment 4) plus
50 KB (the length of the root segment), or 0x80 plus 130 KB. The origins of all the
segments are as follows:
Table 6. Segment origins
Segment Origin
0 0x80 + 0
1 0x80 + 50,000
2 0x80 + 130,000
3 0x80 + 130,000
4 0x80 + 50,000
The segment origin is also called the load point, because it is the relative location
where the segment is loaded. Figure 4 shows the segment origin for each segment
and the way storage is used by the sample program. The vertical bars indicate
segment origin; two segments with the same origin can use the same storage area.
This figure also shows that the longest path is that for segments 0, 4, and 2.
Overlay processing
The overlay processing is initiated when a section in local storage calls a section
not in storage. The function which determines when an overlay is to occur is the
overlay manager. This checks which segment the called section is in and, if
The overlay manager uses special stubs and tables to determine when an overlay is
necessary. These stubs and tables are generated by the linker and are part of the
output program module. The special stubs are used for each inter-segment call.
The tables generated are the overlay segment table and the overlay region table.
Figure 5 shows the location of the call stubs and the segment and region tables in
the root segment in the sample program.
The size of these tables must be considered when planning the use of local storage.
Call stubs
There is one call stub for each function in an overlay segment which is called from
a different segment. No call stub is needed if the function is called within the same
segment. All call stubs are in the root segment. During execution the call stub
specifies (to the overlay manager) the segment to be loaded, and the segment offset
to transfer control to, to invoke the function after it is loaded.
44 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
overlay each other then the program might be described as an overlay graph
structure (as opposed to an overlay tree structure) and it should use multiple
regions.
With multiple regions each segment has access to both the root segment and other
overlay segments in other regions. Therefore regions are independent of each other.
Figure 6 shows the relationship between the sections in the sample program and
two new sections: SH and SI. The two new sections are each used by two other
sections in different segments. Placing SH and SI in the root segment makes the
root segment larger than necessary, because SH and SI can overlay each other. The
two sections cannot be duplicated in two paths, because the linker automatically
deletes the duplicates.
However, if the two sections are placed in another region they can be in local
storage when needed, regardless of the segments executed in the other regions.
Figure 7 on page 46 shows the sections in a four-region structure. Either segment
in region 3 can be in local storage regardless of the segments being executed in
regions 0, 1, or 2. Segments in region 3 can cause segments in region 0, 1 or 2 to be
loaded without being overlaid themselves.
The relative origin of region 3 is determined by the length of the preceding regions
(200 KB). Region 3, therefore, begins at the origin plus 200 KB.
The local storage required for the program is determined by adding the lengths of
the longest segment in each region. In Figure 7 if SH is 40 KB and SI is 30 KB the
storage required is 240 KB plus the storage required by the overlay manager, its
call stubs and its overlay tables. Figure 8 on page 47 shows the segment origin for
each segment and the way storage is used by the sample program.
46 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Figure 8. Overlay graph segment origin and use of storage
The input sequence of control statements and sections should reflect the sequence
of the segments in the overlay structure (for example the graph in Figure 7 on page
46), region by region, from top to bottom and from left to right. This sequence is
illustrated in later examples.
The origin of every region is specified with an OVERLAY statement. Each OVERLAY
statement defines a load point at the end of the previous region. That load point is
logically assigned a relative address at the quadword boundary that follows the
last byte of the largest segment in the preceding region. Subsequent segments
defined in the same region have their origin at the same load point.
Note: By implication sections SA and SB are associated with the root segment
because they are not specified in the OVERLAY statements.
In the sample overlay graph program, as shown in Figure 6 on page 45, one more
load point is assigned to the origin of the last OVERLAY statement and its region.
Segments 5 and 6 are at the third load point.
The following linker script statements add to the sequence for the overlay tree
program creating the structure shown in Figure 7 on page 46:
.
.
.
OVERLAY {
.segment5 {./si.o(.text)}
.segment6 {./sh.o(.text)}
}
Migration/Co-Existence/Binary-Compatibility Considerations
This feature will work with both IPA and non-IPA code, though the partitioning
algorithm will generate better overlays with IPA code.
Compiler options
Table 7. Compiler options
Option Description
-qipa=overlay Specifies that the compiler should
automatically create code overlays. The
-qipa=partition={small|medium|large}
option is used to control the size of the
overlay buffer. The overlay buffer will be
placed after the text segment of the linker
script.
-qipa=nooverlay Specifies that the compiler should not
automatically create code overlays. This is
the default behavior for the dual source
compiler.
48 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Table 7. Compiler options (continued)
Option Description
-qipa=overlayproc=<names_list> Specifies a comma-separated list of functions
that should be in the same overlay. Multiple
overlayproc suboptions may be present to
specify multiple overlay groups. If a
procedure is listed in multiple groups, it will
be cloned for each group referencing it. C++
function names must be mangled.
-qipa=nooverlayproc= <names_list> Specifies a comma-separated list of functions
that should not be overlaid. These will
always be resident in the local store. C++
function names must be mangled.
Examples:
# Compile and link without overlays.
xlc foo.c bar.c
xlc foo.c bar.c -qipa=nooverlay
# Compile and link with automatic overlays and ensure that foo and bar are
# in the same overlay. The main function is always resident.
xlc foo.c bar.c -qipa=overlay:overlayproc=foo,bar:nooverlayproc=main
# Compile and link with automatic overlays and a custom linker script.
xlc foo.c bar.c -qipa=overlay -Wl,-Tmyldscript
The SPU program is organized as an overlay program with two regions and three
segments. The first region is the non-overlay region containing the root segment
(segment 0). This root segment contains the spu_main function along with overlay
support programming and tables (not shown). The second region is an overlay
region and contains segments 1 and 2. In segment 1 are the code sections of
functions o1_test1 and o1_test2, and in segment 2 are the code sections of
functions o2_test1 and o2_test2, as shown in Figure 10.
Combining these figures yields the following diagram showing the structure of the
SPU program.
50 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
The physical view of this sample (Figure 12) shows one region containing the
non-overlay root segment, and a second region containing one of two overlay
segments. Because the functions in these two overlay segments are quite similar
their lengths happen to be the same.
The spu_main program calls its sub-functions multiple times. Specifically the
spu_main program first calls two functions, o1_test1 and o1_test2, passing in an
integer value (101 and 102 respectively) and upon return it expects an integer
result (1 and 2 respectively). Next spu_main calls the two other functions, o2_test1
and o2_test2 passing in an integer value (201 and 202 respectively) and upon
return it expects an integer result (11 and 12 respectively). Finally spu_main calls
again the first two functions, o1_test1 and o1_test2 passing in an integer value
(301 and 302 respectively) and upon return it expects an integer result (1 and 2
respectively). Between each pair of calls, the overlay manager loads the
appropriate segment into the appropriate region. In this case, for the first pair it
loads segment 1 into region 1 then for the second pair it loads segment 2 into
region 1, and for the last pair it reloads segment 1 back into region 1. See Figure 13
on page 52.
Note: To simplify the linker scripts only the affected statements are shown in this
and the following examples.
...
.text :
{
*( EXCLUDE_FILE(./olay1/test.o ./olay2/test.o)
.text .stub .text.* .gnu.linkonce.t.*)
*(.gnu.warning)
}
OVERLAY {
.segment1 {./olay1/test.o(.text)}
.segment2 {./olay2/test.o(.text)}
}
...
The sample consists of a single SPU main program. The main program calls the SA
function which in turn calls the SB function. These three functions are all located in
the root segment (segment 0) and cannot be overlaid.
52 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
The SB function calls the SC and SG functions. These are in two segments which are
both located in region 1 and overlay each other.
SC calls SD and SF. SD in turn calls SE. The SD and SE functions are in segment 2
and the SF function is in segment 3. These two segments are both located in region
2 and overlay each other.
The physical view of this sample (Figure 8 on page 47) shows the four regions; one
region containing a single non-overlay root segment and three regions containing
six overlay segments.
OVERLAY :
{
.segment1 {./sc.o(.text)}
.segment4 {./sg.o(.text) }
}
OVERLAY :
{
.segment2 {./sd.o(.text) ./se.o(.text)}
.segment3 {./sf.o(.text)}
}
OVERLAY :
{
.segment5 {./sh.o(.text)}
.segment6 {./si.o(.text)}
}
...
The physical view of this sample in Figure 15 shows three regions; one containing
a single non-overlay root segment, and two containing twelve overlay segments.
54 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
The linker commands are in linker.script in /opt/ibm/cell-sdk/prototype/src/
samples/overlay/large_matrix.
Note: this is a subset of all the functions in the large_matrix library. Only those
needed by the test case driver, large_matrix.c, are used in this example.
The management data structures generated include two overlay tables in a .ovtab
section. The first of these is a table with one entry per overlay segment. This table
is read-only to the overlay manager, and should never change during execution of
the program. It has the format:
struct {
u32 vma; // SPU local store address that the section is loaded to.
u32 size; // Size of the overlay in bytes.
u32 offset; // Offset in SPE executable where the section can be found.
u32 buf; // One-origin index into the _ovly_buf_table.
} _ovly_table[];
The second table has one entry per overlay region. This table is read-write to the
overlay manager, and changes to reflect the current overlay mapping state. The
format is:
struct {
u32 mapped; // One-origin index into _ovly_table for the
// currently loaded overlay. 0 if none.
} _ovly_buf_table[];
Note: These tables, all stubs, and the overlay manager itself must reside in the root
(non-overlay) segment.
Whenever the overlay manager loads an segment into a region it updates the
mapped field in the _ovly_buf_table entry for the region with the index of the
segment entry in the _ovly_buf table.
The overlay manager may be provided by the user as a library containing the
entries __ovly_load and _ovly_debug_event. (It is an error for the user to provide
_ovly_debug_event without also providing __ovly_load.) If these entries are not
provided the linker will use a built-in overlay manager which contains these
symbols in the .stub section.
56 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Appendix A. Notices
This information was developed for products and services offered in the U.S.A.
The manufacturer may not offer the products, services, or features discussed in this
document in other countries. Consult the manufacturer’s representative for
information on the products and services currently available in your area. Any
reference to the manufacturer’s product, program, or service is not intended to
state or imply that only that product, program, or service may be used. Any
functionally equivalent product, program, or service that does not infringe any
intellectual property right of the manufacturer may be used instead. However, it is
the user’s responsibility to evaluate and verify the operation of any product,
program, or service.
The following paragraph does not apply to the United Kingdom or any other
country where such provisions are inconsistent with local law: THIS
INFORMATION IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR
FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may
not apply to you.
Any references in this information to Web sites not owned by the manufacturer are
provided for convenience only and do not in any manner serve as an endorsement
of those Web sites. The materials at those Web sites are not part of the materials for
this product and use of those Web sites is at your own risk.
The manufacturer may use or distribute any of the information you supply in any
way it believes appropriate without incurring any obligation to you.
Licensees of this program who wish to have information about it for the purpose
of enabling: (i) the exchange of information between independently created
programs and other programs (including this one) and (ii) the mutual use of the
information which has been exchanged, should contact the manufacturer.
The licensed program described in this information and all licensed material
available for it are provided by IBM under terms of the IBM Customer Agreement,
All statements regarding the manufacturer’s future direction or intent are subject to
change or withdrawal without notice, and represent goals and objectives only.
This information contains examples of data and reports used in daily business
operations. To illustrate them as completely as possible, the examples include the
names of individuals, companies, brands, and products. All of these names are
fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
If you are viewing this information in softcopy, the photographs and color
illustrations may not appear.
58 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Edition notice
© Copyright International Business Machines Corporation 2005. All rights
reserved.
Appendix A. Notices 59
Trademarks
The following terms are trademarks of International Business Machines
Corporation in the United States, other countries, or both:
alphaWorks
BladeCenter
developerWorks
IBM
POWER
Power PC®
PowerPC
PowerPC Architecture™
Intel, MMX, and Pentium® are trademarks of Intel Corporation in the United
States, other countries, or both.
UNIX® is a registered trademark of The Open Group in the United States and
other countries.
Java™ and all Java-based trademarks and logos are trademarks or registered
trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
Red Hat, the Red Hat “Shadow Man” logo, and all Red Hat-based trademarks and
logos are trademarks or registered trademarks of Red Hat, Inc., in the United
States and other countries.
XDR is a trademark of Rambus Inc. in the United States and other countries.
60 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Appendix B. Related documentation
All of the documentation listed in this section is available on the ISO image. The
latest versions of some documents may be available from the referenced web pages
or on your system after installing components of the SDK.
Cell/B.E. processor
There is a set of tutorial and reference documentation for the Cell/B.E. stored in
the IBM online technical library at:
http://www.ibm.com/chips/techlib/techlib.nsf/products/Cell_Broadband_Engine
v Cell Broadband Engine Architecture
v Cell Broadband Engine Programming Handbook
v Cell Broadband Engine Registers
v C/C++ Language Extensions for Cell Broadband Engine Architecture
v Synergistic Processor Unit (SPU) Instruction Set Architecture
v SPU Application Binary Interface Specification
v Assembly Language Specification
v Cell Broadband Engine Linux Reference Implementation Application Binary Interface
Specification
After you have installed the SDK, you can also find the following PDFs in the
/opt/ibm/cell-sdk/prototype/docs directory:
v SDK Sample Library documentation
v IDL compiler documentation
http://www.ibm.com/chips/techlib/techlib.nsf/products/Cell_Broadband_Engine
v Cell Broadband Engine Programming Tutorial documentation
v SPE Runtime Management library documentation Version 1.2
v SPE Runtime Management library documentation Version 2.1
v SPE Runtime Management library Version 1.2 to Version 2.0 Migration Guide
After you have installed the SDK, you can find the following PDFs in the
/opt/ibmcmp/xlc/8.2/doc directory.
v Getting Started with IBM XL C/C++ Compiler
v IBM XL C/C++ Compiler Language Reference
After you have installed the SDK, you can also find the following PDFs in the
/opt/ibm/systemsim-cell/doc directory.
v IBM Full-System Simulator Users Guide
v IBM Full-System Simulator Command Reference
v Performance Analysis with the IBM Full-System Simulator
v IBM Full-System Simulator BogusNet HowTo
PowerPC Base
The following documents can be found on the developerWorks Web site at:
http://www.ibm.com/developerworks/eserver/library
v PowerPC Architecture Book, Version 2.02
– Book I: PowerPC User Instruction Set Architecture
– Book II: PowerPC Virtual Environment Architecture
– Book III: PowerPC Operating Environment Architecture
v PowerPC Microprocessor Family: Vector/SIMD Multimedia Extension Technology
Programming Environments Manual Version 2.07c
62 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Glossary
This glossary contains terms and abbreviations application has one or more Linux threads and some
used in Cell/B.E. systems. number of SPE threads. All the Linux threads within
the application share the application’s resources,
ABI. Application Binary Interface. This is the standard including access to the SPE threads.
that a program follows to ensure that code generated
by different compilers (and perhaps linking with Cell Broadband Engine program. A PPE program
various, third-party libraries) will run correctly on the with one or more embedded SPE programs.
Cell Broadband Engine. The ABI defines data types,
register use, calling conventions and object formats. code section. A self-contained area of code, in
particular one which may be used in an overlay
ALF. Accelerated Library Framework. This an API that segment.
provides a set of functions to help programmers
solving data parallel problems on a hybrid system. ALF compiler. A programme that translates a high-level
supports the single-program-multiple-data (SPMD) programming language, such as C++, into executable
programming style with a single program running on code.
all accelerator elements at one time. ALF offers
CPL. Common Public License.
programmers an interface to partition data across a set
of parallel processes without requiring architecturally Cycle-accurate simulation. See Performance simulation.
dependent code.
cycle. Unless otherwise specified, one tick of the PPE
atomic operation. A set of operations, such as clock.
read-write, that are performed as an uninterrupted unit.
DMA. Direct Memory Access. A technique for using a
Auto-SIMDize. To automatically transform scaler code special-purpose controller to generate the source and
to vector code. destination addresses for a memory or I/O transfer.
Barcelona Supercomputing Center. Spanish National DMA command. A type of MFC command that
Supercomputing Center, supporting Bladecenter and transfers or controls the transfer of a memory location
Linux on cell. containing data or instructions. See MFC.
BSC. See Barcelona Supercomputing Center. ELF. Executable and Linking Format. The standard
object format for many UNIX operating systems,
BE. Broadband Engine.
including Linux. Originally defined by AT®&T and
Broadband Engine. See CBEA. placed in public domain. Compilers generate ELF files.
Linkers link to files with ELF files in libraries. Systems
C++. Deriving from C, C++ is an object-orientated run ELF files.
programming language.
elfspe. The SPE that allows an SPE program to run
cache. High-speed memory close to a processor. A directly from a Linux command prompt without
cache usually contains recently-accessed data or needing a PPE application to create an SPE thread and
instructions, but certain cache-control instructions can wait for it to complete.
lock, evict, or otherwise modify the caching of data or
instructions. ext3. Extended file system 3. One of the file system
options available for Linux partitions.
call stub. A small piece of code used as a link to other
code which is not immediately accessible. Fedora Core. An operating system built entirely from
open-source software and therefore freely available.
CBEA. Cell Broadband Processor Architecture. A new Often, but mistakenly, known as Fedora linux.
architecture that extends the 64-bit PowerPC
Architecture. The CBEA and the Cell Broadband Engine FDPR-Pro. Feedback Directed Program Restructuring.
are the result of a collaboration between Sony, Toshiba, A feedback-based post-link optimization tool.
and IBM, known as STI, formally started in early 2001.
FFT. Fast Fourier Transform.
Cell/B.E.. Cell Broadband Engine. See CBEA.
firmware. A set of instructions contained in ROM
Cell Broadband Engine Linux application. An usually used to enable peripheral devices at boot.
application running on the PPE and SPE. Each such
FSF. Free Software Foundation. Organization kernel. The core of an operating which provides
promoting the use of open-source software such as services for other parts of the operating system and
Linux. provides multitasking. In Linux or UNIX operating
system, the kernel can easily be rebuilt to incorporate
FSS. IBM Full-System Simulator. IBM's tool which enhancements which then become operating-system
simulates the cell processor environment on other host wide.
computers.
K&R programming. A reference to a well-known
GCC. GNU C compiler. book on programming written by Dennis Kernighan
and Brian Ritchie.
GDB. GNU application debugger. A modified version
of gdb, ppu-gdb, can be used to debug a Cell Broadband L1. Level-1 cache memory. The closest cache to a
Engine program. The PPE component runs first and processor, measured in access time.
uses system calls, hidden by the SPU programming
library, to move the SPU component of the Cell L2. Level-2 cache memory. The second-closest cache to
Broadband Engine program into the local store of the a processor, measured in access time. A L2 cache is
SPU and start it running. A modified version of gdb, typically larger than a L1 cache.
spu-gdb, can be used to debug code executing on SPEs.
latency. The time between when a function (or
GPL. GNU General Public License. Guarantees instruction) is called and when it returns. Programmers
freedom to share, change and distribute free software. often optimize code so that functions return as quickly
as possible; this is referred to as the low-latency
GNU. GNU is Not Unix. A project to develop free approach to optimization. Low-latency designs often
Unix-like operating systems such as Linux. leave the processor data-starved, and performance can
suffer.
graph structure. A program design in which each
child segment is linked to one or more parent libspe. A SPU-thread runtime management library.
segments.
Linux. An open-source Unix-like computer operating
GUI. Graphical User Interface. User interface for system.
interacting with a computer which employs graphical
images and widgets in addition to text to represent the LGPL. Lesser General Public License. Similar to the
information and actions available to the user. Usually GPL, but does less to protect the user’s freedom.
the actions are performed through direct manipulation
of the graphical elements. local store. The 256-KB local store associated with
each SPE. It holds both instructions and data.
HTTP. Hypertext Transfer Protocol. A method used to
transfer or convey information on the World Wide Web. LS. See local store.
I/O device. Input/output device. From the viewpoint LSA. Local Store Address. An address in the local
of software I/O devices exist as memory-mapped store of a SPU through which programs running in the
registers that are accessed in main-storage space by SPU, and DMA transfers managed by the MFC, access
load/store instructions. . the local store.
IDE. Integrated Development Environment. Integrates main memory. See main storage.
the Cell/B.E. GNU tool chain, compilers, the
Full-System Simulator, and other development main storage. The effective-address (EA) space. It
components to provide a comprehensive, Eclipse-based consists physically of real memory (whatever is
development platform that simplifies Cell/B.E. external to the memory-interface controller, including
development. both volatile and nonvolatile memory), SPU LSs,
memory-mapped registers and arrays, memory-mapped
IDL. Interface definition language. Not the same as I/O devices (all I/O is memory-mapped), and pages of
CORBA IDL virtual memory that reside on disk. It does not include
caches or execution-unit register files. See also local
ILAR. IBM International License Agreement for early store.
release of programs.
Makefile. A descriptive file used by the makecommand
initrd. A command file read at boot. in which the user specifies: (a) target program or
library, (b) rules about how the target is to be built, (c)
ISO image. Commonly a disk image which can be dependencies which, if updated, require that the target
burnt to CD. Technically it is a disk image of and ISO be rebuilt.
9660 file system.
Mambo. Pre-release name of the IBM Full-System
Simulator, see FSS
64 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Glossary
MASS. Mathematical Acceleration Subsystem. MASS PowerPC Architecture. A computer architecture that is
and MASS/V library contain scalar and vector based on the third generation of RISC processors. The
operations not in the standard C math library. PowerPC architecture was developed jointly by Apple,
Motorola, and IBM.
MFC. Memory Flow Controller. Part of an SPE which
provides two main functions: it moves data via DMA PPC. See Power PC.
between the SPE’s local store (LS) and main storage,
and it synchronizes the SPU with the rest of the PPE. PowerPC Processor Element. The
processing units in the system. general-purpose processor in the Cell.
netboot. Command to boot a device from another on program section. See code section.
the same network. Requires a TFTP server.
proxy. Allows many network devices to connect to the
NUMA. Non-uniform memory access. In a internet using a single IP address. Usually a single
multiprocessing system such as the Cell/B.E., memory server, often acting as a firewall, connects to the
is configured so that it can be shared locally, thus internet behind which other network devices connect
giving performance benefits. using the IP address of that server.
overlay segment. Code that is dynamically loaded and region. See overlay region.
executed by a running SPU program. A segment
contains one or more code sections. RPM. Originally an acronym for Red Hat Package
Manager, and RPM file is a packaging format for one
overlay region. An area of storage, with a fixed or more files used by many Linux systems when
address range, into which overlay segments are loaded. installing software programs.
A region only contains one segment at any time.
root segment. Code that is always in storage when a
page table. A table that maps virtual addresses (VAs) SPU program runs. The root segment contains overlay
to real addresses (RA) and contains related protection control sections and may also contain code sections and
parameters and other information about memory data areas.
locations.
Sandbox. Safe place for running programs or script
PDF. Portable document format. without affecting other users or programs.
Performance simulation. Simulation by the IBM Full SDK. Software development toolkit. A complete
System Simulator for the Cell Broadband Engine in package of tools for application development. The
which both the functional behavior of operations and Cell/B.E. SDK includes sample software for the Cell
the time required to perform the operations is Broadband Engine.
simulated. Also called cycle-accurate simulation.
section. See code section.
pipelining. A technique that breaks operations, such
as instruction processing or bus transactions, into segment. See overlay segment and root segment.
smaller stages so that a subsequent stage in the
SIMD. Single Instruction Multiple Data. Processing in
pipeline can begin before the previous stage has
which a single instruction operates on multiple data
completed.
elements that make up a vector data-type. Also known
plugin. Code that is dynamically loaded and executed as vector processing. This style of programming
by running an SPU program. Plugins facilitate code implements data-level parallelism.
overlays.
SIMDize. To transform scaler code to vector code.
PowerPC. Of or relating to the PowerPC Architecture or
SPE. Synergistic Processor Element. Extends the
the microprocessors that implement this architecture.
PowerPC 64 architecture by acting as cooperative
PPC-64. 64 bit implementation of the PowerPC offload processors (synergistic processors), with the
Architecture. direct memory access (DMA) and synchronization
mechanisms to communicate with them (memory flow
Glossary 65
Glossary
control), and with enhancements for real-time vector. An instruction operand containing a set of data
management. There are 8 SPEs on each cell processor. elements packed into a one-dimensional array. The
elements can be fixed-point or floating-point values.
SPE thread. A thread scheduled and run on a SPE. A Most Vector/SIMD Multimedia Extension and SPU
program has one or more SPE threads. Each such SIMD instructions operate on vector operands. Vectors
thread has its own SPU local store (LS), 128 x 128-bit are also called SIMD operands or packed operands.
register file, program counter, and MFC Command
Queues, and it can communicate with other execution virtual memory. The address space created using the
units (or with effective-address memory through the memory management facilities of a processor.
MFC channel interface).
virtual storage. See virtual memory.
SPU. Synergistic Processor Unit. The part of an SPE
that executes instructions from its local store (LS). VMA. Virtual memory address. See virtual memory.
spulet. 1) A standalone SPU program that is managed workload. A set of code samples in the SDK that
by a PPE executive. 2) A programming model that characterizes the performance of the architecture,
allows legacy C programs to be compiled and run on algorithms, libraries, tools, and compilers.
an SPE directly from the Linux command prompt. ®
XDR. Rambus Extreme Data Rate DRAM memory
tag group. A group of DMA commands. Each DMA technology.
command is tagged with a 5-bit tag group identifier.
Software can use this identifier to check or wait on the XLC. The IBM optimizing C/C++ compiler.
completion of all queued commands in one or more tag
x86. Generic name for Intel-based processors.
groups. All DMA commands except getllar, putllc,
and putlluc are associated with a tag group. yaboot. Linux utility which is a boot loader for
PowerPC-based hardware.
Tcl. Tool Command Language. An interpreted script
language used to develop GUIs, application prototypes,
Common Gateway Interface (CGI) scripts, and other
scripts. Used as the command language for the Full
System Simulator.
66 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Index
Special characters directory
archive library 54
libraries and samples (continued)
Cell/B.E. library 5
__ovly_load 56 directory structure FDPR-Pro 11
_ovly_debug_event 56 libraries and samples 8 libspe version 2.1 5
–extra-overlay-stubs programming sample 20 MASS 7
linker command 55 system root 15 OProfile 11
DMA 40 prototype 7
documentation 61 SIMD math library 6
A SPU timing tool 10
address subdirectories 8
load 40 E support libraries 10
library
archive library 54 elfspe 6
directory 54 archive 54
example
overlay manager 39
overlay graph structure 44
libspe
B version 2.1 5
linker 39, 44, 45, 54
best practices 22 F command 52, 55
BogusNet 18 fast mode commands 53
bogusnet support 37 Full-System Simulator 17 control statement 47
breakpoints FDPR-Pro 11 flags 52, 53, 54
setting pending 32 flags GNU 55
linker 52, 53, 54 OVERLAY statement 55
Full-System Simulator script 52, 53, 55
C callthru utility 17 linker command
call stub 39, 44, 55 description 3 –extra-overlay-stubs 55
callthru utility 17 fast mode 17 linker statement 48
cell-perf-counter 13 running 16 OVERLAY 47
command system root image 4, 18 Linux
linker 52, 53, 55 systemsim 16 kernel 5
compiler function 40 load address 40
changing 20 load point 43, 47, 48
compiling and linking
GNU tool chain 21 G
control statement GCC compiler 1 M
linker 47 GNU SPU linker 55 makefile
GNU tool chain 1 for samples 20
compiling and linking 21 manager
D overlay 43, 44, 46, 56
data MASS library 7
transient 40 I
debugging IDE 13
architecture 29 info spu dma 35 N
commands 34 info spu event 35 native debugging
compiling with GCC 25 info spu mailbox 35 setting up 36
compiling with XLC 25 info spu proxydma 36
GDB 25 info spu signal 35
info spu dma 35
info spu event 35
Integrated Development
Environment 13
O
info spu mailbox 35 Oprofile
info spu proxydma 36 restrictions 13
info spu signal 35 SPU profiling restrictions 12
multithreaded code 29 K SPU report anomalies 12
pending breakpoints 32 kernel 5 OProfile 11
PPE code 26 origin
remotely 36 segment 43, 44, 46
scheduler-locking 31 L overlay 39
graph structure example 44
SPE code 26 length of an overlay program 42
SPE registers 28 large matrix sample 53
libraries and samples
starting remote 37 manager 43, 44, 46, 51, 56
cell-perf-counter 13
using the combined debugger 32 manager library 39
68 Cell Broadband Engine: Software Development Kit 2.1 Programmer's Guide Version 2.1
Readers’ Comments — We’d Like to Hear from You
Cell Broadband Engine
Software Development Kit 2.1
Programmer's Guide
Version 2.1
We appreciate your comments about this publication. Please comment on specific errors or omissions, accuracy,
organization, subject matter, or completeness of this book. The comments you send should pertain to only the
information in this manual or product and the way in which the information is presented.
For technical questions and information about products and prices, please contact your IBM branch office, your
IBM business partner, or your authorized remarketer.
When you send comments to IBM, you grant IBM a nonexclusive right to use or distribute your comments in any
way it believes appropriate without incurring any obligation to you. IBM or any other organizations will only use
the personal information that you supply to contact you about the issues that you state on this form.
Comments:
If you would like a response from IBM, please fill in the following information:
Name Address
Company or Organization
_ _ _ _ _ _ _Fold
_ _ _and
_ _ _Tape
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _Please
_ _ _ _ _do
_ _not
_ _ staple
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _Fold
_ _ _and
_ _ Tape
______
NO POSTAGE
NECESSARY
IF MAILED IN THE
UNITED STATES
_________________________________________________________________________________________
Fold and Tape Please do not staple Fold and Tape
Cut or Fold
SC33-8325-01 Along Line
Printed in USA
SC33-8325-01