Sprui04b - TMS320C6000 Optimizing Compiler v8.2.x
Sprui04b - TMS320C6000 Optimizing Compiler v8.2.x
v8.2.x
User's Guide
Preface....................................................................................................................................... 11
1 Introduction to the Software Development Tools .................................................................... 14
1.1 Software Development Tools Overview ................................................................................. 15
1.2 Compiler Interface .......................................................................................................... 16
1.3 ANSI/ISO Standard ........................................................................................................ 16
1.4 Output Files ................................................................................................................. 17
1.5 Utilities ....................................................................................................................... 17
2 Getting Started with the Code Generation Tools .................................................................... 18
2.1 How Code Composer Studio Projects Use the Compiler ............................................................. 18
2.2 Compiling from the Command Line ..................................................................................... 19
3 Using the C/C++ Compiler ................................................................................................... 20
3.1 About the Compiler......................................................................................................... 21
3.2 Invoking the C/C++ Compiler ............................................................................................. 21
3.3 Changing the Compiler's Behavior with Options ....................................................................... 22
3.3.1 Linker Options ..................................................................................................... 29
3.3.2 Frequently Used Options ......................................................................................... 31
3.3.3 Miscellaneous Useful Options ................................................................................... 33
3.3.4 Run-Time Model Options ......................................................................................... 35
3.3.5 Selecting Target CPU Version (--silicon_version Option) .................................................... 36
3.3.6 Symbolic Debugging and Profiling Options .................................................................... 36
3.3.7 Specifying Filenames ............................................................................................. 37
3.3.8 Changing How the Compiler Interprets Filenames ........................................................... 37
3.3.9 Changing How the Compiler Processes C Files .............................................................. 38
3.3.10 Changing How the Compiler Interprets and Names Extensions ........................................... 38
3.3.11 Specifying Directories............................................................................................ 38
3.3.12 Assembler Options ............................................................................................... 39
3.3.13 Dynamic Linking .................................................................................................. 40
3.4 Controlling the Compiler Through Environment Variables ............................................................ 41
3.4.1 Setting Default Compiler Options (C6X_C_OPTION) ........................................................ 41
3.4.2 Naming One or More Alternate Directories (C6X_C_DIR) ................................................... 42
3.5 Controlling the Preprocessor ............................................................................................. 43
3.5.1 Predefined Macro Names ........................................................................................ 43
3.5.2 The Search Path for #include Files ............................................................................. 44
3.5.3 Support for the #warning and #warn Directives ............................................................... 45
3.5.4 Generating a Preprocessed Listing File (--preproc_only Option) ........................................... 45
3.5.5 Continuing Compilation After Preprocessing (--preproc_with_compile Option) ........................... 45
3.5.6 Generating a Preprocessed Listing File with Comments (--preproc_with_comment Option) ........... 45
3.5.7 Generating Preprocessed Listing with Line-Control Details (--preproc_with_line Option) ............... 46
3.5.8 Generating Preprocessed Output for a Make Utility (--preproc_dependency Option) ................... 46
3.5.9 Generating a List of Files Included with #include (--preproc_includes Option) ........................... 46
3.5.10 Generating a List of Macros in a File (--preproc_macros Option) ......................................... 46
3.6 Passing Arguments to main() ............................................................................................. 46
3.7 Understanding Diagnostic Messages .................................................................................... 47
3.7.1 Controlling Diagnostic Messages ............................................................................... 48
3.7.2 How You Can Use Diagnostic Suppression Options ......................................................... 49
5.6.4 Using the .mdep Directive to Identify Specific Memory Dependencies ................................... 135
5.6.5 Memory Alias Examples ........................................................................................ 136
6 Linking C/C++ Code .......................................................................................................... 138
6.1 Invoking the Linker Through the Compiler (-z Option) ............................................................... 139
6.1.1 Invoking the Linker Separately ................................................................................. 139
6.1.2 Invoking the Linker as Part of the Compile Step ............................................................ 140
6.1.3 Disabling the Linker (--compile_only Compiler Option) ..................................................... 140
6.2 Linker Code Optimizations............................................................................................... 141
6.2.1 Conditional Linking .............................................................................................. 141
6.2.2 Generating Function Subsections (--gen_func_subsections Compiler Option) .......................... 141
6.2.3 Generating Aggregate Data Subsections (--gen_data_subsections Compiler Option) ................. 141
6.3 Controlling the Linking Process ......................................................................................... 142
6.3.1 Including the Run-Time-Support Library ...................................................................... 142
6.3.2 Run-Time Initialization........................................................................................... 143
6.3.3 Global Object Constructors ..................................................................................... 143
6.3.4 Specifying the Type of Global Variable Initialization ........................................................ 144
6.3.5 Specifying Where to Allocate Sections in Memory .......................................................... 144
6.3.6 A Sample Linker Command File ............................................................................... 145
7 TMS320C6000 C/C++ Language Implementation ................................................................... 146
7.1 Characteristics of TMS320C6000 C .................................................................................... 147
7.1.1 Implementation-Defined Behavior ............................................................................. 147
7.2 Characteristics of TMS320C6000 C++................................................................................. 151
7.3 Using MISRA C 2004 .................................................................................................... 152
7.4 Data Types ................................................................................................................ 153
7.4.1 Size of Enum Types ............................................................................................. 153
7.4.2 Vector Data Types ............................................................................................... 154
7.5 Keywords .................................................................................................................. 156
7.5.1 The complex Keyword .......................................................................................... 156
7.5.2 The const Keyword .............................................................................................. 156
7.5.3 The __cregister Keyword ....................................................................................... 157
7.5.4 The __interrupt Keyword ........................................................................................ 158
7.5.5 The __near and __far Keywords .............................................................................. 159
7.5.6 The restrict Keyword ............................................................................................ 160
7.5.7 The volatile Keyword ............................................................................................ 161
7.6 C++ Exception Handling ................................................................................................. 162
7.7 Register Variables and Parameters .................................................................................... 162
7.8 The __asm Statement .................................................................................................... 163
7.9 Pragma Directives ........................................................................................................ 164
7.9.1 The CALLS Pragma ............................................................................................. 165
7.9.2 The CHECK_MISRA Pragma .................................................................................. 165
7.9.3 The CLINK Pragma .............................................................................................. 166
7.9.4 The CODE_SECTION Pragma ................................................................................ 166
7.9.5 The DATA_ALIGN Pragma ..................................................................................... 167
7.9.6 The DATA_MEM_BANK Pragma .............................................................................. 168
7.9.7 The DATA_SECTION Pragma ................................................................................. 170
7.9.8 The Diagnostic Message Pragmas ............................................................................ 171
7.9.9 The FUNC_ALWAYS_INLINE Pragma ....................................................................... 171
7.9.10 The FUNC_CANNOT_INLINE Pragma ...................................................................... 172
7.9.11 The FUNC_EXT_CALLED Pragma .......................................................................... 172
7.9.12 The FUNC_INTERRUPT_THRESHOLD Pragma .......................................................... 173
7.9.13 The FUNC_IS_PURE Pragma ................................................................................ 173
7.9.14 The FUNC_IS_SYSTEM Pragma ............................................................................ 174
7.9.15 The FUNC_NEVER_RETURNS Pragma .................................................................... 174
List of Figures
1-1. TMS320C6000 Software Development Flow ........................................................................... 15
4-1. Software-Pipelined Loop .................................................................................................. 62
5-1. 4-Bank Interleaved Memory ............................................................................................. 128
5-2. 4-Bank Interleaved Memory With Two Memory Spaces............................................................. 128
8-1. Char and Short Data Storage Format .................................................................................. 207
8-2. 32-Bit Data Storage Format ............................................................................................. 208
8-3. Single-Precision Floating-Point Char Data Storage Format......................................................... 208
8-4. 40-Bit Data Storage Format Signed __int40_t ........................................................................ 209
8-5. Unsigned 40-bit __int40_t ............................................................................................... 209
8-6. 64-Bit Data Storage Format Signed 64-bit long ...................................................................... 210
8-7. Unsigned 64-bit long ..................................................................................................... 210
8-8. Double-Precision Floating-Point Data Storage Format .............................................................. 211
8-9. Bit-Field Packing in Big-Endian and Little-Endian Formats ......................................................... 213
8-10. Register Argument Conventions ........................................................................................ 216
8-11. Autoinitialization at Run Time ........................................................................................... 251
8-12. Initialization at Load Time ............................................................................................... 255
8-13. Constructor Table ......................................................................................................... 255
List of Tables
2-1. Steps for Creating a CCS Project ........................................................................................ 18
3-1. Processor Options ......................................................................................................... 22
3-2. Optimization Options ...................................................................................................... 22
3-3. Advanced Optimization Options ......................................................................................... 23
3-4. Debug Options .............................................................................................................. 23
3-5. Include Options ............................................................................................................ 23
3-6. Control Options ............................................................................................................ 23
3-7. Language Options .......................................................................................................... 24
3-8. Parser Preprocessing Options ............................................................................................ 24
3-9. Predefined Symbols Options ............................................................................................. 25
3-10. Diagnostic Message Options ............................................................................................. 25
3-11. Run-Time Model Options .................................................................................................. 25
3-12. Entry/Exit Hook Options ................................................................................................... 26
3-13. Feedback Options ......................................................................................................... 26
3-14. Assembler Options ......................................................................................................... 26
3-15. File Type Specifier Options ............................................................................................... 27
3-16. Directory Specifier Options................................................................................................ 27
3-17. Default File Extensions Options .......................................................................................... 27
3-18. Dynamic Linking Support Compiler Options ............................................................................ 28
3-19. Command Files Options ................................................................................................... 28
3-20. MISRA-C 2004 Options ................................................................................................... 28
3-21. Performance Advisor Options ............................................................................................ 28
3-22. Linker Basic Options ....................................................................................................... 29
3-23. File Search Path Options .................................................................................................. 29
3-24. Command File Preprocessing Options .................................................................................. 29
3-25. Diagnostic Message Options ............................................................................................. 29
3-26. Linker Output Options ..................................................................................................... 30
3-27. Symbol Management Options ............................................................................................ 30
Notational Conventions
This document uses the following conventions:
• Program listings, program examples, and interactive displays are shown in a special typeface.
Interactive displays use a bold version of the special typeface to distinguish commands that you enter
from items that the system displays (such as prompts, command output, error messages, etc.).
Here is a sample of C code:
#include <stdio.h>
main()
{ printf("Hello World\n");
}
• In syntax descriptions, instructions, commands, and directives arein a bold typeface and parameters
are in an italic typeface. Portions of a syntax that are in bold should be entered as shown; portions of a
syntax that are in italics describe the type of information that should be entered.
• Square brackets ( [ and ] ) identify an optional parameter. If you use an optional parameter, you specify
the information within the brackets. Unless the square brackets are in the bold typeface, do not enter
the brackets themselves. The following is an example of a command that has an optional parameter:
• Braces ( { and } ) indicate that you must choose one of the parameters within the braces; you do not
enter the braces themselves. This is an example of a command with braces that are not included in the
actual syntax but indicate that you must specify either the --rom_model or --ram_model option:
• In assembler syntax statements, the leftmost column is reserved for the first character of a label or
symbol. If the label or symbol is optional, it is usually not shown. If a label or symbol is a required
parameter, it is shown starting against the left margin of the box, as in the example below. No
instruction, command, directive, or parameter, other than a symbol or label, can begin in the leftmost
column.
• Some directives can have a varying number of parameters. For example, the .byte directive. This
syntax is shown as [, ..., parameter].
• This document describes support for the C64+, C6740, and C6600 variants of the TMS320C6000™
processor series. The C6200, C6400, C6700, and C6700+ variants are not supported in v8.0 and later
versions of the TI Code Generation Tools. If you are using one of these legacy devices, please use
v7.4 of the Code Generation Tools and refer to SPRU187 and SPRU186 for documentation.
Related Documentation
You can use the following books to supplement this user's guide:
ANSI X3.159-1989, Programming Language - C (Alternate version of the 1989 C Standard), American
National Standards Institute
ISO/IEC 9899:1989, International Standard - Programming Languages - C (The 1989 C Standard),
International Organization for Standardization
ISO/IEC 9899:1999, International Standard - Programming Languages - C (The 1999 C Standard),
International Organization for Standardization
ISO/IEC 14882-2003, International Standard - Programming Languages - C++ (The 2003 C++
Standard), International Organization for Standardization
The C Programming Language (second edition), by Brian W. Kernighan and Dennis M. Ritchie,
published by Prentice-Hall, Englewood Cliffs, New Jersey, 1988
The Annotated C++ Reference Manual, Margaret A. Ellis and Bjarne Stroustrup, published by Addison-
Wesley Publishing Company, Reading, Massachusetts, 1990
C: A Reference Manual (fourth edition), by Samuel P. Harbison, and Guy L. Steele Jr., published by
Prentice Hall, Englewood Cliffs, New Jersey
Programming Embedded Systems in C and C++, by Michael Barr, Andy Oram (Editor), published by
O'Reilly & Associates; ISBN: 1565923545, February 1999
Programming in C, Steve G. Kochan, Hayden Book Company
The C++ Programming Language (second edition), Bjarne Stroustrup, published by Addison-Wesley
Publishing Company, Reading, Massachusetts, 1990
Tool Interface Standards (TIS) DWARF Debugging Information Format Specification Version 2.0,
TIS Committee, 1995
DWARF Debugging Information Format Version 3, DWARF Debugging Information Format Workgroup,
Free Standards Group, 2005 (http://dwarfstd.org)
DWARF Debugging Information Format Version 4, DWARF Debugging Information Format Workgroup,
Free Standards Group, 2010 (http://dwarfstd.org)
System V ABI specification (http://www.sco.com/developers/gabi/)
OpenCL™ Specification version 1.2 (https://www.khronos.org/opencl/)
The TMS320C6000™ is supported by a set of software development tools, which includes an optimizing
C/C++ compiler, an assembly optimizer, an assembler, a linker, and assorted utilities.
This chapter provides an overview of these tools and introduces the features of the optimizing C/C++
compiler. The assembly optimizer is discussed in Chapter 5. The assembler and linker are discussed in
detail in the TMS320C6000 Assembly Language Tools User's Guide.
The following list describes the tools that are shown in Figure 1-1:
• The assembly optimizer allows you to write linear assembly code without being concerned with the
pipeline structure or with assigning registers. It accepts assembly code that has not been register-
allocated and is unscheduled. The assembly optimizer assigns registers and uses loop optimization to
turn linear assembly into highly parallel assembly that takes advantage of software pipelining. See
Chapter 5.
• The compiler accepts C/C++ source code and produces C6000 assembly language source code. See
Chapter 3.
• The assembler translates assembly language source files into machine language relocatable object
files. See the TMS320C6000 Assembly Language Tools User's Guide.
• The linker combines relocatable object files into a single absolute executable object file. As it creates
the executable file, it performs relocation and resolves external references. The linker accepts
relocatable object files and object libraries as input. See Chapter 6 for an overview of the linker. See
the TMS320C6000 Assembly Language Tools User's Guide for details.
• The archiver allows you to collect a group of files into a single archive file, called a library. The
archiver allows you to modify such libraries by deleting, replacing, extracting, or adding members. One
of the most useful applications of the archiver is building a library of object files. See the
TMS320C6000 Assembly Language Tools User's Guide.
• The run-time-support libraries contain the standard ISO C and C++ library functions, compiler-utility
functions, floating-point arithmetic functions, and C I/O functions that are supported by the compiler.
See Chapter 9.
The library-build utility automatically builds the run-time-support library if compiler and linker options
require a custom version of the library. See Section 9.4. Source code for the standard run-time-support
library functions for C and C++ is provided in the lib\src subdirectory of the directory where the
compiler is installed.
• The hex conversion utility converts an object file into other object formats. You can download the
converted file to an EPROM programmer. See the TMS320C6000 Assembly Language Tools User's
Guide.
• The C++ name demangler is a debugging aid that converts names mangled by the compiler back to
their original names as declared in the C++ source code. As shown in Figure 1-1, you can use the C++
name demangler on the assembly file that is output by the compiler; you can also use this utility on the
assembler listing file and the linker map file. See Chapter 10.
• The disassembler decodes object files to show the assembly instructions that they represent. See the
TMS320C6000 Assembly Language Tools User's Guide.
• The main product of this development process is an executable object file that can be executed in a
TMS320C6000 device. You can use an XDS emulator when refining and correcting your code.
1.5 Utilities
These features are compiler utilities:
• Library-build utility
The library-build utility lets you custom-build object libraries from source for any combination of run-
time models. For more information, see Section 9.4.
• C++ name demangler
The C++ name demangler (dem6x) is a debugging aid that translates each mangled name it detects in
compiler-generated assembly code, disassembly output, or compiler diagnostic messages to its
original name found in the C++ source code. For more information, see Chapter 10.
• Hex conversion utility
For stand-alone embedded applications, the compiler has the ability to place all code and initialization
data into ROM, allowing C/C++ code to run from reset. The ELF files output by the compiler can be
converted to EPROM programmer data files by using the hex conversion utility, as described in the
TMS320C6000 Assembly Language Tools User's Guide.
This chapter provides an overview of the procedure for creating a Code Composer Studio project that
uses the C6000 Code Generation Tools. In addition, it provides an introduction to the command-line for
the compiler and linker.
After you have created a CCS project, you can use the Properties dialog for the project to see how the
compiler and linker will be used and modify the command-line options used when compiling and linking.
To open this dialog, select the project in the Project Explorer and choose Project > Properties from the
menus. Expand the category tree to select Build > C6000 Compiler and Build > C6000 Linker. You can
learn more about any command-line options you see in this dialog in Chapter 3.
18 Getting Started with the Code Generation Tools SPRUI04B – May 2017
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
www.ti.com Compiling from the Command Line
SPRUI04B – May 2017 Getting Started with the Code Generation Tools 19
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
Chapter 3
SPRUI04B – May 2017
The compiler translates your source program into machine language object code that the TMS320C6000
can execute. Source code must be compiled, assembled, and linked to create an executable object file. All
of these steps are executed at once by using the compiler.
For a complete description of the assembler and the linker, see the TMS320C6000 Assembly Language
Tools User's Guide.
(1)
Note: Machine-specific options (see Table 3-11) can also affect optimization.
(1)
Note: Machine-specific options (see Table 3-11) can also affect optimization.
--c_src_interlist Invokes the interlist feature, which interweaves original C/C++ source
with compiler-generated assembly language. The interlisted C
statements may appear to be out of sequence. You can use the interlist
feature with the optimizer by combining the --optimizer_interlist and --
c_src_interlist options. See Section 4.15. The --c_src_interlist option can
have a negative performance and/or code size impact.
--cmd_file=filename Appends the contents of a file to the option set. You can use this option
to avoid limitations on command line length or C style comments
imposed by the host operating system. Use a # or ; at the beginning of a
line in the command file to include comments. You can also include
comments by delimiting them with /* and */. To specify options, surround
hyphens with quotation marks. For example, "--"quiet.
You can use the --cmd_file option multiple times to specify multiple files.
For instance, the following indicates that file3 should be compiled as
source and file1 and file2 are --cmd_file files:
cl6x --cmd_file=file1 --cmd_file=file2 file3
--compile_only Suppresses the linker and overrides the --run_linker option, which
specifies linking. The --compile_only option's short form is -c. Use this
option when you have --run_linker specified in the C6X_C_OPTION
environment variable and you do not want to link. See Section 6.1.3.
--define=name[=def] Predefines the constant name for the preprocessor. This is equivalent to
inserting #define name def at the top of each C source file. If the
optional[=def] is omitted, the name is set to 1. The --define option's short
form is -D.
If you want to define a quoted string and keep the quotation marks, do
one of the following:
• For Windows, use --define=name="\"string def\"". For example, --
define=car="\"sedan\""
• For UNIX, use --define=name='"string def"'. For example, --
define=car='"sedan"'
• For Code Composer Studio, enter the definition in a file and include
that file with the --cmd_file option.
--help Displays the syntax for invoking the compiler and lists available options.
If the --help option is followed by another option or phrase, detailed
information about the option or phrase is displayed. For example, to see
information about debugging options use --help debug.
--include_path=directory Adds directory to the list of directories that the compiler searches for
#include files. The --include_path option's short form is -I. You can use
this option several times to define several directories; be sure to
separate the --include_path options with spaces. If you do not specify a
directory name, the preprocessor ignores the --include_path option. See
Section 3.5.2.1.
--keep_asm Retains the assembly language output from the compiler or assembly
optimizer. Normally, the compiler deletes the output assembly language
file after assembly is complete. The --keep_asm option's short form is -k.
--quiet Suppresses banners and progress information from all the tools. Only
source filenames and error messages are output. The --quiet option's
short form is -q.
--run_linker Runs the linker on the specified object files. The --run_linker option and
its parameters follow all other options on the command line. All
arguments that follow --run_linker are passed to the linker. The --
run_linker option's short form is -z. See Section 6.1.
--skip_assembler Compiles only. The specified source files are compiled but not
assembled or linked. The --skip_assembler option's short form is -n. This
option overrides --run_linker. The output is assembly language output
from the compiler.
--src_interlist Invokes the interlist feature, which interweaves optimizer comments or
C/C++ source with assembly source. If the optimizer is invoked (--
opt_level=n option), optimizer comments are interlisted with the
assembly language output of the compiler, which may rearrange code
significantly. If the optimizer is not invoked, C/C++ source statements are
interlisted with the assembly language output of the compiler, which
allows you to inspect the code generated for each C/C++ statement. The
--src_interlist option implies the --keep_asm option. The --src_interlist
option's short form is -s.
--tool_version Prints the version number for each tool in the compiler. No compiling
occurs.
--undefine=name Undefines the predefined constant name. This option overrides any --
define options for the specified constant. The --undefine option's short
form is -U.
--wchar_t={32|16} Sets the size (in bits) of the C/C++ type wchar_t. By default the
compiler generates 16-bit wchar_t. 16-bit wchar_t objects are not
compatible with 32-bit wchar_t objects; an error is generated if they
are combined. When the --linux option is specified, it implies --
wchar_t=32 since Linux uses 32-bit extended characters.
--profile:breakpt Disables optimizations that would cause incorrect behavior when using a
breakpoint-based profiler.
--symdebug:dwarf (Default) Generates directives that are used by the C/C++ source-level
debugger and enables assembly source debugging in the assembler.
The --symdebug:dwarf option's short form is -g. See Section 4.16.
For more information on the DWARF debug format, see The DWARF
Debugging Standard.
--symdebug:dwarf_ Specifies the DWARF debugging format version (2 or 3) to be generated
version={2|3} when --symdebug:dwarf (the default) is specified. By default, the
compiler generates DWARF version 3 debug information. For more
information on TI extensions to the DWARF language, see The Impact of
DWARF on TI Object Files (SPRAAB5).
--symdebug:none Disables all symbolic debugging output. This option is not recommended;
it prevents debugging and most performance analysis capabilities.
For information about how you can alter the way that the compiler interprets individual filenames, see
Section 3.3.8. For information about how you can alter the way that the compiler interprets and names the
extensions of assembly source and object files, see Section 3.3.11.
You can use wildcard characters to compile or assemble multiple files. Wildcard specifications vary by
system; use the appropriate form listed in your operating system manual. For example, to compile all of
the files in a directory with the extension .cpp, enter the following:
cl6x *.cpp
For example, if you have a C source file called file.s and an assembly language source file called assy,
use the --asm_file and --c_file options to force the correct interpretation:
cl6x --c_file=file.s --asm_file=assy
The following example assembles the file fit.rrr and creates an object file named fit.o:
cl6x --asm_extension=.rrr --obj_extension=.o fit.rrr
The period (.) in the extension is optional. You can also write the example above as:
cl6x --asm_extension=rrr --obj_extension=o fit.rrr
--asm_define=name[=def] Predefines the constant name for the assembler; produces a .set directive
for a constant or an .arg directive for a string. If the optional [=def] is
omitted, the name is set to 1. If you want to define a quoted string and
keep the quotation marks, do one of the following:
• For Windows, use --asm_define=name="\"string def\"". For example: -
-asm_define=car="\"sedan\""
• For UNIX, use --asm_define=name='"string def"'. For example: --
asm_define=car='"sedan"'
• For Code Composer Studio, enter the definition in a file and include
that file with the --cmd_file option.
--asm_dependency Performs preprocessing for assembly files, but instead of writing
preprocessed output, writes a list of dependency lines suitable for input to
a standard make utility. The list is written to a file with the same name as
the source file but with a .ppa extension.
--asm_includes Performs preprocessing for assembly files, but instead of writing
preprocessed output, writes a list of files included with the #include
directive. The list is written to a file with the same name as the source file
but with a .ppa extension.
--asm_listing Produces an assembly listing file.
--asm_undefine=name Undefines the predefined constant name. This option overrides any --
asm_define options for the specified name.
-- Produces a symbolic cross-reference in the listing file.
asm_listing_cross_referen
ce
--include_file=filename Includes the specified file for the assembly module; acts like an .include
directive. The file is included before source file statements. The included
file does not appear in the assembly listing files.
--machine_regs Displays reg operands as machine registers in the assembly file for
debugging purposes.
--no_compress Prevents compression in the assembler. Compression changes 32-bit
instructions to 16-bit instructions, where possible/profitable.
--no_reload_errors Turns off all reload-related loop buffer error messages in assembly code.
--strip_coff_underscore Aids in transitioning hand-coded assembly from COFF to EABI. Although
the COFF output is no longer supported, this option remains available as
a COFF ABI to ELF EABI migration aid. For COFF ABI, the compiler
prepended an underscore to the beginning of all C/C++ identifiers. For
EABI, the link-time symbol is the same as the C/C++ identifier name. This
option removes the underscore prefix from legacy symbol references as
needed.
NOTE: C_OPTION and C_DIR -- The C_OPTION and C_DIR environment variables are
deprecated. Use device-specific environment variables instead.
Environment variable options are specified in the same way and have the same meaning as they do on
the command line. For example, if you want to always run quietly (the --quiet option), enable C/C++
source interlisting (the --src_interlist option), and link (the --run_linker option) for Windows, set up the
C6X_C_OPTION environment variable as follows:
set C6X_C_OPTION=--quiet --src_interlist --run_linker
In the following examples, each time you run the compiler, it runs the linker. Any options following --
run_linker on the command line or in C6X_C_OPTION are passed to the linker. Thus, you can use the
C6X_C_OPTION environment variable to specify default compiler and linker options and then specify
additional compiler and linker options on the command line. If you have set --run_linker in the environment
variable and want to compile only, use the compiler --compile_only option. These additional examples
assume C6X_C_OPTION is set as shown above:
cl6x *c ; compiles and links
cl6x --compile_only *.c ; only compiles
cl6x *.c --run_linker lnk.cmd ; compiles and links using a command file
cl6x --compile_only *.c --run_linker lnk.cmd
; only compiles (--compile_only overrides --run_linker)
For details on compiler options, see Section 3.3. For details on linker options, see the Linker Description
chapter in the TMS320C6000 Assembly Language Tools User's Guide.
The pathnames are directories that contain input files. The pathnames must follow these constraints:
• Pathnames must be separated with a semicolon.
• Spaces or tabs at the beginning or end of a path are ignored. For example, the space before and after
the semicolon in the following is ignored:
set C6X_C_DIR=c:\path\one\to\tools ; c:\path\two\to\tools
• Spaces and tabs are allowed within paths to accommodate Windows directories that contain spaces.
For example, the pathnames in the following are valid:
set C6X_C_DIR=c:\first path\to\tools;d:\second path\to\tools
The environment variable remains set until you reboot the system or reset the variable by entering:
You can use the names listed in Table 3-33 in the same manner as any other defined name. For example,
printf ( "%s %s" , __TIME__ , __DATE__);
3.5.2.1 Adding a Directory to the #include File Search Path (--include_path Option)
The --include_path option names an alternate directory that contains #include files. The --include_path
option's short form is -I. The format of the --include_path option is:
--include_path=directory1 [--include_path= directory2 ...]
There is no limit to the number of --include_path options per invocation of the compiler; each --
include_path option names one directory. In C source, you can use the #include directive without
specifying any directory information for the file; instead, you can specify the directory information with the -
-include_path option.
For example, assume that a file called source.c is in the current directory. The file source.c contains the
following directive statement:
#include "alt.h"
UNIX /tools/files/alt.h
Windows c:\tools\files\alt.h
The table below shows how to invoke the compiler. Select the command for your operating system:
The .args section is loaded with the following data for non-SYS/BIOS-based executables, where each
element in the argv[] array contains a string corresponding to that argument:
Int argc;
Char * argv[0];
Char * argv[1];
...
Char * argv[n];
For SYS/BIOS-based executables, the elements in the .args section are as follows:
Int argc;
Char ** argv; /* points to argv[0] */
Char * envp; /* ignored by loadProg command */
Char * argv[0];
Char * argv[1];
...
Char * argv[n];
For more details, see the "Scripting Console" topic in the TI Processors Wiki.
By default, the source code line is not printed. Use the --verbose_diagnostics compiler option to display
the source line and the error position. The above example makes use of this option.
The message identifies the file and line involved in the diagnostic, and the source line itself (with the
position indicated by the ^ character) follows the message. If several diagnostic messages apply to one
source line, each diagnostic has the form shown; the text of the source line is displayed several times,
with an appropriate position indicated each time.
Long messages are wrapped to additional lines, when necessary.
You can use the --display_error_number command-line option to request that the diagnostic's numeric
identifier be included in the diagnostic message. When displayed, the diagnostic identifier also indicates
whether the diagnostic can have its severity overridden on the command line. If the severity can be
overridden, the diagnostic identifier includes the suffix -D (for discretionary); otherwise, no suffix is
present. For example:
Because errors are determined to be discretionary based on the severity in a specific context, an error can
be discretionary in some cases and not in others. All warnings and remarks are discretionary.
For some messages, a list of entities (functions, local variables, source files, etc.) is useful; the entities are
listed following the initial error message:
"test.c", line 4: error: more than one instance of overloaded function "f"
matches the argument list:
function "f(int)"
function "f(float)"
argument types are: (double)
f(1.5);
^
In some cases, additional context information is provided. Specifically, the context information is useful
when the front end issues a diagnostic while doing a template instantiation or while generating a
constructor, destructor, or assignment operator function. For example:
"test.c", line 7: error: "A::A()" is inaccessible
B x;
^
detected during implicit generation of "B::B()" at line 7
Without the context information, it is difficult to determine to what the error refers.
--emit_warnings_as_ Treats all warnings as errors. This option cannot be used with the --
errors no_warnings option. The --diag_remark option takes precedence over this
option. This option takes precedence over the --diag_warning option.
--issue_remarks Issues remarks (non-serious warnings), which are suppressed by default.
--no_warnings Suppresses diagnostic warnings (errors are still issued).
--section_sizes={on|off} Generates section size information, including sizes for sections containing
executable code and constants, constant or initialized data (global and static
variables), and uninitialized data. Section size information is output during
both the assembly and linking phases. This option should be placed on the
command line with the compiler options (that is, before the --run_linker or --z
option).
--set_error_limit=num Sets the error limit to num, which can be any decimal value. The compiler
abandons compiling after this number of errors. (The default is 100.)
--verbose_diagnostics Provides verbose diagnostic messages that display the original source with
line-wrap and indicate the position of the error in the source line. Note that this
command-line option cannot be used within the Code Composer Studio IDE.
--write_diagnostics_file Produces a diagnostic message information file with the same source file
name with an .err extension. (The --write_diagnostics_file option is not
supported by the linker.) Note that this command-line option cannot be used
within the Code Composer Studio IDE.
If you invoke the compiler with the --quiet option, this is the result:
"err.c", line 9: warning: statement is unreachable
"err.c", line 12: warning: statement is unreachable
Because it is standard programming practice to include break statements at the end of each case arm to
avoid the fall-through condition, these warnings can be ignored. Using the --display_error_number option,
you can find out the diagnostic identifier for these warnings. Here is the result:
[err.c]
"err.c", line 9: warning #111-D: statement is unreachable
"err.c", line 12: warning #111-D: statement is unreachable
Next, you can use the diagnostic identifier of 111 as the argument to the --diag_remark option to treat this
warning as a remark. This compilation now produces no diagnostic messages (because remarks are
disabled by default).
NOTE: You can suppress any non-fatal errors, but be careful to make sure you only suppress
diagnostic messages that you understand and are known not to affect the correctness of
your program.
The --gen_preprocessor_listing option also includes diagnostic identifiers as defined in Table 3-35.
S One of the identifiers in Table 3-35 that indicates the severity of the diagnostic
filename The source file
line number The line number in the source file
column number The column number in the source file
diagnostic The message text for the diagnostic
Diagnostic messages after the end of file are indicated as the last line of the file with a column number of
0. When diagnostic message text requires more than one line, each subsequent line contains the same
file, line, and column information but uses a lowercase version of the diagnostic identifier. For more
information about diagnostic messages, see Section 3.7.
The semantics of the inline keyword in C code follow the C99 standard. The semantics of the inline
keyword in C++ code follow the C++ standard.
The inline keyword is supported in all C++ modes, in relaxed ANSI mode for all C standards, and in strict
ANSI mode for C99. It is disabled in strict ANSI mode for C89, because it is a language extension that
could conflict with a strictly conforming program. If you want to define inline functions while in strict ANSI
C89 mode, use the alternate keyword _ _inline.
Compiler options that affect inlining are: --opt_level, --auto_inline, -remove_hooks_when_inlining, --
opt_for_space, and -opt_for_speed.
In most cases, inlining will reduce the code size by a small amount.
/*****************************************************************************/
/* string.h vx.xx (Excerpted) */
/* Copyright (c) 1993-2011 Texas Instruments Incorporated */
/*****************************************************************************/
#ifdef _INLINE
#define _IDECL static inline
#else
#define _IDECL extern _CODE_ACCESS
#endif
#ifdef _INLINE
/****************************************************************************/
/* strlen */
/****************************************************************************/
static inline size_t strlen(const char *string)
{
size_t n = (size_t)-1;
const char *s = string - 1;
#endif
/****************************************************************************/
/* strlen */
/****************************************************************************/
#undef _INLINE
#include <string.h>
{
_CODE_ACCESS size_t strlen(cont char * string)
size_t n = (size_t)-1;
const char *s = string - 1;
RTS Library Files Are Not Built with the --interrupt_threshold Option
NOTE: The run-time-support library files provided with the compiler are not built with the interrupt
flexibility option. Refer to the readme file to see how the run-time-support library files were
built for your release. See Section 9.4 to build your own run-time-support library files with the
interrupt flexibility option.
The --c_src_interlist option prevents the compiler from deleting the interlisted assembly language output
file. The output assembly file, function.asm, is assembled normally.
When you invoke the interlist feature without the optimizer, the interlist runs as a separate pass between
the code generator and the assembler. It reads both the assembly and C/C++ source files, merges them,
and writes the C/C++ statements into the assembly file as comments.
Using the --c_src_interlist option can cause performance and/or code size degradation.
Example 3-4 shows a typical interlisted assembly file.
For more information about using the interlist feature with the optimizer, see Section 4.15.
_main:
--entry_hook[=name] Enables entry hooks. If specified, the hook function is called name. Otherwise,
the default entry hook function name is __entry_hook.
--entry_parm{=name| Specify the parameters to the hook function. The name parameter specifies
address|none} that the name of the calling function is passed to the hook function as an
argument. In this case the signature for the hook function is: void hook(const
char *name);
The address parameter specifies that the address of the calling function is
passed to the hook function. In this case the signature for the hook function is:
void hook(void (*addr)());
The none parameter specifies that the hook is called with no parameters. This
is the default. In this case the signature for the hook function is: void
hook(void);
--exit_hook[=name] Enables exit hooks. If specified, the hook function is called name. Otherwise,
the default exit hook function name is __exit_hook.
--exit_parm{=name| Specify the parameters to the hook function. The name parameter specifies
address|none} that the name of the calling function is passed to the hook function as an
argument. In this case the signature for the hook function is: void hook(const
char *name);
The address parameter specifies that the address of the calling function is
passed to the hook function. In this case the signature for the hook function is:
void hook(void (*addr)());
The none parameter specifies that the hook is called with no parameters. This
is the default. In this case the signature for the hook function is: void
hook(void);
The presence of the hook options creates an implicit declaration of the hook function with the given
signature. If a declaration or definition of the hook function appears in the compilation unit compiled with
the options, it must agree with the signatures listed above.
In C++, the hooks are declared extern "C". Thus you can define them in C (or assembly) without being
concerned with name mangling.
Hooks can be declared inline, in which case the compiler tries to inline them using the same criteria as
other inline functions.
Entry hooks and exit hooks are independent. You can enable one but not the other, or both. The same
function can be used as both the entry and exit hook.
You must take care to avoid recursive calls to hook functions. The hook function should not call any
function which itself has hook calls inserted. To help prevent this, hooks are not generated for inline
functions, or for the hook functions themselves.
You can use the --remove_hooks_when_inlining option to remove entry/exit hooks for functions that are
auto-inlined by the optimizer.
See Section 7.9.24 for information about the NO_HOOKS pragma.
The compiler tools can perform many optimizations to improve the execution speed and reduce the size of
C and C++ programs by simplifying loops, software pipelining, rearranging statements and expressions,
and allocating variables into registers.
This chapter describes how to invoke different levels of optimization and describes which optimizations are
performed at each level. This chapter also describes how you can use the Interlist feature when
performing optimization and how you can profile or debug optimized code.
The levels of optimizations described above are performed by the stand-alone optimization pass. The
code generator performs several additional optimizations, particularly processor-specific optimizations. It
does so regardless of whether you invoke the optimizer. These optimizations are always enabled,
although they are more effective when the optimizer is used.
A1
B1 A2
Pipelined-loop prolog
C1 B2 A3
D1 C2 B3 A4
E1 D2 C3 B4 A5 Kernel
E2 D3 C4 B5
E3 D4 C5
Pipelined-loop epilog
E4 D5
E5
If you enter comments on instructions in your linear assembly input file, the compiler moves the comments
to the output file along with additional information. It attaches a 2-tuple <x, y> to the comments to specify
the iteration and cycle of the loop an instruction is on in the software pipeline. The zero-based number x
represents the iteration the instruction is on during the first execution of the loop kernel. The zero-based
number y represents the cycle that the instruction is scheduled on within a single iteration of the loop.
For more information about software pipelining, see the TMS320C6000 Programmer's Guide.
*----------------------------------------------------------------------------*
The terms defined below appear in the software pipelining information. For more information on each
term, see the TMS320C6000 Programmer's Guide.
• Loop unroll factor. The number of times the loop was unrolled specifically to increase performance
based on the resource bound constraint in a software pipelined loop.
• Known minimum trip count. The minimum number of times the loop will be executed.
• Known maximum trip count. The maximum number of times the loop will be executed.
• Known max trip count factor. Factor that would always evenly divide the loops trip count. This
information can be used to possibly unroll the loop.
• Loop label. The label you specified for the loop in the linear assembly input file. This field is not
present for C/C++ code.
• Loop carried dependency bound. The distance of the largest loop carry path. A loop carry path
occurs when one iteration of a loop writes a value that must be read in a future iteration. Instructions
that are part of the loop carry bound are marked with the ^ symbol.
• Initiation interval (ii). The number of cycles between the initiation of successive iterations of the loop.
The smaller the initiation interval, the fewer cycles it takes to execute a loop.
• Resource bound. The most used resource constrains the minimum initiation interval. If four
instructions require a .D unit, they require at least two cycles to execute (4 instructions/2 parallel .D
units).
• Unpartitioned resource bound. The best possible resource bound values before the instructions in
the loop are partitioned to a particular side.
• Partitioned resource bound (*). The resource bound values after the instructions are partitioned.
• Resource partition. This table summarizes how the instructions have been partitioned. This
information can be used to help assign functional units when writing linear assembly. Each table entry
has values for the A-side and B-side registers. An asterisk is used to mark those entries that determine
the resource bound value. The table entries represent the following terms:
– .L units is the total number of instructions that require .L units.
– .S units is the total number of instructions that require .S units.
– .D units is the total number of instructions that require .D units.
– .M units is the total number of instructions that require .M units.
– .X cross paths is the total number of .X cross paths.
– .T address paths is the total number of address paths.
– Long read path is the total number of long read port paths.
– Long write path is the total number of long write port paths.
– Logical ops (.LS) is the total number of instructions that can use either the .L or .S unit.
– Addition ops (.LSD) is the total number of instructions that can use either the .L or .S or .D unit
• Bound(.L .S .LS). The resource bound value as determined by the number of instructions that use the
.L and .S units. It is calculated with the following formula:
Bound(.L .S .LS ) = ceil((.L + .S + .LS) / 2)
• Bound(.L .S .D .LS .LSD). The resource bound value as determined by the number of instructions that
use the .D, .L, and .S units. It is calculated with the following formula:
Bound(.L .S .D .LS .SLED) = ceil((.L + .S + .D + .LS + .LSD) / 3)
• Minimum required memory pad. The number of bytes that are read if speculative execution is
enabled. See Section 4.2.3 for more information.
• Loop carried dependency bound too large. If the loop has complex loop control, try --
speculate_loads according to the recommendations in Section 4.2.3.2.
• Cannot identify trip counter. The loop trip counter could not be identified or was used incorrectly in
the loop body.
• Unsafe schedule for irregular loop. "Irregular" loops are non-downcounting loops with a known
number of iterations, such as a while loop. Irregular loops may require transformations that execute
instructions more times than called for by the loop. This error means the compiler was unable to find a
schedule with instructions that are safe to over-execute, are guarded with a predicate, or have their
effects undone after the loop. Try to rewrite the loop as a down-counting loop. You may also try
increasing the --speculate_loads (-mh) option.
This example shows that on cycle 0 (first execute packet) of the loop kernel, registers A0, A1, A2, A6, A7,
A8, A9, B0, B1, B2, B4, B5, B6, B7, B8, and B9 are all live during this cycle.
4.2.3 Collapsing Prologs and Epilogs for Improved Performance and Code Size
When a loop is software pipelined, a prolog and epilog are generally required. The prolog is used to pipe
up the loop and epilog is used to pipe down the loop.
In general, a loop must execute a minimum number of iterations before the software-pipelined version can
be safely executed. If the minimum known trip count is too small, either a redundant loop is added or
software pipelining is disabled. Collapsing the prolog and epilog of a loop can reduce the minimum trip
count necessary to safely execute the pipelined loop.
Collapsing can also substantially reduce code size. Some of this code size growth is due to the redundant
loop. The remainder is due to the prolog and epilog.
The prolog and epilog of a software-pipelined loop consists of up to p-1 stages of length ii, where p is the
number of iterations that are executed in parallel during the steady state and ii is the cycle time for the
pipelined loop body. During prolog and epilog collapsing the compiler tries to collapse as many stages as
possible. However, over-collapsing can have a negative performance impact. Thus, by default, the
compiler attempts to collapse as many stages as possible without sacrificing performance. When the --
opt_for_space=0 or --opt_for_space=1 options are invoked, the compiler increasingly favors code size
over performance.
If the minimum safe trip count is greater than the minimum known trip count, use of --speculate_loads is
highly recommended, not only for code size, but for performance.
When using --speculate_loads, you must ensure that potentially speculated loads will not cause illegal
reads. This can be done by padding the data sections and/or stack, as needed, by the required memory
pad in both directions. The required memory pad for a given software-pipelined loop is also provided in the
comment block for that loop.
;* Minimum required memory pad : 8 bytes
For safety, the example loop requires that array data referenced within this loop be preceded and followed
by a pad of at least 5 bytes. This pad can consist of other program data. The pad will not be modified. In
many cases, the threshold value (namely, the minimum value of the argument to --speculate_loads that is
needed to achieve a particular schedule and level of collapsing) is the same as the pad. However, when it
is not, the comment block will also include the minimum threshold value. In the case of this loop, the
threshold value must be at least 7 to achieve this level of collapsing.
The compiler and linker can provide automatic load speculation via the auto argument to the --
speculate_loads option (i.e. --speculate_loads=auto or -mh=auto). Use of the auto argument makes it
easier to use and benefit from speculative load optimizations. This option can generate speculative loads
of up to 256 bytes beyond memory that the compiler can prove to be allocated.
In addition, the compiler communicates information to the linker to help automatically ensure the required
pre- and post-padding:
• If the symbol of the speculatively loaded buffer is known at compile time, the linker ensures the object
pointed to by the symbol has the required padding to let the speculative load access legal memory.
• If the symbol information is not known during compile time, the linker will ensure that the placement of
data sections will allow legal accessing beyond the boundaries of the data sections. The linker does
this by simply padding the start and end of the memory range(s) where the data sections are placed.
However, you can also specify the speculative loads threshold explicitly via the --speculate_loads=n
option, where n is at least the minimum required pad (as explained earlier), but you also need to consider
whether a larger threshold value would facilitate additional collapsing. This information is also provided, if
applicable. For example, in the above comment block, a threshold value of 14 might facilitate further
collapsing. If you choose the auto argument to --speculate_loads, the compiler will consider the larger
threshold value automatically.
A
B A
C B A ←Three iterations in parallel = minimum trip count
C B
C
When the C6000 tools cannot determine the trip count for a loop, then by default two loops and control
logic are generated. The first loop is not pipelined, and it executes if the run-time trip count is less than the
loop's minimum trip count. The second loop is the software pipelined loop, and it executes when the run-
time trip count is greater than or equal to the minimum trip count. At any given time, one of the loops is a
redundant loop. For example:
foo(N) /* N is the trip count */
{
for (I=0; I < N; I++) /* I is the trip counter */
}
After finding a software pipeline for the loop, the compiler transforms foo() as below, assuming the
minimum trip count for the loop is 3. Two versions of the loop would be generated and the following
comparison would be used to determine which version should be executed:
foo(N)
{
if (N < 3)
{
for (I=0; I < N; I++) /* Unpipelined version */
}
else
}
for (I=0; I < N; I++) /* Pipelined version */
}
}
foo(50); /* Execute software pipelined loop */
foo(2); /* Execute loop (unpipelined)*/
You may be able to help the compiler avoid producing redundant loops with the use of --
program_level_compile --opt_level=3 (see Section 4.7) or the use of the MUST_ITERATE pragma (see
Section 7.9.21).
In certain circumstances, the compiler reverts to a different --call_assumptions level from the one you
specified, or it might disable program-level optimization altogether. Table 4-5 lists the combinations of --
call_assumptions levels and conditions that cause the compiler to revert to other --call_assumptions
levels.
In some situations when you use --program_level_compile and --opt_level=3, you must use a --
call_assumptions option or the FUNC_EXT_CALLED pragma. See Section 4.7.2 for information about
these situations.
Solution — Try both of these solutions and choose the one that works best with your code:
• Compile with --program_level_compile --opt_level=3 --call_assumptions=1.
• Add the volatile keyword to those variables that may be modified by the assembly functions and
compile with --program_level_compile --opt_level=3 --call_assumptions=2.
Situation — Your application consists of C/C++ source code and assembly source code. The assembly
functions are interrupt service routines that call C/C++ functions; the C/C++ functions that the
assembly functions call are never called from C/C++. These C/C++ functions act like main: they
function as entry points into C/C++.
Solution — Add the volatile keyword to the C/C++ variables that may be modified by the interrupts. Then,
you can optimize your code in one of these ways:
• You achieve the best optimization by applying the FUNC_EXT_CALLED pragma to all of the
entry-point functions called from the assembly language interrupts, and then compiling with --
program_level_compile --opt_level=3 --call_assumptions=2. Be sure that you use the pragma
with all of the entry-point functions. If you do not, the compiler might remove the entry-point
functions that are not preceded by the FUNC_EXT_CALLED pragma.
• Compile with --program_level_compile --opt_level=3 --call_assumptions=3. Because you do not
use the FUNC_EXT_CALLED pragma, you must use the --call_assumptions=3 option, which is
less aggressive than the --call_assumptions=2 option, and your optimization may not be as
effective.
Keep in mind that if you use --program_level_compile --opt_level=3 without additional options, the
compiler removes the C functions that the assembly functions call. Use the FUNC_EXT_CALLED
pragma to keep these functions.
--gen_profile_info tells the compiler to add instrumentation code to collect profile information. When
the program executes the run-time-support exit() function, the profile data is
written to a PDAT file. This option applies to all the C/C++ source files being
compiled on the command-line.
If the environment variable TI_PROFDATA on the host is set, the data is written
into the specified file. Otherwise, it uses the default filename: pprofout.pdat. The
full pathname of the PDAT file (including the directory name) can be specified
using the TI_PROFDATA host environment variable.
By default, the RTS profile data output routine uses the C I/O mechanism to write
data to the PDAT file. You can install a device handler for the PPHNDL device to
re-direct the profile data to a custom device driver routine. For example, this could
be used to send the profile data to a device that does not use a file system.
Feedback directed optimization requires you to turn on at least some debug
information when using the --gen_profile_info option. This enables the compiler to
output debug information that allows pdd6x to correlate compiled functions and
their associated profile data.
--use_profile_info specifies the profile information file(s) to use for performing phase 2 of feedback
directed optimization. More than one profile information file can be specified on the
command line; the compiler uses all input data from multiple information files. The
syntax for the option is:
--use_profile_info==file1[, file2, ..., filen]
If no filename is specified, the compiler looks for a file named pprofout.prf in the
directory where the compiler in invoked.
-a Computes the average of the data values in the data sets instead of
accumulating data values
-e exec.out Specifies exec.out is the name of the application executable.
-o application.prf Specifies application.prf is the formatted profile feedback file that is used as the
argument to --use_profile_info during recompilation. If no output file is specified,
the default output filename is pprofout.prf.
filename .pdat Is the name of the profile data file generated by the run-time-support function.
This is the default name and it can be overridden by using the host environment
variable TI_PROFDATA.
The run-time-support function and pdd6x append to their respective output files and do not overwrite
them. This enables collection of data sets from multiple runs of the application.
--gen_profile_info Adds instrumentation to the compiled code. Execution of the code results in
profile data being emitted to a PDAT file.
--use_profile_info=file.prf Uses profile information for optimization and/or generating code coverage
information.
--analyze=codecov Generates a code coverage information file and continues with profile-based
compilation. Must be used with --use_profile_info.
--analyze_only Generates only a code coverage information file. Must be used with --
use_profile_info. You must specify both --analyze=codecov and --
analyze_only to do code coverage analysis of the instrumented application.
API
Files Created
*.pdat Profile data file, which is created by executing an instrumented program and
used as input to the profile data decoder
*.prf Profiling feedback file, which is created by the profile data decoder and
used as input to the re-compilation step
4.9 Using Profile Information to Get Better Program Cache Layout and Analyze Code
Coverage
There are two different types of analysis information you can get from the path profiler: code coverage
information and call graph information.
The program cache layout tool helps you to develop better program instruction cache efficiency into your
applications. Program cache layout is the process of controlling the relative placement of code sections
into memory to minimize the occurrence of conflict misses in the program instruction cache.
Code coverage conveys the execution count of each line of source code in the file being compiled, using
data collected during profiling.
You can specify two environment variables to control the destination of the code-coverage information file.
• The TI_COVDIR environment variable specifies the directory where the code-coverage file should be
generated. The default is the directory where the compiler is invoked.
• The TI_COVDATA environment variable specifies the name of the code-coverage data file generated
by the compiler. the default is filename.csv where filename is the base-name of the file being compiled.
For example, if foo.c is being compiled, the default code-coverage data file name is foo.csv.
If the code-coverage data file already exists, the compiler appends the new dataset at the end of the file.
Code-coverage data is a comma-separated list of data items that can be conveniently handled by data-
processing tools and scripting languages. The following is the format of code-coverage data:
"filename-with-full-path","funcname",line#,column#,exec-frequency,"comments"
The full filename, function name, and comments appear within quotation marks ("). For example:
"/some_dir/zlib/c64p/deflate.c","_deflateInit2_",216,5,1,"( strm->zalloc )"
Other tools, such as a spreadsheet program, can be used to format and view the code coverage data.
For further information about profile-based optimization and a more detailed description of the profiling
infrastructure, see Section 4.8.
--analyze=callgraph Instructs the compiler to generate weighted call graph analysis information.
--analyze=codecov Instructs the compiler to generate code coverage analysis information. This
option replaces the previous --codecov option.
--analyze_only Halts compilation after generation of analysis information is completed.
TI_WCGDATA Allows you to specify a single output CSV file for all weighted call graph analysis
information. New information is appended to the CSV file identified by this
environment variable, if the file already exists.
TI_ANALYSIS_DIR Specifies the directory in which the output analysis file will be generated. The
same environment variable can be used for both code coverage information and
weighted call graph information (all analysis files generated by pprof6x will be
written to the directory specified by the TI_ANALYSIS_DIR environment variable).
4.9.4.5 Linker
The compiler prioritizes the placement of a function relative to others based on the order in which --
preferred_order options are encountered during the linker invocation. The syntax is:
--preferred_order=function specification
unordered()
Using pdd6x produces a .prf file which is then fed into the re-compile of the application that uses the
profile information to generate weighted call graph input data.
The use of -mo instructs the compiler to generate code for each function into its own subsection. This
option provides the linker with the means to directly control the placement of the code for a given
function.
The compiler generates a CSV file containing weighted call graph information for each source file that
is specified on the command line. If such a CSV file already exists, then new call graph analysis
information will be appended to the existing CSV file. These CSV files are then input to the cache
layout tool (clt6x) to produce a preferred function order command file for your application.
For more details on the content of the CSV files (containing weighted call graph information) generated
by the compiler, see Section 4.9.6.
The output of clt6x is a text file containing a sequence of --preferred_order=function specification options.
By default, the name of the output file is forder.cmd, but you can specify your own file name with the -o
option. The order in which functions appear in this file is their preferred function order as determined by
the clt6x.
In general, the proximity of one function to another in the preferred function order list is a reflection of how
often the two functions call each other. If two functions are very close to each other in the list, then the
linker interprets this as a suggestion that the two functions should be placed very near to one another.
Functions that are placed close together are less likely to create a cache conflict miss at run time when
both functions are active at the same time. The overall effect should be an improvement in program
instruction cache efficiency and performance.
The preferred function order command file, forder.cmd, contains a list of --preferred_order=function
specification options. The linker prioritizes the placement of functions relative to each other in the order
that the --preferred_order options are encountered during the linker invocation.
Each --preferred_order option contains a function specification. A function specification can describe
simply the name of the function for a global function, or it can provide the path name and source file name
where the function is defined. A function specification that contains path and file name information is used
to distinguish one static function from another that has the same function name.
The --preferred_order options are interpreted by the linker as suggestions to guide the placement of
functions relative to each other. They are not explicit placement instructions. If an object file or input
section is explicitly mentioned in a linker command file SECTIONS directive, then the placement
instruction specified in the linker command file takes precedence over any suggestion from a --
preferred_order option that is associated with a function that is defined in that object file or input section.
This precedence can be relaxed by applying the unordered() operator to an output specification as
described in Section 4.9.7.
4.9.6 Comma-Separated Values (CSV) Files with Weighted Call Graph (WCG) Information
The format of the CSV files generated by the compiler under the --analyze=callgraph --use_profile_info
option combination is as follows:
"caller","callee","weight" [CR][LF]
caller spec,callee spec,call frequency [CR][LF]
caller spec,callee spec,call frequency [CR][LF]
caller spec,callee spec,call frequency [CR][LF]
...
*(.text)
} > PMEM
...
}
In this SECTIONS directive, the specification of .text explicitly dictates the order in which functions are laid
out in the output section. Thus by default, the linker will layout func_a through func_h in exactly the order
that they are specified, regardless of any other placement priority criteria (such as a preferred function
order list that is enumerated by --preferred_order options).
The unordered() operator can be used to relax this constraint on the placement of the functions in the
'.text' output section so that placement can be guided by other placement priority criteria.
The unordered() operator can be applied to an output section as in Example 4-2.
SECTIONS
{
.text: unordered()
{
file.obj(.text:func_a)
file.obj(.text:func_b)
file.obj(.text:func_c)
file.obj(.text:func_d)
file.obj(.text:func_e)
file.obj(.text:func_f)
file.obj(.text:func_g)
file.obj(.text:func_h)
*(.text)
} > PMEM
...
}
output attributes/
section page origin length input sections
-------- ---- ---------- ---------- ----------------
.text 0 00000020 00000120
00000020 00000020 file.obj (.text:func_g:func_g)
00000040 00000020 file.obj (.text:func_b:func_b)
00000060 00000020 file.obj (.text:func_d:func_d)
00000080 00000020 file.obj (.text:func_a:func_a)
000000a0 00000020 file.obj (.text:func_c:func_c)
000000c0 00000020 file.obj (.text:func_f:func_f)
000000e0 00000020 file.obj (.text:func_h:func_h)
00000100 00000020 file.obj (.text:func_e:func_e)
SECTIONS
{
.text: unordered()
{
file.obj(.text:func_a)
file.obj(.text:func_b)
file.obj(.text:func_c)
file.obj(.text:func_d)
. += 0x100;
file.obj(.text:func_e)
file.obj(.text:func_f)
file.obj(.text:func_g)
file.obj(.text:func_h)
*(.text)
} > PMEM
...
}
In Example 4-4, a dot (.) expression, ". += 0x100;", separates the explicit specification of two groups of
functions in the output section. In this case, the linker will honor the specified position of the dot (.)
expression with respect to the functions on either side of the expression. That is, the unordered() operator
will allow the preferred function order list to guide the placement of func_a through func_d relative to each
other, but none of those functions will be placed after the hole that is created by the dot (.) expression.
Likewise, the unordered() operator allows the preferred function order list to influence the placement of
func_e through func_h relative to each other, but none of those functions will be placed before the hole
that is created by the dot (.) expression.
SECTIONS
{
GROUP
{
.grp1: {
file.obj(.grp1:func_a)
file.obj(.grp1:func_b)
file.obj(.grp1:func_c)
file.obj(.grp1:func_d)
} unordered()
.grp2: {
file.obj(.grp2:func_e)
file.obj(.grp2:func_f)
file.obj(.grp2:func_g)
file.obj(.grp2:func_h)
}
.text: { *(.text) }
} > PMEM
...
}
The SECTIONS directive in Example 4-5 applies the unordered() operator to the first member of the
GROUP. The .grp1 output section layout can then be influenced by other placement priority criteria (like
the preferred function order list), whereas the .grp2 output section will be laid out as explicitly specified.
The unordered() operator cannot be applied to an entire GROUP or UNION. Attempts to do so will result
in a linker command file syntax error and the link will be aborted.
4.10.1 Use the --aliased_variables Option When Certain Aliases are Used
The compiler, when invoked with optimization, assumes that if the address of a local variable is passed to
a function, the function changes the local variable by writing through the pointer. This makes the local
variable's address unavailable for use elsewhere after returning. For example, the called function cannot
assign the local variable's address to a global variable or return the local variable's address.
If your code uses aliases in this way and uses optimization, you must use the --aliased_variables option.
For example, suppose your code is similar to the following, in which the address of the local variable x is
passed to the function f(), which aliases glob_ptr to that address and returns the address. If this example
were to be compiled with optimization, the --aliased_variables option would be needed in order for the
function f() to be able to successfully perform its actions.
int *glob_ptr;
g()
{
int x = 1;
int *p = f(&x);
*p = 5; /* p aliases x */
*glob_ptr = 10; /* glob_ptr aliases x */
h(x);
}
4.10.2 Use the --no_bad_aliases Option to Indicate That These Techniques Are Not Used
The --no_bad_aliases option informs the compiler that it can make certain assumptions about how aliases
are used in your code. These assumptions allow the compiler to improve optimization. The --
no_bad_aliases option also specifies that loop-invariant counter increments and decrements are non-zero.
Loop invariant means the value of an expression does not change within the loop.
• The --no_bad_aliases option indicates that your code does not use the aliasing technique described in
Section 4.10.1. If your code uses that technique, do not use the --no_bad_aliases option. You must
compile with the --aliased_variables option.
Do not use the --aliased_variables option with the --no_bad_aliases option. If you do, the --
no_bad_aliases option overrides the --aliased_variables option.
• The --no_bad_aliases option indicates that a pointer to a character type does not alias (point to) an
object of another type. That is, the special exception to the general aliasing rule for these types given
in Section 3.3 of the ISO specification is ignored. If you have code similar to the following example, do
not use the --no_bad_aliases option:
{
long l;
char *p = (char *) &l;
p[2] = 5;
}
• The --no_bad_aliases option indicates that indirect references on two pointers, P and Q, are not
aliases if P and Q are distinct parameters of the same function activated by the same call at run time.
If you have code similar to the following example, do not use the --no_bad_aliases option:
g(int j)
{
int a[20];
int g()
{
return f(5, -4); /* -4 is a negative index */
return f(0, 96); /* 96 exceeds 20 as an index */
return f(4, 16); /* This one is OK */
}
If an Advice file is requested, but there is no advice, the advice file will not be created;
rather the compiler prints a message to stdout :
"filename.c": advice #27004: No Performance Advice is generated.
Note that Advice to prevent Software Pipeline Disqualification (such as that presented above) will also be
printed in the .asm file. So, func.asm will contain :
;*----------------------------------------------------------------------------*
;* SOFTWARE PIPELINE INFORMATION
;* Disqualified loop: Loop contains a call
;* Loop at line 8 cannot be scheduled efficiently as it contains a
;* function call ("_init"). Try making "_init" an inline function.
;* Disqualified loop: Loop contains non-pipelinable instructions
;* Disqualified loop: Loop contains a call
;* Loop at line 8 cannot be scheduled efficiently as it contains a
;* function call ("_calculate"). Try to inline call or consider
;* rewriting loop.
;* Disqualified loop: Loop contains non-pipelinable instructions
;*----------------------------------------------------------------------------*
If --advice_dir option and full pathname are specified together, --advice:performance_dir option is ignored, and
the advice is generated in the full pathname advice file. Also, note that directory "mydir" must already exist for
an advice file to be created in there.
Your compilation is being done without any optimization options (-o0 and above). This prevents the
compiler from using its most powerful optimization techniques, since the -o (--opt_level) options are the
foundations for most other optimizations. You could get substantially better performance using -o2 (or
above) optimization. For C6000, optimization option -o2 is required for the software pipelining loop
optimization, which is crucial to getting good performance.
The C/C++ compiler is able to perform various optimizations, but you need to specify optimization options
on the command line so that these optimizations are performed. The easiest way to invoke optimization is
to specify the --opt_level=n option on the compiler command line. You can use -On to alias the --opt_level
option. The n denotes the level of optimization (0, 1, 2, and 3), which controls the type and degree of
optimization. See "Invoking Optimization" in Section 4.1 for more information on Optimization Options.
Your compilation uses low-level optimization options (-o1 and below), which prevents the compiler from
using its most powerful optimization techniques.
The C/C++ compiler is able to perform various optimizations, but you can control the level of these
optimizations. High-level optimizations are performed in the optimizer and low-level, target-specific
optimizations occur in the code generator. You must use high-level optimizations to achieve optimal code.
You can invoke optimization by specifying the --opt_level=n option on the compiler command line.
See "Invoking Optimization" in Section 4.1 for more information on Optimization Options. Also see
information for Advice #27000 in Section 4.14.1.
Your compilation is being done using -mu, which turns off software-pipelining. Software-pipelining is a key
optimization for achieving good performance. This Advice is issued to alert you to NOT use compiler
option -mu. -mu is a good option for debugging, but it is recommended that this option not be used for
production code because of the negative performance implications.
In general, to achieve maximal performance, avoid using the following in production code :
• -g: Compiling with debug information no longer affects the ability to optimize code. However, high
levels of optimization do make it more difficult to debug code due to code restructuring and other
transformations. If you are still at the debugging stage, you may want to use a lower level of
optimization. For production code, you can use a high level of optimization with or without disabling the
inclusion of debug information.
• -ss: Interlist source code into assembly file. As with -g, this option can negatively impact performance.
• -mu: Turns off software-pipelining, which is a key optimization for achieving good performance. This is
a good option for debugging, but is not recommended for use in production code due to negative
performance implications.
This advice was provided in earlier versions in which the inclusion of debug information impacted the
ability to optimize code. Debug information no longer impacts optimization, and the --optimize_with_debug
option has been deprecated. Also see Advice #27002 in Section 4.14.3.
The compiler detects that your compilation is being done using --advice:performance option, but the
compiler has no Advice to report. This Advice is issued to alert you to the fact that no Advice is being
emitted, and an Advice file will not be created (if one was requested).
The compiler attempts to perform the software pipeline loop optimization at optimization level --opt_level=3
(or -O3). If there is a call in the loop, the compiler will attempt to completely inline the called function, but
sometimes this is not possible. If the compiler cannot inline the called function, software pipelining cannot
be performed. This can severely reduce the performance of the loop.
In the test case below, the call to the function "func2" prevents software pipelining. Inlining function "func2"
or rewriting the loop to avoid a function call can avoid pipeline disqualification. If the loop pipelines
successfully you may see performance improvement.
void func1(int *p, int *q, int n)
{
unsigned int i;
p[i] = q[i] + t;
}
}
The compiler can insert calls to special functions in the run-time support library (RTS) to support
operations that are not natively supported by the ISA. For instance, while C6000 floating-point ISAs
support instructions to convert between floating-point and signed integer values, they don't support
conversion between floating-point and unsigned integer values. If you use unsigned variables in floating
point expressions, the compiler will generate a call to an RTS routine to carry out this function. Such a call
will disable software pipelining.
You can change the unsigned variables in your code to signed variables and prevent this from happening.
The compiler will then be able to use the native hardware instead of adding the special function call, so
you may get better performance.
An asm statement inserted in a C code loop will disqualify the loop for software pipelining. Software-
pipelining is a key optimization for achieving good performance. You may see reduced performance
without software pipelining.
Replace the asm() statement with native C, or an intrinsic function call to prevent this from happening.
Your code contains a complex conditional expression, possibly a large "if" clause, within a loop, which is
preventing optimization. The C6000 compiler will optimize small “if” statements (“if” statements with “if”
and “else” blocks that are short or empty). The compiler will not optimize large "if" statements, and such
large if statements within the loop body will disqualify the loop for software pipelining. Software-pipelining
is a key optimization; you may see reduced performance without it.
In the examples below, Example 1 will pipeline, but Example 2 won't :
Example 1:
for (i=0; i < N; i++)
{
if (!flag) {
//statements
}
else {
x[i] = y[i];
}
}
Example 2:
for (i = 0; i < n; i++)
{
if (!flag) {
//statements
}
else {
if (flag == 1) x[i] = y[i];
}
}
Example 1 will have significantly better performance than Example 2 because it pipelines successfully. But
Example 2 can be pipelined if the code is modified to eliminate the nested "if" :
There is a switch statement within the loop. A switch statement in a loop will disqualify the loop for
software pipelining. Software-pipelining is a key optimization; you may see reduced performance without
it.
Try and rewrite the loop without a switch statement.
The compiler can insert calls to special functions in the run-time support library (RTS) to support
operations that are not natively supported by the ISA. For example, the compiler calls __c6xabi_divi()
function to perform 32-bit integer divide operation. Such functions are called compiler helper functions,
and result in a function call within the loop body. In the example below, the compiler will accomplish the
division operation by calling the compiler helper function "_divi" :
void func(float *p, float n)
{
int i;
For improved performance, at optimization levels --opt_level=2 (-O2) and --opt_level=3 (-O3), the compiler
attempts to software pipeline your loops. Sometimes the compiler may not be able to inline a function call
that is in a loop. Because the compiler could not inline the function call, the loop could not be software
pipelined, and the loop could not be efficiently scheduled.
For example, in the test case below, call to function "func2" prevents software pipelining:
void func1(int *p, int *q, int n)
{
unsigned int i;
; other operations
}
}
int function func2() { . . . }
However if function func2 is inlined, it saves the overhead of a function call. The compiler is free to
optimize the function in context with surrounding code. Automatic inlining is controlled by the "inline"
keyword; use it to allow inlining of specific functions :
inline int function func2() { . . . }
The compiler inserts calls to special functions in the run-time support library (RTS) to support operations
that are not natively supported by the instruction set architecture (ISA). For example, C6000 fixed point
ISAs do not support floating-point instructions and the compiler will generate a call to an RTS routine to
carry out the floating point operation. In the test case below, the floating-point multiplication is unavailable
for a fixed-point device:
void func(float *p, float *q, int n)
{
unsigned int i;
If compiled for C6400+ (compiler option -mv6400+) the compiler will use an RTS call to carry out the
operation. Such a call will disable software pipelining. You can rewrite the operation, or use a fixed point
operation to prevent this.
Also see Advice #30001 in Section 4.14.7.
To help the compiler determine memory dependencies, you can qualify a pointer, reference, or array with
the restrict keyword. The restrict keyword is a type qualifier that can be applied to pointers, references,
and arrays. Its use represents a guarantee by you, the programmer, that within the scope of the pointer
declaration the object pointed to can be accessed only by that pointer. Any violation of this guarantee
renders the program undefined.
To see more information on using restrict, refer to Section 7.5.6
The C6000 architecture is partitioned into two nearly symmetric halves. The resource breakdown
displayed in the software pipelining information in the asm file, is computed after the compiler has
partitioned instructions to either the A-side or the B-side. If the resources are imbalanced (i.e.; some
resources on one side are used more than resources on the other) software pipelining is resource-bound,
and the loop cannot be efficiently scheduled. If the compiler has information about the trip-count for the
loop, it can unroll the loop to balance resource usage, and get better pipelining. You can give loop trip-
count information to the compiler using the "MUST_ITERATE" pragma.
To see more information on using the MUST_ITERATE pragma, refer to Section 7.9.21
Most loops have memory access instructions. The compiler attempts to use wider load instructions, and
aligned memory accesses instead of non-aligned memory accesses to reduce/balance out resources used
for the memory access instructions. One of the ways to let the compiler know that it is safe to use "wider"
loads is to use the keyword "_nassert".
To find out more on using the _nassert keyword, see Section 8.6.11.
When you use the --c_src_interlist and --optimizer_interlist options with optimization, the compiler inserts
its comments and the interlist feature runs before the assembler, merging the original C/C++ source into
the assembly file.
Example 4-10 shows the function from Example 4-9 compiled with the optimization (--opt_level=2) and the
--c_src_interlist and --optimizer_interlist options. The assembly file contains compiler comments and C
source interlisted with assembly code.
Example 4‑9. The Function From Example 3-4 Compiled With the -O2 and --optimizer_interlist Options
_main:
;** 5 ----------------------- printf("Hello, world\n");
;** 6 ----------------------- return 0;
STW .D2 B3,*SP--(12)
.line 3
B .S1 _printf
NOP 2
MVKL .S1 SL1+0,A0
MVKH .S1 SL1+0,A0
|| MVKL .S2 RL0,B3
STW .D2 A0,*+SP(4)
|| MVKH .S2 RL0,B3
RL0: ; CALL OCCURS
.line 4
ZERO .L1 A4
.line 5
LDW .D2 *++SP(12),B3
NOP 4
B .S2 B3
NOP 5
; BRANCH OCCURS
Example 4‑10. The Function From Example 3-4 Compiled with the --opt_level=2, --optimizer_interlist, and
--c_src_interlist Options
_main:
;** 5 ----------------------- printf("Hello, world\n");
;** 6 ----------------------- return 0;
STW .D2 B3,*SP--(12)
;------------------------------------------------------------------------------
; 5 | printf("Hello, world\n");
;------------------------------------------------------------------------------
B .S1 _printf
NOP 2
MVKL .S1 SL1+0,A0
MVKH .S1 SL1+0,A0
|| MVKL .S2 RL0,B3
STW .D2 A0,*+SP(4)
|| MVKH .S2 RL0,B3
RL0: ; CALL OCCURS
;------------------------------------------------------------------------------
; 6 | return 0;
;------------------------------------------------------------------------------
ZERO .L1 A4
LDW .D2 *++SP(12),B3
NOP 4
B .S2 B3
NOP 5
; BRANCH OCCURS
Debug information increases the size of object files, but it does not affect the size of code or data on the
target. If object file size is a concern and debugging is not needed, use --symdebug:none to disable the
generation of debug information.
If you are having trouble debugging loops in your code, you can use the --disable_software_pipelining
option to turn off software pipelining. See Section 4.2.1 for more information.
--opt_for_space --opt_for_speed
none =4
=0 =3
=1 =2
=2 =1
=3 =0
Optimization See
Cost-based register allocation Section 4.18.1
Alias disambiguation Section 4.18.1
Branch optimizations and control-flow simplification Section 4.18.3
Data flow optimizations Section 4.18.4
• Copy propagation
• Common subexpression elimination
• Redundant assignment elimination
Expression simplification Section 4.18.5
Inline expansion of functions Section 4.18.6
Optimization See
Function Symbol Aliasing Section 4.18.7
Induction variable optimizations and strength reduction Section 4.18.8
Loop-invariant code motion Section 4.18.9
Loop rotation Section 4.18.10
Instruction scheduling Section 4.18.11
For this example, the compiler makes aaa an alias of bbb, so that at link time all calls to function aaa
should be redirected to bbb. If the linker can successfully redirect all references to aaa, then the body of
function aaa can be removed and the symbol aaa is defined at the same address as bbb.
For information about using the GCC function attribute syntax to declare function aliases, see
Section 7.14.2
The assembly optimizer allows you to write assembly code without being concerned with the pipeline
structure of the C6000 or assigning registers. It accepts linear assembly code, which is assembly code
that may have had register-allocation performed and is unscheduled. The assembly optimizer assigns
registers and uses loop optimizations to turn linear assembly into highly parallel assembly.
Profile
Efficient Yes
Complete
enough?
No
Refine C/C++ code
Phase 2:
Refine C/C++ Compile
code
Profile
Efficient Yes
Complete
enough?
No
Yes
More C/C++
optimizations?
No
Write/refine linear assembly
Phase 3:
Write linear Assembly optimize
assembly
Profile
No
Efficient
enough?
Yes
Complete
• TMS320C6000 instructions
When you are writing your linear assembly, your code does not need to indicate the following:
– Pipeline latency
– Register usage
– Which unit is being used
As with other code generation tools, you might need to modify your linear assembly code until you are
satisfied with its performance. When you do this, you will probably want to add more detail to your
linear assembly. For example, you might want to partition or assign some registers.
label[:] Labels are optional for all assembly language instructions and for most (but not all)
assembly optimizer directives. When used, a label must begin in column 1 of a source
statement. A label can be followed by a colon.
[ register ] Square brackets ([ ]) enclose conditional instructions. The machine-instruction
mnemonic is executed based on the value of the register within the brackets; valid
register names are A0, A1, A2, B0, B1, B2, or symbolic.
mnemonic The mnemonic is a machine-instruction (such as ADDK, MVKH, B) or assembly
optimizer directive (such as .proc, .trip)
unit specifier The optional unit specifier enables you to specify the functional unit operand. Only the
specified unit side is used; other specifications are ignored. The preferred method is
specifying register sides.
operand list The operand list is not required for all instructions or directives. The operands can be
symbols, constants, or expressions and must be separated by commas.
comment Comments are optional. Comments that begin in column 1 must begin with a
semicolon or an asterisk; comments that begin in any other column must begin with a
semicolon.
The C6000 assembly optimizer reads up to 200 characters per line. Any characters beyond 200 are
truncated. Keep the operational part of your source statements (that is, everything other than comments)
less than 200 characters in length for correct assembly. Your comments can extend beyond the character
limit, but the truncated portion is not included in the .asm file.
Follow these guidelines in writing linear assembly code:
• All statements must begin with a label, a blank, an asterisk, or a semicolon.
• Labels are optional; if used, they must begin in column 1.
• One or more blanks must separate each field. Tab characters are interpreted as blanks. You must
separate the operand list from the preceding field with a blank.
• Comments are optional. Comments that begin in column 1 can begin with an asterisk or a semicolon (*
or ;) but comments that begin in any other column must begin with a semicolon.
• If you set up a conditional instruction, the register must be surrounded by square brackets.
• A mnemonic cannot begin in column 1 or it is interpreted as a label.
Refer to the TMS320C6000 Assembly Language Tools User's Guide for information on the syntax of
C6000 instructions, including conditional instructions, labels, and operands.
loop: .trip 25
LDW *a_0++[2], val0 ; load a[0-1]
LDW *b_0++[2], val1 ; load b[0-1]
MPY val0, val1, prod1 ; a[0] * b[0]
MPYH val0, val1, prod2 ; a[1] * b[1]
ADD prod1, prod2, tmp0 ; sum0 += (a[0]*b[0]) +
ADD tmp0, sum0, sum0 ; (a[1]*b[1])
.return sum
.endproc
int sum, I;
The old method of partitioning registers indirectly by partitioning instructions can still be used. Side and
functional unit specifiers can still be used on instructions. However, functional unit specifiers (.L/.S/.D/.M)
are ignored. Side specifiers are translated into partitioning constraints on the corresponding symbolic
names, if any. For example:
MV .1 x, y ; translated to .REGA y
LDW .D2T2 *u, v:w ; translated to .REGB u, v, w
In the linear assembler, you can also specify register pairs using the .cproc and/or .reg directive as in
Example 5-3:
.global foopair
foopair: .cproc q1:q0,s0
.reg r1:r0
ADD q1:q0, s0, r1:r0
.return r1:r0
.endproc
In Example 5-3, the expression "q1:q0" means that the first argument into the linear assembly function is a
register pair. By the C calling conventions, the pair "q1:q0" symbols are mapped to register pair "a5:a4".
When a register pair syntax is used as the argument to a .reg directive (as shown), it means that the two
register symbols are constrained to be an aligned register pair when the compiler processes the linear
assembler source and allocates actual registers that the register pair symbols map to "r1:r0" as shown.
The compiler supports a register quad syntax (C6600 only) in order to specify 128-bit operands of 128-bit
capable instructions in linear assembly and assembly source code. Example 5-4 illustrates how you can
specify register quads:
.global fooquad
fooquad: .cproc q3:q2:q1:q0, s3:s2:s1:s0
.reg r3:r2:r1:r0
QMPY32 s3:s2:s1:s0, q3:q2:q1:q0, r3:r2:r1:r0
.return r3:r2:r1:r0
.endproc
In Example 5-4, the expression "q3:q2:q1:q0" means that the first argument into the linear assembly
function is a register quad. By the C calling conventions, the quad "q3:q2:q1:q0" symbols are mapped to
register quad "a7:a6:a5:a4". When a register quad syntax is used as the argument to a .reg directive (as
shown), it means that the four register symbols are constrained to be an aligned register quad when the
compiler processes the linear assembler source and allocates actual registers that the register quad
symbols map to "r3:r2:r1:r0" as shown.
There are several ways to enter the unit specifier filed in linear assembly. Of these, only the specific
register side information is recognized and used:
• You can specify the particular functional unit (for example, .D1).
• You can specify the .D1 or .D2 functional unit followed by T1 or T2 to specify that the nonmemory
operand is on a specific register side. T1 specifies side A and T2 specifies side B. For example:
LDW .D1T2 *A3[A4], B3
LDW .D1T2 *src, dst
• You can specify only the data path (for example, .1), and the assembly optimizer assigns the functional
type (for example, .L1).
For more information on functional units refer to the TMS320C6000 CPU and Instruction Set Reference
Guide.
.reg t0,t1,p,i,sh:sl
MVK 100,i
ZERO sh
ZERO sl
.return sh:sl
.endproc
To disable this format with symbolic names and display assembly instructions with actual registers
instead, compile with the --machine_regs option.
Description Use the .call directive to call a function. Optionally, you can specify a register that is
assigned the result of the call. The register can be a symbolic or machine register. The
.call directive adheres to the same register and function calling conventions as the
C/C++ compiler. For information, see Section 8.3 and Section 8.4. There is no support
for alternative register or function calling conventions.
You cannot call a function that has a variable number of arguments, such as printf. No
error checking is performed to ensure the correct number and/or type of arguments is
passed. You cannot pass or return structures through the .call directive.
Following is a description of the .call directive parameters:
By default, the compiler generates near calls and the linker utilizes trampolines if the
near call will not reach its destination. To force a far call, you must explicitly load the
address of the function into a register, and then issue an indirect call. For example:
MVK func,reg
MVKH func,reg
.call reg(op1) ; forcing a far call
If you want to use * for indirection, you must abide by C/C++ syntax rules, and use the
following alternate syntax:
.call [ret_reg =] (* ireg)([arg1, arg2,...])
For example:
.call (*driver)(op1, op2) ; indirect call
.reg driver
.call driver(op1, op2) ; also an indirect call
Here are other valid examples that use the .call syntax.
.call fir(x, h, y) ; void function
Since you can use machine register names anywhere you can use symbolic registers, it
may appear you can change the function calling convention. For example:
.call A6 = compute()
It appears that the result is returned in A6 instead of A4. This is incorrect. Using machine
registers does not override the calling convention. After returning from the compute
function with the returned result in A4, a MV instruction transfers the result to A6.
Description The .circ directive assigns a symbolic register name to a machine register and declares
the symbolic register as available for circular addressing. The compiler then assigns the
variable to the register and ensures that all code transformations are safe in this
situation. You must insert setup/teardown code for circular addressing.
The compiler assumes that it is safe to speculate any load using an explicitly declared
circular addressing variable as the address pointer and may exploit this assumption to
perform optimizations.
When a symbol is declared with the .circ directive, it is not necessary to declare that
symbol with the .reg directive.
The .circ directive is equivalent to using .map with a circular declaration.
Example Here the symbolic name Ri is assigned to actual machine register Mi and Ri is declared
as potentially being used for circular addressing.
.CIRC R1/M1, R2/M2 ...
Description Use the .cproc/.endproc directive pair to delimit a section of your code that you want
the assembly optimizer to optimize and treat as a C/C++ callable function. This section is
called a procedure. The .cproc directive is similar to the .proc directive in that you use
.cproc at the beginning of a section and .endproc at the end of a section. In this way, you
can set off sections of your assembly code that you want to be optimized, like functions.
The directives must be used in pairs; do not use .cproc without the corresponding
.endproc. Specify a label with the .cproc directive. You can have multiple procedures in a
linear assembly file.
The .cproc directive differs from the .proc directive in that the compiler treats the .cproc
region as a C/C++ callable function. The assembly optimizer performs some operations
automatically in a .cproc region in order to make the function conform to the C/C++
calling conventions and to C/C++ register usage conventions.
These operations include the following:
• When you use save-on-entry registers (A10 to A15 and B10 to B15), the assembly
optimizer saves the registers on the stack and restores their original values at the
end of the procedure.
• If the compiler cannot allocate machine registers to symbolic register names specified
with the .reg directive (see the .reg topic) it uses local temporary stack variables. With
.cproc, the compiler manages the stack pointer and ensures that space is allocated
on the stack for these variables.
For more information, see Section 8.3 and Section 8.4.
Use the optional argument to represent function parameters. The argument entries are
very similar to parameters declared in a C/C++ function. The arguments to the .cproc
directive can be of the following types:
• Machine-register names. If you specify a machine-register name, its position in the
argument list must correspond to the argument passing conventions for C (see
Section 8.4). For example, the C/C++ compiler passes the first argument to a
function in register A4. This means that the first argument in a .cproc directive must
be A4 or a symbolic name. Up to ten arguments can be used with the .cproc
directive.
• Variable names.If you specify a variable name, then the assembly optimizer ensures
that either the variable name is allocated to the appropriate argument passing
register or the argument passing register is copied to the register allocated for the
variable name. For example, the first argument in a C/C++ call is passed in register
A4, so if you specify the following .cproc directive:
frame .cproc arg1
The assembly optimizer either allocates arg1 to A4, or arg1 is allocated to a different
register (such as B7) and an MV A4, B7 is automatically generated.
• Register pairs. A register pair is specified as arghi:arglo and represents a 40-bit
argument or a 64-bit type double argument.
For example, the .cproc defined as follows:
_fcn: .cproc arg1, arg2hi:arg2lo, arg3, B6, arg5, B9:B8
...
.return res
...
.endproc
corresponds to a C function declared as:
int fcn(int arg1, long arg2, int arg3, int arg4, int arg5, long arg6);
In this example, the fourth argument of .cproc is register B6. This is allowed since the
fourth argument in the C/C++ calling conventions is passed in B6. The sixth
argument of .cproc is the actual register pair B9:B8. This is allowed since the sixth
argument in the C/C++ calling conventions is passed in B8 or B9:B8 for longs.
• Register quads (C6600 only). A register quad is specified as r3:r2:r1:r0 and
represents a 128-bit type, __x128_t. See Example 5-4.
If you are calling a procedure from C++ source, you must use the appropriate linkname
for the procedure label. Otherwise, you can force C naming conventions by using the
extern C declaration. See Section 7.12 and Section 8.6 for more information.
When .endproc is used with a .cproc directive, it cannot have arguments. The live out set
for a .cproc region is determined by any .return directives that appear in the .cproc
region. (A value is live out if it has been defined before or within the procedure and is
used as an output from the procedure.) Returning a value from a .cproc region is
handled by the .return directive. The return branch is automatically generated in a .cproc
region. See the .return topic for more information.
Only code within procedures is optimized. The assembly optimizer copies any code that
is outside of procedures to the output file and does not modify it. See Section 5.4.1 for a
list of instruction types that cannot appear in a .cproc region.
LOOP:
AND cword,mask,cond ; cond = codeword & mask
[cond] MVK 1,cond ; !(!(cond))
CMPEQ theta,cond,if ; (theta == !(!(cond)))
LDH *a++,ai ; a[i]
[if] ADD sum,ai,sum ; sum += a[i]
[!if] SUB sum,ai,sum ; sum -= a[i]
SHL mask,1,mask ; mask = mask << 1
[cntr] ADD -1,cntr,cntr ; decrement counter
[cntr] B LOOP ; for LOOP
.return sum
.endproc
Description The .map directive assigns symbol names to machine registers. Symbols are stored in
the substitution symbol table. The association between symbolic names and actual
registers is wiped out at the beginning and end of each linear assembly function. The
.map directive can be used in assembly and linear assembly files.
When a symbol is declared with the .map directive, it is not necessary to declare that
symbol with the .reg directive.
Example Here the .map directive is used to assign x to register A6 and y to register B7. The
symbols are used with a move statement.
.map x/A6, y/B7
MV x, y ; equivalent to MV A6, B7
The symbol used to name a memory reference has the same syntax restrictions as any
assembly symbol. (For more information about symbols, refer to the TMS320C6000
Assembly Language Tools User's Guide.) It is in the same space as the symbolic
registers. You cannot use the same name for a symbolic register and annotating a
memory reference.
The .mdep directive tells the assembly optimizer that there is a dependence between
two memory references.
The .mdep directive is valid only within procedures; that is, within occurrences of the
.proc and .endproc directive pair or the .cproc and .endproc directive pair.
Example Here is an example in which .mdep is used to indicate a dependence between two
memory references.
.mdep ld1, st1
Description The .mptr directive associates a register with the information that allows the assembly
optimizer to determine automatically whether two memory operations have a memory
bank conflict. If the assembly optimizer determines that two memory operations have a
memory bank conflict, then it does not schedule them in parallel.
A memory bank conflict occurs when two accesses to a single memory bank in a given
cycle result in a memory stall that halts all pipeline operation for one cycle while the
second value is read from memory. For more information on memory bank conflicts,
including how to use the .mptr directive to prevent them, see Section 5.5.
Following are descriptions of the .mptr directive parameters:
variable|memref The name of the register symbol or memory reference used to identify
a load or store involved in a dependence.
base A symbolic address that associates related memory accesses
offset The offset in bytes from the starting base symbol. The offset is an
optional parameter and defaults to 0.
stride The register loop increment in bytes. The stride is an optional
parameter and defaults to 0.
The .mptr directive tells the assembly optimizer that when the symbol or memref is used
as a memory pointer in an LD(B/BU)(H/HU)(W) or ST(B/H/W) instruction, it is initialized
to point to base + offset and is incremented by stride each time through the loop.
The .mptr directive is valid within procedures only; that is, within occurrences of the .proc
and .endproc directive pair or the .cproc and .endproc directive pair.
The symbolic addresses used for base symbol names are in a name space separate
from all other labels. This means that a symbolic register or assembly label can have the
same name as a memory bank base name. For example:
.mptr Darray,Darray
Example Here is an example in which .mptr is used to avoid memory bank conflicts.
_blkcp: .cproc I
loop: .trip 50
; potential conflict
LDW *ptr1++, tmp1 ; load *0, bank 0
STW tmp1, *ptr2++{foo} ; store *8, bank 0
.endproc
Syntax .no_mdep
Description The .no_mdep directive tells the assembly optimizer that no memory dependencies
occur within that function, with the exception of any dependencies pointed to with the
.mdep directive.
There is no guarantee that the symbol will be assigned to any register in the specified
group. The compiler may ignore the preference.
When a symbol is declared with the .pref directive, it is not necessary to declare that
variable with the .reg directive.
Description Use the .proc/.endproc directive pair to delimit a section of your code that you want the
assembly optimizer to optimize. This section is called a procedure. Use .proc at the
beginning of the section and .endproc at the end of the section. In this way, you can set
off sections of unscheduled assembly instructions that you want optimized by the
compiler. The directives must be used in pairs; do not use .proc without the
corresponding .endproc. Specify a label with the .proc directive. You can have multiple
procedures in a linear assembly file.
Use the optional variable parameter in the .proc directive to indicate which registers are
live in, and use the optional register parameter of the .endproc directive to indicate which
registers are live out for each procedure. The variable can be an actual register or a
symbolic name. For example:
.PROC x, A5, y, B7
...
.ENDPROC y
A value is live in if it has been defined before the procedure and is used as an input to
the procedure. A value is live out if it has been defined before or within the procedure
and is used as an output from the procedure. If you do not specify any registers with the
.endproc directive, it is assumed that no registers are live out.
Only code within procedures is optimized. The assembly optimizer copies any code that
is outside of procedures to the output file and does not modify it.
See Section 5.4.1 for a list of instruction types that cannot appear in a .proc region.
Example Here is a block move example in which .proc and .endproc are used:
move .proc A4, B4, B0
.no_mdep
loop:
LDW *B4++, A1
MV A1, B1
STW B1, *A4++
ADD -4, B0, B0
[B0] B loop
.endproc
Description The .reg directive allows you to use descriptive names for values that are stored in
registers. The assembly optimizer chooses a register for you such that its use agrees
with the functional units chosen for the instructions that operate on the value.
The .reg directive is valid within procedures only; that is, within occurrences of the .proc
and .endproc directive pair or the .cproc and .endproc directive pair.
Declaring register pairs (or register quads for C6600) explicitly is optional. Doing so is
only necessary if the registers should be allocated as a pair, but they are not used that
way. It is a best practice to declare register pairs and register quads with the pair/quad
syntax. Here is an example of declaring a register pair:
.reg A7:A6
Example 1 This example uses the same code as the block move example shown for .proc/.endproc
but the .reg directive is used:
move .cproc dst, src, cnt
Notice how this example differs from the .proc example: symbolic registers declared with
.reg are allocated as machine registers.
Example 2 The code in the following example is invalid, because a variable defined by the .reg
directive cannot be used outside of the defined procedure:
move .proc A4
.reg tmp
LDW *A4++, top
MV top, B5
.endproc
MV top, B6 ; WRONG: top is invalid outside of the procedure
Description Registers can be directly partitioned through two directives. The .rega directive is used
to constrain a symbol name to A-side registers. The .regb directive is used to constrain
a symbol name to B-side registers. For example:
.REGA y
.REGB u, v, w
MV x, y
LDW *u, v:w
The .rega and .regb directives are valid within procedures only; that is, within
occurrences of the .proc and .endproc directive pair or the .cproc and .endproc directive
pair.
When a symbol is declared with the .rega or .regb directive, it is not necessary to declare
that symbol with the .reg directive.
The old method of partitioning registers indirectly by partitioning instructions can still be
used. Side and functional unit specifiers can still be used on instructions. However,
functional unit specifiers (.L/.S/.D/.M) and crosspath information are ignored. Side
specifiers are translated into partitioning constraints on the corresponding symbol
names, if any. For example:
MV .1X z, y ; translated to .REGA y
LDW .D2T2 *u, v:w ; translated to .REGB u, v, w
Description The .reserve directive prevents the assembly optimizer from using the specified register
in a .proc or .cproc region.
If a .reserved register is explicitly assigned in a .proc or .cproc region, then the assembly
optimizer can also use that register. For example, the variable tmp1 can be allocated to
register A7, even though it is in the .reserve list, since A7 was explicitly defined in the
ADD instruction:
.cproc
.reserve a7
.reg tmp1
....
ADD a6, b4, a7
....
.endproc
Example 1 The .reserve in this example guarantees that the assembly optimizer does not use A10
to A13 or B10 to B13 for the variables tmp1 to tmp5:
test .proc a4, b4
.reg tmp1, tmp2, tmp3, tmp4, tmp5
.reserve a10, a11, a12, a13, b10, b11, b12, b13
.....
.endproc a4
Example 2 The assembly optimizer may generate less efficient code if the available register pool is
overly restricted. In addition, it is possible that the available register pool is constrained
such that allocation is not possible and an error message is generated. For example, the
following code generates an error since all of the conditional registers have been
reserved, but a conditional register is required for the variable tmp:
.cproc ...
.reserve a1,a2,b0,b1,b2
.reg tmp
....
[tmp] ....
....
.endproc
Description The .return directive function is equivalent to the return statement in C/C++ code. It
places the optional argument in the appropriate register for a return value as per the
C/C++ calling conventions (see Section 8.4).
The optional argument can have the following meanings:
• Zero arguments implies a .cproc region that has no return value, similar to a void
function in C/C++ code.
• An argument implies a .cproc region that has a 32-bit return value, similar to an int
function in C/C++ code.
• A register pair of the format hi:lo implies a .cproc region that has a 40-bit long, a 64-
bit long long, or a 64-bit type double return value; similar to a long/long long/double
function in C/C++ code.
Arguments to the .return directive can be either symbolic register names or machine-
register names.
All return statements in a .cproc region must be consistent in the type of the return value.
It is not legal to mix a .return arg with a .return hi:lo in the same .cproc region.
The .return directive is unconditional. To perform a conditional .return, simply use a
conditional branch around a .return. The assembly optimizer removes the branch and
generates the appropriate conditional code. For example, to return if condition cc is true,
code the return as:
[!cc] B around
.return
around:
Example This example uses a symbolic register, tmp, and a machine-register, A5, as .return
arguments:
.cproc ...
.reg tmp
...
.return tmp = legal symbolic name
...
.return a5 = legal actual name
Description The .trip directive specifies the value of the trip count. The trip count indicates how
many times a loop iterates. The .trip directive is valid within procedures only. Following
are descriptions of the .trip directive parameters:
label The label represents the beginning of the loop. This is a required
parameter.
minimum value The minimum number of times that the loop can iterate. This is a
required parameter. The default is 1.
maximum value The maximum number of times that the loop can iterate. The
maximum value is an optional parameter.
factor The factor used, along with minimum value and maximum value, to
determine the number of times that the loop can iterate. In the
following example, the loop executes some multiple of 8, between 8
and 48, times:
loop: .trip 8, 48, 8
If the assembly optimizer cannot ensure that the trip count is large enough to pipeline a
loop for maximum performance, a pipelined version and an unpipelined version of the
same loop are generated. This makes one of the loops a redundant loop. The pipelined
or the unpipelined loop is executed based on a comparison between the trip count and
the number of iterations of the loop that can execute in parallel. If the trip count is
greater or equal to the number of parallel iterations, the pipelined loop is executed;
otherwise, the unpipelined loop is executed. For more information about redundant
loops, see Section 4.3.
You are not required to specify a .trip directive with every loop; however, you should use
.trip if you know that a loop iterates some number of times. This generally means that
redundant loops are not generated (unless the minimum value is really small) saving
code size and execution time.
If you know that a loop always executes the same number of times whenever it is called,
define maximum value (where maximum value equals minimum value) as well. The
compiler may now be able to unroll your loop thereby increasing performance.
When you are compiling with the interrupt flexibility option (--interrupt_threshold=n),
using a .trip maximum value allows the compiler to determine the maximum number of
cycles that the loop can execute. Then, the compiler compares that value to the
threshold value given by the --interrupt_threshold option. See Section 3.12 for more
information.
Example The .trip directive states that the loop will execute 16, 24, 32, 40 or 48 times when the
w_vecsum routine is called.
w_vecsum: .cproc ptr_a, ptr_b, ptr_c, weight, cnt
.reg ai, bi, prod, scaled_prod, ci
.no_mdep
Description The .volatile directive allows you to designate memory references as volatile. Volatile
loads and stores are not deleted. Volatile loads and stores are not reordered with
respect to other volatile loads and stores.
If the .volatile directive references a memory location that may be modified during an
interrupt, compile with the --interrupt_threshold=1 option to ensure all code referencing
the volatile memory location can be interrupted.
.proc
.if
...
.endif
.endproc
These illegal example .if/.endif loops are partly inside and partly outside .cproc or .proc regions:
.if
.cproc
.endif
.endproc
.proc
.if
...
.else
.endproc
.endif
• The following assembly instructions cannot be used from linear assembly:
– EFI
– SPLOOP, SPLOOPD and SPLOOPW and all other loop-buffer related instructions
– ADDKSP and DP-relative addressing
SPRUI04B – May 2017 Using the Assembly Optimizer 127
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
Avoiding Memory Bank Conflicts With the Assembly Optimizer www.ti.com
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
8N 8N + 1 8N + 2 8N + 3 8N + 4 8N + 5 8N + 6 8N + 7
Bank 0 Bank 1 Bank 2 Bank 3
For devices that have more than one memory space (Figure 5-2), an access to bank 0 in one memory
space does not interfere with an access to bank 0 in another memory space, and no pipeline stall occurs.
8N 8N + 1 8N + 2 8N + 3 8N + 4 8N + 5 8N + 6 8N + 7
Memory 8M 8M + 1 8M + 2 8M + 3 8M + 4 8M + 5 8M + 6 8M + 7
space 1
For example:
.mptr a_0,a+0,16
.mptr a_4,a+4,16
LDW *a_0++[4], val1 ; base=a, offset=0, stride=16
LDW *a_4++[4], val2 ; base=a, offset=4, stride=16
.mptr dptr,D+0,8
LDH *dptr++, d0 ; base=D, offset=0, stride=8
LDH *dptr++, d1 ; base=D, offset=2, stride=8
LDH *dptr++, d2 ; base=D, offset=4, stride=8
LDH *dptr++, d3 ; base=D, offset=6, stride=8
In this example, the offset for dptr is updated after every memory access. The offset is updated only when
the pointer is modified by a constant. This occurs for the pre/post increment/decrement addressing modes.
See the .mptr topic for more information.
Example 5-6 shows loads and stores extracted from a loop that is being software pipelined.
Example 5-6. Load and Store Instructions That Specify Memory Bank Information
.mptr Ain,IN,-16
.mptr Bin,IN-4,-16
.mptr Aco,COEF,16
.mptr Bco,COEF+4,16
.mptr Aout,optr+0,4
.mptr Bout,optr+2,4
_dot: .cproc a, b
.reg sum0, sum1, I
.reg val1, val2, prod1, prod2
loop: .trip 50
LDW *a++,val1 ; load a[0-1] bank0
LDW *b++,val2 ; load b[0-1] bank2
MPY val1,val2,prod1 ; a[0] * b[0]
MPYH val1,val2,prod2 ; a[1] * b[1]
ADD prod1,sum0,sum0 ; sum0 += a[0] * b[0]
ADD prod2,sum1,sum1 ; sum1 += a[1] * b[1]
It is not always possible to control fully how arrays and other memory objects are aligned. This is
especially true when a pointer is passed into a function and that pointer may have different alignments
each time the function is called. A solution to this problem is to write a dot product routine that cannot
have memory hits. This would eliminate the need for the arrays to use different memory banks.
If the dot product loop kernel is unrolled once, then four LDW instructions execute in the loop kernel.
Assuming that nothing is known about the bank alignment of arrays a and b (except that they are word
aligned), the only safe assumptions that can be made about the array accesses are that a[0-1] cannot
conflict with a[2-3] and that b[0-1] cannot conflict with b[2-3]. Example 5-10 shows the unrolled loop
kernel.
Example 5-10. Dot Product From Example 5-8 Unrolled to Prevent Memory Bank Conflicts
ADD 4,a_0,a_4
ADD 4,b_0,b_4
MVK 25,i ; I = 100/4
ZERO sum0 ; multiply result = 0
ZERO sum1 ; multiply result = 0
.mptr a_0,a+0,8
.mptr a_4,a+4,8
.mptr b_0,b+0,8
.mptr b_4,b+4,8
loop: .trip 25
The goal is to find a software pipeline in which the following instructions are in parallel:
LDW *a0++[2],val1 ; load a[0-1] bankx
|| LDW *a2++[2],val2 ; load a[2-3] bankx+2
LDW *b0++[2],val1 ; load b[0-1] banky
|| LDW *b2++[2],val2 ; load b[2-3] banky+2
Without the .mptr directives in Example 5-10, the loads of a[0-1] and b[0-1] are scheduled in parallel, and
the loads of a[2-3] and b[2-3] might be scheduled in parallel. This results in a 50% chance that a memory
conflict will occur on every cycle. However, the loop kernel shown in Example 5-11 can never have a
memory bank conflict.
In Example 5-8, if .mptr directives had been used to specify that a and b point to different bases, then the
assembly optimizer would never find a schedule for a 1-cycle loop kernel, because there would always be
a memory bank conflict. However, it would find a schedule for a 2-cycle loop kernel.
.mptr a,RS
.mptr b,RS
.mptr c,XY
.mptr d,XY+2
LDW *a++[i0a],A0 ; a and b always conflict with each other
LDW *b++[i0b],B0 ;
STH A1,*c++[i1a] ; c and d never conflict with each other
STH B2,*d++[i1b] ;
The directive to indicate a specific memory dependence in the previous example is as follows:
.mdep ld1, st1
This means that whenever ld1 accesses memory at location X, some later time in code execution, st1 may
also access location X. This is equivalent to adding a dependence between these two instructions. In
terms of the software pipeline, these two instructions must remain in the same order. The ld1 reference
must always occur before the st1 reference; the instructions cannot even be scheduled in parallel.
It is important to note the directional sense of the directive from ld1 to st1. The opposite, from st1 to ld1, is
not implied. In terms of the software pipeline, while every ld1 must occur before every st1, it is still legal to
schedule the ld1 from iteration n+1 before the st1 from iteration n.
Example 5-14 is a picture of the software pipeline with the instructions from two different iterations in
different columns. In the actual instruction sequence, instructions on the same horizontal line are in
parallel.
STW { st1 }
If that schedule does not work because the iteration n st1 might write a value the iteration n+1 ld1 should
read, then you must note a dependence relationship from st1 to ld1.
.mdep st1, ld1
Both directives together force the software pipeline shown in Example 5-15.
Example 5-15. Software Pipeline Using .mdep st1, ld1 and .mdep ld1, st1
...
STW { st1 }
LDW { ld1 }
...
STW { st1 }
<Indexed addressing,...>
Indexed addressing, *+base[index], is a good example of an addressing mode where you typically do not
know anything about the relative sequence of the memory accesses, except they sometimes access the
same location. To correctly model this case, you need to note the dependence relation in both directions,
and you need to use both directives.
.mdep ld1, st1 .mdep st1, ld1
.return tmp
.endproc
• Example 2
Here, .mdep r2, r1 indicates that STW must occur before LDW. Since STW is after LDW in the code,
the dependence relation is across loop iterations. The STW instruction writes a value that may be read
by the LDW instruction on the next iteration. In this case, a 6-cycle recurrence is created.
fn: .cproc dst, src, cnt
.reg tmp
.no_mdep
.mdep r2, r1
.endproc
Volatile References
NOTE: For volatile references, use .volatile rather than .mdep.
The C/C++ compiler and assembly language tools provide two methods for linking your programs:
• You can compile individual modules and link them together. This method is especially useful when you
have multiple source files.
• You can compile and link in one step. This method is useful when you have a single source module.
This chapter describes how to invoke the linker with each method. It also discusses special requirements
of linking C/C++ code, including the run-time-support libraries, specifying the type of initialization, and
allocating the program into memory. For a complete description of the linker, see the TMS320C6000
Assembly Language Tools User's Guide.
6.1 Invoking the Linker Through the Compiler (-z Option) ........................................... 139
6.2 Linker Code Optimizations ................................................................................ 141
6.3 Controlling the Linking Process ......................................................................... 142
When you specify a library as linker input, the linker includes and links only those library members that
resolve undefined references. The linker uses a default allocation algorithm to allocate your program into
memory. You can use the MEMORY and SECTIONS directives in the linker command file to customize
the allocation process. For information, see the TMS320C6000 Assembly Language Tools User's Guide.
You can link a C/C++ program consisting of object files prog1.obj, prog2.obj, and prog3.obj, with an
executable object file filename of prog.out with the command:
cl6x --run_linker --rom_model prog1 prog2 prog3 --output_file=prog.out
--library=rts6600.lib
The --run_linker option divides the command line into the compiler options (the options before --
run_linker) and the linker options (the options following --run_linker). The --run_linker option must follow all
source files and compiler options on the command line.
All arguments that follow --run_linker on the command line are passed to the linker. These arguments can
be linker command files, additional object files, linker options, or libraries. These arguments are the same
as described in Section 6.1.1.
All arguments that precede --run_linker on the command line are compiler arguments. These arguments
can be C/C++ source files, assembly files, linear assembly files, or compiler options. These arguments are
described in Section 3.2.
You can compile and link a C/C++ program consisting of object files prog1.c, prog2.c, and prog3.c, with an
executable object file filename of prog.out with the command:
cl6x prog1.c prog2.c prog3.c --run_linker --rom_model --output_file=prog.out --library=rts6600.lib
<Linking>
remark: linking in "libc.a"
remark: linking in "rts64plus.lib" in place of "libc.a"
When you link your program, you must specify where to allocate the sections in memory. In general,
initialized sections are linked into ROM or RAM; uninitialized sections are linked into RAM. With the
exception of code sections, the initialized and uninitialized sections created by the compiler cannot be
allocated into internal program memory.
The linker provides MEMORY and SECTIONS directives for allocating sections. For more information
about allocating sections into memory, see the TMS320C6000 Assembly Language Tools User's Guide.
The MEMORY and possibly the SECTIONS directives, might require modification to work with your
system. See the TMS320C6000 Assembly Language Tools User's Guide for more information on these
directives.
--rom_model
--heap_size=0x2000
--stack_size=0x0100
--library=rts64plus.lib
MEMORY
{
VECS: o = 0x00000000 l = 0x000000400 /* reset & interrupt vectors */
PMEM: o = 0x00000400 l = 0x00000FC00 /* intended for initialization */
BMEM: o = 0x80000000 l = 0x000010000 /* .bss, .sysmem, .stack, .cinit */
}
SECTIONS
{
vectors > VECS
.text > PMEM
.data > BMEM
.stack > BMEM
.bss > BMEM
.sysmem > BMEM
.cinit > BMEM
.const > BMEM
.cio > BMEM
.far > BMEM
}
The C/C++ compiler supports the C/C++ language standard that was developed by a committee of the
American National Standards Institute (ANSI) and subsequently adopted by the International Standards
Organization (IS0).
The C++ language supported by the C6000 is defined by the ANSI/ISO/IEC 14882:2003 standard with
certain exceptions.
alternate definitions of main. The alternate definitions are rejected in strict ANSI mode. (5.1.2.2.1)
• If space is provided for program arguments at link time with the --args option and the program is run
under a system that can populate the .args section (such as CCS), argv[0] will contain the filename of
the executable, argv[1] through argv[argc-1] will contain the command-line arguments to the program,
and argv[argc] will be NULL. Otherwise, the value of argv and argc are undefined. (5.1.2.2.1)
• Interactive devices include stdin, stdout, and stderr (when attached to a system that honors CIO
requests). Interactive devices are not limited to those output locations; the program may access
hardware peripherals that interact with the external state. (5.1.2.3)
• Signals are not supported. The function signal is not supported. (7.14) (7.14.1.1)
• The library function getenv is implemented through the CIO interface. If the program is run under a
system that supports CIO, the system performs getenv calls on the host system and passes the result
back to the program. Otherwise the operation of getenv is undefined. No method of changing the
environment from inside the target program is provided. (7.20.4.5)
• The system function is not supported. (7.20.4.6).
J.3.3. Identifiers
• The compiler does not support multibyte characters in identifiers. (6.4.2)
• The number of significant initial characters in an identifier is unlimited. (5.2.4.1, 6.4.2)
J.3.4 Characters
• The number of bits in a byte (CHAR_BIT) is 8. See Section 7.4 for details about data types. (3.6)
• The execution character set is the same as the basic execution character set: plain ASCII. (5.2.1)
• The values produced for the standard alphabetic escape sequences are as follows: (5.2.2)
• The value of a char object into which any character other than a member of the basic execution
character set has been stored is the ASCII value of that character. (6.2.5)
• Plain char is identical to signed char. (6.2.5, 6.3.1.1)
• The source character set and execution character set are both plain ASCII, so the mapping between
them is one-to-one. The compiler does accept multibyte characters in comments, string literals, and
character constants. (6.4.4.4, 5.1.1.2)
• The compiler currently supports only one locale, "C". (6.4.4.4).
• The compiler currently supports only one locale, "C". (6.4.5).
J.3.5 Integers
• C6000 supports the additional integer types __int40_t and unsigned __int40_t, which are signed and
unsigned 40-bit integer types. (6.2.5)
• Integer types are represented as two's complement, and there are no trap representations. (6.2.6.2)
• The rank of __int40_t and unsigned __int40_t is less than the rank for long long. The rank of __int40_t
and unsigned __int40_t is greater than the rank for long. (6.3.1.1)
• When an integer is converted to a signed integer type which cannot represent the value, the value is
truncated (without raising a signal) by discarding the bits which cannot be stored in the destination
type; the lowest bits are not modified. (6.3.1.3)
• Right shift of a signed integer value performs an arithmetic (signed) shift. The bitwise operations other
than right shift operate on the bits in exactly the same way as on an unsigned value. That is, after the
usual arithmetic conversions, the bitwise operation is performed without regard to the format of the
integer type, in particular the sign bit. (6.5)
J.3.6 Floating point
• The accuracy of floating-point operations (+ - * /) is bit-exact. The accuracy of library functions that
return floating-point results is not specified. (5.2.4.2.2)
• The compiler does not provide non-standard values for FLT_ROUNDS (5.2.4.2.2)
• The compiler does not provide non-standard negative values of FLT_EVAL_METHOD (5.2.4.2.2)
• The rounding direction when an integer is converted to a floating-point number is IEEE-754 "round to
even". (6.3.1.4)
• The rounding direction when a floating-point number is converted to a narrower floating-point number
is IEEE-754 "round to even". (6.3.1.5)
• For floating-point constants that are not exactly representable, the implementation uses the nearest
representable value. (6.4.4.2)
• The compiler does not contract float expressions. (6.5)
• The default state for the FENV_ACCESS pragma is off. (7.6.1)
• The TI compiler does not define any additional float exceptions (7.6, 7.12)
• The default state for the FP_CONTRACT pragma is off. (7.12.2)
• The "inexact" floating-point exception cannot be raised if the rounded result equals the mathematical
result. (F.9)
• The "underflow" and "inexact" floating-point exceptions cannot be raised if the result is tiny but not
inexact. (F.9)
J.3.7 Arrays and pointers
• When converting a pointer to an integer or vice versa, the pointer is considered an unsigned integer of
the same size, and the normal integer conversion rules apply.
• When converting a pointer to an integer or vice versa, if the bitwise representation of the destination
can hold all of the bits in the bitwise representation of the source, the bits are copied exactly. (6.3.2.3)
• The size of the result of subtracting two pointers to elements of the same array is the size of ptrdiff_t,
which is defined in Section 7.4. (6.5.6)
J.3.8 Hints
• When the optimizer is used, the register storage-class specifier is ignored. When the optimizer is not
used, the compiler will preferentially place register storage class objects into registers to the extent
possible. The compiler reserves the right to place any register storage class object somewhere other
than a register. (6.7.1)
• The inline function specifier is ignored unless the optimizer is used. For other restrictions on inlining,
see Section 3.11.5. (6.7.4)
J.3.9 Structures, unions, enumerations, and bit-fields
• A "plain" int bit-field is treated as a signed int bit-field. (6.7.2, 6.7.2.1)
• In addition to _Bool, signed int, and unsigned int, the compiler allows char, signed char, unsigned char,
signed short, unsigned shot, signed long, unsigned long, signed long long, unsigned long long, and
enum types as bit-field types. (6.7.2.1)
• Bit-fields may not straddle a storage-unit boundary.(6.7.2.1)
• Bit-fields are allocated in endianness order within a unit. See Section 8.2.2. (6.7.2.1)
• Non-bit-field members of structures are aligned as specified in See Section 8.2.1. (6.7.2.1)
• The integer type underlying each enumerated type is described in Section 7.4.1. (6.7.2.2)
J.3.10 Qualifiers
• The TI compiler does not shrink or grow volatile accesses. It is the user's responsibility to make sure
the access size is appropriate for devices that only tolerate accesses of certain widths. The TI compiler
does not change the number of accesses to a volatile variable unless absolutely necessary. This is
significant for read-modify-write expressions such as += ; for an architecture which does not have a
corresponding read-modify-write instruction, the compiler will be forced to use two accesses, one for
the read and one for the write. Even for architectures with such instructions, it is not guaranteed that
the compiler will be able to map such expressions to an instruction with a single memory operand. It is
not guaranteed that the memory system will lock that memory location for the duration of the
instruction. In a multi-core system, some other core may write the location after a RMW instruction
reads it, but before it writes the result. The TI compiler will not reorder two volatile accesses, but it may
reorder a volatile and a non-volatile access, so volatile cannot be used to create a critical section. Use
some sort of lock if you need to create a critical section. (6.7.3)
J.3.11 Preprocessing directives
• Include directives may have one of two forms, " " or < >. For both forms, the compiler will look for a
real file on-disk by that name using the include file search path. See Section 3.5.2. (6.4.7).
• The value of a character constant in a constant expression that controls conditional inclusion matches
the value of the same character constant in the execution character set (both are ASCII). (6.10.1).
• The compiler uses the file search path to search for an included < > delimited header file. See
Section 3.5.2. (6.10.2).
• he compiler uses the file search path to search for an included " " delimited header file. See
Section 3.5.2. (6.10.2). (6.10.2).
• There is no arbitrary nesting limit for #include processing. (6.10.2).
• See Section 7.9 for a description of the recognized non-standard pragmas. (6.10.6).
• The date and time of translation are always available from the host. (6.10.8).
J.3.12 Library functions
• Almost all of the library functions required for a hosted implementation are provided by the TI library,
with exceptions noted in Section 7.13.1. (5.1.2.1).
• The format of the diagnostic printed by the assert macro is "Assertion failed, (assertion macro
argument), file file, line line". (7.2.1.1).
• No strings other than "C" and "" may be passed as the second argument to the setlocale function
(7.11.1.1).
• No signal handling is supported. (7.14.1.1).
• The +INF, -INF, +inf, -inf, NAN, and nan styles can be used to print an infinity or NaN. (7.19.6.1,
7.24.2.1).
• The output for %p conversion in the fprintf or fwprintf function is the same as %x of the appropriate
size. (7.19.6.1, 7.24.2.1).
• The termination status returned to the host environment by the abort, exit, or _Exit function is not
returned to the host environment. (7.20.4.1, 7.20.4.3, 7.20.4.4).
• The system function is not supported. (7.20.4.6).
J.3.13 Architecture
• The values or expressions assigned to the macros specified in the headers float.h, limits.h, and stdint.h
are described along with the sizes and format of integer types are described in Section 7.4. (5.2.4.2,
7.18.2, 7.18.3)
• The number, order, and encoding of bytes in any object are described in Section 8.2.1. (6.2.6.1)
• The value of the result of the sizeof operator is the storage size for each type, in terms of bytes. See
Section 8.2.1. (6.5.3.4)
--check_misra={all|required|advisory|none|rulespec}
#pragma CHECK_MISRA ("{all|required|advisory|none|rulespec}")
#pragma RESET_MISRA ("{all|required|advisory|rulespec}")
--misra_advisory={error|warning|remark|suppress}
--misra_required={error|warning|remark|suppress}
The additional types from C, C99 and C++ are defined as synonyms for standard types:
For C++ and relaxed C89/C99, the compiler allows enumeration constants up to the largest integral type
(64 bits). The default, which is recommended, is for the underlying type to be the first type in the following
list in which all the enumerated constant values can be represented: int, unsigned int, long long, unsigned
long long.
If you use the --small_enum option, the smallest possible byte size for the enumeration type is used. The
underlying type is the first type in the following list in which all the enumerated constant values can be
represented: signed char, unsigned char, short, unsigned short, int, unsigned int, long long, unsigned long
long.
The following enum uses 8 bits instead of 32 bits when the --small_enum option is used.
enum example_enum {
first = -128,
second = 0,
third = 127
};
The following enum fits into 16 bits instead of 32 when the --small_enum option is used.
enum a_short_enum {
bottom = -32768,
middle = 0,
top = 32767
};
NOTE: Do not link object files compiled with the --small_enum option with object files that were
compiled without it. If you use the --small_enum option, you must use it with all of your
C/C++ files; otherwise, you will encounter errors that cannot be detected until run time.
All of the vector data types and related built-in functions that are supported in the C6000 programming
model are specified in the "c6x_vec.h" header file in the "include" sub-directory where your C6000 CGT
was installed. Any C/C++ source file that uses vector data types or any of the related built-in functions
must contain the following in that source file:
#include <c6x_vec.h>
A vector type name concatenates an element type name and a number representing the vector length.
The resulting vector consists of the specified number of elements of the specified type.
The C6x implementation of vector data types and operations follows the OpenCL C language specification
closely. For a detailed description of OpenCL vector data types and operations, please see "The OpenCL
Specification" version 1.2, which is available from the Khronos OpenCL Working Group. Section 6.1.2 of
"The OpenCL Specification" version 1.2 provides a detailed description of the built-in vector data types
supported in the OpenCL C programming language. The C6x programming model provides the following
built-in vector data types:
Type Description
charn A vector of n 8-bit signed integer values.
ucharn A vector of n 8-bit unsigned integer values.
shortn A vector of n 16-bit signed integer values.
ushortn A vector of n 16-bit unsigned integer values.
intn A vector of n 32-bit signed integer values.
uintn A vector of n 32-bit unsigned integer values.
longlongn A vector of n 64-bit signed integer values.
ulonglongn A vector of n 64-bit unsigned integer values.
floatn A vector of n 32-bit single-precision floating-
point values.
doublen A vector of n 64-bit double-precision floating-
point values.
NOTE: To avoid confusion between C6000's definition of long (32-bits) and 64-bit definitions of long,
vector types with a base type of "long" and unsigned long ("ulong") are not provided. If you
want to use the standard long and ulong types, you can create a simple preprocessor macro
such as: #define long2 longlong2 or #define long2 int2, depending the element type and size
you want to use.
The C6x Code Generation Tools also provide an extension for representing vectors of complex types. A
prefix of 'c' is used to indicate a complex type name. Each complex type vector element contains a real
part and an imaginary part with the real part occupying the lower address in memory. Thus, the complex
vector types are as follows:
Type Description
ccharn A vector of n pairs of 8-bit signed integer values.
cshortn A vector of n pairs of 16-bit signed integer values.
cintn A vector of n pairs of 32-bit signed integer values.
clonglongn A vector of n pairs of 64-bit signed integer values.
cfloatn A vector of n pairs of 32-bit floating-point values.
cdoublen A vector of n pairs of 64-bit floating-point values.
where n can be a vector length of 2, 3, 4, or 8. Note that 16 is not a valid vector length for complex vector
types. For example, a "cfloat2" is a vector of 2 complex floats. Its length is 2 and its size is 128 bits. Each
"cfloat2" vector element contains a real float and an imaginary float.
For information about operators and built-in functions used with vector data types, see Section 7.15.
7.5 Keywords
The C6000 C/C++ compiler supports all of the standard C89 keywords, including const, volatile, and
register. It also supports all of the standard C99 keywords, including inline and restrict. It also supports TI
extension keywords __interrupt, __near, __far, __cregister, and __asm. Some keywords are not available
in strict ANSI mode.
The following keywords may appear in other target documentation and require the same treatment as the
interrupt and restrict keywords:
• trap
• reentrant
• cregister
Using the const keyword, you can define large constant tables and allocate them into system ROM. For
example, to allocate a ROM table, you could use the following definition:
far const int digits[] = {0,1,2,3,4,5,6,7,8,9};
The additional control registers listed in Table 7-5 are used for floating-point operations on C6740 and
C6600 devices:
The __cregister keyword can be used only in file scope. The __cregister keyword is not allowed on any
declaration within the boundaries of a function. It can only be used on objects of type integer or pointer.
The __cregister keyword is not allowed on objects of any floating-point type or on any structure or union
objects.
The __cregister keyword does not imply that the object is volatile. If the control register being referenced
is volatile (that is, can be modified by some external control), then the object must be declared with the
volatile keyword also.
To use the control registers in Table 7-4, you must declare each register as follows. The c6x.h include file
defines all the control registers through this syntax:
Once you have declared the register, you can use the register name directly. See the
TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide (SPRU732), the TMS320C66x DSP
CPU and Instruction Set Reference Guide (SPRUGH7), or the TMS320C674x DSP CPU and Instruction
Set Reference Guide (SPRUFE8) for detailed information on the control registers.
See Example 7-1 for an example that declares and uses control registers.
The name c_int00 is the C/C++ entry point. This name is reserved for the system reset interrupt. This
special interrupt routine initializes the system and calls the main() function. Because it has no caller,
c_int00 does not save any registers.
__near keyword The compiler assumes that the data item can be accessed relative to the data page
pointer. For example:
LDW *+dp(_address),a0
__far keyword The compiler cannot access the data item via the DP. This can be required if the
total amount of program data is larger than the offset allowed (32K) from the DP.
For example:
MVKL _address, a1
MVKH _address, a1
LDW *a1,a0
Be consistent with near and far declarations. If an object is defined to be far, all external declarations of
this object in other C files or headers must also contain the __far keyword, or you will likely get compiler or
linker errors. If an object is defined to be near, you can safely declare it as __far in other C files or
headers, but you will have slower data access for that variable.
If you use the DATA_SECTION pragma, the object is indicated as a far variable, and this cannot be
overridden. If you reference this object in another file, then you need to use extern __far when declaring
this object in the other source file. This ensures access to the variable, since the variable might not be in
the .bss section. For details, see Section 7.9.7.
When data objects do not have the __near or __far keyword specified, the compiler will use far accesses
to aggregate data and near accesses to non-aggregate data. For more information on the data memory
model and ways to control accesses to data, see Section 8.1.4.1.
__near keyword The compiler assumes that destination of the call is within ± 1 M word of the caller.
Here the compiler uses the PC-relative branch instruction.
B _func
__far keyword The compiler is told by you that the call is not within ± 1 M word.
MVKL _func, al
MVKH _func, al
B _func
By default, the compiler generates small-memory model code, which means that every function call is
handled as if it were declared near, unless it is actually declared far. For more information on function
calls, see Section 8.1.5.
Example 7-3 illustrates using the restrict keyword when passing arrays to a function. Here, the arrays c
and d must not overlap, nor may c and d point to the same array.
However, in this example, *ctrl is a loop-invariant expression, so the loop is optimized down to a single-
memory read. To get the desired result, define ctrl as:
volatile unsigned int *ctrl;
Here the *ctrl pointer is intended to reference a hardware location, such as an interrupt flag.
The volatile keyword must also be used when accessing memory locations that represent memory-
mapped peripheral devices. Such memory locations might change value in ways that the compiler cannot
predict. These locations might change if accessed, or when some other memory location is accessed, or
when some signal occurs.
Volatile must also be used for local variables in a function which calls setjmp, if the value of the local
variables needs to remain valid if a longjmp occurs.
#include <stdlib.h>
jmp_buf context;
void function()
{
volatile int x = 3;
switch(setjmp(context))
{
case 0: setup(); break;
default:
{
printf("x == %d\n", x); /* We can only reach here if longjmp has occurred; because x's
lifetime begins before the setjmp and lasts through the longjmp,
the C standard requires x be declared "volatile" */
break;
}
}
}
The compiler copies the argument string directly into your output file. The assembler text must be
enclosed in double quotes. All the usual character string escape codes retain their definitions. For
example, you can insert a .byte directive that contains quotes as follows:
__asm("STR: .byte \"abc\"");
The inserted code must be a legal assembly language statement. Like all assembly language statements,
the line of code inside the quotes must begin with a label, a blank, a tab, or a comment (asterisk or
semicolon). The compiler performs no checking on the string; if there is an error, the assembler detects it.
For more information about the assembly language statements, see the TMS320C6000 Assembly
Language Tools User's Guide.
The __asm statements do not follow the syntactic restrictions of normal C/C++ statements. Each can
appear as a statement or a declaration, even outside of blocks. This is useful for inserting directives at the
very beginning of a compiled module.
The __asm statement does not provide any way to refer to local variables. If your assembly code needs to
refer to local variables, you will need to write the entire function in assembly code.
For more information, refer to Section 8.6.5.
For pragmas that apply to functions or symbols, the syntax differs between C and C++.
• In C, you must supply the name of the object or function to which you are applying the pragma as the
first argument. Because the entity operated on is specified, a pragma in C can appear some distance
way from the definition of that entity.
• In C++, pragmas are positional. They do not name the entity on which they operate as an argument.
Instead, they always operate on the next entity defined after the pragma.
Note that in C++, the arguments to the CALLS pragma must be the full mangled names for the functions
that can be indirectly called from the calling function.
The GCC-style "calls" attribute syntax, which has the same effect as the CALLS pragma, is as follows:
__attribute__((calls("function_1","function_2",..., "function_n")))
The rulespec parameter is a comma-separated list of rule numbers. See Section 7.3 for details.
The RESET_MISRA pragma can be used to reset any CHECK_MISRA pragmas; see Section 7.9.27.
#pragma CLINK
The RETAIN pragma has the opposite effect of the CLINK pragma. See Section 7.9.28 for more details.
The CODE_SECTION pragma is useful if you have code objects that you want to link into an area
separate from the .text section.
int fn(int x)
{
return x;
}
.sect "my_sect"
.global _fn
;******************************************************************************
;* FUNCTION NAME: _fn *
;* *
;* Regs Modified : SP *
;* Regs Used : A4,B3,SP *
;* Local Frame Size : 0 Args + 4 Auto + 0 Save = 4 byte *
;******************************************************************************
_fn:
;** --------------------------------------------------------------------------*
RET .S2 B3 ; |6|
SUB .D2 SP,8,SP ; |4|
STW .D2T1 A4,*+SP(4) ; |4|
ADD .S2 8,SP,SP ; |6|
NOP 2
; BRANCH OCCURS ; |6|
Both global and local variables can be aligned with the DATA_MEM_BANK pragma. The
DATA_MEM_BANK pragma must reside inside the function that contains the local variable being aligned.
The symbol can also be used as a parameter in the DATA_SECTION pragma.
When optimization is enabled, the tools may or may not use the stack to store the values of local
variables.
The DATA_MEM_BANK pragma allows you to align data on any data memory bank that can hold data of
the type size of the symbol. This is useful if you need to align data in a particular way to avoid memory
bank conflicts in your hand-coded assembly code versus padding with zeros and having to account for the
padding in your code.
This pragma increases the amount of space used in data memory by a small amount as padding is used
to align data onto the correct bank.
A value of 0 for the constant argument to DATA_MEM_BANK pragma causes the last five bits of the
starting address to be 0x00. For a value of 2, the last five bits of the starting address will be 0x08
(0b01000). For a value of 4, the last five bits of the starting address will be 0x10 (0b10000). For a value of
6, the last five bits of the starting address will be 0x18 (0b11000).
The code in Example 7-7 uses the DATA_MEM_BANK pragma to specify the alignment of the x, y, z, w,
and zz arrays. It then assigns values to all the array elements and prints the starting address of each
array.
#include <stdio.h>
void main()
{
int i;
#pragma DATA_MEM_BANK (y, 4)
The DATA_SECTION pragma is useful if you have data objects that you want to link into an area separate
from the .bss section. If you allocate a global variable using a DATA_SECTION pragma and you want to
reference the variable in C code, you must declare the variable as extern far.
Example 7-8 through Example 7-10 demonstrate the use of the DATA_SECTION pragma.
char bufferA[512];
#pragma DATA_SECTION("my_sect")
char bufferB[512];
.global _bufferA
.bss _bufferA,512,4
.global _bufferB
The syntax of the diag_suppress, diag_remark, diag_warning, and diag_error pragmas in C is:
#pragma FUNC_ALWAYS_INLINE
#pragma FUNC_CANNOT_INLINE
#pragma FUNC_EXT_CALLED
Except for _c_int00, which is the name reserved for the system reset interrupt for C/C++programs, the
name of the interrupt (the func argument) does not need to conform to a naming convention.
When you use program-level optimization, you may need to use the FUNC_EXT_CALLED pragma with
certain options. See Section 4.7.2.
#pragma FUNC_IS_PURE
#pragma FUNC_IS_SYSTEM
#pragma FUNC_NEVER_RETURNS
#pragma FUNC_NO_GLOBAL_ASG
#pragma FUNC_NO_IND_ASG
Supported options for this pragma are --opt_level, --auto_inline, --code_state, --opt_for_space, and --
opt_for_speed. In order to use --opt_level and --auto_inline with the FUNCTION_OPTIONS pragma, the
compiler must be invoked with some optimization level (that is, at least --opt_level=0).
#pragma INTERRUPT
void func( void )
The code for the function will return via the IRP (interrupt return pointer).
#pragma LOCATION(address )
int x
The NOINIT pragma may be used in conjunction with the LOCATION pragma to map variables to special
memory locations; see Section 7.9.23.
The arguments min and max are programmer-guaranteed minimum and maximum trip counts. The trip
count is the number of times a loop iterates. The trip count of the loop must be evenly divisible by multiple.
All arguments are optional. For example, if the trip count could be 5 or greater, you can specify the
argument list as follows:
#pragma MUST_ITERATE(5)
However, if the trip count could be any nonzero multiple of 5, the pragma would look like this:
#pragma MUST_ITERATE(5, , 5) /* Note the blank field for max */
It is sometimes necessary for you to provide min and multiple in order for the compiler to perform
unrolling. This is especially the case when the compiler cannot easily determine how many iterations the
loop will perform (that is, the loop has a complex exit condition).
When specifying a multiple via the MUST_ITERATE pragma, results of the program are undefined if the
trip count is not evenly divisible by multiple. Also, results of the program are undefined if the trip count is
less than the minimum or greater than the maximum specified.
If no min is specified, zero is used. If no max is specified, the largest possible number is used. If multiple
MUST_ITERATE pragmas are specified for the same loop, the smallest max and largest min are used.
In this example, the compiler attempts to generate a software pipelined loop even without the pragma.
However, if MUST_ITERATE is not specified for a loop such as this, the compiler generates code to
bypass the loop, to account for the possibility of 0 iterations. With the pragma specification, the compiler
knows that the loop iterates at least once and can eliminate the loop-bypassing code.
MUST_ITERATE can specify a range for the trip count as well as a factor of the trip count. The following
example tells the compiler that the loop executes between 8 and 48 times and the trip_count variable is a
multiple of 8 (8, 16, 24, 32, 40, 48). The multiple argument allows the compiler to unroll the loop.
#pragma MUST_ITERATE(8, 48, 8)
for(i = 0; i < trip_count; i++) { ...
You should consider using MUST_ITERATE for loops with complicated bounds. In the following example,
the compiler would have to generate a divide function call to determine, at run time, the number of
iterations performed.
for(i2 = ipos[2]; i2 < 40; i2 += 5) { ...
The compiler will not do the above. In this case, using MUST_ITERATE to specify that the loop always
executes eight times allows the compiler to attempt to generate a software pipelined loop:
#pragma MUST_ITERATE(8, 8)
Typically, if the MUST_ITERATE pragma is used to optimize loop execution, a DINT instruction is
prepended to the optimized code, the loop code is executed, and then an RINT instruction is executed
when the loop is terminated.
#pragma NMI_INTERRUPT
The code generated for the function will return via the NRP versus the IRP as for a function declared with
the interrupt keyword or INTERRUPT pragma.
Except for _c_int00, which is the name reserved for the system reset interrupt for C programs, the name
of the interrupt (function) does not need to conform to a naming convention.
NOTE: When using these pragmas in non-volatile FRAM memory, the memory region could be
protected against unintended writes through the device's Memory Protection Unit. Some
devices have memory protection enabled by default. Please see the information about
memory protection in the datasheet for your device. If the Memory Protection Unit is enabled,
it first needs to be disabled before modifying the variables.
If you are using non-volatile RAM, you can define a persistent variable with an initial value of zero loaded
into RAM. The program can increment that variable over time as a counter, and that count will not
disappear if the device loses power and restarts, because the memory is non-volatile and the boot
routines do not initialize it back to zero. For example:
#pragma PERSISTENT(x)
#pragma location = 0xC200 // memory address in RAM
int x = 0;
void main() {
run_init();
while (1) {
run_actions(x);
__delay_cycles(1000000);
x++;
}
}
#pragma NOINIT (x )
int x;
#pragma PERSISTENT (x )
int x=10;
#pragma NOINIT
int x;
#pragma PERSISTENT
int x=10;
int x __attribute__((noinit));
int x __attribute__((persistent)) = 0;
#pragma NO_HOOKS
The above form of the pack pragma affects all class, struct, or union type declarations that follow this
pragma in a file. It forces the maximum alignment of each field to be the value specified by n. Valid values
for n are 1, 2, 4, 8, and 16 bytes.
The above form of the pack pragma affects only class, struct, and union type declarations between push
and pop directives. (A pop directive with no prior push results in a warning diagnostic from the compiler.)
The maximum alignment of all fields declared is n. Valid values for n are 1, 2, 4, 8, and 16 bytes.
The above form of the pack pragma sends a warning diagnostic to stderr to record the current state of the
pack pragma stack. You can use this form while debugging.
For more about packed fields, see Section 7.14.4.
Where min and max are the minimum and maximum trip counts of the loop in the common case. The trip
count is the number of times a loop iterates. Both arguments are optional.
For example, PROB_ITERATE could be applied to a loop that executes for eight iterations in the majority
of cases (but sometimes may execute more or less than eight iterations):
#pragma PROB_ITERATE(8, 8)
If only the minimum expected trip count is known (say it is 5), the pragma would look like this:
#pragma PROB_ITERATE(5)
If only the maximum expected trip count is known (say it is 10), the pragma would look like this:
#pragma PROB_ITERATE(, 10) /* Note the blank field for min */
The rulespec parameter is a comma-separated list of rule numbers. See Section 7.3 for details.
#pragma RETAIN
The CLINK pragma has the opposite effect of the RETAIN pragma. See Section 7.9.3 for more details.
In Example 7-11 x and y are put in the section mydata. To reset the current section to the default used by
the compiler, a blank parameter should be passed to the pragma. An easy way to think of the pragma is
that it is like applying the CODE_SECTION or DATA_SECTION pragma to all symbols below it.
#pragma SET_DATA_SECTION("mydata")
int x;
int y;
#pragma SET_DATA_SECTION()
The pragmas apply to both declarations and definitions. If applied to a declaration and not the definition,
the pragma that is active at the declaration is used to set the section for that symbol. Here is an example:
#pragma SET_CODE_SECTION("func1")
extern void func1();
#pragma SET_CODE_SECTION()
...
void func1() { ... }
In Example 7-12 func1 is placed in section func1. If conflicting sections are specified at the declaration
and definition, a diagnostic is issued.
The current CODE_SECTION and DATA_SECTION pragmas and GCC attributes can be used to override
the SET_CODE_SECTION and SET_DATA_SECTION pragmas. For example:
In Example 7-13 x is placed in x_data and y is placed in mydata. No diagnostic is issued for this case.
The pragmas work for both C and C++. In C++, the pragmas are ignored for templates and for implicitly
created objects, such as implicit constructors and virtual function tables.
This pragma guarantees that the alignment of the named type or the base type of the named typedef is at
least equal to that of the expression. (The alignment may be greater as required by the compiler.) The
alignment must be a power of 2. The type must be a type or a typedef name. If a type, it must be either a
structure tag or a union tag. If a typedef, its base type must be either a structure tag or a union tag.
Note that while the top-level object of a type (or a typedef of that type) will be aligned as requested, the
type will not be padded to the alignment (as is usual for a struct), nor does the alignment propagate to
derived types such as arrays and parent structs. If you want to pad a structure or union so that individual
elements are also aligned and/or cause the alignment to apply to derived types, use the "aligned" type
attribute described in Section 7.14.4.
Since ANSI/ISO C declares that a typedef is simply an alias for a type (i.e. a struct) this pragma can be
applied to the struct, the typedef of the struct, or any typedef derived from them, and affects all aliases of
the base type.
This example aligns any st_tag structure variables on a page boundary:
typedef struct st_tag
{
int a;
short b;
} st_typedef;
Any use of STRUCT_ALIGN with a basic type (int, short, float) or a variable results in an error.
#pragma UNROLL( n )
If possible, the compiler unrolls the loop so there are n copies of the original loop. The compiler only
unrolls if it can determine that unrolling by a factor of n is safe. In order to increase the chances the loop is
unrolled, the compiler needs to know certain properties:
• The loop iterates a multiple of n times. This information can be specified to the compiler via the
multiple argument in the MUST_ITERATE pragma.
• The smallest possible number of iterations of the loop
• The largest possible number of iterations of the loop
The compiler can sometimes obtain this information itself by analyzing the code. However, sometimes the
compiler can be overly conservative in its assumptions and therefore generates more code than is
necessary when unrolling. This can also lead to not unrolling at all. Furthermore, if the mechanism that
determines when the loop should exit is complex, the compiler may not be able to determine these
properties of the loop. In these cases, you must tell the compiler the properties of the loop by using the
MUST_ITERATE pragma.
Specifying #pragma UNROLL(1) asks that the loop not be unrolled. Automatic loop unrolling also is not
performed in this case.
If multiple UNROLL pragmas are specified for the same loop, it is undefined which pragma is used, if any.
The argument string_literal is interpreted in the same way the tokens following a #pragma directive are
processed. The string_literal must be enclosed in quotes. A quotation mark that is part of the string_literal
must be preceded by a backward slash.
You can use the _Pragma operator to express #pragma directives in macros. For example, the
DATA_SECTION syntax:
#pragma DATA_SECTION( func ," section ")
Is represented by the _Pragma() operator syntax:
_Pragma ("DATA_SECTION( func ,\" section \")")
The following code illustrates using _Pragma to specify the DATA_SECTION pragma in a macro:
...
#define EMIT_PRAGMA(x) _Pragma(#x)
#define COLLECT_DATA(var) EMIT_PRAGMA(DATA_SECTION(var,"mysection"))
COLLECT_DATA(x)
int x;
...
The EMIT_PRAGMA macro is needed to properly expand the quotes that are required to surround the
section argument to the DATA_SECTION pragma.
The linkname of foo is _foo__Fi, indicating that foo is a function that takes a single argument of type int.
To aid inspection and debugging, a name demangling utility is provided that demangles names into those
found in the original C++ source. See Chapter 10 for more information.
The mangling algorithm follows that described in the Itanium C++ ABI (http://www.codesourcery.com/cxx-
abi/abi.html).
int foo(int i) { } would be mangled "_Z3fooi"
– snprintf() does not properly pad with spaces when writing to a wide character array
• stdlib.h
– strtof() atof() / strtod() / strtold() do not support hexadecimal float strings
– vfscanf() / vscanf() / vsscanf() return value on floating point matching failure is incorrect
• tgmath.h
• time.h
– strftime()
• wchar.h
– getws() / fputws()
– mbrlen()
– mbsrtowcs()
– wcscat()
– wcschr()
– wcscmp() / wcsncmp()
– wcscpy() / wcsncpy()
– wcsftime()
– wcsrtombs()
– wcsstr()
– wcstok()
– wcsxfrm()
– Wide character print / scan functions
– Wide character conversion functions
7.13.2 Enabling Strict ANSI/ISO Mode and Relaxed ANSI/ISO Mode (--strict_ansi and --
relaxed_ansi Options)
Under relaxed ANSI/ISO mode (the default), the compiler accepts language extensions that could
potentially conflict with a strictly conforming ANSI/ISO C/C++ program. Under strict ANSI mode, these
language extensions are suppressed so that the compiler will accept all strictly conforming programs.
Use the --strict_ansi option when you know your program is a conforming program and it will not compile
in relaxed mode. In this mode, language extensions that conflict with ANSI/ISO C/C++ are disabled and
the compiler will emit error messages where the standard requires it to do so. Violations that are
considered discretionary by the standard may be emitted as warnings instead.
Examples:
The following is strictly conforming C code, but will not be accepted by the compiler in the default relaxed
mode. To get the compiler to accept this code, use strict ANSI mode. The compiler will suppress the
interrupt keyword language exception, and interrupt may then be used as an identifier in the code.
int main()
{
int interrupt = 0;
return 0;
}
The following is not strictly conforming code. The compiler will not accept this code in strict ANSI mode.
To get the compiler to accept it, use relaxed ANSI mode. The compiler will provide the interrupt keyword
extension and will accept the code
interrupt void isr(void);
int main()
{
return 0;
}
The following code is accepted in all modes. The __interrupt keyword does not conflict with the ANSI/ISO
C standard, so it is always available as a language extension.
__interrupt void isr(void);
int main()
{
return 0;
}
The default mode is relaxed ANSI. This mode can be selected with the --relaxed_ansi (or -pr) option.
Relaxed ANSI mode accepts the broadest range of programs. It accepts all TI language extensions, even
those which conflict with ANSI/ISO, and ignores some ANSI/ISO violations for which the compiler can do
something reasonable. The GCC language extensions described in Section 7.14 are available in relaxed
ANSI/ISO mode.
7.14.1 Extensions
Most of the GCC language extensions are available in the TI compiler when compiling in relaxed ANSI
mode (--relaxed_ansi).
The extensions that the TI compiler supports are listed in Table 7-6, which is based on the list of
extensions found at the GNU web site. The shaded rows describe extensions that are not supported.
The format attribute is applied to the declarations of printf, fprintf, sprintf, snprintf, vprintf, vfprintf, vsprintf,
vsnprintf, scanf, fscanf, vfscanf, vscanf, vsscanf, and sscanf in stdio.h. Thus when GCC extensions are
enabled, the data arguments of these functions are type checked against the format specifiers in the
format string argument and warnings are issued when there is a mismatch. These warnings can be
suppressed in the usual ways if they are not desired.
See Section 7.9.19 for more about using the interrupt function attribute.
The malloc attribute is applied to the declarations of malloc, calloc, realloc and memalign in stdlib.h.
The packed attribute is supported for struct and union types. It is available only for target architectures
that have hardware support for unaligned access if the --relaxed_ansi option is used.
Members of a packed structure are stored as closely to each other as possible, omitting additional bytes of
padding usually added to preserve word-alignment. For example, assuming a word-size of 4 bytes
ordinarily has 3 bytes of padding between members c1 and i, and another 3 bytes of trailing padding after
member c2, leading to a total size of 12 bytes:
struct unpacked_struct { char c1; int i; char c2;};
However, the members of a packed struct are byte-aligned. Thus the following does not have any bytes of
padding between or after members and totals 6 bytes:
struct __attribute__((__packed__)) packed_struct { char c1; int i; char c2; };
Subsequently, packed structures in an array are packed together without trailing padding between array
elements.
Bit fields of a packed structure are bit-aligned. The byte alignment of adjacent struct members that are not
bit fields does not change. However, there are no bits of padding between adjacent bit fields.
The packed attribute can only be applied to the original definition of a structure or union type. It cannot be
applied with a typedef to a non-packed structure that has already been defined, nor can it be applied to
the declaration of a struct or union object. Therefore, any given structure or union type can only be packed
or non-packed, and all objects of that type will inherit its packed or non-packed attribute.
The packed attribute is not applied recursively to structure types that are contained within a packed
structure. Thus, in the following example the member s retains the same internal layout as in the first
example above. There is no padding between c and s, so s falls on an unaligned boundary:
struct __attribute__((__packed__)) outer_packed_struct { char c; struct unpacked_struct s; };
It is illegal to implicitly or explicitly cast the address of a packed struct member as a pointer to any non-
packed type except an unsigned char. In the following example, p1, p2, and the call to foo are all illegal.
void foo(int *param);
struct packed_struct ps;
However, it is legal to explicitly cast the address of a packed struct member as a pointer to an unsigned
char:
unsigned char *pc = (unsigned char *)&ps.i;
The TI compiler also supports an unpacked attribute for an enumeration type to allow you to indicate that
the representation is to be an integer type that is no smaller than int; in other words, it is not packed.
The following statement initializes all the elements of the vector to the same value, which is 1 in this case:
ushort4 myushort4 = (ushort4)(1);
The value of myvec in the following function is not resolved until run-time:
void foo(int a, int b)
{
int2 myvec = (int2)(a, b); ...
}
Shorter vectors can be concatenated together to form longer vectors. In the following example, two int
variables are concatenated into an int2 variable:
void foo(int a, int b)
{
int2 myvec = (int2)(a, b);
...
}
The following example concatenates two int2 variables into an int4 variable, which is passed to an
external function:
extern void bar(int4 v4);
bar(myv4);
}
Operator Description
- negate
~ bitwise complement
! logical not (integer vectors only)
The following example declares an int4 vector called pos_i4 and initializes it to the values 1, 2, 3, and 4. It
then uses the negate operator to initializes the values of another int4 vector, neg_i4, to the values -1, -2,
-3, and -4.
int4 pos_i4 = (int4)(1, 2, 3, 4);
int4 neg_i4 = -pos_i4;
Operator Description
+, - , *, / arithmetic operators (also supported for complex vectors)
=, +=, -=, *=, /=, assignment operators
% modulo operator (integer vectors only)
&, |, ^, <<, >> bitwise operators
>, >=, ==, !=, <=, < relational operators
increment / decrement operators (prefix and postfix; integer vectors only; also
++, --
supported for the real portion of complex vectors)
&&, || logical operators (integer vectors only)
The following example uses the =, ++, and + operators on vectors of type int4. Assume that the iv4
argument initially contains (1, 2, 3, 4). On exit from foo(), iv4 will contain (3, 4, 5, 6).
void foo(int4 iv4)
{
int4 local_iva = iv4++; /* local_iva = (1, 2, 3, 4) */
int4 local_ivb = iv4++; /* local_ivb = (2, 3, 4, 5) */
The arithmetic operators and increment / decrement operators can be used with complex vector types.
The increment / decrement operators add or subtract by 1+0i.
The following example multiplies and divides complex vectors of type cfloat2. For details about the rules
for complex multiplication and division, please see Annex G of the C99 C language specification.
void foo()
{
cfloat2 va = (cfloat2) (1.0, -2.0, 3.0, -4.0);
cfloat2 vb = (cfloat2) (4.0, -2.0, -4.0, 2.0);
On C64+ and C6740, the * and / operators in the previous example call a built-in function to perform the
complex multiply and divide operations. On C6600, the compiler generates a CMPYSP instruction to carry
out the complex multiply or divide operation.
194 TMS320C6000 C/C++ Language Implementation SPRUI04B – May 2017
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
www.ti.com Operations and Functions for Vector Data Types
.s0, .s1, ..., .s9, .sa, ..., .sf— Access one of up to 16 elements in a vector.
uchar16 ucvec16 = (uchar16)(1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16 );
uchar8 ucvec8 = (uchar8)(2, 4, 6, 8, 10, 12, 14, 16);
.even, .odd— Access the even or odd elements of a vector, where the zeroth element is even.
ushort4 usvec4 = (ushort4)(1, 2, 3, 4);
ushort2 usvecodd = usvec4.odd; /* usvecodd = (ushort2)(2, 4); */
ushort2 usveceven = usvec4.even; /* usveceven = (ushort2)(1, 3); */
.hi, .lo— Access the elements in the upper half of a vector with .hi or the elements in the lower half of a
vector with .lo.
ushort8 usvec8 = (ushort8)(1, 2, 3, 4, 5, 6, 7, 8);
ushort4 usvechi = usvec8.hi; /* usvechi = (ushort4)(5, 6, 7, 8); */
ushort4 usveclo = usvec8.lo; /* usveclo = (ushort4)(1, 2, 3, 4); */
.r— Access the real parts of each of the elements in a complex type vector.
cfloat2 cfa = (cfloat2)(1.0, -2.0, 3.0, -4.0);
float2 rfa = cfa.r; /* rfa = (float2)(1.0, 3.0); */
.i— Access the imaginary parts of each of the elements in a complex type vector.
cfloat2 cfa = (cfloat2)(1.0, -2.0, 3.0, -4.0);
float2 ifa = cfa.i; /* ifa = (float2)(-2.0, -4.0); */
Swizzle operators can be combined to access a subset of the subset of elements. The result of the
combination must be well-defined. For example, after the following code runs, usvec4 contains (1, 2, 5, 4).
ushort4 usvec4 = (ushort4)(1, 2, 3, 4);
usvec4.hi.even = 5;
If the data stored by a vector element is outside the range of values that can be stored in the destination
type, by default the value is truncated. However, if you add the _sat modifier (for "saturated") to the
function name, values that are outside the range of the destination type are set to the maximum value of
the destination type (or minimum for negative values outside the range). The _sat modifier cannot be used
when converting an integer vector to a floating-point vector. In the following example, any values larger
than 32,767 stored in an element of myint4 are set to as 32,767 in the corresponding element of
myshort4.
int4 myint4;
Likewise, when converting between floating-point and integer vectors, one of the following modifiers can
be added to the function name to specify how the floating-point values should be rounded:
• _rte — Round to nearest even integer
• _rtz — Round toward zero (default for converting floats to integers)
• _rtp — Round toward positive infinity
• _rtn — Round toward negative infinity
When converting an integer type to a float, rounding is not necessary. Rounding from a larger float type
(double) to a smaller float type (float) does require rounding. The default rounding method for double to
float conversions is _rte.
The following example converts the data stored in a float4 to an int4. It uses the _rtp modifier, so values
are rounded up toward positive infinity:
float4 myfloat4;
The _sat modifier can be combined with a rounding modifier. The following example rounds floating values
toward even and sets values greater than the maximum possible short value to the maximum value:
float4 myfloat4;
If vector data types are enabled, you can also use the convert_<type>() functions for scalar (non-vector)
types such as short and int. The result is the same as type casting the source type. Conversions between
scalar types and vector types is not allowed, because the source and destination types must contain the
same number of elements.
The convert_<destination type>() functions are not available for use with complex vector types.
If the sizes of the source and destination types are different, an error occurs.
If vector data types are enabled, you can also use the as_<type>() functions for scalar (non-vector) types.
The types must have the same number of bits. The following example re-interprets a float value as an int
value. Since the float value of 1.0 is represented in hex as 0x3f800000, the value in the resulting int is
1,065,353,216.
float myfloat = 1.0f;
myint = as_int(myfloat);
The as_<destination type>() functions are not available for use with complex vector types.
Prototypes for all the supported vector built-in functions are listed in the "c6x_vec.h" header file, which is
located in the "include" sub-directory of your C6000 Code Generation Tools installation. Please see the
"c6x_vec.h" for a complete list of the vector built-in functions.
The following example vbif_ex.c file uses the __add_sat() and __sub_sat() built-in functions with vectors:
#include <stdio.h>
#include <c6x_vec.h>
int main()
{
short4 va = (short4) (1, 2, 3, -32766);
short4 vb = (short4) (5, 32767, -13, 17);
short4 vc = va + vb;
short4 vd = va - vb;
short4 ve = __add_sat(va, vb);
short4 vf = __sub_sat(va, vb);
print_short4("va=", va);
print_short4("vb=", vb);
print_short4("vc=(va+vb)=", vc);
print_short4("vd=(va-vb)=", vd);
print_short4("ve=__add_sat(va,vb)=", ve);
print_short4("vf=__sub_sat(va,vb)=", vf);
return 0;
}
Note that the lnk.cmd file contains a reference to rts6400.lib. The rts6400.lib library contains
c6x_veclib.obj, which defines the built-in functions, __add_sat() and __sub_sat().
Running this example produces the following output:
va= <1, 2, 3, -32766>
vb= <5, 32767, -13, 17>
vc=(va+vb)= <6, -32767, -10, -32749>
vd=(va-vb)= <-4, -32765, 16, 32753>
ve=__add_sat(va,vb)= <6, 32767, -10, -32749>
vf=__sub_sat(va,vb)= <-4, -32765, 16, -32768>
Run-Time Environment
This chapter describes the TMS320C6000 C/C++ run-time environment. To ensure successful execution
of C/C++ programs, it is critical that all run-time code maintain this environment. It is also important to
follow the guidelines in this chapter if you write assembly language functions that interface with C/C++
code.
8.1.1 Sections
The compiler produces relocatable blocks of code and data called sections. The sections are allocated
into memory in a variety of ways to conform to a variety of system configurations. For more information
about sections and allocating them, see the introductory object file information in the TMS320C6000
Assembly Language Tools User's Guide.
There are two basic types of sections:
• Initialized sections contain data or executable code. Initialized sections are usually, but not always,
read-only. The C/C++ compiler creates the following initialized sections:
– The .args section contains the command argument for a host-based loader. This section is read-
only. See the --arg_size option for details.
– The .binit section contains boot time copy tables. This is a read-only section. For details on BINIT,
see the TMS320C6000 Assembly Language Tools User's Guide for linker command file
information.
– The .cinit section is created only if you are using the --rom_model option. It contains tables for
explicitly initialized global and static variables.
– The .init_array section contains the table for calling global constructors.
– The .ovly section contains copy tables other than boot time (.binit) copy tables. This is a read-only
section.
– The .c6xabi.exidx section contains the index table for exception handling. The .c6xabi.extab
section contains stack unwinding instructions for exception handling. These sections are read-only.
See the --exceptions option for details.
– The .name.load section contains the compressed image of section name. This section is read-
only. See the TMS320C6000 Assembly Language Tools User's Guide for information on copy
tables.
– The .ppinfo section contains correlation tables and the .ppdata section contains data tables for
compiler-based profiling. See the --gen_profile_info option for details.
– The .const section contains string literals, floating-point constants, and data defined with the
C/C++ qualifiers far and const (provided the constant is not also defined as volatile). This is a read-
only section. String literals are placed in the .const:.string subsection to enable greater link-time
placement control.
– The .fardata section reserves space for non-const, initialized far global and static variables.
– The .neardata section reserves space for non-const, initialized near global and static variables.
– The .rodata section reserves space for const near global and static variables.
– The .switch section contains jump tables for large switch statements.
– The .text section contains all the executable code and compiler-generated constants. This section
is usually read-only.
SPRUI04B – May 2017 Run-Time Environment 201
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
Memory Model www.ti.com
– The .TI.crctab section contains CRC checking tables. This is a read-only section.
• Uninitialized sections reserve space in memory (usually RAM). A program can use this space at run
time to create and store variables. The compiler creates the following uninitialized sections:
– The .bss section reserves space for uninitialized global and static variables. Uninitialized variables
that are also unused are usually created as common symbols (unless you specify --common=off)
instead of being placed in .bss so that they can be excluded from the resulting application.
– The .far section reserves space for global and static variables that are declared far.
– The .stack section reserves memory for the system stack.
– The .sysmem section reserves space for dynamic memory allocation. The reserved space is used
by dynamic memory allocation routines, such as malloc, calloc, realloc, or new. If a C/C++ program
does not use these functions, the compiler does not create the .sysmem section.
The assembler creates the default sections .text, .bss, and .data. You can instruct the compiler to create
additional sections by using the CODE_SECTION and DATA_SECTION pragmas (see Section 7.9.4 and
Section 7.9.7).
Stack Overflow
NOTE: The compiler provides no means to check for stack overflow during compilation or at run
time. A stack overflow disrupts the run-time environment, causing your program to fail. Be
sure to allow enough space for the stack to grow. You can use the --entry_hook option to
add code to the beginning of each function to check for stack overflow; see Section 3.16.
The --mem_model:data options do not affect the access to objects explicitly declared with the near of far
keyword.
By default, all run-time-support data is defined as far.
For more information on near and far accesses to data, see Section 7.5.5.
Consts that are declared far, either explicitly through the far keyword or implicitly using --
mem_model:const are always placed in the .const section.
S S S S S S S S S S S S S S S S S S S S S S S S S S I I I I I I
31 7 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 U U U U U U U
31 7 0
S S S S S S S S S S S S S S S S S I I I I I I I I I I I I I I I
31 15 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 U U U U U U U U U U U U U U U U
31 15 0
LEGEND: S = sign, I = signed integer, U = unsigned integer, MS = most significant, LS = least significant
8.2.1.2 enum, int, and long Data Types (signed and unsigned)
The int and unsigned int data types are stored in memory as 32-bit objects (see Figure 8-2). Objects of
these types are loaded to and stored from bits 0-31 of a register. In big-endian mode, 4-byte objects are
loaded to registers by moving the first byte (that is, the lower address) of memory to bits 24-31 of the
register, moving the second byte of memory to bits 16-23, moving the third byte to bits 8-15, and moving
the fourth byte to bits 0-7. In little-endian mode, 4-byte objects are loaded to registers by moving the first
byte (that is, the lower address) of memory to bits 0-7 of the register, moving the second byte to bits 8-15,
moving the third byte to bits 16-23, and moving the fourth byte to bits 24-31.
For details about the size of an enum type, see Section 7.4.1.
Even register
LS
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
31 0
LEGEND: S = sign, U = unsigned integer, I = signed integer, X = unused, MS = most significant, LS = least significant
Even register
LS
U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U
31 0
LEGEND: S = sign, U = unsigned integer, I = signed integer, X = unused, MS = most significant, LS = least significant
Even register
LS
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
31 0
LEGEND: S = sign, U = unsigned integer, I = signed integer, X = unused, MS = most significant, LS = least significant
Even register
LS
U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U
31 0
LEGEND: S = sign, U = unsigned integer, I = signed integer, X = unused, MS = most significant, LS = least significant
Even register
LS
M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M
31 0
LEGEND: S = sign, M = mantissa, E = exponent, MS = most significant, LS = least significant
The parameter f is the pointer to the member function if it is nonvirtual. The 0 is the offset to the virtual
function pointer within the class object. The parameter d is the offset to be added to the beginning of the
class object for this pointer.
Members of structs have sizes and alignments equal to those they would have as independent objects,
unless the packed attribute is used. An array member of a struct is aligned to the alignment of its element
type; this may differ from the alignment the element would have if it were an independent top-level (static)
object.
Structs always have size equal to a multiple of the struct alignment. This sometimes requires padding after
the last member to round the size up to a multiple of the struct alignment. The size of a structure includes
any necessary padding between members. For example, if the largest member of a struct is of type float,
the size of the struct will be a multiple of 32 bits.
Static scope arrays (sometimes called top-level arrays) are aligned on an 8-byte (64-bit) boundary.
A0 represents the least significant bit of the field A; A1 represents the next least significant bit, etc. Again,
storage of bit fields in memory is done with a byte-by-byte, rather than bit-by-bit, transfer.
Big-endian memory
Byte 0 Byte 1 Byte 2 Byte 3
A A A A A A A B B B B B B B B B B C C C D D E E E E E E E E E X
6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 2 1 0 1 0 8 7 6 5 4 3 2 1 0 X
Little-endian register
MS LS
X E E E E E E E E E D D C C C B B B B B B B B B B A A A A A A A
X 8 7 6 5 4 3 2 1 0 1 0 2 1 0 9 8 7 6 5 4 3 2 1 0 6 5 4 3 2 1 0
31 0
Little-endian memory
Byte 0 Byte 1 Byte 2 Byte 3
B A A A A A A A B B B B B B B B E E D D C C C B X E E E E E E E
0 6 5 4 3 2 1 0 8 7 6 5 4 3 2 1 1 0 1 0 2 1 0 9 X 8 7 6 5 4 3 2
LEGEND: X = not used, MS = most significant, LS = least significant
All other control registers are not saved or restored by the compiler.
The compiler assumes that control registers not listed in Table 8-2 that can have an effect on compiled
code have default values. For example, the compiler assumes all circular addressing-enabled registers
are set for linear addressing (the AMR is used to enable circular addressing). Enabling circular addressing
and then calling a C/C++ function without restoring the AMR to a default setting violates the calling
convention. You must be certain that control registers which affect compiler-generated code have a default
value when calling a C/C++ function from assembly.
Assembly language programmers must be aware that the linker assumes B15 contains the stack pointer.
The linker needs to save and restore values on the stack in trampoline code that it generates. If you do
not use B15 as the stack pointer in assembly code, you should use the linker option that disables
trampolines, --trampolines=off. Otherwise, trampolines could corrupt memory and overwrite register
values.
A4 A4 B4 A6
int func2( int a, float b, int c) struct A d, float e, int f, int g);
A4 A4 B4 A6 B6 A8 B8 A10
int func3( int a, double b, float c) long double d);
A4 A4 B5:B4 A6 B7:B6
/*NOTE: The following function has a variable number of arguments. */
int vararg(int a, int b, int c, int d);
A4 A4 B4 A6 stack
struct A func4( int y);
A3 A4
__x128_t func5( __x128_t a);
A7:A6:A5:A4 A7:A6:A5:A4
void func6(int a, int b, __x128_t c);
A4 B4 A11:A10:A9:A8
void func7(int a, int b, __x128_t c, int d, int e, int f, __x128_t g, int h);
You must be careful to declare functions properly that accept structure arguments, both at the point
where they are called (so that the structure argument is passed as an address) and at the point where
they are declared (so the function knows to copy the structure to a local copy).
5. The called function executes the code for the function.
6. If the called function returns any integer, pointer, or float type, the return value is placed in the A4
register. If the function returns a double, long double, long, or long long type, the value is placed in the
A5:A4 register pair. For C6600 if the function returns a __x128_t, the value is placed in A7:A6:A5:A4.
If the function returns a structure, the caller allocates space for the structure and passes the address of
the return space to the called function in A3. To return a structure, the called function copies the
structure to the memory block pointed to by the extra argument.
In this way, the caller can be smart about telling the called function where to return the structure. For
example, in the statement s = f(x), where s is a structure and f is a function that returns a structure, the
caller can actually make the call as f(&s, x). The function f then copies the return structure directly into
s, performing the assignment automatically.
If the caller does not use the return structure value, an address value of 0 can be passed as the first
argument. This directs the called function not to copy the return structure.
You must be careful to declare functions properly that return structures, both at the point where they
are called (so that the extra argument is passed) and at the point where they are declared (so the
function knows to copy the result).
7. Any register numbered A10 to A15 or B10 to B15 that was saved in Step 4 is restored.
8. If A15 was used as a frame pointer (FP), the old value of A15 is restored from the stack. The space
allocated for the function in Step 1 is reclaimed at the end of the function by adding a constant to
register B15 (SP).
9. The function returns by jumping to the value of the return register (B3) or the saved value of the return
register.
code have default values. For example, the compiler assumes all circular-addressing-enabled registers
are set for linear addressing (the AMR is used to enable circular addressing). Enabling circular
addressing and then calling a C/C++ function without restoring the AMR to a default setting violates the
calling convention. Also, enabling circular addressing and having interrupts enabled violates the calling
convention. You must be certain that control registers that affect compiler-generated code have a
default value when calling a C/C++ function from assembly.
• Assembly language programmers must be aware that the linker assumes B15 contains the stack
pointer. The linker needs to save and restore values on the stack in trampoline code that it generates.
If you do not use B15 as the stack pointer in your assembly code, you should use the linker option that
disables trampolines, --trampolines=off. Otherwise, trampolines could corrupt memory and overwrite
register values.
• Assembly code that utilizes B14 and/or B15 for localized purposes other than the data-page pointer
and stack pointer may violate the calling convention. The assembly programmer needs to protect these
areas of non-standard use of B14 and B15 by turning off interrupts around this code. Because interrupt
handling routines need the stack (and thus assume the stack pointer is in B15) interrupts need to be
turned off around this code. Furthermore, because interrupt service routines may access global data
and may call other functions which access global data, this special treatment also applies to B14. After
the data-page pointer and stack pointer have been restored, interrupts may be turned back on.
extern "C" {
extern int asmfunc(int a); /* declare external asm function */
int gvar = 0; /* define global variable */
}
void main()
{
int I = 5;
.global asmfunc
.global gvar
asmfunc:
LDW *+b14(gvar),A3
NOP 4
ADD a3,a4,a3
STW a3,*b14(gvar)
MV a3,a4
B b3
NOP 5
In the C++ program in Example 8-1, the extern declaration of asmfunc is optional because the return type
is int. Like C/C++ functions, you need to declare assembly functions only if they return noninteger values
or pass noninteger parameters.
NOTE: SP Semantics
The stack pointer must always be 8-byte aligned. This is automatically performed by the C
compiler and system initialization code in the run-time-support libraries. Any hand-written
assembly code that has interrupts enabled or calls a function defined in C or linear assembly
source should also reserve a multiple of 8 bytes on the stack.
Because you are referencing only the symbol's value as stored in the symbol table, the symbol's declared
type is unimportant. In Example 8-5, int is used. You can reference linker-defined symbols in a similar
manner.
Table 8-4 provides a summary of the C6000 intrinsics clarifying which devices support which intrinsics.
The intrinsics listed in Table 8-5 can be used on all C6000 devices. They correspond to the indicated
C6000 assembly language instruction(s). See the TMS320C6000 CPU and Instruction Set Reference
Guide for more information.
See Table 8-6 for a list of intrinsics that are specific to C6740 and C6600. See Table 8-7 for a list of
C6600-specifiic intrinsics.
Some items listed in the following tables are actually defined in the c6x.h header file as macros that point
to intrinsics. This header file is provided in the compiler's "include" directory. Your code must include this
header file in order to use the noted macros.
(1)
See the TMS320C6000 Programmer's Guide for more information.
(2)
See Section 8.6.10 for details on manipulating 8-byte data quantities.
(3)
See the TMS320C6000 Programmer's Guide for more information.
(4)
See Section 8.6.10 for details on manipulating 8-byte data quantities.
The intrinsics listed in Table 8-6 can be used for C6740 and C6600 devices, but not C6400+ devices. The
intrinsics listed correspond to the indicated C6000 assembly language instruction(s). See the
TMS320C6000 CPU and Instruction Set Reference Guide for more information.
See Table 8-5 for a list of generic C6000 intrinsics. See Table 8-7 for a list of C6600-specific intrinsics.
(1)
See Section 8.6.10 for details on manipulating 8-byte data quantities.
The intrinsics listed in Table 8-7 are supported only for C6600 devices. These intrinsics are in addition to
those listed in Table 8-5 and Table 8-6. The intrinsics listed correspond to the indicated assembly
language instruction(s). See the TMS320C6000 CPU and Instruction Set Reference Guide for more
information.
#include <c6x.h>
#include <stdio.h>
__x128_t mpy_four_way_example(__x128_t s, int a, int b, int c, int d)
{
__x128_t t = _ito128(a, b, c, d); // Pack values into a __x128_t
__x128_t results = _qmpy32(s, t); // Perform a four-way SIMD multiply
return results;
}
The _disable_interrupts() and _enable_interrupts( ) intrinsics both return an unsigned int that can be
subsequently passed to _restore_interrupts( ) to restore the previous interrupt state. These intrinsics
provide a barrier to optimization and are therefore appropriate for implementing a critical (or atomic)
section. For example,
unsigned int restore_value;
restore_value = _disable_interrupts();
if (sem) sem--;
_restore_interrupts(restore_value);
The example code disables interrupts so that the value of sem read for the conditional clause does not
change before the modification of sem in the then clause. The intrinsics are barriers to optimization, so the
memory reads and writes of sem do not cross the _disable_interrupts or _restore_interrupts locations.
Overwrites CSR
NOTE: The _restore_interrupts( ) intrinsic overwrites the CSR control register with the value in the
argument. Any CSR bits changed since the _disable_interrupts( ) intrinsic or
_enable_interrupts( ) intrinsic will be lost.
8.6.11 Using MUST_ITERATE and _nassert to Enable SIMD and Expand Compiler
Knowledge of Loops
Through the use of MUST_ITERATE and _nassert, you can guarantee that a loop executes a certain
number of times.
This example tells the compiler that the loop is guaranteed to run exactly 10 times:
#pragma MUST_ITERATE(10,10);
for (I = 0; I < trip_count; I++) { ...
MUST_ITERATE can also be used to specify a range for the trip count as well as a factor of the trip count.
For example:
#pragma MUST_ITERATE(8,48,8);
for (I = 0; I < trip; I++) { ...
This example tells the compiler that the loop executes between 8 and 48 times and that the trip variable is
a multiple of 8 (8, 16, 24, 32, 40, 48). The compiler can now use all this information to generate the best
loop possible by unrolling better even when the --interrupt_thresholdn option is used to specify that
interrupts do occur every n cycles.
The TMS320C6000 Programmer's Guide states that one of the ways to refine C/C++ code is to use word
accesses to operate on 16-bit data stored in the high and low parts of a 32-bit register. Examples using
casts to int pointers are shown with the use of intrinsics to use certain instructions like _mpyh. This can be
automated by using the _nassert(); intrinsic to specify that 16-bit short arrays are aligned on a 32-bit
(word) boundary.
The following examples generate the same assembly code:
• Example 1
int dot_product(short *x, short *y, short z)
{
int *w_x = (int *)x;
int *w_y = (int *)y;
int sum1 = 0, sum2 = 0, I;
for (I = 0; I < z/2; I++)
{
sum1 += _mpy(w_x[i], w_y[i]);
sum2 += _mpyh(w_x[i], w_y[i]);
}
return (sum1 + sum2);
}
• Example 2
int dot_product(short *x, short *y, short z)
{
int sum = 0, I;
The following subsections describe methods you can use to ensure the data referenced by ptr is aligned.
You have to employ one of these methods at every place in your code where f() is called.
Such an array is automatically aligned to an 8-byte boundary. This is true whether the array is global,
static, or local. This automatic alignment is all that is required to achieve SIMD optimization on those
respective devices. You still need to include the _nassert because, in the general case, the compiler
cannot guarantee that ptr holds the address of a properly aligned array.
If you always pass the base address of an array to pointers like ptr, then you can use the following macro
to reflect that fact.
#if defined(_TMS320C6600)
#define ALIGNED_ARRAY(ptr) _nassert((int) ptr % 16 == 0)
#elif defined(_TMS320C6400_PLUS) || defined(_TMS320C6740)
#define ALIGNED_ARRAY(ptr) _nassert((int) ptr % 8 == 0)
#else
#define ALIGNED_ARRAY(ptr) /* empty */
#endif
The macro works regardless of which C6000 device you build for, or if you port the code to another target.
This code passes an unaligned address to ptr, thus violating the presumption coded in the _nassert().
There is no direct remedy for this case. Avoid this practice whenever possible.
To get a stricter alignment, use the function memalign with the desired alignment. To get an alignment of
256 bytes for example:
buffer = memalign(256, 100 * sizeof(short);
If you are using BIOS memory allocation routines, be sure to pass the alignment factor as the last
argument using the syntax that follows:
See the TMS320C6000 DSP/BIOS Help for more information about BIOS memory allocation routines and
the segid parameter in particular.
struct s
{
...
short buf1[50];
...
} g;
...
f(g.buf1);
class c
{
public :
short buf1[50];
void mfunc(void);
...
};
void c::mfunc()
{
f(buf1);
...
}
To align an array in a structure, place it inside a union with a dummy object that has the desired
alignment. If you want 8 byte alignment, use a "long long" dummy field. For example:
struct s
{
union u
{ long long dummy; /* 8-byte alignment */
short buffer[50]; /* also 8-byte alignment */
} u;
...
};
If you want to declare several arrays contiguously, and maintain a given alignment, you can do so by
keeping the array size, measured in bytes, an even multiple of the desired alignment. For example:
struct s
{
union u
{ long long dummy; /* 8-byte alignment */
short buffer[50]; /* also 8-byte alignment */
short buf2[50]; /* 4-byte alignment */
...
} u;
};
Because the size of buf1 is 50 * 2-bytes per short = 100 bytes, and 100 is an even multiple of 4, not 8,
buf2 is only aligned on a 4-byte boundary. Padding buf1 out to 52 elements makes buf2 8-byte aligned.
Within a structure or class, there is no way to enforce an array alignment greater than 8. For the purposes
of SIMD optimization, this is not necessary.
If a C/C++ interrupt routine does not call any other functions, only those registers that the interrupt handler
attempts to define are saved and restored. However, if a C/C++ interrupt routine does call other functions,
these functions can modify unknown registers that the interrupt handler does not use. For this reason, the
routine saves all usable registers if any other functions are called. Interrupts branch to the interrupt return
pointer (IRP). Do not call interrupt handling functions directly.
Interrupts can be handled directly with C/C++ functions by using the INTERRUPT pragma or the
__interrupt keyword. For more information, see Section 7.9.19 and Section 7.5.4, respectively.
You are responsible for handling the AMR control register and the SAT bit in the CSR correctly inside an
interrupt. By default, the compiler does not do anything extra to save/restore the AMR and the SAT bit.
Macros for handling the SAT bit and the AMR register are included in the c6x.h header file.
For example, you are using circular addressing in some hand assembly code (that is, the AMR does not
equal 0). This hand assembly code can be interrupted into a C code interrupt service routine. The C code
interrupt service routine assumes that the AMR is set to 0. You need to define a local unsigned int
temporary variable and call the SAVE_AMR and RESTORE_AMR macros at the beginning and end of
your C interrupt service routine to correctly save/restore the AMR inside the C interrupt service routine.
#include <c6x.h>
/* restore the AMR for your hand assembly code before exiting */
RESTORE_AMR(temp_amr);
}
If you need to save/restore the SAT bit (i.e. you were performing saturated arithmetic when interrupted
into the C interrupt service routine which may also perform some saturated arithmetic) in your C interrupt
service routine, it can be done in a similar way as the above example using the SAVE_SAT and
RESTORE_SAT macros.
The compiler saves and restores the ILC and RILC control registers if needed.
For floating point architectures, you are responsible for handling the floating point control registers
FADCR, FAUCR and FMCR. If you are reading bits out of the floating pointer control registers, and if the
interrupt service routine (or any called function) performs floating point operations, then the relevant
floating point control registers should be saved and restored. No macros are provided for these registers,
as simple assignment to and from an unsigned int temporary will suffice.
The compiler allocates the variables 'i' and 'a[] to .data section and the initial values are placed directly.
.global i
.data
.align 4
i:
.field 23,32 ; i @ 0
.global a
.data
.align 4
a:
.field 1,32 ; a[0] @ 0
.field 2,32 ; a[1] @ 32
.field 3,32 ; a[2] @ 64
.field 4,32 ; a[3] @ 96
.field 5,32 ; a[4] @ 128
Each compiled module that defines static or global variables contains these .data sections. The linker
treats the .data section like any other initialized section and creates an output section. In the load-time
initialization model, the sections are loaded into memory and used by the program. See Section 8.9.2.5.
In the run-time initialization model, the linker uses the data in these sections to create initialization data
and an additional compressed initialization table. The boot routine processes the initialization table to copy
data from load addresses to run addresses. See Section 8.9.2.3.
C auto init
table and data C auto init
(ROM) Loader table and data
(.cinit section) (ROM)
Boot
routine
.data
uninitialized
(RAM)
The linker defined symbols __TI_CINIT_Base and __TI_CINIT_Limit point to the start and end of the
table, respectively. Each entry in this table corresponds to one output section that needs to be initialized.
The initialization data for each output section could be encoded using different encoding.
The load address in the C auto initialization record points to initialization data with the following format:
The first 8-bits of the initialization data is the handler index. It indexes into a handler table to get the
address of a handler function that knows how to decode the following data.
The handler table is a list of 32-bit function pointers.
_TI_Handler_Table_Base:
32-bit handler 1 address
The encoded data that follows the 8-bit index can be in one of the following format types. For clarity the 8-
bit index is also depicted for each format.
8-bit index 24-bit padding 32-bit length (N) N byte initialization data (not compressed)
The compiler uses 24-bit padding to align the length field to a 32-bit boundary. The 32-bit length field
encodes the length of the initialization data in bytes (N). N byte initialization data is not compressed and is
copied to the run address as is.
The run-time support library has a function __TI_zero_init() to process this type of initialization data. The
first argument to this function is the address pointing to the byte after the 8-bit index. The second
argument is the run address from the C auto initialization record.
The compiler uses 24-bit padding to align the length field to a 32-bit boundary. The 32-bit length field
encodes the number of bytes to be zero initialized.
The run-time support library has a function __TI_zero_init() to process the zero initialization. The first
argument to this function is the address pointing to the byte after the 8-bit index. The second argument is
the run address from the C auto initialization record.
The data following the 8-bit index is compressed using Run Length Encoded (RLE) format. uses a simple
run length encoding that can be decompressed using the following algorithm:
1. Read the first byte, Delimiter (D).
2. Read the next byte (B).
3. If B != D, copy B to the output buffer and go to step 2.
4. Read the next byte (L).
(a) If L == 0, then length is either a 16-bit, a 24-bit value, or we’ve reached the end of the data, read
next byte (L).
(i) If L == 0, length is a 24-bit value or the end of the data is reached, read next byte (L).
(i) If L == 0, the end of the data is reached, go to step 7.
(ii) Else L <<= 16, read next two bytes into lower 16 bits of L to complete 24-bit value for L.
(ii) Else L <<= 8, read next byte into lower 8 bits of L to complete 16-bit value for L.
(b) Else if L > 0 and L < 4, copy D to the output buffer L times. Go to step 2.
(c) Else, length is 8-bit value (L).
5. Read the next byte (C); C is the repeat character.
6. Write C to the output buffer L times; go to step 2.
7. End of processing.
The run-time support library has a routine __TI_decompress_rle24() to decompress data compressed
using RLE. The first argument to this function is the address pointing to the byte after the 8-bit index. The
second argument is the run address from the C auto initialization record.
The data following the 8-bit index is compressed using LZSS compression. The run-time support library
has the routine __TI_decompress_lzss() to decompress the data compressed using LZSS. The first
argument to this function is the address pointing to the byte after the 8-bit index. The second argument is
the run address from the C auto initialization record.
void auto_initialize()
{
unsigned char **table_ptr;
unsigned char **table_limit;
/*--------------------------------------------------------------*/
/* Check if Handler table has entries. */
/*--------------------------------------------------------------*/
if (&__TI_Handler_Table_Base >= &__TI_Handler_Table_Limit)
return;
/*---------------------------------------------------------------*/
/* Get the Start and End of the CINIT Table. */
/*---------------------------------------------------------------*/
table_ptr = (unsigned char **)&__TI_CINIT_Base;
table_limit = (unsigned char **)&__TI_CINIT_Limit;
while (table_ptr < table_limit)
{
/*-------------------------------------------------------------*/
/* 1. Get the Load and Run address. */
/* 2. Read the 8-bit index from the load address. */
/* 3. Get the handler function pointer using the index from */
/* handler table. */
/*-------------------------------------------------------------*/
unsigned char *load_addr = *table_ptr++;
unsigned char *run_addr = *table_ptr++;
unsigned char handler_idx = *load_addr++;
handler_fptr handler =
(handler_fptr)(&HANDLER_TABLE)[handler_idx];
/*-------------------------------------------------------------*/
/* 4. Call the handler and pass the pointer to the load data */
/* after index and the run address. */
/*-------------------------------------------------------------*/
(*handler)((const unsigned char *)load_addr, run_addr);
}
}
Since the linker does not generate the C autoinitialization tables, no boot time initialization is performed.
Figure 8-12 illustrates the initialization of variables at load time.
.data
Loader
section
.data section
(initialized)
(RAM)
Address of constructor 1
Address of constructor 2
Address of constructor n
__TI_INITARRAY_Limit:
Some of the features of C/C++ (such as I/O, dynamic memory allocation, string operations, and
trigonometric functions) are provided as an ANSI/ISO C/C++ standard library, rather than as part of the
compiler itself. The TI implementation of this library is the run-time-support library (RTS). The C/C++
compiler implements the ISO standard library except for those facilities that handle signal and locale
issues (properties that depend on local language, nationality, or culture). Using the ANSI/ISO standard
library ensures a consistent set of functions that provide for greater portability.
In addition to the ANSI/ISO-specified functions, the run-time-support library includes routines that give you
processor-specific commands and direct C language I/O requests. These are detailed in Section 9.1 and
Section 9.2.
A library-build utility is provided with the code generation tools that lets you create customized run-time-
support libraries. This process is described in Section 9.4 .
258 Using Run-Time-Support Functions and Building Libraries SPRUI04B – May 2017
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
www.ti.com C and C++ Run-Time Support Libraries
SPRUI04B – May 2017 Using Run-Time-Support Functions and Building Libraries 259
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
C and C++ Run-Time Support Libraries www.ti.com
260 Using Run-Time-Support Functions and Building Libraries SPRUI04B – May 2017
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
www.ti.com C and C++ Run-Time Support Libraries
SPRUI04B – May 2017 Using Run-Time-Support Functions and Building Libraries 261
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
C and C++ Run-Time Support Libraries www.ti.com
trg The device family of the C6000 architecture that the library was built for. This can be one
of the following: 64plus, 6740, or 6600.
endian Indicates endianness:
(blank) Little-endian library
e Big-endian library
abi Indicates the application binary interface (ABI) used. Although the COFF file format is no
longer supported, the library filename still contains "_elf" to distinguish the EABI libraries
from older COFF libraries.
_elf EABI
eh Indicates whether the library has exception handling support
(blank) exception handling not supported
_eh exception handling support
262 Using Run-Time-Support Functions and Building Libraries SPRUI04B – May 2017
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
www.ti.com The C I/O Functions
void main()
{
FILE *fid;
fid = fopen("myfile","w");
fprintf(fid,"Hello, world\n");
fclose(fid);
Issuing the following compiler command compiles, links, and creates the file main.out from the run-time-
support library:
cl6x main.c -z --heap_size=1000 --output_file=main.out
SPRUI04B – May 2017 Using Run-Time-Support Functions and Building Libraries 263
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
The C I/O Functions www.ti.com
Description The open function opens the file specified by path and prepares it for I/O.
• The path is the filename of the file to be opened, including an optional directory path
and an optional device specifier (see Section 9.2.5).
• The flags are attributes that specify how the file is manipulated. The flags are
specified using the following symbols:
O_RDONLY (0x0000) /* open for reading */
O_WRONLY (0x0001) /* open for writing */
O_RDWR (0x0002) /* open for read & write */
O_APPEND (0x0008) /* append on each write */
O_CREAT (0x0200) /* open with file create */
O_TRUNC (0x0400) /* open with truncation */
O_BINARY (0x8000) /* open in binary mode */
Low-level I/O routines allow or disallow some operations depending on the flags used
when the file was opened. Some flags may not be meaningful for some devices,
depending on how the device implements files.
• The file_descriptor is assigned by open to an opened file.
The next available file descriptor is assigned to each new file opened.
Return Value The function returns one of the following values:
non-negative file descriptor if successful
-1 on failure
264 Using Run-Time-Support Functions and Building Libraries SPRUI04B – May 2017
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
www.ti.com close — Close File for I/O
Description The close function closes the file associated with file_descriptor.
The file_descriptor is the number assigned by open to an opened file.
Description The read function reads count characters into the buffer from the file associated with
file_descriptor.
• The file_descriptor is the number assigned by open to an opened file.
• The buffer is where the read characters are placed.
• The count is the number of characters to read from the file.
Description The write function writes the number of characters specified by count from the buffer to
the file associated with file_descriptor.
• The file_descriptor is the number assigned by open to an opened file.
• The buffer is where the characters to be written are located.
• The count is the number of characters to write to the file.
SPRUI04B – May 2017 Using Run-Time-Support Functions and Building Libraries 265
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
lseek — Set File Position Indicator www.ti.com
Description The lseek function sets the file position indicator for the given file to a location relative to
the specified origin. The file position indicator measures the position in characters from
the beginning of the file.
• The file_descriptor is the number assigned by open to an opened file.
• The offset indicates the relative offset from the origin in characters.
• The origin is used to indicate which of the base locations the offset is measured from.
The origin must be one of the following macros:
SEEK_SET (0x0000) Beginning of file
SEEK_CUR (0x0001) Current value of the file position indicator
SEEK_END (0x0002) End of file
Return Value The return value is one of the following:
# new value of the file position indicator if successful
(off_t)-1 on failure
Description The unlink function deletes the file specified by path. Depending on the device, a deleted
file may still remain until all file descriptors which have been opened for that file have
been closed. See Section 9.2.3.
The path is the filename of the file, including path information and optional device prefix.
(See Section 9.2.5.)
266 Using Run-Time-Support Functions and Building Libraries SPRUI04B – May 2017
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
www.ti.com rename — Rename File
NOTE: The optional device specified in the new name must match the device of
the old name. If they do not match, a file copy would be required to
perform the rename, and rename is not capable of this action.
SPRUI04B – May 2017 Using Run-Time-Support Functions and Building Libraries 267
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
DEV_open — Open File for I/O www.ti.com
Syntax int DEV_open (const char * path , unsigned flags , int llv_fd );
Description This function finds a file matching path and opens it for I/O as requested by flags.
• The path is the filename of the file to be opened. If the name of a file passed to open
has a device prefix, the device prefix will be stripped by open, so DEV_open will not
see it. (See Section 9.2.5 for details on the device prefix.)
• The flags are attributes that specify how the file is manipulated. The flags are
specified using the following symbols:
O_RDONLY (0x0000) /* open for reading */
O_WRONLY (0x0001) /* open for writing */
O_RDWR (0x0002) /* open for read & write */
O_APPEND (0x0008) /* append on each write */
O_CREAT (0x0200) /* open with file create */
O_TRUNC (0x0400) /* open with truncation */
O_BINARY (0x8000) /* open in binary mode */
See POSIX for further explanation of the flags.
• The llv_fd is treated as a suggested low-level file descriptor. This is a historical
artifact; newly-defined device drivers should ignore this argument. This differs from
the low-level I/O open function.
This function must arrange for information to be saved for each file descriptor, typically
including a file position indicator and any significant flags. For the HOST version, all the
bookkeeping is handled by the debugger running on the host machine. If the device uses
an internal buffer, the buffer can be created when a file is opened, or the buffer can be
created during a read or write.
Return Value This function must return -1 to indicate an error if for some reason the file could not be
opened; such as the file does not exist, could not be created, or there are too many files
open. The value of errno may optionally be set to indicate the exact error (the HOST
device does not set errno). Some devices might have special failure conditions; for
instance, if a device is read-only, a file cannot be opened O_WRONLY.
On success, this function must return a non-negative file descriptor unique among all
open files handled by the specific device. The file descriptor need not be unique across
devices. The device file descriptor is used only by low-level functions when calling the
device-driver-level functions. The low-level function open allocates its own unique file
descriptor for the high-level functions to call the low-level functions. Code that uses only
high-level I/O functions need not be aware of these file descriptors.
268 Using Run-Time-Support Functions and Building Libraries SPRUI04B – May 2017
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
www.ti.com DEV_close — Close File for I/O
Return Value This function should return -1 to indicate an error if the file descriptor is invalid in some
way, such as being out of range or already closed, but this is not required. The user
should not call close() with an invalid file descriptor.
Description The read function reads count bytes from the input file associated with dev_fd.
• The dev_fd is the number assigned by open to an opened file.
• The buf is where the read characters are placed.
• The count is the number of characters to read from the file.
Return Value This function must return -1 to indicate an error if for some reason no bytes could be
read from the file. This could be because of an attempt to read from a O_WRONLY file,
or for device-specific reasons.
If count is 0, no bytes are read and this function returns 0.
This function returns the number of bytes read, from 0 to count. 0 indicates that EOF
was reached before any bytes were read. It is not an error to read less than count bytes;
this is common if the are not enough bytes left in the file or the request was larger than
an internal device buffer size.
Syntax int DEV_write (int dev_fd , const char * buf , unsigned count );
Return Value This function must return -1 to indicate an error if for some reason no bytes could be
written to the file. This could be because of an attempt to read from a O_RDONLY file,
or for device-specific reasons.
SPRUI04B – May 2017 Using Run-Time-Support Functions and Building Libraries 269
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
DEV_lseek — Set File Position Indicator www.ti.com
Description This function sets the file's position indicator for this file descriptor as lseek.
If lseek is supported, it should not allow a seek to before the beginning of the file, but it
should support seeking past the end of the file. Such seeks do not change the size of
the file, but if it is followed by a write, the file size will increase.
Return Value If successful, this function returns the new value of the file position indicator.
This function must return -1 to indicate an error if for some reason no bytes could be
written to the file. For many devices, the lseek operation is nonsensical (e.g. a computer
monitor).
Description Remove the association of the pathname with the file. This means that the file may no
longer be opened using this name, but the file may not actually be immediately removed.
Depending on the device, the file may be immediately removed, but for a device which
allows open file descriptors to point to unlinked files, the file will not actually be deleted
until the last file descriptor is closed. See Section 9.2.3.
Return Value This function must return -1 to indicate an error if for some reason the file could not be
unlinked (delayed removal does not count as a failure to unlink.)
If successful, this function returns 0.
Description This function changes the name associated with the file.
• The old_name is the current name of the file.
• The new_name is the new name for the file.
Return Value This function must return -1 to indicate an error if for some reason the file could not be
renamed, such as the file doesn't exist, or the new name already exists.
270 Using Run-Time-Support Functions and Building Libraries SPRUI04B – May 2017
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
www.ti.com The C I/O Functions
#include <stdio.h>
#include <file.h>
#include "mydevice.h"
void main()
{
add_device("mydevice", _MSA,
MYDEVICE_open, MYDEVICE_close,
MYDEVICE_read, MYDEVICE_write,
MYDEVICE_lseek, MYDEVICE_unlink, MYDEVICE_rename);
/*-----------------------------------------------------------------------*/
/* Re-open stderr as a MYDEVICE file */
/*-----------------------------------------------------------------------*/
if (!freopen("mydevice:stderrfile", "w", stderr))
{
puts("Failed to freopen stderr");
exit(EXIT_FAILURE);
}
/*-----------------------------------------------------------------------*/
/* stderr should not be fully buffered; we want errors to be seen as */
/* soon as possible. Normally stderr is line-buffered, but this example */
/* doesn't buffer stderr at all. This means that there will be one call */
/* to write() for each character in the message. */
/*-----------------------------------------------------------------------*/
if (setvbuf(stderr, NULL, _IONBF, 0))
{
puts("Failed to setvbuf stderr");
exit(EXIT_FAILURE);
}
/*-----------------------------------------------------------------------*/
/* Try it out! */
/*-----------------------------------------------------------------------*/
printf("This goes to stdout\n");
fprintf(stderr, "This goes to stderr\n"); }
SPRUI04B – May 2017 Using Run-Time-Support Functions and Building Libraries 271
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
The C I/O Functions www.ti.com
Use the low-level function add_device() to add your device to the device_table. The device table is a
statically defined array that supports n devices, where n is defined by the macro _NDEVICE found in
stdio.h/cstdio.
The first entry in the device table is predefined to be the host device on which the debugger is running.
The low-level routine add_device() finds the first empty position in the device table and initializes the
device fields with the passed-in arguments. For a complete description, see the add_device function.
If no device prefix is used, the HOST device will be used to open the file.
Description The add_device function adds a device record to the device table allowing that device to
be used for I/O from C. The first entry in the device table is predefined to be the HOST
device on which the debugger is running. The function add_device() finds the first empty
position in the device table and initializes the fields of the structure that represent a
device.
To open a stream on a newly added device use fopen( ) with a string of the format
devicename : filename as the first argument.
• The name is a character string denoting the device name. The name is limited to 8
characters.
• The flags are device characteristics. The flags are as follows:
_SSA Denotes that the device supports only one open stream at a time
_MSA Denotes that the device supports multiple open streams
More flags can be added by defining them in file.h.
• The dopen, dclose, dread, dwrite, dlseek, dunlink, and drename specifiers are
function pointers to the functions in the device driver that are called by the low-level
functions to perform I/O on the specified device. You must declare these functions
with the interface specified in Section 9.2.2. The device driver for the HOST that the
TMS320C6000 debugger is run on are included in the C I/O library.
272 Using Run-Time-Support Functions and Building Libraries SPRUI04B – May 2017
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
www.ti.com add_device — Add Device to Device Table
#include <file.h>
#include <stdio.h>
/****************************************************************************/
/* Declarations of the user-defined device drivers */
/****************************************************************************/
extern int MYDEVICE_open(const char *path, unsigned flags, int fno);
extern int MYDEVICE_close(int fno);
extern int MYDEVICE_read(int fno, char *buffer, unsigned count);
extern int MYDEVICE_write(int fno, const char *buffer, unsigned count);
extern off_t MYDEVICE_lseek(int fno, off_t offset, int origin);
extern int MYDEVICE_unlink(const char *path);
extern int MYDEVICE_rename(const char *old_name, char *new_name);
main()
{
FILE *fid;
add_device("mydevice", _MSA, MYDEVICE_open, MYDEVICE_close, MYDEVICE_read,
MYDEVICE_write, MYDEVICE_lseek, MYDEVICE_unlink, MYDEVICE_rename);
fid = fopen("mydevice:test","w");
fprintf(fid,"Hello, world\n");
fclose(fid);
}
SPRUI04B – May 2017 Using Run-Time-Support Functions and Building Libraries 273
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
Handling Reentrancy (_register_lock() and _register_unlock() Functions) www.ti.com
The arguments to _register_lock() and _register_unlock() should be functions which take no arguments
and return no values, and which implement some sort of global semaphore locking:
extern volatile sig_atomic_t *sema = SHARED_SEMAPHORE_LOCATION;
static int sema_depth = 0;
static void my_lock(void)
{
while (ATOMIC_TEST_AND_SET(sema, MY_UNIQUE_ID) != MY_UNIQUE_ID);
sema_depth++;
}
static void my_unlock(void)
{
if (!--sema_depth) ATOMIC_CLEAR(sema);
}
The run-time-support nests calls to _lock(), so the primitives must keep track of the nesting level.
274 Using Run-Time-Support Functions and Building Libraries SPRUI04B – May 2017
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
www.ti.com Library-Build Process
All three of these programs are provided as a non-optional feature of CCS 5.1. They are also available as
part of the optional XDC Tools feature if you are using an earlier version of CCS.
The mklib program looks for these executables in the following order:
1. in your PATH
2. in the directory getenv("CCS_UTILS_DIR")/cygwin
3. in the directory getenv("CCS_UTILS_DIR")/bin
4. in the directory getenv("XDCROOT")
5. in the directory getenv("XDCROOT")/bin
If you are invoking mklib from the command line, and these executables are not in your path, you must set
the environment variable CCS_UTILS_DIR such that getenv("CCS_UTILS_DIR")/bin contains the correct
programs.
SPRUI04B – May 2017 Using Run-Time-Support Functions and Building Libraries 275
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
Library-Build Process www.ti.com
Now that the linker has decided which library to use, it checks whether the run-time-support library is
present in C6X_C_DIR . The library must be in exactly the same directory as the index library libc.a. If the
library is not present, the linker invokes mklib to build it. This happens when the library is missing,
regardless of whether the user specified the name of the library directly or allowed the linker to pick the
best library from the index library.
The mklib program builds the requested library and places it in 'lib' directory part of C6X_C_DIR in the
same directory as the index library, so it is available for subsequent compilations.
Things to watch out for:
• The linker invokes mklib and waits for it to finish before finishing the link, so you will experience a one-
time delay when an uncommonly-used library is built for the first time. Build times of 1-5 minutes have
been observed. This depends on the power of the host (number of CPUs, etc).
• In a shared installation, where an installation of the compiler is shared among more than one user, it is
possible that two users might cause the linker to rebuild the same library at the same time. The mklib
program tries to minimize the race condition, but it is possible one build will corrupt the other. In a
shared environment, all libraries which might be needed should be built at install time; see
Section 9.4.2.2 for instructions on invoking mklib directly to avoid this problem.
• The index library must exist, or the linker is unable to rebuild libraries automatically.
• The index library must be in a user-writable directory, or the library is not built. If the compiler
installation must be installed read-only (a good practice for shared installation), any missing libraries
must be built at installation time by invoking mklib directly.
• The mklib program is specific to a certain version of a certain library; you cannot use one compiler
version's run-time support's mklib to build a different compiler version's run-time support library.
Some targets have many libraries, so this step can take a long time. To build a subset of the libraries,
invoke mklib individually for each desired library.
276 Using Run-Time-Support Functions and Building Libraries SPRUI04B – May 2017
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
www.ti.com Library-Build Process
Examples:
To build all standard libraries and place them in the compiler's library directory:
mklib --all --index=$C_DIR/lib
To build one standard library and place it in the compiler's library directory:
mklib --pattern=rts64plus.lib --index=$C_DIR/lib
To build a custom library that is just like rts64plus.lib, but has symbolic debugging support enabled:
mklib --pattern=rts64plus.lib --extra_options="-g" --index=$C_DIR/lib --
install_to=$Project/Debug --name=rts64plus_debug.lib
SPRUI04B – May 2017 Using Run-Time-Support Functions and Building Libraries 277
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
Library-Build Process www.ti.com
278 Using Run-Time-Support Functions and Building Libraries SPRUI04B – May 2017
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
Chapter 10
SPRUI04B – May 2017
The C++ compiler implements function overloading, operator overloading, and type-safe linking by
encoding a function's prototype and namespace in its link-level name. The process of encoding the
prototype into the linkname is often referred to as name mangling. When you inspect mangled names,
such as in assembly files, disassembler output, or compiler or linker diagnostic messages, it can be
difficult to associate a mangled name with its corresponding name in the C++ source code. The C++ name
demangler is a debugging aid that translates each mangled name it detects to its original name found in
the C++ source code.
These topics tell you how to invoke and use the C++ name demangler. The C++ name demangler reads
in input, looking for mangled names. All unmangled text is copied to output unaltered. All mangled names
are demangled before being copied to output.
By default, the C++ name demangler outputs to standard output. You can use the -o file option if you want
to output to a file.
class banana {
public:
int calories(void);
banana();
~banana();
};
int calories_in_a_banana(void)
{
banana x;
return x.calories();
}
_calories_in_a_banana__Fv:
;** ----------------------------------------------------------------------*
CALL .S1 ___ct__6bananaFv
STW .D2T2 B3,*SP--(16)
MVKL .S2 RL0,B3
MVKH .S2 RL0,B3
ADD .S1X 8,SP,A4
NOP 1
RL0: ; CALL OCCURS
CALL .S1 _calories__6bananaFv
MVKL .S2 RL1,B3
ADD .S1X 8,SP,A4
MVKH .S2 RL1,B3
NOP 2
RL1: ; CALL OCCURS
CALL .S1 ___dt__6bananaFv
STW .D2T1 A4,*+SP(4)
ADD .S1X 8,SP,A4
MVKL .S2 RL2,B3
MVK .S2 0x2,B4
MVKH .S2 RL2,B3
RL2: ; CALL OCCURS
LDW .D2T1 *+SP(4),A4
LDW .D2T2 *++SP(16),B3
NOP 4
RET .S2 B3
NOP 5
; BRANCH OCCURS
Executing the C++ name demangler demangles all names that it believes to be mangled. Enter:
dem6x calories_in_a_banana.asm
The result is shown in Example 10-3. The linknames in Example 10-2 ___ct__6bananaFv,
_calories__6bananaFv, and ___dt__6bananaFv are demangled.
calories_in_a_banana():
;** ----------------------------------------------------------------------*
CALL .S1 banana::banana()
STW .D2T2 B3,*SP--(16)
MVKL .S2 RL0,B3
MVKH .S2 RL0,B3
ADD .S1X 8,SP,A4
NOP 1
RL0: ; CALL OCCURS
CALL .S1 banana::calories()
MVKL .S2 RL1,B3
ADD . S1X 8,SP,A4
MVKH .S2 RL1,B3
NOP 2
RL1: ; CALL OCCURS
CALL .S1 banana::~banana()
STW .D2T1 A4,*+SP(4)
ADD .S1X 8,SP,A4
MVKL .S2 RL2,B3
MVK . S2 0x2,B4
MVKH . S2 RL2,B3
RL2: ; CALL OCCURS
LDW .D2T1 *+SP(4),A4
LDW .D2T2 *++SP(16),B3
NOP 4
RET .S2 B3
NOP 5
; BRANCH OCCURS
Glossary
A.1 Terminology
Application Binary Interface (ABI)— A standard that specifies the interface between two object
modules. An ABI specifies how functions are called and how information is passed from one
program component to another.
assignment statement— A statement that initializes a variable with a value.
autoinitialization— The process of initializing global C variables (contained in the .cinit section) before
program execution begins.
autoinitialization at run time— An autoinitialization method used by the linker when linking C code. The
linker uses this method when you invoke it with the --rom_model link option. The linker loads the
.cinit section of data tables into memory, and variables are initialized at run time.
alias disambiguation— A technique that determines when two pointer expressions cannot point to the
same location, allowing the compiler to freely optimize such expressions.
aliasing— The ability for a single object to be accessed in more than one way, such as when two
pointers point to a single object. It can disrupt optimization, because any indirect reference could
refer to any other object.
allocation— A process in which the linker calculates the final memory addresses of output sections.
ANSI— American National Standards Institute; an organization that establishes standards voluntarily
followed by industries.
archive library— A collection of individual files grouped into a single file by the archiver.
archiver— A software program that collects several individual files into a single file called an archive
library. With the archiver, you can add, delete, extract, or replace members of the archive library.
assembler— A software program that creates a machine-language program from a source file that
contains assembly language instructions, directives, and macro definitions. The assembler
substitutes absolute operation codes for symbolic operation codes and absolute or relocatable
addresses for symbolic addresses.
assignment statement— A statement that initializes a variable with a value.
autoinitialization— The process of initializing global C variables (contained in the .cinit section) before
program execution begins.
autoinitialization at run time— An autoinitialization method used by the linker when linking C code. The
linker uses this method when you invoke it with the --rom_model link option. The linker loads the
.cinit section of data tables into memory, and variables are initialized at run time.
big endian— An addressing protocol in which bytes are numbered from left to right within a word. More
significant bytes in a word have lower numbered addresses. Endian ordering is hardware-specific
and is determined at reset. See also little endian
block— A set of statements that are grouped together within braces and treated as an entity.
.bss section— One of the default object file sections. You use the assembler .bss directive to reserve a
specified amount of space in the memory map that you can use later for storing data. The .bss
section is uninitialized.
SPRUI04B – May 2017 Glossary 283
Submit Documentation Feedback
Copyright © 2017, Texas Instruments Incorporated
Terminology www.ti.com
byte— Per ANSI/ISO C, the smallest addressable unit that can hold a character.
C/C++ compiler— A software program that translates C source statements into assembly language
source statements.
code generator— A compiler tool that takes the file produced by the parser or the optimizer and
produces an assembly language source file.
COFF— Common object file format; a system of object files configured according to a standard
developed by AT&T. This ABI is no longer supported.
command file— A file that contains options, filenames, directives, or commands for the linker or hex
conversion utility.
comment— A source statement (or portion of a source statement) that documents or improves
readability of a source file. Comments are not compiled, assembled, or linked; they have no effect
on the object file.
compiler program— A utility that lets you compile, assemble, and optionally link in one step. The
compiler runs one or more source modules through the compiler (including the parser, optimizer,
and code generator), the assembler, and the linker.
configured memory— Memory that the linker has specified for allocation.
constant— A type whose value cannot change.
cross-reference listing— An output file created by the assembler that lists the symbols that were
defined, what line they were defined on, which lines referenced them, and their final values.
.data section— One of the default object file sections. The .data section is an initialized section that
contains initialized data. You can use the .data directive to assemble code into the .data section.
direct call— A function call where one function calls another using the function's name.
directives— Special-purpose commands that control the actions and functions of a software tool (as
opposed to assembly language instructions, which control the actions of a device).
disambiguation— See alias disambiguation
dynamic memory allocation— A technique used by several functions (such as malloc, calloc, and
realloc) to dynamically allocate memory for variables at run time. This is accomplished by defining a
large memory pool (heap) and using the functions to allocate memory from the heap.
ELF— Executable and Linkable Format; a system of object files configured according to the System V
Application Binary Interface specification.
emulator— A hardware development system that duplicates the TMS320C6000 operation.
entry point— A point in target memory where execution starts.
environment variable— A system symbol that you define and assign to a string. Environmental variables
are often included in Windows batch files or UNIX shell scripts such as .cshrc or .profile.
epilog— The portion of code in a function that restores the stack and returns.
executable object file— A linked, executable object file that is downloaded and executed on a target
system.
expression— A constant, a symbol, or a series of constants and symbols separated by arithmetic
operators.
external symbol— A symbol that is used in the current program module but defined or declared in a
different program module.
file-level optimization— A level of optimization where the compiler uses the information that it has about
the entire file to optimize your code (as opposed to program-level optimization, where the compiler
uses information that it has about the entire program to optimize your code).
function inlining— The process of inserting code for a function at the point of call. This saves the
overhead of a function call and allows the optimizer to optimize the function in the context of the
surrounding code.
global symbol— A symbol that is either defined in the current module and accessed in another, or
accessed in the current module but defined in another.
high-level language debugging— The ability of a compiler to retain symbolic and high-level language
information (such as type and function definitions) so that a debugging tool can use this
information.
indirect call— A function call where one function calls another function by giving the address of the
called function.
initialization at load time— An autoinitialization method used by the linker when linking C/C++ code. The
linker uses this method when you invoke it with the --ram_model link option. This method initializes
variables at load time instead of run time.
initialized section— A section from an object file that will be linked into an executable object file.
input section— A section from an object file that will be linked into an executable object file.
integrated preprocessor— A C/C++ preprocessor that is merged with the parser, allowing for faster
compilation. Stand-alone preprocessing or preprocessed listing is also available.
interlist feature— A feature that inserts as comments your original C/C++ source statements into the
assembly language output from the assembler. The C/C++ statements are inserted next to the
equivalent assembly instructions.
intrinsics— Operators that are used like functions and produce assembly language code that would
otherwise be inexpressible in C, or would take greater time and effort to code.
ISO— International Organization for Standardization; a worldwide federation of national standards
bodies, which establishes international standards voluntarily followed by industries.
kernel— The body of a software-pipelined loop between the pipelined-loop prolog and the pipelined-loop
epilog.
K&R C— Kernighan and Ritchie C, the de facto standard as defined in the first edition of The C
Programming Language (K&R). Most K&R C programs written for earlier, non-ISO C compilers
should correctly compile and run without modification.
label— A symbol that begins in column 1 of an assembler source statement and corresponds to the
address of that statement. A label is the only assembler statement that can begin in column 1.
linker— A software program that combines object files to form an executable object file that can be
allocated into system memory and executed by the device.
listing file— An output file, created by the assembler, which lists source statements, their line numbers,
and their effects on the section program counter (SPC).
little endian— An addressing protocol in which bytes are numbered from right to left within a word. More
significant bytes in a word have higher numbered addresses. Endian ordering is hardware-specific
and is determined at reset. See also big endian
loader— A device that places an executable object file into system memory.
loop unrolling— An optimization that expands small loops so that each iteration of the loop appears in
your code. Although loop unrolling increases code size, it can improve the performance of your
code.
run-time environment— The run time parameters in which your program must function. These
parameters are defined by the memory and register conventions, stack organization, function call
conventions, and system initialization.
run-time-support functions— Standard ISO functions that perform tasks that are not part of the C
language (such as memory allocation, string conversion, and string searches).
run-time-support library— A library file, rts.src, which contains the source for the run time-support
functions.
section— A relocatable block of code or data that ultimately will be contiguous with other sections in the
memory map.
sign extend— A process that fills the unused MSBs of a value with the value's sign bit.
software pipelining— A technique used by the C/C++ optimizer to schedule instructions from a loop so
that multiple iterations of the loop execute in parallel.
source file— A file that contains C/C++ code or assembly language code that is compiled or assembled
to form an object file.
stand-alone preprocessor— A software tool that expands macros, #include files, and conditional
compilation as an independent program. It also performs integrated preprocessing, which includes
parsing of instructions.
static variable— A variable whose scope is confined to a function or a program. The values of static
variables are not discarded when the function or program is exited; their previous value is resumed
when the function or program is reentered.
storage class— An entry in the symbol table that indicates how to access a symbol.
string table— A table that stores symbol names that are longer than eight characters (symbol names of
eight characters or longer cannot be stored in the symbol table; instead they are stored in the string
table). The name portion of the symbol's entry points to the location of the string in the string table.
subsection— A relocatable block of code or data that ultimately will occupy continuous space in the
memory map. Subsections are smaller sections within larger sections. Subsections give you tighter
control of the memory map.
symbol— A string of alphanumeric characters that represents an address or a value.
symbolic debugging— The ability of a software tool to retain symbolic information that can be used by a
debugging tool such as an emulator.
target system— The system on which the object code you have developed is executed.
.text section— One of the default object file sections. The .text section is initialized and contains
executable code. You can use the .text directive to assemble code into the .text section.
trigraph sequence— A 3-character sequence that has a meaning (as defined by the ISO 646-1983
Invariant Code Set). These characters cannot be represented in the C character set and are
expanded to one character. For example, the trigraph ??' is expanded to ^.
trip count— The number of times that a loop executes before it terminates.
unconfigured memory— Memory that is not defined as part of the memory map and cannot be loaded
with code or data.
uninitialized section— A object file section that reserves space in the memory map but that has no
actual contents. These sections are built with the .bss and .usect directives.
unsigned value— A value that is treated as a nonnegative number, regardless of its actual sign.
variable— A symbol representing a quantity that can assume any of a set of values.
word— A 32-bit addressable location in target memory
Revision History
Previous Revisions:
Section 3.3 and
SPRUI04A Using the Compiler The --gen_data_subsections option has been added.
Section 6.2.3
Run-Time Additional boot hook functions are available. These can be
SPRUI04A Section 8.9.1
Environment customized for use during system initialization.
The COFF object file format and the associated STABS debugging
format are no longer supported. The C6000 compiler now supports
only the Embedded Application Binary Interface (EABI) ABI, which
works only with object files that use the ELF object file format and the
SPRUI04 Introduction Section 1.4
DWARF debug format. Sections of this document that referred to the
COFF format have been removed or simplified. If you need COFF
support, please use v7.4 of the Code Generation Tools and refer to
SPRU187 and SPRU186 for documentation.
Added a Getting Started chapter with introductory information for new
SPRUI04 Getting Started Chapter 2
users.
Texas Instruments Incorporated (‘TI”) technical, application or other design advice, services or information, including, but not limited to,
reference designs and materials relating to evaluation modules, (collectively, “TI Resources”) are intended to assist designers who are
developing applications that incorporate TI products; by downloading, accessing or using any particular TI Resource in any way, you
(individually or, if you are acting on behalf of a company, your company) agree to use it solely for this purpose and subject to the terms of
this Notice.
TI’s provision of TI Resources does not expand or otherwise alter TI’s applicable published warranties or warranty disclaimers for TI
products, and no additional obligations or liabilities arise from TI providing such TI Resources. TI reserves the right to make corrections,
enhancements, improvements and other changes to its TI Resources.
You understand and agree that you remain responsible for using your independent analysis, evaluation and judgment in designing your
applications and that you have full and exclusive responsibility to assure the safety of your applications and compliance of your applications
(and of all TI products used in or for your applications) with all applicable regulations, laws and other applicable requirements. You
represent that, with respect to your applications, you have all the necessary expertise to create and implement safeguards that (1)
anticipate dangerous consequences of failures, (2) monitor failures and their consequences, and (3) lessen the likelihood of failures that
might cause harm and take appropriate actions. You agree that prior to using or distributing any applications that include TI products, you
will thoroughly test such applications and the functionality of such TI products as used in such applications. TI has not conducted any
testing other than that specifically described in the published documentation for a particular TI Resource.
You are authorized to use, copy and modify any individual TI Resource only in connection with the development of applications that include
the TI product(s) identified in such TI Resource. NO OTHER LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE TO
ANY OTHER TI INTELLECTUAL PROPERTY RIGHT, AND NO LICENSE TO ANY TECHNOLOGY OR INTELLECTUAL PROPERTY
RIGHT OF TI OR ANY THIRD PARTY IS GRANTED HEREIN, including but not limited to any patent right, copyright, mask work right, or
other intellectual property right relating to any combination, machine, or process in which TI products or services are used. Information
regarding or referencing third-party products or services does not constitute a license to use such products or services, or a warranty or
endorsement thereof. Use of TI Resources may require a license from a third party under the patents or other intellectual property of the
third party, or a license from TI under the patents or other intellectual property of TI.
TI RESOURCES ARE PROVIDED “AS IS” AND WITH ALL FAULTS. TI DISCLAIMS ALL OTHER WARRANTIES OR
REPRESENTATIONS, EXPRESS OR IMPLIED, REGARDING TI RESOURCES OR USE THEREOF, INCLUDING BUT NOT LIMITED TO
ACCURACY OR COMPLETENESS, TITLE, ANY EPIDEMIC FAILURE WARRANTY AND ANY IMPLIED WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT OF ANY THIRD PARTY INTELLECTUAL
PROPERTY RIGHTS.
TI SHALL NOT BE LIABLE FOR AND SHALL NOT DEFEND OR INDEMNIFY YOU AGAINST ANY CLAIM, INCLUDING BUT NOT
LIMITED TO ANY INFRINGEMENT CLAIM THAT RELATES TO OR IS BASED ON ANY COMBINATION OF PRODUCTS EVEN IF
DESCRIBED IN TI RESOURCES OR OTHERWISE. IN NO EVENT SHALL TI BE LIABLE FOR ANY ACTUAL, DIRECT, SPECIAL,
COLLATERAL, INDIRECT, PUNITIVE, INCIDENTAL, CONSEQUENTIAL OR EXEMPLARY DAMAGES IN CONNECTION WITH OR
ARISING OUT OF TI RESOURCES OR USE THEREOF, AND REGARDLESS OF WHETHER TI HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
You agree to fully indemnify TI and its representatives against any damages, costs, losses, and/or liabilities arising out of your non-
compliance with the terms and provisions of this Notice.
This Notice applies to TI Resources. Additional terms apply to the use and purchase of certain types of materials, TI products and services.
These include; without limitation, TI’s standard terms for semiconductor products http://www.ti.com/sc/docs/stdterms.htm), evaluation
modules, and samples (http://www.ti.com/sc/docs/sampterms.htm).
Mailing Address: Texas Instruments, Post Office Box 655303, Dallas, Texas 75265
Copyright © 2017, Texas Instruments Incorporated