Endian
Endian
November 15, 2004 Abstract This paper describes software considerations related to microprocessor Endian architecture and discusses guidelines for developing Endian-neutral code.
Disclaimers
THE INFORMATION IS FURNISHED FOR INFORMATIONAL USE ONLY, IS SUBJECT TO CHANGE WITHOUT NOTICE, AND SHOULD NOT BE CONSTRUED AS A COMMITMENT BY INTEL CORPORATION. INTEL CORPORATION ASSUMES NO RESPONSIBILITY OR LIABILITY FOR ANY ERRORS OR INACCURACIES THAT MAY APPEAR IN THIS DOCUMENT OR ANY SOFTWARE THAT MAY BE PROVIDED IN ASSOCIATION WITH THIS DOCUMENT. THIS INFORMATION IS PROVIDED "AS IS" AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THE USE OF THIS INFORMATION INCLUDING WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, COMPLIANCE WITH A SPECIFICATION OR STANDARD, MERCHANTABILITY OR NONINFRINGEMENT.
Legal Notices
Copyright 2004, Intel Corporation. All rights reserved. Intel, Itanium and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be claimed as the property of others.
Contents
Contents ......................................................................................................................... 3 Introduction.................................................................................................................... 5 Analysis.......................................................................................................................... 5
Code Portability.....................................................................................................................................5 Shared Data ...........................................................................................................................................5 Best Known Methods ...........................................................................................................................5
Byte Swapping............................................................................................................... 8
Byte Swapping Methods ......................................................................................................................8 Network I/O Macros..........................................................................................................................8 Custom Byte Swap Macros ..............................................................................................................9 Byte Swap Controls ..............................................................................................................................9 Compile Time Controls ...................................................................................................................10 Run Time Controls..........................................................................................................................10 Recovering Byte Swap Overhead......................................................................................................11
Converting Endian-specific to Endian-neutral Code ................................................ 16 Reversing Endian-specific Architecture of Code ..................................................... 16 Conclusion ................................................................................................................... 17 Appendix Definitions................................................................................................ 18 Appendix Abbreviations and Acronyms................................................................. 21 Appendix References ............................................................................................... 22
Introduction
Endianness describes how multi-byte data is represented by a computer system and is dictated by the CPU architecture of the system. Unfortunately not all computer systems are designed with the same Endianarchitecture. The difference in Endian-architecture is an issue when software or data is shared between computer systems. An analysis of the computer system and its interfaces will determine the requirements of the Endian implementation of the software. Software is sometimes designed with one specific Endian-architecture in mind, limiting the portability of the code to other processor architectures. This type of implementation is considered to be Endian-specific. However, Endian-neutral software can be developed, allowing the code to be ported easily between processors of different Endian-architectures, and without rewriting any code. Endian-neutral software is developed by identifying system memory and external data interfaces, and using Endian-neutral coding practices to implement the interfaces. Platform migration requires consideration of the Endian-architecture of the current and target platforms, as well as the Endian-architecture of the code. Best known methods describe the software interface information that should be considered and how to convert Endian-specific code to Endian-neutral code. This white paper establishes a set of fundamental guidelines for software developers who wish to develop Endian-neutral code or convert Endian-specific code through the use of some or all of the coding techniques documented in this paper. Note: The examples in this paper are based on 32-bit processor architecture.
Analysis
There are two main areas where Endianness must be considered. One area pertains to code portability. The second area pertains to sharing data between platforms.
Code Portability
It is not uncommon for software to be designed and implemented for the Endian-architecture of a specific processor platform, without allowing for ease of portability to other platforms. Endian-neutral code provides flexibility for software implementations to be compiled for and operate seamlessly on processors of different Endian-architectures.
Shared Data
Computer systems are made up of multiple components, including computers, interfaces, data storage and shared memory. Any time file data or memory is shared between computers, there is a potential for an Endianarchitecture conflict. Data can be stored in ways that are not tied to endian-architecture and also in ways that define the Endianness of the data.
Definition of Endianness
Endianness is the format to how multi-byte data is stored in computer memory. It describes the location of the most significant byte (MSB) and least significant byte (LSB) of an address in memory. Endianness is dictated by the CPU architecture implementation of the system. The operating system does not dictate the endian model
5
implemented, but rather the endian model of the CPU architecture dictates how the operating system is implemented. Representing these two storage formats are two types of Endianness-architecture, Big-Endian1 and Little-Endian. There are benefits to both of these endian architectures. See section Merits of Endian Architectures. BigEndian stores the MSB at the lowest memory address. Little-Endian stores the LSB at the lowest memory address. The lowest memory address of multi-byte data is considered the starting address of the data. In Figure 1, the 32-bit hex value 0x12345678 is stored in memory as follows for each Endian-architecture. The lowest memory address is represented in the leftmost position, Byte 00.
Endian Order
Byte 00 12 78 (LSB)
Byte 01 34
Byte 02 56
Byte 03 78 (LSB) 12
Big Endian
Little Endian
56
34
As you can see in Figure 1, the value of the stored multi-byte data field is the same for both types of Endianness as long as the data is referenced in its native data type, in this case a long value. If this data field is referenced as individual bytes, the Endianness of the data must be known. An unexpected difference in Endianness will cause a computer system to interpret the data in the opposite direction, resulting in the wrong value. The difference can be correctly handled by implementing code that is aware of the Endian-architecture of the computer system as well as the Endianness of the stored data. See section Byte Swapping.
The terms big-Endian and little-Endian are derived from the Lilliputians of Gulliver's Travels, whose major political issue was whether softboiled eggs should be opened on the big side or the little side. Likewise, the big/little-Endian computer debate has much more to do with political issues than technological merits.
Platform ARM* DEC Alpha* HP PA-RISC 8000* IBM PowerPC* Intel 80x86 Intel IXP network processors Intel Itanium processor family Java Virtual Machine* MIPS* Motorola 68k* Sun SPARC*
Endian Architecture Bi-Endian Little-Endian Bi-Endian Bi-Endian Little-Endian Bi-Endian Bi-Endian Big-Endian Bi-Endian Big-Endian Big-Endian
Little-Endian Format BMP GIF FLI PCX QTM RTF (Autodesk Animator*) (PC Paintbrush*) (MAC Quicktime*) (Rich Text Format) (Windows* & OS/2) PSD IMG
Variable or Bi-Endian Format DXF PS (AutoCAD*) (Postscript*, 8 bit interpreted text, no Endian issue) (Persistence of Visionraytracer*) (WAV & AVI*)
JPEG, JPG MacPaint SGI (Silicon Graphics*) RIFF TIFF XWD POV
Table 2- Common file formats How can the opposing Endian data be efficiently processed? A hardware solution doesnt allow for variability in data since it expects either Big-Endian or Little-Endian formats. Also, Hard-wired Endian swapping typically
wont suffice for a large range of networks and protocols and many of these file formats are fixed Endian. Software byte swapping seems the only viable method. Several different methods are available:
Byte Swapping
Basically, anytime multi-byte data is imported or exported between computer systems, the format of the data must be standardized. If the data format is binary, the Endianness of the data must be known by both nodes. With this knowledge, the computer systems can decide, based on their own endian-architecture, whether byte swapping must be performed on the data. Byte swap methods are developed to standardize the access to the data. The byte swap methods of Endian-neutral code use byte swap controls to determine whether a byte swap must be performed.
The network I/O macros are described in Table 3. The word host is used to refer to the processors endianarchitecture and the word network is used to refer to the TCP/IP endian-architecture. Using these macros allow the same code to work on a Big-Endian or Little-Endian processor.
Translation (Can be read as) host to network short host to network long network to host short network to host long
Meaning Converts the unsigned short integer hostshort from host byte order to network byte order. Converts the unsigned integer hostlong from host byte order to network byte order. Converts the unsigned short integer netshort from network byte order to host byte order. Converts the unsigned integer netlong from network byte order to host byte order.
Table 3 Network I/O Macros The byte swap performed for TCP/IP communication on Little-Endian processors adds a performance overhead. However, this overhead can be recovered as the processor speed increases. See Recovering Byte Swap Overhead.
Macro Code
32 bits
SwapFourBytes
#define SwapTwoBytes(data) \ ( (((data) >> 8) & 0x00FF) | (((data) << 8) & 0xFF00) ) #include <stdio.h> #define SwapFourBytes(data) \ ( (((data) >> 24) & 0x000000FF) | (((data) >> 8) & 0x0000FF00) | \ (((data) << 8) & 0x00FF0000) | (((data) << 24) & 0xFF000000) ) #include <stdio.h> #define SwapEightBytes(data) \ ( (((data) >> 56) & 0x00000000000000FF) 0x000000000000FF00) | \ (((data) >> 24) & 0x0000000000FF0000) 0x00000000FF000000) | \ (((data) << 8) & 0x000000FF00000000) 0x0000FF0000000000) | \ (((data) << 40) & 0x00FF000000000000) 0xFF00000000000000) ) | (((data) >> 40) & | (((data) >> 8) &
64 bits
SwapEightBytes
Macro Code
#if CPU_ARCHITECTURE == BIG_ENDIAN /* Do nothing */ #else SwapTwoBytes (data) #endif #if CPU_ARCHITECTURE == BIG_ENDIAN SwapTwoBytes (data) #else /* Do nothing */ #endif #if CPU_ARCHITECTURE == BIG_ENDIAN /* Do nothing */ #else SwapFourBytes (data) #endif #if CPU_ARCHITECTURE == BIG_ENDIAN SwapFourBytes (data) #else /* Do nothing */ #endif #if CPU_ARCHITECTURE == BIG_ENDIAN /* Do nothing */ #else SwapEightBytes (data) #endif #if CPU_ARCHITECTURE == BIG_ENDIAN SwapEightBytes (data) #else /* Do nothing */ #endif
MY_RD_LE_SHORT MY_WRT_LE_SHORT
MY_RD_BE_LONG MY_WRT_BE_LONG
MY_RD_LE_LONG MY_WRT_LE_LONG
MY_RD_BE_DOUBLE MY_WRT_BE_DOUBLE
MY_RD_LE_DOUBLE MY_WRT_LE_DOUBLE
union { char Array[4]; long Chars; } TestUnion; char c = 'a'; /* Test platform Endianness */ for(x = 0; x < 4; x++) TestUnion.Array[x] = c++; if (TestUnion.Chars == 0x61626364 /* Its big endian */
Little-Endian
Packet Transmission Time (Little-Endian) Processing Time (Little-Endian
Time
Big-Endian
From this example we can see that there is some overhead associated with swapping the bytes in the network headers. However, given a substantial increase in processor performance, the byte swap required on the LittleEndian processor is recovered.
Data Transfer
Data transfer is the movement of data from one system to another across a specified transmission medium. Problem - When transferring multi-byte data between big and little endian systems, the data has to be manipulated to ensure the preservation of the "true meaning" of the data on both systems. When transferring multi-byte data from a big endian machine, the most significant byte will be in the leftmost position. When a little endian system receives the data, however, the most significant byte will be in the rightmost position unless the bytes are "swapped". Example: The big endian system transmits the value "0x11223344". The little endian system receives the value as "0x44332211." Solution - When multi-byte data is transferred between big and little endian systems, the bytes must be swapped in order to preserve the "true meaning" of the values. Use functions that swap the bytes like the network I/O macros to ensure the preservation of data in its true form on both big and little endian systems.
Data Types
Unions
A union is a variable that may hold objects of different types and sizes, with the compiler keeping track of the size and alignment requirements. Objects of dissimilar types and sizes can only be held at different times. A union provides a way to manipulate different kinds of data in a single area of storage. Problem Unions work fine for using the same memory to access different data. The key is to know what type of data exists in the memory before it is accessed. Accessing the same data with different types is not a valid use of unions and can cause endian issues. Problem If data types longer than 8 bits are united with a byte array, the data becomes byte order dependent. Solution A Dont access the same data in memory as different data types.
Byte Arrays
Byte Arrays A character array that is used to hold a specified number of bytes. The size of array is always equal to the number of bytes to hold. Problem If data in the byte array is accessed outside of its native data type, the data becomes byte order dependent. Example: An array that is initialized with a list of characters will be read as different values between little endian and big endian platforms. The following example shows a byte array initialized to a,b,c,d. Accessing this array as a long data type on a little endian platform will result in the value 0x64636261. On a big endian platform it results in the value 0x61626364. Solution Avoid accessing byte arrays outside of the byte data type.
Also, if the data is set as a byte value, say 0x74 on the Little-Endian machine, the result of the data read as nibbles on the Little-Endian machine is a value of 4 for iphdr.ver field, and a value of 7 for iphdr.ihl field. On the Big-Endian machine the results would be a value of 7 for the iphdr.verf field, and a value of 4 for the iphdr.ihl field. Example:
struct { char ver:4, ihl:4; } iphdr; /* * A packet header may utilize bit fields. Bit order within * a byte is determined by the byte order of the processor. * In this example we modify two nibbles of an IP header and * then access later as a byte. */ char ipbyte; iphdr.ver = 0x4; iphdr.ihl = 0x7; ipbyte = *(char *)&iphdr; if (ipbyte == 0x47) { printf (Big Endian\n); } else if (ipbyte == 0x74) { printf (Little Endian\n); }
Figure 4 IP Header Bit Fields Solution Instead of using the bit field structure, access the entire 8 bit value in its native data type (byte) and use a mask for the bits of each field. Masks for the 4 bits of the version field (V) and a mask for the 4 bits of the header length field (L) are represented below. Format Version field (V) bit mask Version bit mask hex value Header Length field (L) bit mask VVVV LLLL 1111 0000 0xF0 0000 1111
Pointer Casts
Casting pointers changes the native meaning of the original data. Doing so will affect which data is addressed. Problem If the native data pointer is a 32-bit pointer and is cast to a byte pointer, depending on the Endianarchitecture of the host, either the first byte or the last byte will be pointed to. Example: Casting a pointer that stores the 32-bit value 0x11223344 to a byte pointer, the big-Endian system points to 0x11. The little-Endian system points to 0x44. Solution Never change the native type of a pointer. Instead, get the data in its native data type format and use byte swapping macros to access the bytes individually.
13
Size Accessed
long long
short char
Swap both shorts end for end Swap bytes 0 and 3 Swap bytes 1 and 2
double double
long short
Swap both longs end for end Swap bytes 0,1 with 7,6 Swap bytes 2,3 with 5,4
char
short
Never. Although this may be efficient for copies, it is not a good programming practice.
short long
Endian-Neutral Code
The goal of Endian-neutral code is to provide one software source-set of files that will work correctly no matter which processor Endian-architecture the code is executed on, eliminating the need to rewrite the code. The way to effectively achieve this goal is by identifying the memory and external data interfaces of the system and then implementing the use of processor independent macros to perform the interface operations. These macros automatically compile the appropriate code for the respective Endian-architecture. Endian-neutral code makes no assumptions of the underlying platform in its implementation. Instead, it funnels all data and memory accesses through wrappers that decide how the accesses should be made. The decision is based on information that is defined during code compilation and specifies which Endian-architecture the code is being compiled to support.
14
2. Byte Swap Macros Use macros that serve as wrappers around all binary multi-byte data interfaces. 3. Data Transfer Use network I/O macros to read/write data from the network. The macros will determine when byte swapping should occur based on whether the format of the transferred data is in the native endian format of the processor. 4. Data Types Access data in its native data type. For example: Always read/write an int as an int type as opposed to reading/writing four bytes. An alternative is to use custom endian-neutral macros to access specific bytes within a multi-byte data type. Lack of conformance to this guideline will cause code compatibility problems between endian-architectures. Examples: a. Unions Never use unions to access the same data with dissimilar types. See Platform Porting considerations. b. Byte Arrays Never access multi-byte data as a byte array. See Platform Porting considerations. c. Pointer Casts Never cast pointers. See Platform Porting considerations.
5. Bit Fields Never define bit fields across byte boundaries or smaller than 8 bits. If it is necessary to access bit data that is not a full byte or on byte boundaries, access the entire bit field in its native data type and use a bit mask for the bits of each fields. 6. Bit Shifts Use the C language << and >> constructs to move byte positions of binary multi-byte data. 7. Pointer Casts Never cast pointers to change the size of the data pointed to. 8. Compiler Directives Be careful when using compiler directives, such as those affecting storage (align, pack). Directives are not always portable between compilers. C defined directives such as #include and #define are okay. Use the #define directive to define the platform endian-architecture of the compiled code compilers.
Code Analysis
Analysis of code to determine its endian portability can result in variations of portability. Ill refer to these variations as the Good, Bad, and Ugly. The guidelines to code analysis are as follows: 1. Review data definitions for use of unions. Unions should be considered suspect. If the unions include the use of accessing the same data with different data types, the code should be updated to remove the use of the union. 2. Review code for casting of data types. Remove all uses of accessing data outside of its native data type and replace with macros that access the data per compiler defined Endianness.
15
3. Review code for use of Network I/O macros. Big-Endian architectures do not require byte swapping on the TCP/IP header data so it is possible that no special code is added for this interface. However, in order to make the code portable to Little-Endian architecture byte swapping is required. The use of Network I/O macros for this interface will determine whether to byte swap or not. 4. Identify all import/export interfaces of shared data. Verify whether the interface follows the recommendations for handling shared data. If not, decide which solution should be used for each interface to make it endian-neutral. 5. Identify all memory interfaces. Verify whether the interfaces follow the recommendations for accessing data in its native data type. 6. Always use the compile bit shift directives to swap positions of bytes within data.
The Good
The code is already Endian-neutral. The code analysis did not result in any required Endianness changes.
The Bad
The code is only partially Endian-neutral. The code analysis results determined that the code uses some endianneutral code practices, such as the use of network macros, but does not adhere to all guidelines for all imported and exported data. The code conforms to at least 50 percent of the Endian-neutral coding guidelines.
The Ugly
The code analysis results determined that there is little to no endian-neutrality designed in. The code explicitly assumes Endianness, contains use of unions and type casting pointers to change the size of the data access, or does not use Endian-neutral macros to access binary multi-byte data, or is in violation to more than 50 percent of the Endian-neutral coding guidelines.
16
Example: Big-Endian platforms do nothing extra to receive and transmit the TCP/IP header of network data. So, it is very likely that the network I/O macros are absent. These macros will need to be added when porting the BigEndian code to a Little-Endian host. Reversing the endian-specific architecture of code requires slightly less effort than re-implementing the code for endian neutrality, since endian-neutral code requires the addition of wrappers around external data. However, it might make more sense to convert the code to be Endian-neutral in order to guarantee flexibility in the future.
Conclusion
In all, it is absolutely important to understand the format of all external data and endian architecture of the source and target processors before porting. In order to make external data formats compatible with the host processor endian-architecture, byte-swapping is sometimes required to accommodate the differences in formats. The best way to neutralize this difference is with the use of byte-swapping macros. This paper described Endianness and its affect on code portability. Following the guidelines in this paper will allow the same source code to work correctly on host processors of differing Endian-architectures, easing the effort of platform migration.
17
Appendix Definitions
.1 Bi-Endian Architecture
A CPU that is capable of being configured to operate as either Big Endian Architecture or Little Endian Architecture.
.3 Byte Array
A byte array is a character array that is used to hold a specified number of bytes. Size of array is always equal to the number of bytes to hold.
.4 Data Transfer
Data transfer is the movement of data from one system to another across a specified transmission medium.
.5 Data type
Data types are used to access data in different formats. These formats specify the size of the data as well as the location. For example: char, short, int, and long all specify the size of machine defined data sizes. A structure will define a custom data type that can contain members of various sizes and residing at specific locations within the structure. A Union defines a grouping of data types that can be used to access the same data in different formats.
.6 Endian-architecture
This term is used to refer to the endian architecture of a system, either Big Endian architecture or Little Endian architecture.
.7 Endian-neutral
The code does not assume endian-architecture. All endian sensitive data interfaces are encapsulated by wrappers, such as macros, that access data in a manner respective to the endian-architecture.
.8 Endian-specific
The code is written explicitly to be either Big Endian or Little Endian architecture. Endian-specific code will not run correctly on CPUs with the opposite Endian-architecture of the implemented code.
18
analogous in purpose to X.409, ISO Abstract Syntax Notation. The major difference between these two is that XDR uses implicit typing, while X.409 uses explicit typing.2
78 (LSB) 56
Big Endian
12(MSB 34 ) 78 56
78 12(MSB )
Little Endian
34
Description of XDR from Sun Microsystems, RFC 1832 - External Data Representation Standard, August 1995, p. 24 ftp://ftp.isi.edu/innotes/rfc1832.txt 19
.15 Union
A union is a variable that may hold objects of different types and sizes, with the compiler keeping track of the size and alignment requirements. Objects of different types and sizes can only be held at different times. A union provides a way to manipulate different kinds of data in a single area of storage.
20
21
Appendix References
Srinivasan, R. Sun Microsystems, RFC 1832 - External Data Representation Standard, August 1995, p. 24 ftp://ftp.isi.edu/in-notes/rfc1832.txt
22