Endianness: From Wikipedia, The Free Encyclopedia
Endianness: From Wikipedia, The Free Encyclopedia
Endianness: From Wikipedia, The Free Encyclopedia
In computing, endianness is the byte (and sometimes bit) ordering used to represent
some kind of data. Typical cases are the order in which integer values are stored as
bytes in computer memory (relative to a given memory addressing scheme) and the
transmission order over a network or other medium. When specifically talking about
bytes, endianness is also referred to simply as byte order. [1]
Contents
[hide]
1 Clarifying analogy
2 Endianness and hardware
o 2.1 Bi-endian hardware
o 2.2 Floating-point and endianness
3 Discussion, background, etymology
4 Method of mapping registers to memory locations
5 Examples of storing the value 0x0A0B0C0D in memory
o 5.1 Big-endian
o 5.2 Little-endian
o 5.3 Middle-endian
6 Endianness in networking
7 "Bit endianness"
8 Other meanings
9 Notes
10 External links
Integers are usually stored as sequences of bytes, so that the encoded value can be
obtained by simple concatenation. The two most common of them are:
"Big-endian" does not mean "ending big", but "big end first".
Well known processor architectures that use the little-endian format include 6502, Z80,
x86, and, largely, PDP-11. Motorola processors such as the 6800 and 68000 have
generally used big-endian. PowerPC (which includes Apple's Macintosh line prior to the
Intel switch) and System/370 also adopt big-endian. SPARC historically used big-
endian, though version 9 is bi-endian (see below).
Some architectures (including ARM, PowerPC (but not the PPC970/G5), DEC Alpha,
SPARC V9, MIPS, PA-RISC and IA64) feature switchable endianness. This feature can
improve performance or simplify the logic of networking devices and software. The
word bi-endian, said of hardware, denotes the capability to compute or pass data in
either of two different endian formats.
Many of these architectures can be switched via software to default to a specific endian
format (usually done when the computer starts up); however, on some systems the
default endianness is selected by hardware on the motherboard and cannot be changed
via software (e.g., the DEC Alpha, which runs only in big-endian mode on the Cray
T3E).
Note that "bi-endian" refers primarily to how a processor treats data accesses.
Instruction accesses (fetches of instruction words) on a given processor may still
assume a fixed endianness, even if data accesses are fully bi-endian.
Note, too, that some nominally bi-endian CPUs may actually employ internal "magic"
(as opposed to really switching to a different endianness) in one of their operating
modes. For instance, some PowerPC processors in little-endian mode act as little-endian
from the point of view of the executing programs but they do not actually store data in
memory in little-endian format (multi-byte values are swapped during memory
load/store operations). This can cause problems when memory is transferred to an
external device if some part of the software, e.g. a device driver, does not account for
the situation.
An often cited argument in favor of big-endian is that it is consistent with the ordering
commonly used in natural languages;[5] that is, however, far from being universal, both
in spoken and written form: spoken languages have a wide variety of organizations of
numbers, and although numbers are nowadays written almost universally in the Hindu-
Arabic numeral system, with the most significant digits written to the left, whether this
is "big-endian" or "little-endian" depends on the direction of text flow in the writing
system being used.
The little-endian system has the property that, in the absence of alignment restrictions,
values can be read from memory at different widths without using different addresses.
For example, a 32-bit memory location with content 4A 00 00 00 can be read at the
same address as either 8-bit (value = 4A), 16-bit (004A), or 32-bit (0000004A). (This
example works only if the value makes sense in all three sizes, which means the value
fits in just 8 bits.) This little-endian property is rarely used, and does not imply that
little-endian has any performance advantage in variable-width data access. (In real-
world analogy, though, if one had 1004 widgets, this could be said to be like knowing
that one has “something-ending-in-4” widgets before knowing that one has one
thousand of them.)
Conversely, in big-endian systems, the first byte is simply the coarsest part of the value,
and subsequent bytes increase in precision. (In real-world analogy, if one had 1004
widgets, this is like finding that one has one thousand widgets before finding that the
exact number is 1004.)
To further illustrate the above notions this section provides example layouts of a 32-bit
number in the most common variants of endianness. There is no general guarantee that a
platform will use one of these formats but in practice there are few if any exceptions.
All the examples refer to the storage in memory of the value 0x0A0B0C0D.
[edit] Big-endian
With 8-bit atomic element size and 1-byte (octet) address increment:
increasing addresses →
The most significant byte (MSB) value, which is 0x0A in our example, is stored at the
memory location with the lowest address, the next byte value in significance, 0x0B, is
stored at the following memory location and so on. This is akin to Left-to-Right reading
order in hexadecimal.
increasing addresses →
The most significant atomic element stores now the value 0x0A0B, followed by 0x0C0D.
[edit] Little-endian
With 8-bit atomic element size and 1-byte (octet) address increment:
increasing addresses →
... 0x0D 0x0C 0x0B 0x0A ...
The least significant byte (LSB) value, 0x0D, is at the lowest address. The other bytes
follow in increasing order of significance.
increasing addresses →
The least significant 16-bit unit stores the value 0x0C0D, immediately followed by
0x0A0B.
The 16-bit atomic element byte ordering may look backwards as written above, but this
is because little-endian is best written with addressing increasing towards the left. If we
write the bytes this way then the ordering makes slightly more sense:
← increasing addresses
The least significant byte (LSB) value, 0x0D, is at the lowest address. The other bytes
follow in increasing order of significance.
← increasing addresses
The least significant 16-bit unit stores the value 0x0C0D, immediately followed by
0x0A0B.
However, if one displays memory with addresses increasing to the left like this, then the
display of Unicode (or ASCII) text is reversed from the normal display (for left-to-right
languages). For example, the word "XRAY" displayed in the "little-endian-friendly"
manner just described is:
← increasing addresses
[edit] Middle-endian
increasing addresses →
Note that this can be interpreted as storing the most significant "half" (16-bits) followed
by the less significant half (as if big-endian) but with each half stored in little-endian
format. This ordering is known as PDP-endianness.
The ARM architecture can also produce this format when writing a 32-bit word to an
address 2 bytes from a 32-bit word alignment.
While the lowest network protocols may deal with sub-byte formatting, all the layers
above them usually consider the byte (mostly meant as octet) as their atomic unit.
[edit] Notes
1. ^ For hardware, the Jargon File also reports the less common expression byte sex [1]. It
is unclear whether this terminology is also used when more than two orderings are
possible. Similarly, the manual for the ORCA/M assembler refers to a field indicating
the order of the bytes in a number field as NUMSEX, and the Mac OS X operating system
refers to "byte sex" in its compiler tools [2].
2. ^ Note that, in these expressions, the term "end" is meant as "extremity", not as "last
part"; and that big and little say which extremity is written first.
3. ^ Gulliver's Travels (Part I, Chapter IV) on Wikisource
4. ^ Endian FAQ – includes the paper Internet Engineering Note (IEN) 137: On Holy
Wars and a Plea for Peace ftp mirror by Danny Cohen (1 April 1980), but adds much
more context.
5. ^ Cf. entries 539 and 704 of the Linguistic Universals Database
This article was originally based on material from the Free On-line Dictionary of
Computing, which is licensed under the GFDL.