Endian
Endian
txt
INTRODUCTION
This is an attempt to stop a war. I hope it is not too late and that
somehow, magically perhaps, peace will prevail again.
The latecomers into the arena believe that the issue is: "What is the
proper byte order in messages?".
The root of the conflict lies much deeper than that. It is the question
of which bit should travel first, the bit from the little end of the
word, or the bit from the big end of the word? The followers of the
former approach are called the Little-Endians, and the followers of the
latter are called the Big-Endians. The details of the holy war between
the Little-Endians and the Big-Endians are documented in [6] and
described, in brief, in the Appendix. I recommend that you read it at
this point.
1 of 15 22/12/19, 8:55 am
https://www.ietf.org/rfc/ien/ien137.txt
There are two possible consistent orders. One is starting with the
narrow end of each word (aka "LSB") as the Little-Endians do, or
starting with the wide end (aka "MSB") as their rivals, the Big-Endians,
do.
MEMORY ORDER
The Little-Endians assign B0 to the LSB of the words and B31 is the MSB.
The Big-Endians do just the opposite, B0 is the MSB and B31 is the LSB.
2 of 15 22/12/19, 8:55 am
https://www.ietf.org/rfc/ien/ien137.txt
The English language, like most modern languages, suggests that we lay
these computer words on paper from left to right, like this:
|---word0---|---word1---|---word2---|....
|---word0---|---word1---|---word2---|....
|C0,C1,C2,C3|C0,C1,C2,C3|C0,C1,C2,C3|.....
|B0......B31|B0......B31|B0......B31|......
Many computers share with the Big-Endians this view about order. In
many of their diagrams the registers are connected such that when the
word W(n) is shifted right, its LSB moves into the MSB of word W(n+1).
English text strings are stored in the same order, with the first
character in C0 of W0, the next in C1 of W0, and so on.
This order is very consistent with itself and with the English language.
They believe that one should start with the narrow end of every word,
and that low addresses are of lower order than high addresses.
Therefore they put their words on paper as if they were written in
Hebrew, like this:
...|---word2---|---word1---|---word0---|
When they add the bit order and the byte order they get:
...|---word2---|---word1---|---word0---|
....|C3,C2,C1,C0|C3,C2,C1,C0|C3,C2,C1,C0|
.....|B31......B0|B31......B0|B31......B0|
In this regime, when word W(n) is shifted right, its LSB moves into the
MSB of word W(n-1).
4
English text strings are stored in the same order, with the first
character in C0 of W0, the next in C1 of W0, and so on.
This order is very consistent with itself, with the Hebrew language, and
3 of 15 22/12/19, 8:55 am
https://www.ietf.org/rfc/ien/ien137.txt
C0: "J"
C1: "O"
C2: "H"
C3: "N"
..etc..
The PDP10 and the 360, for example, were designed by Big-Endians: their
bit order, byte-order, word-order and page-order are the same. The same
order also applies to long (multi-word) character strings and to
multiple precision numbers.
Next, let's consider the new M68000 microprocessor. Its way of storing
a 32-bit number, xy, a 16-bit number, z, and the string "JOHN" in its
16-bit words is shown below (S = sign bit, M = MSB, L = LSB):
4 of 15 22/12/19, 8:55 am
https://www.ietf.org/rfc/ien/ien137.txt
|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|......
The M68000 always has on the left (i.e., LOWER byte- or word-address)
the wide-end of numbers in any of the various sizes which it may use: 4
(BCD), 8, 16 or 32 bits.
Let's look next at the PDP11 order, since this is the first computer to
claim to be a Little-Endian. Let's again look at the way data is stored
in memory:
The PDP11 does not have an instruction to move 32-bit numbers. Its
multiplication products are 32-bit quantities created only in the
registers, and may be stored in memory in any way. Therefore, the
32-bit quantity, xy, was not shown in the above diagram.
However, due to some infiltration from the other camp, the registers of
this Little-Endian's marvel are treated in the Big-Endians' way: a
double length operand (32-bit) is placed with its MSB in the lower
address register and the LSB in the higher address register. Hence,
when depicted on paper, the registers have to be put from left to right,
with the wide end of numbers in the LOWER-address register. This
affects the integer multiplication and division, the combined-shifts and
more. Admittedly, Blefuscu scores on this one.
5 of 15 22/12/19, 8:55 am
https://www.ietf.org/rfc/ien/ien137.txt
Well, Blefuscu scores many points for this. The above reference in [3]
does not even try to camouflage it by any Chinese notation.
Let's look at the VAX order. Again, we look at the way the above data
(with xy being a 32-bit integer) is stored in memory:
So, what about the infiltrators? Did they completely fail in carrying
out their mission? Since the integer arithmetic was closely guarded
they attacked the floating point and the double-floating which were
already known to be easy prey.
7
Let's look, again, at the way the above data is stored, except that now
the 32-bit quantity xy is a floating point number: now this data is
organized in memory in the following Blefuscuian way:
Blefuscu scores again. The VAX is found guilty, however with the
explanation that it tries to be compatible with the PDP11.
Having found themselves there, the VAXians found a way around this
unaesthetic appearance: the VAX literature (e.g., p. 10 of [4])
describes this order by using the Chinese top-to-bottom notation, rather
than an embarrassing left-to-right or right-to-left one. This page is a
marvel. One has to admire the skillful way in which some quantities are
shown in columns 8-bit wide, some in 16 and other in 32, all in order to
avoid the egg-on-the-face problem.....
6 of 15 22/12/19, 8:55 am
https://www.ietf.org/rfc/ien/ien137.txt
(vertical) notation because usually the top (aka "up") of the diagrams
corresponds to "low"-memory (low addresses). However, anyone who was
brought up by computer scientists, rather than by botanists, knows that
trees grow downward, having their roots at the top of the page and their
leaves down below. Computer scientists seldom remember which way "up"
really is (see 2.3 of [5], pp. 305-309).
The Big-Endians struck again, and without any resistance got their way.
The decimal number 12345678 is stored in the VAX memory in this order:
7 8 5 6 3 4 1 2
...|-------long0-------|
....|--word1--|--word0--|
.....|-C1-|-C0-|-C1-|-C0-|
......|B15....B0|B15....B0|
TRANSMISSION ORDER
In either of the consistent orders the first bit (B0) of the first byte
(C0) of the first word (W0) is sent first, then the rest of the bits of
this byte, then (in the same order) the rest of the bytes of this word,
and so on.
7 of 15 22/12/19, 8:55 am
https://www.ietf.org/rfc/ien/ien137.txt
There are many ways to devise inconsistent orders. The two most popular
ones are the following and its mirror image. Under this order the first
bit to be sent is the LEAST significant bit (B0) of the MOST significant
byte (C0) of the first word, followed by the rest of the bits of this
byte, then the same right-to-left bit order inside the left-to-right
byte order.
Figure 1 shows the transmission order for the 4 orders which were
discussed above, the 2 consistent and the 2 inconsistent ones.
Those who use such an inconsistent order (or any other), and only those,
have to be concerned with the famous byte-order problem. If they can
pretend that their communication medium is really a byte-oriented link
then this inconsistency can be safely hidden under the rug.
Since the 16-bit communication gear will be provided by the same folks
who brought us the 8-bit communication gear, it is safe to expect these
two modes to be compatible with each other.
8 of 15 22/12/19, 8:55 am
https://www.ietf.org/rfc/ien/ien137.txt
chunk size.
The founders of the Little-Endians party are RS232 and TELEX, who stated
that the narrow-end is sent first. So do the HDLC and the SDLC
protocols, the Z80-SIO, Signetics-2652, Intel-8251, Motorola-6850 and
all the rest of the existing communication devices. In addition to
these protocols and chips the PDP11s and the VAXes have already pledged
their allegiance to this camp, and deserve to be on this roster.
10
The HDLC protocol is a full fledged member of this camp because it sends
all of its fields with the narrow end first, as is specifically defined
in Table 1/X.25 (Frame formats) in section 2.2.1 of Recommendation X.25
(see [2]). A close examination of this table reveals that the bit order
of transmission is always 1-to-8. Always, except the FCS (checksum)
field, which is the only 16-bit quantity in the byte-oriented protocol.
The FCS is sent in the 16-to-1 order. How did the Blefuscuians manage
to pull off such a fiasco?! The answer is beyond me. Anyway, anyone
who designates bits as 1-to-8 (instead of 0-to-7) must be gullible to
such tricks.
The same document ([1], again, p. F-18), states that the data "must
consist of an even number of 8-bit bytes. Further, considering each pair
of bytes as a 16-bit word, the less significant (right) byte is sent
first".
In order to make this even more clear, p. F-23 states "All bytes (data
bytes too) are transmitted least significant (rightmost) bit first".
Note that the Lilliputians' camp includes all the who's-who of the
communication world, unlike the Blefuscuians' camp which is very much
oriented toward the computing world.
9 of 15 22/12/19, 8:55 am
https://www.ietf.org/rfc/ien/ien137.txt
Both camps have already adopted the slogan "We'd rather fight than
switch!".
There are two camps each with its own language. These languages are as
compatible with each other as any Semitic and Latin languages are.
So can all the Little-Endians, even though there are some differences
among the dialects used by different tribes.
CONCLUSION
Each camp tries to convert the other. Like all the religious wars of
the past, logic is not the decisive tool. Power is. This holy war is
not the first one, and probably will not be the last one either.
The "Be reasonable, do it my way" approach does not work. Neither does
the Esperanto approach of "let's all switch to yet a new language".
We would like to see some Gulliver standing up between the two islands,
forcing a unified communication regime on all of us. I do hope that my
way will be chosen, but I believe that, after all, which way is chosen
does not make too much difference. It is more important to agree upon
an order than which order is agreed upon.
time time
10 of 15 22/12/19, 8:55 am
https://www.ietf.org/rfc/ien/ien137.txt
| |
\ | | /
\ | | /
\ | | /
\ | | /
\ | | /
\ | | /
\ | | /
\ | | /
\ | | /
\ | | /
\ | | /
\ | | /
\ | | /
\ | | /
\ | | /
\ | | /
<-MSB---------------LSB- -MSB---------------LSB->
order (1) | | order (2)
time time
| |
/ | | \
/ | | \
/ | | \
/ | | \
/ | | \
/ | | \
/ | | \
/ | | \
/ | | \
/ | | \
/ | | \
/ | | \
/ | | \
/ | | \
/ | | \
/ | | \
<-MSB---------------LSB- -MSB---------------LSB->
order (3) | | order (4)
13
A P P E N D I X
11 of 15 22/12/19, 8:55 am
https://www.ietf.org/rfc/ien/ien137.txt
R E F E R E N C E S
[2] CCITT.
Orange Book. Volume VIII.2: Public Data Networks.
International Telecommunication Union, Geneva, 1977.
[3] DEC.
PDP11 04/05/10/35/40/45 processor handbook.
Digital Equipment Corp., 1975.
[4] DEC.
VAX11 - Architecture Handbook.
Digital Equipment Corp., 1979.
12 of 15 22/12/19, 8:55 am
https://www.ietf.org/rfc/ien/ien137.txt
[5] Knuth, D. E.
The Art of Computer Programming. Volume I: Fundamental
Algorithms.
Addison-Wesley, 1968.
People start counting from the number ONE. The very word FIRST is
abbreviated into the symbol "1st" which indicates ONE, but this is a
very modern notation. The older notions do not necessarily support this
relationship.
In English and French - the word "first" is not derived from the word
"one" but from an old word for "prince" (which means "foremost").
Similarly, the English word "second" is not derived from the number
"two" but from an old word which means "to follow". Obviously there is
an close relation between "third" and "three", "fourth" and "four" and
so on.
Similarly, in Hebrew, for example, the word "first" is derived from the
word "head", meaning "the foremost", but not specifically No. 1. The
Hebrew word for "second" is specifically derived from the word "two".
The same for three, four and all the other numbers.
However, people have,for a very long time, counted from the number One,
not from Zero. As a matter of fact, the inclusion of Zero as a
full-fledged member of the set of all numbers is a relatively modern
concept.
A nice mathematical theorem states that for any basis, b, the first b^N
(b to the Nth power) positive integers are represented by exactly N
digits (leading zeros included). This is true if and only if the count
starts with Zero (hence, 0 through b^N-1), not with One (for 1 through
b^N).
13 of 15 22/12/19, 8:55 am
https://www.ietf.org/rfc/ien/ien137.txt
from One, rather than Zero, would cause either the loss of one memory
cell, or an additional address line. Since either price is too
expensive, computer engineers agree to use the mathematical notation of
starting with Zero. Good for them!
The designers of the 1401 were probably ashamed to have address-0 and
hid it from the users, pretending that the memory started at address-1.
16
This is probably the reason that all memories start at address-0, even
those of systems which count bits from B1 up.
ORDER OF NUMBERS.
Serial comparators and dividers prefer the former. Serial adders and
multipliers prefer the latter order.
When was the common Big-Endians order adopted by most modern languages?
The reason is that when two 16-bit numbers, for example, are multiplied,
the result is a 31-bit number in a 32-bit field. Integers are right
justified; fractions are left justified. The entire difference is only
14 of 15 22/12/19, 8:55 am
https://www.ietf.org/rfc/ien/ien137.txt
SWIFT's POINT
Swift's point is that the difference between breaking the egg at the
little-end and breaking it at the big-end is trivial. Therefore, he
suggests, that everyone does it in his own preferred way.
We agree that the difference between sending eggs with the little- or
the big-end first is trivial, but we insist that everyone must do it in
the same way, to avoid anarchy. Since the difference is trivial we may
choose either way, but a decision must be made.
15 of 15 22/12/19, 8:55 am