The Z1: Architecture and Algorithms of Konrad Zuse's First Computer
The Z1: Architecture and Algorithms of Konrad Zuse's First Computer
The Z1: Architecture and Algorithms of Konrad Zuse's First Computer
Raul Rojas
Freie Universitt Berlin
June 2014
Abstract
This paper provides the first comprehensive description of the Z1, the mechanical
computer built by the German inventor Konrad Zuse in Berlin from 1936 to 1938. The
paper describes the main structural elements of the machine, the high-level
architecture, and the dataflow between components. The computer could perform the
four basic arithmetic operations using floating-point numbers. Instructions were read
from punched tape. A program consisted of a sequence of arithmetical operations,
intermixed with memory store and load instructions, interrupted possibly by input and
output operations. Numbers were stored in a mechanical memory. The machine did not
include conditional branching in the instruction set.
While the architecture of the Z1 is similar to the relay computer Zuse finished in 1941
(the Z3) there are some significant differences. The Z1 implements operations as
sequences of microinstructions, as in the Z3, but does not use rotary switches as microsteppers. The Z1 uses a digital incrementer and a set of conditions which are translated
into microinstructions for the exponent and mantissa units, as well as for the memory
blocks. Microinstructions select one out of 12 layers in a machine with a 3D mechanical
structure of binary mechanical elements. The exception circuits for mantissa zero,
necessary for normalized floating-point, were lacking; they were first implemented in
the Z3.
The information for this article was extracted from careful study of the blueprints drawn
by Zuse for the reconstruction of the Z1 for the German Technology Museum in Berlin,
from some letters, and from sketches in notebooks. Although the machine has been in
exhibition since 1989 (non-operational), no detailed high-level description of the
machines architecture had been available. This paper fills that gap.
The German inventor Konrad Zuse (1910-1995) built his first computing machine from 1936 to 19381
(from 1934 to 1935 he experimented with small mechanical circuits). In Germany, Zuse has always
been considered the father of the computer although the machines he built during WWII became
known only after the conflagration. Zuse studied civil engineering at the Technische Hochschule
Charlottenburg (todays Technical University of Berlin). His first employer was the company Henschel,
who had just started building military airplanes in Berlin in 1933 [1]. The duty of the 25 years old was
to carry out the long chains of structural calculations needed for the manufacturing process of
aircraft components. As a student, Zuse had already started thinking about ways of mechanizing
computation [2]. Therefore, after working just several months for the Henschel Flugzeugwerke, he
decided to quit, build a mechanical computer, and start his own business, in fact, the first computer
company in the world.
During the period 1936-1945, Konrad Zuse was unstoppable, even after two short-lived calls to the
front. He could manage to be recalled to Berlin to work part-time for the Henschel Flugzeugwerke,
and part-time for his own company. In those nine years, he built the six computers known today as
the Z1, Z2, Z3 and Z4, as well as the specialized S1 and S2 machines. The last four were built after
WWII had already started. The Z4 was finished during the closing months of the war. Zuses original
abbreviations for the machines names were V1, V2, V3 and V4 (meaning Versuchsmodell, or
prototype). After the war, he changed the V for a Z for obvious reasons. The V1 (Z1 in what follows)
1
The precise chronology of his line of computing machines was provided by K. Zuse in a small handwritten note
from March 1946. There, the V1 is dated as having been built in the years 1936-1938.
was a fascinating piece of technical brinkmanship: it was a completely mechanical computer, but
instead of using gears and wheels to represent the ten decimal digits (as Babbage had done in the
previous century, or IBM was doing with its Hollerith machines), Zuse decided to build a fully binary
computer. His machine was based on components in which the forward linear movement of a small
rod or metallic plate represented a one, and no movement represented a zero (or vice versa,
according to the component). Zuse developed novel types of mechanical logical gates and finished
the first prototype of the machine in his parents living room. The sequence of events that led to the
Z1 and subsequent machines has been appraised by Zuse himself in his autobiography [2].
The Z1 was a mechanical but also a surprisingly modern computing machine: it was based on binary
numbers, it used a floating-point representation for the data and could perform the four basic
arithmetic operations. The program was read from a punched tape (no conditional branch was
available though), and the results could be stored to or read from memory (16 words). The machine
cycle was around 4 Hz.
The Z1 was very similar to the Z3, finished in 1941, whose architecture has been already described in
the Annals [3]. However, the detailed high-level architecture of the Z1 has never been explained
before. The original prototype was destroyed during a bombing raid in 1943. Only a few sketches and
photographs of the mechanical components survived. In the 1980s, Konrad Zuse, who had retired
many years earlier, obtained financing from Siemens and other German sponsors for building a full
replica of the Z1 which is now housed in Berlins Technology Museum (Fig. 1). Zuse built the machine
with the help of two engineering students: He prepared a full set of blueprints, painstakingly drawing
every single mechanical component (to be cut from sheets of steel), and supervising the construction
over the course of several years at his own house in Hnfeld, Germany. The first sketches of the Z1
reconstruction were drawn in 1984. In April of 1986 Zuse drew a timeline expecting to have the
machine finished by December of 1987. When the machine was delivered to the Berlin museum in
1989 it was shown running and computing some arithmetical operations on several occasions.
However, the reconstructed Z1 was, like the original, never reliable enough to run unattended for
long stretches of time. It even failed at the inauguration and Zuse spent months repairing the
machine. After Konrad Zuse passed away in 1995, the machine was never restarted again.
Fig. 1: A view of the reconstructed Z1 in Berlin (from the Konrad Zuse Internet Archive [5]). The user can rotate
the view around the machine, can zoom in and out. The virtual display is based on thousands of linked
photographs.
Although we have a reconstruction of the Z1 in Berlin, fate struck twice. Other than drawing the
blueprints, Zuse made no serious effort to write a complete top-down description of the
reconstructed Z1 (he expected a local university to do it). This would have been necessary, because it
is evident from comparing the reconstruction with old photographs of the Z1 built in 1938, that the
new machine has been streamlined. The higher precision of the machining equipment available to
Zuse in the 1980s allowed him to build the machine using layers of steel plates which could be placed
tighter together. The new Z1 fills a significantly smaller volume than the old Z1. It is also not
completely clear if the new Z1 is strictly a one-to-one logical and mechanical clone of the original
machine, or if Zuses experience with the Z3 and later machines allowed him to improve portions of
the reconstructed Z1. In the set of mechanical blueprints drawn from 1984 to 1989, there are at least
six different designs for the addition unit, having between five to eight, and finally up to 12
mechanical layers.2 Zuse left no detailed written record which could allow us to answer such
questions. Still worse, he rebuilt the Z1 and left no comprehensive logical description of it for the
second time! He acted like those celebrated clockmakers who only draw the parts of their watches,
leaving no further explanation: first-rate clockmakers would need no further clarifications. His two
student assistants documented only the memory and the tape reader, a heaven-sent piece of
information [4]. Visitors to the Berlin museum can only wonder at the thousands of components
visible in the machine. They can both wonder and despair, since it is almost impossible, even for
professional computer scientists, to visualize the inner workings of this mechanical Leviathan. The
machine is there -- but unfortunately dead.
Fig. 2: The mechanical layers of the Z1. The eight memory layers can be seen on the right, the 12 processor
layers on the left. The lower section with levers is used for transmitting the clock cycles to all parts of the
machine.
This paper is based on a careful study of the blueprints of the Z1, scattered notes in Zuses
notebooks, and numerous on-site inspections of the machine. The reconstructed Z1 has been nonoperational for so many years because the steel plates used by Zuse bend under stress. For this
paper, more than 1100 large format drawings of the machines components were reviewed, as well
as 15.000 pages in notebooks (only a small fraction thereof contained information about the Z1
though). I could only see a short video of parts of the machine operating (filmed almost 20 years
ago). Deutsches Museum in Munich houses 1079 blueprints from Zuses private papers, while the
Berlin Technical Museum has another 314 in its archives. Fortunately, some blueprints include also
the definition and timing of some microinstructions for the Z1, and also a few examples of bit by bit
handwritten calculations made by Zuse. Such examples were probably used by Zuse to check the
machines internal operation and find bugs. This information was like a Rosetta stone, which allowed
us to correlate the Z1 microinstructions with the diagrams and blueprints, and also with our relatively
All the blueprints for the reconstruction of the Z1 have been made available through our Konrad Zuse
Internet Archive at http://zuse-z1.zib.de.
deep knowledge of the relay-computer Z3 (for which we have complete circuits [5]). The Z3 is based
on the same high-level architecture as the Z1, but is different in an important number of ways.
This paper proceeds top-down: first we review the block architecture of the Z1, the layout of the
mechanical components, and I also provide some examples of the mechanical gates used by Zuse.
We then look in more detail at the Z1 core elements: the clocked addition units for exponent and
mantissa, the memory, and the microsequencer for arithmetical operations. We show the interplay
of the mechanical elements and how the sandwiched layout of steel plates helped Zuse organize
the computation. We look at the multiplication and division process, at input and output. The last
part of the paper briefly situates the Z1 in its historical context.
Block-architecture
The Z1 was a clocked machine. Being a mechanical device, the clock was subdivided into four
subcycles which consisted in the movement of mechanical components in four orthogonal directions,
as shown in Fig. 2 (left side, see cycling unit). Each movement direction was called an
engagement by Zuse. He aimed for a 4Hz clock cycle but the Berlin reconstruction never was
operated faster than at 1Hz (four engagements per second). At that speed, a multiplication takes
around 20 seconds.
Fig. 3: Block diagram of the Z1 (1936-38) according to the reconstruction of 1989. The original Z1 had only 16
words of memory instead of 64. The punched tape was made of 35mm film tape. Each instruction was encoded
using 8 bits.
The Z1 has a number of features later adopted in the Z3. From a modern perspective, the most
important innovations in the Z1 (see Fig. 3) were the following:
4
a) It was based on a fully binary architecture for the memory and the processor.
b) The memory was separated from the CPU. In the Berlin reconstruction, the memory and
punched tape reader constitute about one half of the machine. The processor, I/O consoles,
and the microcontrol unit constitute the other half. The original Z1 had 16 words of memory,
the reconstruction has 64.
c) The machine was programmable: the program was read from punched tape using 8 bits (two
bits for the opcode and six bits for memory addressing, or three bits for the opcode of the
four arithmetical and the two I/O operations). Therefore there were only eight instructions:
the four basic arithmetical operations, load-from and store-to memory, one instruction for
reading data from a decimal panel, and another for showing the contents of the result
register on a mechanical decimal display.
d) Floating-point was used for internal data representation, in the memory and in the
processor. Therefore, the processor was divided into two parts: one for handling the
exponents, another for handling the mantissas. The mantissa had 16 bits for the bits after
the binary point. The bit to the left of the point was always 1 (normalized floating-point) and
did not have to be stored. Exponents were represented with 7 bits in twos complement
format (running thus from -64 to +63). The sign of the floating-point numbers was stored in
one additional bit. Therefore the word-length in memory was 24 bits (16 bits for the
mantissa, 7 for the exponent, one bit for the sign).
e) The special case of zero in arguments or results (which cannot be expressed with a
normalized mantissa, where the leading bit is always 1) can be handled within the floatingpoint representation as special values of the exponent. This was done in the Z3 but not in the
Z1, also not in its reconstruction. Therefore, neither the original Z1, nor the reconstruction,
can work with zero as intermediate result. Zuse was aware of this shortcoming, but he left
the solution to the relay machine, which was easier to wire.
f) The CPU was microcoded: operations were broken into sequences of microinstructions, one
for each machine cycle. The microinstructions produced a specific dataflow within the
arithmetical-logical units (ALUs), which were running nonstop, adding whatever two
numbers were stored in its two input registers, in every cycle.
g) Curiously, memory and processor ran independently: the memory would put data at or
collect data from the communication interface, whenever the punched tape gave the order.
The processor would pick, or put data at the interface, when a load or store operation was
executed. It was possible to run only the processor and shut-down the memory, in which
case the data at the interface, supposedly coming from the memory, became zero. It was
also possible to run only the memory and shut-down the processor. This allowed Zuse to
debug each half of the machine independently. When running together, a shaft connecting
the cycling units in each half synchronized both parts of the machine.
Further innovations in the Z1 were similar to some of the ideas presented later in the Z3. The
instruction set was practically the same as in the Z3 but the Z1 could not extract square roots. The Z1
used discarded 35mm film tapes as punched tape.
Fig. 3 shows the abstract diagram of the reconstructed Z1. Notice the two main halves of the
machine: the memory is in the upper half, and the processor in the bottom half. Each half had its
own rotating cycling unit, which further divided each cycle into four mechanical movements in the
directions shown by the arrows. These four movements could be communicated to any part of the
machine using the levers distributed under the computational components. The punched tape was
read, one instruction at a time. The instructions had different durations. Load and store operations
took one cycle, all other operations needed several cycles. The memory address was contained in the
lower six bits of the 8 bit opcode, allowing the programmer to refer explicitly to 64 memory
addresses.
Memory and processor communicated through the buffer between both units shown in Fig. 1. In the
CPU, the internal representation of the mantissa was extended to 20 bits: two additional bits were
5
The Berlin reconstruction of the Z1 is based on a very clean layout. All mechanical components seem
to have been arranged in an optimal way. We mentioned that Zuse designed at least six different
versions of the processor. The relative position of the main components was fixed from the
beginning and might reflect the original distribution of the mechanical elements in the original Z1.
There are two main divisions: a gap separates the memory from the processor (as shown in Fig. 3). In
fact, both parts of the machine could be actually pulled apart for debugging purposes since they
were mounted on separate tables with rollers. A further horizontal plane subdivides the machine
6
into an upper part containing the computational components (those visible in photographs of the
Z1), and a lower part containing all the synchronization levers. This Z1 underworld is only visible
when the visitor bends over to look underneath the computational skyline. Fig. 4 is a drawing from
the blueprints showing the computation and the synchronization levels for part of the processor.
Notice the 12 layers of computational components and the lower section with three levels for levers.
This blueprint is a good example of how difficult it can be to interpret the drawings. While there is
much detail about the size of the parts, there are just a few annotations about their use.
Fig. 4: Blueprint for the computation and synchronization layers of the Z1 (exponents unit)
Fig. 5 shows the distribution of logical components in the reconstructed Z1, seen from above, and as
drawn by Zuse, further annotated with the logical functionality of each block (this sketch has been
available since the 1990s). On the upper part we see the three memory banks. Each can contain eight
8-bit words per layer. Every memory bank has 8 mechanical layers, so that a total of 64 words can be
stored. The first memory bank (10a) is used for the exponent and sign, the last two banks (10b, 10c)
are used for the lower 16 bits of the mantissa of the stored numbers. This distribution of bits allowed
Zuse to build three identical 8-bit memory banks and use them for exponent and mantissa,
simplifying thus the mechanical construction.
Between memory and processor there is a buffer for passing numbers to the processor (blocks
12abc), or for receiving numbers from it. There is no way of coding constants in the punched tape. All
numbers have to be entered by the user using the decimal input panel (block 18, right side), or must
be generated by the computer itself as intermediate results.
Each unit in this diagram shows just the uppermost layer. Remember that the Z1 is built like a
sandwich of mechanical parts. Each computational layer is strictly separated from the laver above
or below (each layer has a metallic floor and a metallic ceiling). Communication between layers is
done using vertical rods than can pass movement from one layer to those above or below it. The
vertical rods are the small circles drawn between the rectangles representing layers of computation.
The somewhat larger circles drawn inside the rectangles represent logical operations. Inside each
circle we can find a binary gate (and going down through the layers, up to 12 gates for each circle).
This drawing allows us to make an estimation of the number of logical gates present in the Z1. Not all
units have the same height, and not all layers are populated with mechanical components. A
conservative estimation of the number of binary elements would be 6.000 gates.
The mechanical structure of the Z1 can be best understood by looking at a few simple examples of
the type of binary logic gates that Zuse used in his machines. The classical digital representation for
decimal digits has always been the rotary dial. A gear is divided into ten sectors -- by rotating the
gear it is then possible to count from zero to nine. Zuse decided as early as 1934 to use the binary
system (which he called, following Leibniz, the dyadic system). In Zuses technique, a planar plate can
have one of two positions (0 or 1). It is possible to move from one state to the other using linear
motion. Logical gates pass movement from one plate to another, according to the value of the
represented bits. The structures are three-dimensional: they consist of arrangements of superposed
planar plates which transmit movement usually through cylindrical rods or pins positioned vertically,
at right angles to the plates.
We show examples of the three basic gates: conjunction, disjunction, and negation. There are many
possible mechanical realizations for the main idea, and Zuse showed great creativity drawing always
the variation of a gate that best fitted the 3D structure of the machine. Fig. 5 shows what Zuse called
the elementary gate. The actor plate can be regarded as the machine cycle. This plate moves
cyclically from right to left and back. The upper plate is the data bit we are using for control. It can be
in the position 1 or 0. The rod going though the openings moves horizontally following the plate
(keeping its verticality). If the upper plate is in the 0-position, the actor plates movement cannot be
transmitted to the actuated plate (see Fig. 6, left side). If the data bit plate moves to the 1-position,
the movement of the actor plate is transmitted to the actuated plate. This is what Konrad Zuse called
a mechanical relay, just a switch that closes a mechanical current. This elementary gate can thus
copy a bit from the upper to the acted plate, rotating the movement of the bit by 90 degrees.
Figure 6: An elementary gate is a switch. If the data bit is 1, the actor and actuated plates are connected. If the
data bit is zero, they are disconnected and the movement of the actor is not transmitted.
Fig. 7 shows now such plate arrangements as seen from the top. The actor plate is shown with its
opening. The control plate in green pulls the circle (rod) up or down. The actuated plate (red) can
move to the right or left, but only when the rod is in such a position that the actors opening moves
the rod. There is a drawing of the equivalent switch to the left of each mechanical top view. The
control bit can close or open the gate. The actor plate can be pulled or pushed (as shown by the
arrows). Zuses convention was to always draw the switch in the zero position of the control bit, as
done in Fig. 7. Zuse preferred plates to be pushed by the actor plate (right side of Fig. 7) rather than
be pulled (left side of Fig. 7). It is now very easy to build a negation gate, by using a closed switch
which is opened by the position 1 of the control bit (as shown in the bottom two diagrams in Fig. 7).
Having a mechanical relay, it is now straightforward to build the rest of the logical operations. Fig. 8
shows the necessary circuits, now only using abstract notation. The equivalent mechanical
contraptions are easy to conceive.
Fig. 7: Some variations of the elementary gate and Zuses abstract notation for mechanical relays. The relays
are drawn as switches. By convention, the drawing always shows the zero-position of the control bit. The
arrows show the possible movements. The actuator plate can be pulled to the left (left side diagrams), or
pushed to the right (right side diagrams). The initial position of the mechanical relay can be in the closed
position (lower two diagrams). In that case the relay acts as a negation, since the output is the negation of the
control bit.
Fig. 8: Some logical gates built from mechanical relays. The lowest diagram, an XOR, can be built by using
mechanical relays with two possible actuated plates, as shown in the diagrams. The mechanical equivalents are
easy to design.
10
Now everybody can start to build his/her Zuse mechanical computer. The basic element is the
mechanical relay. More complex connections (like the relays with two actuated plates) can be
designed, and the corresponding mechanics has to be built using plates and rods.
The main problem for building a complete computer is to interconnect all components. Notice that
the control bit always moves orthogonally to the result bit. Each completed logical operation rotates
the mechanical movement by 90 degrees. The next logical operation rotates the movement by 90
degrees, and so on. After four gates, we are back to the original direction of movement. This is why
Zuses cycling units used the four directions NESW. Within a machine cycle it is possible to execute
four layers of logical computations. The logical gates can be simple, such as a negation, or complex,
such as a gate with two actuated plates (half of an XOR). The clocking in the Z1 is such that the
machine completes an addition in four engagements: in engagement IV the arguments are loaded.
Engagements I and II compute partial sums and carries, and engagement III the final result.
Result bits can be transferred to different horizontal levels than the level at which the input bits
move. That is, rods can be also used to move bits up or down between the layers of the machine.
We will see this later in the addition circuits.
At this point Fig. 5 should make more sense: the circles inside the different units are exactly the
circles of Zuses abstract notation and pinpoint the position of logical gates. We can now abstract
from the mechanics and discuss the Z1 from a more logical point of view.
The memory of the Z1
The memory of the Z1 was until now the best understood part of the Z1. It was described by
Schweier and Saupe [4] in the 1990s. A very similar type of memory was used for the Z4, a relay
computer finished by Konrad Zuse in 1945. The Z4 had a processor built with telephone relays, but
the memory was mechanical, just like in the Z1. The mechanical memory of the Z4 is housed today at
Deutsches Museum. Its operation has been simulated in a computer by a student assistant.
The main concept used in the Z1 was that a bit can be stored using a vertical pin which can be set in
one of two possible positions. One position represents zero, the other position represents one. The
diagram below shows how the memory bits can be set by moving them from one position to the
other.
11
Figure 9: One mechanical bit in the memory. The pin can be stored in the zero or one position. Its position can
be read.
Diagram 9(a) shows two stored bits. In step 9(b), a control plate moves the pins up. In step 9(c) the
horizontal actuated plate is pusehd (lower bit) or not (upper bit) by the stored bit and the clocked
plate. In step 9(d) the bits are moved back to their original position, where the vertical control plate
can bring them to position 9(a). Reading a bit from this type of memory was a destructive process.
After reading a bit, the contents of the bit cell had to be restored by the movement back shown in
9(d).
The memory words were addressed by decoding the 6 bits used for an address. Three bits selected
one of 8 layers, the other three one of eight memory words. The decoding circuit for each layer was a
classical binary tree of relays with three levels, as used in the Z3 (with a different number of levels).
We do not delve further into the structure of the mechanical memory. The details can be consulted
in [4].
Number 1
Number 2
1
0
0
0
1
0
1
0
1
1
XOR
AND
Carries
Result
(XOR)
1
0
0
0
0
1
1
0
1
1
0
1
0
1
0
Konrad Zuse used anticipating carriage in all his machines. Instead of propagating a carry through
the different binary powers sequentially, the carry for all positions can be set in one step. The
example above serves to illustrate the procedure. The first XOR is the partial result of the sum of the
two registers without considering carriages. The AND computes the generation of carry bits: they are
transported to the next bit to the left, but are further transported to the next binary position as long
as there is a one in the result of the previous XOR computation. In the example, the first carry
computed with the AND is transformed into three carries, which are finally XORed with the result of
the first XOR. A sequence of consecutive 1s from the XOR operation operates like a kind of
bandwagon for transporting AND-generated carries until the chain of 1s breaks.
The circuit shown in Fig. 10 is the addition circuit used in the reconstructed Z1. The diagram shows
the addition of two bits stored in the a and b rods (a could be the i-th bit of register Aa, and b the
corresponding bit of register Ab). The XOR and the AND computations are performed in parallel using
12
the binary gates 1,2,3, and 4. The AND operates on gate 5, generating the carry bit ui+1, while the
XOR closes or leaves open the chain of XOR bits using gate 6. Gate 7 is an auxiliary gate for passing
the XOR result to the upper level. Gates 8 and 9 compute the final XOR to complete the addition.
The movements of the different components are indicated by the arrows. All four cycle directions are
used, that is, an addition takes one full cycle, from operand loading up to result generation. The
result is passed to rod e, the i-th bit of the register Ae.
This addition circuit is located in layers 1, 2, and 3 of the addition block (see Fig. 13 further down). It
is remarkable that Konrad Zuse, who had no formal training in binary logic, was working with
anticipating carry. The ENIAC, the first large-scale electronic computer, propagated the carry
sequentially from one decimal position in an accumulator to the next. The Harvard Mark I used
anticipating decimal carry.
Fig. 10: Addition unit of the Z3. Computation runs from left to right. Bitwise AND and XOR are computed first
(gates 1, 2, 3, 4). The carry bits are computed in engagement II (gates 5 and 6). In engagement III an XOR
finishes the computation of the addition (gates 8 and 9).
Every operation in the Z1 is broken into a sequence of microinstructions. This is done through a kind
of table of criteria consisting of 108 metallic plates placed in pairs, as shown in Fig. 11 (here we
only see the top pair of plates, as seen from layer 12. The rest of the plates is situated below these
two upper plates, across the 12 layers). The entries in the table (the metallic plates themselves) are
ordered according to the values of ten bits:
-
The Op0, Op1 and Op2 bits contain the binary opcode of the instruction
13
The bits S0 and S1 are conditions bits, set by other parts of the machine. When S0=1, for
example, an addition is transformed into a subtraction.
The bits Ph0, Ph1, Ph2, Ph3, Ph4 are used to count the number of microcycles (or phases)
in an instruction. Multiplication, for example, is executed in 20 phases and the five bits Ph0
to Ph4 advance from 0 to 19 during the operation.
The ten bits theoretically allow us to define up to 1024 different conditions or cases. An instruction
can contain up to 32 phases. The ten bits (opcode, condition bits, and phase) push metallic pins
(colored gray in Fig. 11) which hold the microcontrol plates from snapping to the left or right (each
plate is attached to a spring, shown in the figure). Each microcontrol plate has a different
configuration of teeth, which determines if the current position of the ten control pins can stop the
plate from snapping or not. Each microcontrol plate has therefore an address. When the ten
control bits contains that address, the plate snaps to the right (the upper set of plates), or to the left
(the lower set of plates in Fig. 11).
If a control plate moves to the right it presses on four condition bits (A, B, C, D). Each metallic plate is
cut so that it presses on a different combination of A, B. C, and D according to the selected criterion.
Since the plates are mounted on top of each other, across the 12 layers of the machine, activating a
control plate automatically selects the layer for the next operation. One microoperation in the
exponent unit can also be started parallel to a microoperation in the mantissa unit, since two plates
can snap simultaneously: one to the left, another to the right. It is also possible for two plates to snap
on two different layers of the right side (mantissa control), but mechanical constraints limit this kind
of parallelism.
Fig. 11: The control plates. The teeth of each plate makes it stop, according to the position of the metallic pins
(in grey), activated by the ten bits Op2 to Ph0. Springs make a plate snatch to the right (upper plate) or to the
left (lower plate), when its address has been selected. Selecting a plate out from the 12-level stack of plates
selects the layer for the next operation. The teeth A, B, C or D can be cut, so that only the necessary operation
is selected by pressing on pins in the microcontrol unit. In the diagram, the upper plate has moved to the right
and presses on the three pins A, C, and D.
Controlling the Z1 amounts thus to adjusting the teeth of the metallic plates so that each one of
them responds to a specific ten-bit combination, for acting on the left, or on the right units. The left
side controls the exponents half of the processor. The right side the mantissa half. The alternatives
14
A, B, C or D are exclusive, that is, only one of them is selected by a microcontrol plate (by not pushing
it).
Fig. 12 shows the floating-point processor of the Z1. The processor has one datapath for handling the
exponents (left side) and one for handling the mantissas (right side). The floating-point registers F
and G consist of 7 bits for the exponent, and 17 bits for the mantissa. The exponent-mantissa pair
(Af,Bf) is FP register F and the pair (Ag,Bg) is FP register G. The signs of the arguments are handled
externally, in a sign unit. The sign of a product or a division is computed in advance. The sign of an
addition or subtraction is adjusted after the operation takes place.
In Fig. 12 we can see the registers F and G and their connections to the rest of the processor. The
ALU (arithmetical logical unit) contains two FP registers: the pair (Aa,Ba) and the pair (Ab,Bb). These
registers are the direct inputs to the ALUs. They have to be loaded and can retain partial results
during several iterations, due to the feedback bus from the ALU-outputs Ae and Be.
In the Z1 the data buses are used in three-state mode, that is, many inputs can push on the same
data line (which is a mechanical component). There is no need to isolate electrically the data lines
from the inputs, since no electricity is in play. Since a zero input is represented by no movement of a
mechanical part (no pushing), while a one represents a movement (pushing), there is no conflict
between the parts. If two parts push on the same data line, the only important thing is that they act
in step with the machine cycle (pushing only works in one direction).
Fig. 12: The processor datapath in the Z1. The left part corresponds to the exponents ALU and registers, the
right side to the mantissas. The resulta Ae and Be can be fed back to the temporary registers, or they can be
negated, or shifted. The four bits representing a decimal digit are copied to register Ba directly, one digit after
another, using four bits. The decimal-binary conversion operates on this data.
15
The only registers visible to the programmer are (Af,Bf) and (Ag,Bg). They have no address: the first
register loaded by a Load operation is (Ag,Bg), the second register loaded afterwards is (Ag,Bg). Once
two registers have been loaded, the arithmetical operations can be started. (Af,Bf) is also the result
register for arithmetical operations. The second register can be implicitly loaded after an arithmetical
operation and be the second argument for a new arithmetical operation. This scheme of register
usage is the same as in the Z3. In the Z3, however, there is no register pair (Ag,Bg). The coordination
between main and auxiliary registers is more complicated than in the Z1.
As can be seen from the processor datapath, the individual registers Aa, Ab, Ba, and Bb can be loaded
with different kinds of data: values from other registers, constants (+1, +1, 3 and 13), negative values
of other registers, and the values coming back from the ALUs. The ALU outputs can be negated or
shifted. A shift to the left by n places is represented by a box containing a multiplication with ; a
shift to the right by n places by a division with . These boxes are mechanical circuits containing the
appropriate bit displacement or bit complementation. The result of the addition of the registers Ba
and Bb, for example, is stored in Be and can be transformed in several ways: the result Be can be
negated (-Be), can be shifted one or two places to the right (Be/2, Be/4), or it can be shifted one or
three places to the left (2Be, 8Be). Every such computation is performed in a different layer of the
mechanical stack of layers constituting the ALU. The appropriate result, according to the active
computation, is passed back to register Ba or Bb. Case selection is done by levers that activate the
appropriate layer as selected by the microcontroller. The result Be can also go straight to the
memory unit (the corresponding bus line is not shown in Fig. 12).
The ALU performs an addition in every cycle. All registers Aa, Ab, Ba, and Bb are erased after an ALU
computation, and can be reloaded with the feedback values.
Fig. 13: The layered spatial distribution of operations in the processor. The shifters for Be are on the left stack.
The addition unit is distributed between the three leftmost stacks. The shifters for Bf are in the right stack, and
the binary equivalent of 10^-16 too. The result goes to memory through the line labeled Res on the right. The
two registers Bf and Bg arrive from memory as First (Op1) or second operand (Op2).
Register Ba has a special use for the conversion of four decimal digits into binary. Each decimal digit
entered through the mechanical panel is transformed into 4 bits. Groups of 4 bits are fed directly into
register Ba (at position
), which can advance the four bits by performing a multiplication with
16
the factor 10, adding then the next digit to the partial result, multiplying again by ten, and so on. If
we want to transform the number 8743 from decimal to binary, for example, the digit 8 is entered
first and is multiplied by ten. Then 7 is added to the result, and the new sum (87) is multiplied by ten.
The result (870) is then added to 4, and so on. This yields a simple algorithm for the conversion of
decimal input to a binary number. During the process the exponent half of the processor is adjusting
the exponent of the final floating-point result (therefore the constant 13 in the exponent ALU, which
corresponds to
(see decimal-binary conversion algorithm further down).
Fig. 13 shows the spatial distribution of the different elements of the processor datapath for the
mantissa part. All the shifters have been allocated in different layers of the twelve constituting the
leftmost module of the machine. The registers Bf and Bg come from the right side, directly from
memory (layers 5 and 7). The result Be is fed back to memory crossing through level 8. The bits of the
registers Ba, Bb, and Be are stored in vertical rods (only one bit is shown in this cross-section of the
processor). The ALU is distributed in two mechanical stacks. Level 1 and 2 compute the AND and XOR
of the individual bits in Ba and Bb. The results are passed to the right, where the carry-bits and the
final XOR is computed and stored in Be. The result Be can go back to be stored in memory, or can be
shifted in all the different ways shown, being fed back to Ba or Bb, as desired. Some circuits seem
redundant (there are two ways of loading Be into Ba, for example), but they represent alternatives.
Level 12 loads Be into Ba unconditionally, level 9 only if the exponent Ae is zero. The boxes marked
green in the diagram are empty layers, where no computation takes place and mechanical
components can pass through. The box around the bars Bf and Bf contains the shifter for Bf needed
for multiplication (where each bit of Bf is read one after the other, starting from the lowest binary
power).
Fig. 14: Communication betwen the exponent and the mantissa ALUs.
Now you can picture the computational stream in this machine: data flows from the registers F and G
into the machine, filling the A and B register pairs. A single addition or sequences of
additions/subtractions (for multiplication or for division) are performed. Partial results are recycled
17
in the A and B registers until the result is complete. The final result is then loaded in register F and a
new computation can be started again.
As explained above, the Z1 could perform the four basic arithmetical operations. In the tables
discussed below, the convention has been used of representing a binary one with the letter L. The
tables display the sequence of microinstructions needed for each operation and how it affects the
dataflow between the registers in the processor. One table summarizes addition and subtraction
(using twos complement), one table summarizes multiplication, and one table is for division. There is
also one table for each of the two I/O operations: decimal-binary and binary-decimal conversion. The
tables are divided into Part A for the exponent, and part B for the mantissa. The registers Aa, Ab, Ba,
and BB are loaded as shown in each row of a table. The phase of the operation is given by the column
labeled Ph. Conditions can trigger or inhibit an operation from starting. If a row is executed,
condition bits can be set, or the next phase (Ph) can be computed by the incrementer.
Addition/Subtraction
The table of microinstructions below covers both the case of an addition of numbers, as well as a
subtraction. The main problem for both operations is to scale the two numbers to be
added/subtracted so that the binary exponent is the same. Assume that the numbers
and
are to be added. If
the two mantissas can be added immediately. If
then the
smaller number is rewritten as
. The first multiplication is equivalent to shifting the
mantissa
by (a-b) places to the right (making the mantissa smaller). Let us call
.
The two mantissas to be added are now
and
. The common binary exponent is . A similar
procedure is used in case that
.
Fig. 15: The microinstructions for addition and subtraction. An addition is finished in 5 cycles, a subtraction in 6.
Once the numbers have been aligned, condition bit S0 is tested (cycle 4). If S0 is one, an addition of the
mantissas is performed. If S0 is zero, a subtraction of the mantissas takes place in the same cycle.
18
In the table (Fig. 15), the maximum binary exponent of the two numbers is found first, and the
mantissa of the smaller number is shifted to the right, as many places as necessary until the two
binary exponents are equal. The actual addition starts in cycle 4 and is performed in just one cycle by
the ALU. In cycle 5, it is tested if the new result mantissa is normalized, and if not, it is shifted to
normalize it. It could happen (after a subtraction) that the result mantissa is negative in which case
the result is negated to make it positive. This change of sign is written down in the condition bit S3, in
order to make the necessary adjustment to the sign of the final result. At the end, the result is
normalized.
The sign unit near the tape reader (see Fig. 5, Block 16) computes in advance the sign of the result
and the type of operation. If we assume that the mantissas x and y are positive, then we have the
following four cases for addition and subtraction (after having distributed the signs). We call the
result z:
1) z = +x +y
2) z= +x -y
3) z= -x +y
4) z= -x y
The cases (1) and (4) can be handled with an addition in the ALU. In case (1) the result will be
positive. In case (4) it will be negative. Cases (2) and (3) require a subtraction. The sign of the
subtraction is computed in phase 5 (Fig. 14).
An addition runs in the following steps:
-
Determine difference
of exponents in the exponent unit,
Select largest exponent,
Shift mantissa of the smaller number D times to the left,
Add the mantissas,
Normalize the result,
The sign of the result is the sign of both arguments.
A subtraction runs in the following steps:
19
If the mantissa after the multiplication is larger or equal than 2, then the result is normalized in cycle
18 by shifting one position to the right. Cycle 19 puts the final result in the data bus.
Part<A<(Exponent)
Block<17<(left<side)
<Operation
Description
Multiplication<()
Sum<of<the<
Exponents<
21
21
actual<
Multiplication<()
Finish
26
26
27
0
0
0
0
Block<15b
Criterion
Condition
Aa
Ab
No. S0 S1 Ph
24
Align<<<<
Block<17<(right<side)
Criterion
to<set
Symb
Part<B<(Mantissa)
Block<15a
0
0
Ph
"R4.
Condition
Ba
Bb
Af
Ag
1<<<<
<< Ph,<"R4,<Ae
17
18
18
19
to<set
No. S0 S1 Ph
Ph,<Ae
+1
Be+1<=<L
Ae
Ae
24
1<<<<
to<
17
24
1<<<
to<
17
26
26
27
18
18
19
<Be
mm<=<L
Be+1<=<0
Be+1<=<L
Bg
Be
<Be
Be
Fig. 16: The microinstructions for multiplication. The multiplier-mantissa Bf is stored in a shift register (shift
right). The multiplicand-mantissa is stored in register Bg.
Division
The division algorithm takes 21 cycles and is based on so-called non-restoring floating point
division. The bits of the quotient are computed one by one, starting from the highest order bit and
moving to the lower-order bits successively.
First the difference of the exponents is computed in cycle 0, and then the division of the mantissas is
executed. The divisor mantissa has been stored in Register Bg and the dividend mantissa in register
Bf. The remainder is initialized to Bf in cycle 0. In each cycle thereafter, the divisor is subtracted from
the remainder. If the result is positive, the corresponding bit in the mantissa of the result is set to
one. If the result is negative, the bit in the result mantissa is set to zero. The result bits are computed
one after the other, from bit zero to bit -16. There is a mechanism in the Z1 for setting the bits of
register Bf one after the other, as needed.
If the remainder becomes negative, there are two possible strategies for continuing. In restoring
division the divisor D is added back to the remainder (R-D), in order to come back to the positive
remainder R. Then the remainder is shifted one position to the left (which is equivalent to shifting
the divisor to the right) and the algorithm continues. In non-restoring division, the remainder (R-D)
is shifted one position to the left and then the divisor D is added. Since in the previous step (R-D) was
negative, the shift to the left transforms this quantity in (2R-2D). If we now add the divisor, we obtain
(2R-D), which is the subtraction of D from the shifted positive R, as we would like to have in the next
step of the division algorithm. The algorithm can continue in this manner until the remainder
becomes positive, and then we continue by subtracting again the divisor D. In the table below
refers to the carry bit for the binary power at position 2. If this bit is set, the result of the addition
was negative (in twos-complement arithmetic).
Non-restoring division is a very elegant way of computing the quotient of two floating-point
mantissas, since the restoring step (one additional cycle) is avoided.
20
Fig. 17: The microinstructions for division. The multiplier-mantissa in Bf is pushed bit by bit in a shift register
(shift left). The multiplicand-mantissa is kept in register Bg.
Somewhat puzzling is the fact that the Z3 tested first if the subtraction of Ba and Bb could become
negative, during a division, in which case the subtraction was undone by using a shortcut bus from
Ba to Be (eliminating the result of the subtraction). This extra hardware was not used in the
reconstructed Z1 and the non-restoring algorithm seems more elegant than the solution used in the
Z3.
The input console consists of four columns of ten small plates each. In each column (called Za3, Za2,
Za1 and Za0, in that order, from left to right) the operator can pull out any of the digits 0 to 9. He or
she can thus enter any decimal number of four digits using the four columns. Pulling a digit plate just
generates its binary equivalent in the input console (using four bits). The input console is therefore
just a four by ten table of the ten binary equivalents of the digits 0 to 9.
The microcontroller of the Z1 takes then care of passing each decimal digit Za3, Za2, Za1 and Za0 to
the datapath through register Ba (at position
corresponding to the power
). Za3 is
entered first (in register Ba) and then a multiplication by ten takes place. Digit Za2 is next, and
another multiplication by ten ensues. This is repeated for all four digits. The binary equivalent of the
four decimal digits is contained in the Be after cycle 7. In cycle 8 the mantissa is normalized, if
needed. The constant 13 (LL0L in binary) is added once to the exponent (phase 7) in order to account
for the fact that the digits are entered at the mantissa bit -13.
The decimal exponent of the number is set using a lever. In cycle 9 as many multiplications with the
factor ten are performed, as indicated by the position of the lever for the decimal exponent.
21
Fig. 18: The microinstructions for decimal-binary conversion. Four decimal digits are entered through a
mechanical device.
The table in Fig. 19 shows how to transform a binary number contained in register Bf to a decimal
number to be shown in the decimal output panel.
To avoid dealing with negative decimal exponents, the number in register Bf is first multiplied by
(Zuse limited the operating range of the machine to handle only numbers larger than
as
results, although partial results in the ALU could be smaller than this). This happens after phase 1.
This multiplication is handled by the multiplication operation in the Z1 and the decimal-binary
conversion stays suspended during the cycles that the multiplication needs.
Fig. 19: The microinstructions for binary-decimal conversion. Four decimal digits are displayed in a mechanical
device.
Afterwards, the mantissa is shifted two places to the right (in order to set the binary point so that to
the left of it we can now have four bits). The mantissa is shifted until the exponent is positive and
then three multiplications by ten ensue. After every multiplication the integer part of the mantissa is
copied, erased from the mantissa, and the copy (four bits) is transformed into a decimal digit using a
table (that is the 2B-8B operation in phase 4 to 7). Every decimal digit is shown in the output panel
22
(starting with the highest order decimal digit). Every time a multiplication by ten is performed, the
exponent arrow in the decimal display displaces one place to the left.
9
Conclusions
The original Z1 was destroyed during an allied bombing raid flown in December 1943 over Berlin. It is
impossible to decide today if the original Z1 was identical to the reconstructed Z1. The few
photographs that survived show that the original was bulkier and had a less regular form. Here we
can only take Zuse at his word. However, I think that he had no real reason to consciously
embellish the original machine through the reconstruction. Memory can be a tricky fellow though.
The few notes Zuse scribbled between 1935 and 1938 seem to be consistent with the later
reconstruction. The Z3 was finished in 1941, and according to Zuse, it was very similar in design.
Siemens (the company that acquired Zuses computer company) financed the reconstruction of the
Z1 in the 1980s. Zuse did all the construction work at his home, having two students assisting him.
When the Z1 was finished, part of the wall in the upper floor of Zuses home had to be removed, so
that a large crane could lift the machine for its transportation to Berlin.
The reconstructed Z1 is a very elegant computer, consisting of thousands of components but not one
too many. It would have been possible to use only two shifters at the output of the mantissa ALU (a
shift by one bit to the left, and one bit to the right), but the selection of shifters made by Zuse speeds
up the basic arithmetical operations significantly, at a low cost in components. I find the processor of
the Z1 rather more elegant than the processor of the Z3, since it is more compact and
fundamental. It is as if when Zuse moved to telephone relays, the simpler and more reliable
components allowed him to be profligate with the size of the CPU. The same happened years after
the Z3, when the Z4 was finished. The Z4 was just a larger Z3 with a larger instruction set, but the
computer architecture was roughly the same, even though the Z4 had more instructions. The
mechanical Z1 never worked consistently and Zuse himself later called the mechanical realization a
dead end. He used to joke that the 1989 reconstruction of the Z1 was quite accurate, because the
original was not reliable, and neither the reconstruction. Curiously, the mechanical memory design
was sufficiently dependable to use it again for the Z4, as a way of saving telephone relays. The
mechanical memory of the Z4 was operational from 1950 to 1955 in Switzerland, where the machine
was installed at the ETH Zrich [7].
What I find most surprising is how the young Konrad Zuse could come to such an elegant design for a
computing engine. Whereas the ENIAC, or Mark I teams in the US consisted of seasoned scientists
and electronic experts, Zuse was working in isolation and without real previous experience. From the
architectural point of view, we compute today as Zuse did in 1938, not as the ENIAC did in 1945 [8].
More elegant architectures were only introduced later with the EDVAC report and the bit-serial
machines developed by von Neumann and Turing. John von Neumann lived from 1926 to 1929 in
Berlin and was the youngest Lecturer (Privatdozent) at the University of Berlin. Konrad Zuse and von
Neumann may have crossed paths inadvertently during those years. So much could have happened
in Berlin before madness took over and a dark night fell over Germany.
23
Fig. 20: Sketch of one of Zuses early designs for a Z1 reconstruction. Undated.
References
[1] Horst Materna, Die Geschichte der Henschel Flugzeug-Werke in Schnefeld bei Berlin 1933-1945,
Verlag Rockstuhl, Bad Langensalza, 2010.
[2] Zuse, K., Der Computer Mein Lebenswerk, Springer-Verlag, Berlin, 3rd Edition, 1993.
[3] Rojas, R., "Konrad Zuse's legacy: the architecture of the Z1 and Z3", Annals of the History of
Computing, Vol. 19, N. 2, 1997, pp. 516.
[4] Ursula Schweier, Dietmar Saupe, Funktions- und Konstruktionsprinzipien der
programmgesteuerten mechanischen Rechenmaschine Z1, Arbeitspapiere der GMD 321, GMD,
Sankt Augustin, August 1998.
[5] Rojas, R. (ed.), Die Rechenmaschinen von Konrad Zuse, Springer-Verlag, Berlin, 1998.
[5] Website: Architecture and Simulation of the Z1 Computer, http: http://zuse-z1.zib.de/, last
access: July 21st, 2013.
[6] Konrad Zuse, Rechenvorrichtung aus mechanischen Schaltglieder, Zuse Papers, GMD 019/003
(undated), http://zuse.zib.de/, last access July 21st, 2013.
[7] Bruderer, H.: Konrad Zuse und die Schweiz: Wer hat den Computer erfunden?,
Oldenbourg Wissenschaftsverlag, Munich, 2012.
[8] Goldstine, H.: The Electronic Numerical Integrator and Computer (ENIAC), Annals of
the History of Computing, Vol. 18 , N. 1, 1996, S. 1016.
24