Transputer Architecture: Reference Manual
Transputer Architecture: Reference Manual
architecture
Reference manual
INMOS
July 1987
72-TRN-048-03
You may not:
1. Modify the Materials or use them for any commercial purpose, or any public
display, performance, sale or rental;
2. Remove any copyright or other proprietary notices from the Materials;
This document is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.
2
Contents
Preface 5
1 Introduction 5
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Transputers and occam . . . . . . . . . . . . . . . . . 6
1.2 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 System design . . . . . . . . . . . . . . . . . . . . . . . 7
Programming . . . . . . . . . . . . . . . . . . . . . . . 7
Hardware . . . . . . . . . . . . . . . . . . . . . . . . . 8
Programmable components . . . . . . . . . . . . . . . 8
1.2.2 Systems architecture . . . . . . . . . . . . . . . . . . . 9
Point to point communication links . . . . . . . . . . . 9
Local memory . . . . . . . . . . . . . . . . . . . . . . 9
1.2.3 Communication . . . . . . . . . . . . . . . . . . . . . . 10
2 Occam model 11
The programming model for transputers is defined by
occam. . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Occam overview . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Processes . . . . . . . . . . . . . . . . . . . . . . . . . 13
Assignment . . . . . . . . . . . . . . . . . . . . . . . . 14
Input . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Output . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Constructs . . . . . . . . . . . . . . . . . . . . . . . . 14
SEQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
PAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Communication . . . . . . . . . . . . . . . . . . . . . . 16
IF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
ALT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3 Repetition . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.4 Replication . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.5 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Primitive types . . . . . . . . . . . . . . . . . . . . . . 18
2.2.6 Declarations, arrays and subscripts . . . . . . . . . . . 18
2.2.7 Procedures . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.8 Expressions . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.9 Timer . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.10 Peripheral access . . . . . . . . . . . . . . . . . . . . . 21
2.3 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
PLACED PAR . . . . . . . . . . . . . . . . . . . . . . 22
PRI PAR . . . . . . . . . . . . . . . . . . . . . . . . . 22
3
INMOS standard links . . . . . . . . . . . . . . . . . . 22
3 Error handling 22
4 Program development 23
Logical behaviour . . . . . . . . . . . . . . . . . . . . . 23
4.1 Performance measurement . . . . . . . . . . . . . . . . . . . . 24
4.2 Separate compilation of occam and other languages . . . . . . 25
4.3 Memory map and placement . . . . . . . . . . . . . . . . . . . 25
5 Physical architecture 26
5.1 INMOS serial links . . . . . . . . . . . . . . . . . . . . . . . . 26
5.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.1.2 Link electrical specification . . . . . . . . . . . . . . . 26
5.2 System services . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2.1 Powering up and down, running and stopping . . . . . 27
5.2.2 Clock distribution . . . . . . . . . . . . . . . . . . . . 28
5.3 Bootstrapping from ROM or from a link . . . . . . . . . . . . 28
5.4 Peripheral interfacing . . . . . . . . . . . . . . . . . . . . . . 28
6 Notation conventions 30
6.1 Signal naming conventions . . . . . . . . . . . . . . . . . . . . 30
4
Preface
1 Introduction
5
1.1 Overview
A transputer is a microcomputer with its own local memory and with links
for connecting one transputer to another transputer.
The transputer architecture defines a family of programmable VLSI com-
ponents. The definition of the architecture falls naturally into the logical
aspects which define how a system of interconnected transputers is designed
and programmed, and the physical aspects which define how transputers, as
VLSI components, are interconnected and controlled.
A typical member of the transputer product family is a single chip con-
taining processor, memory, and communication links which provide point to
point connection between transputers. In addition, each transputer prod-
uct contains special circuitry and interfaces adapting it to a particular use.
For example, a peripheral control transputer, such as a graphics or disk
controller, has interfaces tailored to the requirements of a specific device.
A transputer can be used in a single processor system or in networks to
build high performance concurrent systems.
Transputers can be programmed in most high level languages, and are de-
signed to ensure that compiled programs will be efficient. Where it is re-
quired to exploit concurrency, but still to use standard languages, occam
can be used as a harness to link modules written in the selected languages.
To gain most benefit from the transputer architecture, the whole system can
be programmed in occam. This provides all the advantages of a high level
language, the maximum program efficiency and the ability to use the special
6
features of the transputer.
Occam provides a framework for designing concurrent systems using trans-
puters in just the same way that boolean algebra provides a framework for
designing electronic systems from logic gates. The system designer’s task
is eased because of the architectural relationship between occam and the
transputer. A program running in a transputer is formally equivalent to an
occam process, so that a network of transputers can be described directly
as an occam program.
1.2 Rationale
Programming
7
specified by the messages it sends and receives. Communication between
processes is synchronized, removing the need for any separate synchronisa-
tion mechanism.
Internally, each process can be designed as a set of communicating processes.
The system design is therefore hierarchically structured. At any level of
design, the designer is concerned only with a small and manageable set of
processes.
Occam is based on these concepts, and provides the definition of the trans-
puter architecture from the logical point of view (see section 2).
Hardware
Programmable components
8
components to have any desired topology, limited only by the number of
links on each transputer. The architecture minimizes the constraints on the
size of such a system, and the hierarchical structuring provided by occam
simplifies the task of system design and programming.
The result is to provide new orders of magnitude of performance for any
given application, which can now exploit the concurrency provided by a
large number of programmable components.
Local memory
Each transputer in a system uses its own local memory. Overall memory
bandwidth is proportional to the number of transputers in the system, in
contrast to a large global memory, where the additional processors must
share the memory bandwidth.
Because memory interfaces are not shared, and are separate from the com-
munications interfaces, they can be individually optimized on different trans-
puter products to provide high bandwidth with the minimum of external
components.
9
1.2.3 Communication
10
Figure 5: Link protocol
The links are designed to make the engineering of transputer systems straight-
forward. Board layout of two wire connections is easy to design and area
efficient. All transputers will support a standard communications frequency
of 10 Mbits/sec, regardless of processor performance. Thus transputers of
different performance can be directly connected and future transputer sys-
tems will directly communicate with those of today.
2 Occam model
The purpose of this section is to describe how to access and control the re-
sources of transputers using occam. A more detailed description is available
in the occam programming manual and the transputer development system
manual (provided with the development system).
The transputer development system will enable transputers to be programmed
in other industry standard languages. Where it is required to exploit con-
currency, but still to use standard languages, occam can be used as a harness
11
to link modules written in the selected languages.
2.1 Overview
12
nication between occam processes on different transputers is implemented
directly by transputer links. Thus the same occam program can be im-
plemented on a variety of transputer configurations, with one configuration
optimized for cost, another for performance, or another for an appropriate
balance of cost and performance.
The transputer and occam were designed together. All transputers include
special instructions and hardware to provide maximum performance and
optimal implementations of the occam model of concurrency and communi-
cations.
All transputer instruction sets are designed to enable simple, direct and
efficient compilation of occam. Programming of 110, interrupts and timing
is standard on all transputers and conforms to the occam model.
Different transputer variants may have different instruction sets, depending
on the desired balance of cost, performance, internal concurrency and special
hardware. The occam level interface will, however, remain standard across
all products.
2.2.1 Processes
13
only output to the channel, and the other may only input from it.
Assignment
v := e
sets the value of the variable v to the value of the expression e and then
terminates. For example, x := 0 sets x to zero, and x := x + 1 increases the
value of x by 1.
Input
c ? x
inputs a value from the channel c, assigns it to the variable x and then
terminates.
Output
c ! e
2.2.2 Constructs
14
SEQ
SEQ
P1
P2
P3
...
The component processes P1, P2, P3 ... are executed one after another.
Each component process starts after the previous one terminates and the
construct terminates after the last component process terminates. For ex-
ample
SEQ
c1 ? x
x := x + 1
c2 ! x
inputs a value, adds one to it, and then outputs the result.
Sequential constructs in occam are similar to programs written in conven-
tional programming languages. Note, however, that they provide the per-
formance and efficiency equivalent to that of an assembler for a conventional
microprocessor.
PAR
PAR
P1
P2
P3
...
The component processes P1, P2, P3 ... are executed together, and are called
concurrent processes. The construct terminates after all of the component
processes have terminated. For example,
PAR
c1 ? x
c2 ! y
15
allows the communications on channels c1 and c2 to take place together.
The parallel construct is unique to occam. It provides a straightforward way
of writing programs which directly reflects the concurrency inherent in real
systems. The implementation of parallelism on a single transputer is highly
optimized so as to incur minimal process scheduling overhead.
Communication
IF
A conditional construct
IF
condition1
P1
condition2
P2
...
IF
x = 0
y := y + 1
x <> 0
SKIP
16
ALT
An alternative construct
ALT
input1
P1
input2
P2
input3
P3
...
waits until one of input1, input2, input3 ... is ready. If inputs first becomes
ready, inputs is performed, and then process P1 is executed. Similarly,
if input2 first becomes ready, input2 is performed, and then process P2
is executed. Only one of the inputs is performed, then its corresponding
process is executed and then the construct terminates. For example:
ALT
count ? signal
counter := counter + 1
total ? signal
SEQ
out ! counter
counter := 0
either inputs a signal from the channel count, and increases the variable
counter by 1, or alternatively inputs from the channel total, outputs the
current value of the counter, then resets it to zero.
The ALT construct provides a formal language method of handling exter-
nal and internal events that must be handled by assembly level interrupt
programming in conventional microprocessors.
2.2.3 Repetition
WHILE condition
P
repeatedly executes the process P until the value of the condition is false.
For example
WHILE (x - 5) > 0
x := x - 5
17
leaves x holding the value of (x remainder 5) if x were positive.
2.2.4 Replication
SEQ i = 0 FOR n
P
PAR i = 0 FOR n
P1
constructs an array of n similar processes P0, P1, ..., Pn-1. The index i
takes the values 0, 1, ..., n-1, in P0, P1, ..., Pn-1 respectively.
2.2.5 Types
Every variable, expression and value has a type, which may be a primitive
type, array type, record type or variant type. The type defines the length
and interpretation of data.
Primitive types
INT x:
P
18
CHAN of protocol Each communication channel provides communica-
tion between two concurrent processes. Each chan-
nel is of a type which allows communication of data
according to the specified protocol.
TIMER Each timer provides a clock which can be used by
any number of concurrent processes.
BOOL The values of type BOOL are true and false.
BYTE The values of type BYTE are unsigned numbers n
in the range 0 ≤ n < 256.
INT Signed integers n in the range −231 ≤ n < 231 .
INT16 Signed integers n in the range −215 ≤ n < 215 .
INT32 Signed integers n in the range −231 ≤ n < 231 .
INT64 Signed integers n in the range −263 ≤ n < 263 .
REAL32 Floating point numbers stored using a sign bit,
8 bit exponent and 23 bit fraction in ANSI/IEEE
Standard 754-1985 representation.
REAL64 Floating point numbers stored using a sign bit,
11 bit exponent and 52 bit fraction in ANSI/IEEE
Standard 754-1985 representation.
Table 1: Types
2.2.7 Procedures
defines the procedure square. The name may be used as an instance of the
process. For example
19
square (x)
is equivalent to
n IS x:
n := n * n
2.2.8 Expressions
Table 2: Operators
(5 + 7) / 2
2.2.9 Timer
20
A timer input sets a variable to a value of type INT representing the time.
The value is derived from a clock, which changes at regular intervals. For
example
tim ? V
sets the variable v to the current value of a free running clock, declared as
the timer tim.
A delayed input takes the following form
tim ? AFTER e
A delayed input is unable to proceed until the value of the timer satisfies
(timer AFTER e). The comparison performed is a modulo comparison.
This provides the effect that, starting at any point in the timer’s cycle, the
previous half cycle of the timer is considered as being before the current
time, and the next half cycle is considered as being after the current time.
2.3 Configuration
21
correctly distributing a program configured for many transputers.
Configuration does not affect the logical behaviour of a program (see section
four, Program development). However, it does enable the program to be
arranged to ensure that performance requirements are met.
PLACED PAR
PRI PAR
Each link provides one channel in each direction between two transputers.
A channel (which must already have been declared) is associated with a link
by a channel association, for example:
PLACE Link0Input AT 4 :
3 Error handling
22
The occam process STOP starts but never terminates. In method 1, an er-
rant process stops and in particular cannot communicate erroneous data to
other processes. Other processes will continue to execute until they become
dependent on data from the stopped process. It is therefore possible, for ex-
ample, to write a process which uses a timeout to warn of a stopped process,
or to construct a redundant system in which several processes performing
the same task are used to enable the system to continue after one of them
has failed.
Method 1 is the preferred method of executing a program.
Method 2 is useful for program development and can be used to bring trans-
puters to an immediate halt, preventing execution of further instructions.
The transputer Error output can be used to inform the transputer devel-
opment system that such an error has occurred. No variable local to the
process can be overwritten with erroneous data, facilitating analysis of the
program and data which gave rise to the error.
Method 3 is useful only for optimising programs which are known to be
correct!
When a system has stopped or halted as a result of an error, the state of all
transputers in the system can be analysed using the transputer development
system.
For languages other than occam, the transputer provides facilities for han-
dling individual errors by software.
4 Program development
The development of programs for multiple processor systems can involve ex-
perimentation. In some cases, the most effective configuration is not always
clear until a substantial amount of work has been done. For this reason,
it is desirable that most of the design and programming can be completed
before hardware construction is started.
Logical behaviour
23
of processing and communication. Consequently a program ultimately in-
tended for a network of transputers can be compiled, executed and tested
on a single computer used for program development.
Even if the application uses only a single transputer, the program can be
designed as a set of concurrent processes which could run on a number of
transputers. This design style follows the best traditions of structured pro-
gramming; the processes operate completely independently on their own
variables except where they explicitly interact, via channels. The set of con-
current processes can run on a single transputer or, for a higher performance
product, the processes can be partitioned amongst a number of transputers.
It is necessary to ensure, on the development system, that the logical be-
haviour satisfies the application requirements. The only ways in which one
execution of a program can differ from another in functional terms result
from dependencies upon input data and the selection of components of an
ALT. Thus a simple method of ensuring that the application can be dis-
tributed to achieve any desired performance is to design the program to
behave ’correctly’ regardless of input data and ALT selection.
24
4.2 Separate compilation of occam and other languages
25
The implementation of occam supports the allocation of the code and data
areas of an occam process to specific areas of memory. Such a process must
be a separately compiled PROC, and must not reference any variables and
timers other than those declared within it.
5 Physical architecture
5.1.1 Overview
All transputers have several links. The link protocol and electrical charac-
teristics form a standard for all INMOS transputer and peripheral products.
All transputers support a standard link communications frequency of 10
megabits per second. Some devices also support other data rates. Main-
taining a standard communications frequency means that devices of mixed
performance and type can intercommunicate easily.
Each link consists of two unidirectional signal wires carrying both data and
control bits. The link signals are TTL compatible so that their range can
be easily extended by inserting buffers.
The INMOS communication links provide for communication between de-
vices on the same printed circuit board or between printed circuit boards via
a back plane. They are intended to be used in electrically quiet environments
in the same way as logic signals between TTL gates.
The number of links, and any communication speeds in addition to the
standard speed of 10 Mbits/sec, are given in the product data for each
product.
The quiescent state of the link signals is low, for a zero. The link input
signals and output signals are standard TTL compatible signals.
For correct functioning of the links the specifications for maximum variation
in clock frequency between two transputers joined by a link and maximum
capacitive load must be met. Each transputer product also has specified
the maximum permissible variation in delay in buffering, and minimum per-
missible edge gradients. Details of these specifications are provided in the
product data.
26
Provided that these specifications are met then any buffering employed may
introduce an arbitrary delay into a link signal without affecting its correct
operation.
At all times the specification of input voltages with respect to the GND and
VCC pins must be met. This includes the times when the VCC pins are
ramping to 5 V, and also while they are ramping from 5 V down to 0 V.
The system services comprise the clocks, power, and signals used for initial-
ization.
The specification includes minimum times that VCC must be within spec-
ification, the input clock must be oscillating, and the Reset signal must be
high before Reset goes low. These specifications ensure that internal clocks
and logic have settled before the transputer starts.
When the transputer is reset the memory interface is initialised (if present
and configurable).
The processor and INMOS serial links start after reset. The transputer
obeys a bootstrap program which can either be in off-chip ROM or can be
received from one of the links. How to specify where the bootstrap program
is taken from depends upon the type of transputer being used. The program
will normally load up a larger program either from ROM or from a peripheral
such as a disk.
During power down, as during power up, the input and output pins must
remain within specification with respect to both GND and VCC.
A software error, such as arithmetic overflow, array bounds violation or
divide by zero, causes an error flag to be set in the transputer processor.
The flag is directly connected to the Error pin. Both the flag and the pin
can be ignored, or the transputer stopped. Stopping the transputer on an
error means that the error cannot cause further corruption.
As well as containing the error in this way it is possible to determine the
state of the transputer and its memory at the time the error occurred.
27
5.2.2 Clock distribution
All transputers operate from a standard 5MHz input clock. High speed
clocks are derived internally from the low frequency input to avoid the prob-
lems of distributing high frequency clocks. Within limits the mark-tospace
ratio, the voltage levels and the transition times are immaterial. The limits
on these are given in the product data for each product. The asynchronous
data reception of the links means that differences in the clock phase between
chips is unimportant.
The important characteristic of the transputer’s input clock is its stability,
such as is provided by a crystal oscillator. An R-C oscillator is inadequate.
The edges of the clock should be monotonic (without kinks), and should not
undershoot below -0.5 V.
The program which is executed after reset can either reside in ROM in the
transputer’s address space or it can be loaded via any one of the transputer’s
INMOS serial links.
The transputer bootstraps from ROM by transferring control to the top two
bytes in memory, which will invariably contain a backward jump into ROM.
If bootstrapping from a link, the transputer bootstraps from the first link to
receive a message. The first byte of the message is the count of the number of
bytes of program which follow. The program is loaded into memory starting
at a product dependent location MemStart, and then control is transferred
to this address.
Messages subsequently arriving on other links are not acknowledged until the
transputer processor obeys a process which inputs from them. The loading
of a network of transputers is controlled by the transputer development
system, which ensures that the first message each transputer receives is the
bootstrap program.
All transputers contain one or more INMOS serial links. Certain transputer
products also have other applicationspecific interfaces. The peripheral con-
trol transputers contain specialized interfaces to control a specific peripheral
or peripheral family.
In general, a transputer based application will comprise a number of trans-
28
puters which communicate using INMOS links. There are three methods of
communicating with peripherals.
The first is by employing peripheral control transputers (eg for graphics
or disks), in which the transputer chip connects directly to the peripheral
concerned (figure 8). The interface to the peripheral is implemented by
special purpose hardware within the transputer. The application software
in the transputer is implemented as an occam process, and controls the
interface via occam channels linking the processor to the special purpose
hardware.
The second method is by employing link adaptors (figure 9). These de-
vices convert between a link and a specialized interface. The link adaptor
is connected to the link of an appropriate transputer, which contains the
application designer’s peripheral device handler implemented as an occam
process.
The third method is by memory mapping the peripheral onto the memory
bus of a transputer (figure 10). The peripheral is controlled by memory
accesses issued as a result of PORT inputs and outputs. The application
designer’s peripheral device handler provides a standard occam channel in-
terface to the rest of the application.
The first transputers implement an event pin which provides a simple means
for an external peripheral to request attention from a transputer.
29
Figure 10: Memory mapped peripherals
In all three methods, the peripheral driver interfaces to the rest of the ap-
plication via occam channels. Consequently, a peripheral device can be
simulated by an occam process. This enables testing of all aspects of a
transputer system before the construction of hardware.
6 Notation conventions
The bits in a byte are numbered 0 to 7, with bit 0 the least significant.
The bytes in words are numbered from 0, with byte 0 least significant. In
general, wherever a value is treated as a number of component values, the
components are numbered in order of increasing numerical significance, with
the least significant component numbered 0. Where values are stored in
memory, the least significant component value is stored at the lowest (most
negative) address. Similarly, components of arrays are numbered starting
from 0, and stored in memory with component 0 at the lowest address.
Where a byte is transmitted serially, it is always transmitted least signifi-
cant bit (0) first. In general, wherever a value is transmitted as a number
of component values, the least significant component is transmitted first.
Where an array is transmitted serially, component 0 is transmitted first.
Consequently, block transfers to and from memory are performed starting
with the lowest (most negative) address and ending with the highest (most
positive) one.
In diagrams, the least significant component of a value is to the right hand
side of the diagram, component 0 of an array is at the bottom of the diagram
and memory locations with more negative addresses are also to the bottom
of the diagram.
The signal names, identifying the individual pins on a transputer chip, have
been chosen to avoid being cryptic, giving as much information as possible.
All transputer signals described in the text of this manual are printed in
30
bold.
The majority of transputer signals are active high. Those which are active
low have names commencing with not.
IMS xxxx-xx
The main field identifies the product, and the field after the hyphen is used
for speed variants, etc. Extra letters are sometimes introduced, eg for mili-
tary quality products.
The initial character of the main field is a digit for memory products, a
letter for transputer products. The particular letter indicates the type of
transputer product (table 3). Support products are numbered as shown in
table 4.
31