0% found this document useful (0 votes)
91 views10 pages

Nand To Tetris: Building A Modern Computer System From First Principles

Uploaded by

batalhag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views10 pages

Nand To Tetris: Building A Modern Computer System From First Principles

Uploaded by

batalhag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

research

DOI:10.1145/ 3626513
Several reasonable options come
CS course walks students through a step- to mind. One is a hands-on over-
view of applied CS, building on the
by-step construction of a complete, general- programming skills and theoretical
purpose computer system—hardware and knowledge acquired in the first two
software—in one semester. courses. Such a course could survey
key topics in computer architecture,
compilation, operating systems, and
BY SHIMON SCHOCKEN
software engineering, presented in
one cohesive framework. Ideally, the

Nand to Tetris:
course would engage students in sig-
nificant programming assignments,
have them implement classical algo-
rithms and widely used data struc-

Building
tures, and expose them to a range of
optimization and complexity issues.
This hands-on synthesis could ben-
efit students who seek an overarch-

a Modern
ing understanding of computing
systems, as well as self-learners and
non-majors who cannot commit to
more than a few core CS courses.

Computer
This article describes such a
course, called Nand to Tetris, which
walks students through a step-by-
step construction of a complete,

System from
general-purpose computer system—
hardware and software—in one se-
mester. As it turns out, construction
of the computer system built during

First Principles
the course requires exposure to, and
application of, some of the most per-

key insights
˽ In the early days of computers, any
curious person could gain a gestalt
understanding of how the machine
works. As digital technologies became
increasingly more complex, this clarity
is all but lost: The most fundamental
ideas and techniques in applied
computer science are now hidden under
many layers of obscure interfaces and
asked to design an abridged
S U P P O S E YOU W ER E proprietary implementations.

computer science (CS) program consisting of just ˽ Starting from NAND gates only, students
build a hardware platform comprising
three courses. How would you go about it? The a CPU, RAM, datapath, and a software
hierarchy consisting of an assembler,
first course would probably be an introduction a virtual machine, a basic OS, and a
compiler for a simple, Java-like object-
to computer science, exposing students to based language.
computational thinking and equipping them with
IMAGERY BY DEA D SA KURA

˽ The result is a synthesis course that


basic programming skills. The second course would combines key topics from traditional
systems courses in one hands-on
most likely be algorithms and data structures. But framework. The course is self-contained,
the only prerequisite being introduction
what should the third course be? to computer science.

76 COMM UNICATIO NS O F THE AC M | M AY 2024 | VO L . 67 | NO. 5


MAY 2 0 2 4 | VO L. 6 7 | N O. 5 | C OM M U N IC AT ION S OF T HE ACM 77
research

tinent and beautiful ideas and tech- typical course syllabus is available.2 computer. The remaining seven proj-
niques in applied CS. The computer’s About half of the course’s online ects revolve around the design and
hardware platform (CPU, RAM, data- learners are developers who wish implementation of a typical software
path) is built using a simple hardware to acquire a deep, hands-on under- hierarchy. In particular, we motivate
description language and a supplied standing of the hardware and soft- and build an assembler; a virtual ma-
hardware simulator. The computer’s ware infrastructures that enable their chine; a two-tier compiler for a high-
software hierarchy (assembler, vir- work. And the best way to understand level, object-based programming
tual machine, compiler) can be built something deeply is to build it from language; a basic operating system
in any programming language, fol- the ground up. (OS), and an application—typically a
lowing supplied specifications. The simple computer game involving ani-
resulting computer is equipped with From Nand to Tetris mation and interaction. The overall
a simple Java-like, object-based lan- The explicit goal of Nand to Tetris course consists of two parts, as we
guage that lends itself well to interac- courses is to build a general-purpose now turn to describe.
tive applications using graphics and computer system from elementary Part I: Hardware. The course starts
animation. Thousands of computer logic gates. The implicit goals are to with a focused overview of Boolean
games have already been developed offer a hands-on exposition of key algebra. We use disjunctive normal
on this computer, and many of them concepts and techniques in applied forms and reductive reasoning to
are illustrated in YouTube. computer science, and a compelling show that every Boolean function can
Following early versions of the synthesis of core topics from digital be realized using no more than NAND
course6 that underwent many im- architectures, compilation, operat- operators. This provides a theoretical
provements and extensions, the com- ing systems, and software engineer- yet impractical demonstration that
plete Nand to Tetris approach was ing. We make this synthesis con- the course goal—building a general-
described in the book The Elements crete by walking students through purpose computer system from just
of Computing Systems, by Noam Nisan 12 hands-on projects. Each project NAND gates—is indeed feasible. We
and Shimon Schocken.3 By choosing presents and motivates an important then discuss gate logic and chip spec-
a book title that nods to Strunk and hardware or software abstraction, ification, and present a simple hard-
White’s masterpiece,9 we sought to and then it provides guidelines for ware description language (HDL) that
allude to the concise and principled implementing the abstraction using can be learned in a few hours.
nature of our approach. All course executable modules developed in pre- In project 1, students use this HDL
materials—lectures, projects, speci- vious, lower-level projects. The com- and a supplied hardware simulator to
fications, and software tools—are puter system that emerges from this build and unit-test elementary logic
freely available in open source.5 Ver- effort is built gradually and from the gates such as AND, OR, NOT, multiplex-
sions of the course are now offered in bottom up (see Figure 1). ors, and their 16-bit extensions. We then
many educational settings, including The first five projects in the course discuss Boolean arithmetic, two’s com-
academic departments, high schools, focus on constructing the chipset and plement, and arithmetic-logic opera-
bootcamps, and online platforms. A architecture of a simple von Neumann tions. This background sets the stage
for project 2, in which students use the
Figure 1. Overall course plan. elementary gates built in project 1 to
implement a family of combinational
Each project p1, p2, …, p12 lasts one to two course-weeks. Project numbers indicate
the normal sequence, although they can be done in any desired order.
chips, leading up to an ALU. We then
discuss how sequential logic and finite-
state automata can be used to imple-
p9 software hierarchy ment chips that maintain state. In proj-
abstract writing a p10 p11
ect 3, students apply this knowledge to
abstraction gradually build and unit-test 1bit and
concept program
high-level building a abstraction p7 p8
language compiler 16bit registers, as well as a family of di-
building a
OS VM code
virtual
abstraction rect-access memory units (in which ad-
machine machine dressing and storage are realized using
(bytecode) language
p12
combinational and sequential logic, re-
assembler p6
spectively), leading up to a RAM.
At this stage, we have all the basic
hardware platform
building blocks necessary for synthe-
abstraction p4 p5 sizing a simple 16bit von Neumann
CPU, building a p2 p3 machine, which we call “Hack.”
datapath computer abstraction
building p1 Before doing so, we present the in-
ALU, RAM abstraction
chips struction set of this target computer
elementary building d
logic gates logic gates Nan (viewed abstractly), in both its sym-
bolic and binary versions. In project
4, students use the symbolic Hack
machine language to write assembly

78 COM MUNICATIO NS O F TH E AC M | M AY 2024 | VO L . 67 | NO. 5


research

programs that perform basic alge- Figure 2. The specification of each chip consists of a stub HDL file (containing the chip
braic, graphical, and user-interaction signature and an empty PARTS section), a test script, and a compare file.
tasks (the Hack computer specifica-
tion includes input and output driv- When evaluated by the hardware simulator, the output file produced by a correctly
implemented HDL program should be identical to the given compare file.
ers that use memory bitmaps for ren-
dering pixels on a connected screen
and for reading 16bit character codes
from a connected keyboard). The stu-
dents test and execute their assembly
programs on a supplied emulator that
simulates the Hack computer along
with its screen and keyboard devices.
Next, we present possible skeletal
architectures of the Hack CPU and
datapath. This is done abstractly, by
discussing how an architecture can
be functionally planned to fetch, de-
code, and execute binary instructions
written in the Hack instruction set.
We then discuss how the ALU, regis-
ters, and RAM chips built in projects
1–3 can be integrated into a hardware
platform that realizes the Hack com-
puter specification and machine lan-
guage. Construction of this topmost
computer-on-a-chip is completed in Figure 3. The Hardware Simulator, running/evaluating the HDL program shown in Figure
project 5. 2 (the roles of the various panels are explained in the text in parentheses).
Altogether, in Part I of the course,
students build 35 combinational and This particular XOR implementation, which is readable but not necessarily ef-
sequential chips, which are devel- ficient, is based on two NOT, two AND, and one OR chip parts, each implemented
as a standalone HDL program. When evaluating a chip, the simulator evaluates
oped in an HDL and tested on a sup-
recursively all its chip parts, all the way down to evaluating NAND gates, which
plied hardware simulator. For each have a primitive (built-in) implementation.
chip, we provide a skeletal HDL pro-
gram (listing the chip name, I/O pin fx i

names, and functional documenta-


tion comprising the chip API); a test
script, which is a sequence of set/eval/ (HDL code can be written
(input values
are set either
compare steps that walk the chip sim- directly, or loaded from a file) interactively,
ulation through representative test or by a test script)
cases; and a compare file, listing the (internal and
output values
outputs that a correctly implemented are computed
chip should generate when tested on by the simulator)
the supplied test script (see Figure
2). For each chip developed in the
course, the contract is identical: Com-
plete the given HDL skeletal program
and test it on the hardware simulator (A test script is a sequence of semicolon-separated steps,
each being one or more comma-separated micro-steps.
using the supplied test script. If the
Users can run the script step-wise, or batch, inspecting
outputs generated by your chip im- intenral and output values.)
plementation are not identical to the
supplied compare file, keep working;
otherwise, your chip behaves to speci-
fication, but perhaps you want to opti- terface: the Hack instruction set. Us- compiler and a basic operating sys-
mize it for efficiency. The chip logic is ing this machine language as a point tem on top of the hardware platform
evaluated and tested on our hardware of departure, in Part II of the course built in Part I. Specifically, we imple-
simulator (see Figure 3). we construct a software hierarchy ment a simple object-based, Java-like
Part II: Software. The barebones that empowers the Hack computer language called "Jack." We start this
computer that emerges from Part I to execute code written in high-level journey by introducing the Jack lan-
of the course can be viewed as an ab- programming languages. This ef- guage and the OS (abstractly) and dis-
straction that has a well-defined in- fort entails six projects that build a cussing the trade-offs of one-tier and

MAY 2 0 2 4 | VO L. 6 7 | N O. 5 | C OM M U N IC AT ION S OF T HE ACM 79


research

Figure 4. A typical computer game, developed in Jack.

The compiled VM code is loaded into, and executed by, the VM Simulator shown here. The simulator displays the VM code, the
simulated computer screen, the VM stack and the virtual memory segments, and the host RAM in which they are realized. For example,
RAM[0] stores the stack pointer, RAM[1] stores the base address of the local variables segment, and so on.

fx i

(Pong game action. The code shown


on the left is part of a VM code base
translated by the Jack compiler from
a 4-class Jack program.)

(The host RAM, which can be


scrolled in two separate panes,
shows how the stack and
the virtual memory segments
are stored on the host RAM.)

two-tier compilation models. We also grams involving arrays, objects, list that the parser’s logic can correctly
emphasize the role that intermediate processing, recursion, graphics, and tokenize and decode programs. Next,
bytecode plays in modern program- animation (the basic Jack language is we discuss algorithms for translat-
ming frameworks. augmented by a standard class library ing parsed statements, expressions,
Following this general overview, that extends it with string operations, objects, arrays, methods, and con-
we introduce a stack-based virtual I/O support, graphics rendering, structors into VM commands that re-
machine and a VM language that memory management, and more). alize the program’s semantics on the
features push/pop, stack arithmetic, These OS services are used by Jack virtual machine built in projects 7–8.
branching, and function call-and- programs abstractly and implement- In project 11, students apply these al-
return primitives. This abstraction ed in the last project in the course. In gorithms to morph the parser built in
is realized in projects 7 and 8, in project 9, students use Jack to build a project 10 into a full-scale compiler.
which students write a program that simple computer game of their choos- Specifically, we replace the logic that
translates each VM command into a ing. The purpose of this project is not generated passive XML code with log-
sequence of Hack instructions. This learning Jack, but rather setting the ic that generates executable VM code.
translator serves two purposes. First, stage for writing a Jack compiler and The resulting code can be executed
it implements our virtual machine a Jack-based OS. on the supplied VM emulator (see Fig-
abstraction. Second, it functions as Development of the compiler ure 4) or translated further into ma-
the back-end module of two-tier com- spans two projects. We start with a chine language and executed on the
pilers. For example, instead of writ- general discussion of lexicons, gram- hardware simulator.
ing a monolithic compiler that trans- mars, parse trees, and recursive-de- The software hierarchy is summa-
lates Jack programs into the target scent parsing algorithms. We then rized in Figure 5. The final task in the
machine language, one can write a present an XML mark-up representa- course is developing a basic operating
simple and elegant front-end trans- tion designed to capture the syntax system. The OS is minimal, lacking
lator that parses Jack programs and of Jack programs. In project 10, stu- many typical services, such as process
generates intermediate VM code, just dents build a program that parses and file management. Rather, our OS
like Java and C# compilers do. Before Jack programs as input and gener- serves two purposes. First, it extends
developing such a compiler, we offer ates their XML mark-up representa- the basic Jack language with added
a complete specification of the Jack tions as output. An inspection of the functionality, like mathematical and
language, and illustrate Jack pro- resulting XML code allows verifying string operations. Second, the OS is

80 COM MUNICATIO NS O F TH E ACM | M AY 2024 | VO L . 67 | NO. 5


research

designed to close gaps between the and OS. Before implementing a chip, These specifications leave no room for
software hierarchy built in Part II and or, when teaching or learning its in- design uncertainty: Before setting out
the hardware platform built in Part tended behavior, one can load a built- to implement a module, students have
I. Examples include a heap-manage- in chip implementation into the hard- an exact, hands-on understanding of its
ment system for storing and dispos- ware simulator and experiment with it intended functionality.
ing arrays and objects, an input driv- (we elaborate on this “behavioral sim- The ability to experiment with ex-
er for reading characters and strings ulation” practice later in this article). ecutable solutions has subtle educa-
from the keyboard, and output driv- Before implementing the assembler, tional virtues. In addition to actively
ers for rendering text and graphics on one can load assembly programs into understanding the abstraction—a
the screen. For each such OS service, the supplied assembler and visually rich world in itself—students are en-
we discuss its abstraction and API, as inspect how symbolic instructions couraged to discuss and question the
well as relevant algorithms and data are translated into binary codes. Prior merits and limitations of the abstrac-
structures for realizing them. For ex- to implementing the Jack compiler, tion’s design. We describe these ex-
ample, we use bitwise algorithms for one can use the supplied compiler plorations in the last section of this
efficient implementation of algebraic to translate representative Jack pro- article, where we discuss the course's
operations, first-fit/best-fit and linked grams, inspect the compiled VM code, pedagogy.
list algorithms for memory manage- and observe its execution on the sup- Modularity. A system architecture
ment, and Bresenham’s algorithm for plied VM emulator. And before imple- is said to be modular when it consists
drawing lines and circles. In project menting any OS function, one can call of (recursively) a set of relatively small
12, students use these CS gems to de- the function from a compiled Jack test and standalone modules, so that each
velop the OS, using Jack and supplied program and investigate its input-out- module can be independently devel-
API’s. And with that, the Nand to Tet- put behavior. oped and unit tested. Like abstrac-
ris journey comes to an end. The central role of abstraction is also tion, modularity is a key element of
inherent in all the project materials: sound system engineering: The ability
Discussion: Engineering One cannot start implementing a mod- to work on each module in isolation,
Abstraction-implementation. A hall- ule before carefully studying its intend- and often in parallel, allows develop-
mark of sound system engineering is ed functionality. Every chip is specified ers to compartmentalize and manage
separating the abstract specification abstractly by a stub HDL file containing complexity.
of what a system does from the imple- the chip signature and documentation, The computer system built in Nand
mentation details of how it does it. a test script, and a compare file. Every to Tetris courses comprises many
In Patterson and Hennessy’s “Seven software module—for example, the hardware and software modules.
Great Ideas in Computer Architec- assembler’s symbol table or the com- Each module is accompanied by an
ture,” abstraction is at the top of the piler’s parser—is specified by an API abstract specification and a proposed
list.4 Likewise, Dijkstra describes ab- that documents the module along with architecture that outlines how it can
straction as an essential mental tool staged test programs and compare files. be built from lower-level modules. In-
in programming.1 In Nand to Tetris,
the discussion of every hardware or Figure 5. The software hierarchy built in Part II of the course (projects 7–12).

software module begins with an ab-


A Jack program, consisting of one or more class files, and the OS
stract specification of its intended (implemented as a library of Jack classes) are compiled into a set of VM files.
functionality. This is followed by a The VM files are compiled further into assembly code, which is translated
proposed implementation plan that by the assembler into binary code. The target code can be executed by
hints, in outline form, how the ab- the computer built in Part I of the course (projects 1–5).
straction can be realized using ab-
stract building blocks from the level Jack code
VM code Assembly Binary
below (see Figure 1). Here we mean
“abstract” in a very concrete way: p9
Before tasking students to develop a Compiler
hardware or software module—any
module—we guide them to experi- p10 VM translator
ment with a supplied executable so- Assembler
p11 p7
lution that entails precisely what the
module seeks to do. p8 p6
These experiments are facilitated
by the Nand to Tetris online IDE,8 de-
veloped by David Souther and Neta
p12
London. This set of tools includes a
OS p1 p2 p3
hardware simulator, a CPU emulator,
a Hack assembler, a Jack compiler, p4 p5
and a VM emulator/runtime system
that implements our virtual machine

MAY 2 0 2 4 | VO L. 6 7 | N O. 5 | C OM M U N IC AT ION S OF T HE ACM 81


research

dividual modules are relatively small, which is part of our open source hard- is staged. For example, the hardware
so developing each one is a manage- ware simulator, includes Java imple- platform developed in Part I of the
able and self-contained activity. Spe- mentations of all the chips built in the course consists of 35 modules (stand-
cifically, the HDL construction of a course. Instructors who wish to mod- alone chips) that are developed and
typical chip in Part I of the course in- ify or extend the Hack computer or unit-tested separately, according to
cludes an average of seven lower-level build other hardware platforms can staged plans given in each project.
chip-parts, and the proposed API of a edit the existing built-in chips library In complex chips such as the ALU,
typical software module in Part II of or create new libraries. CPU, and the RAM, the implementa-
the course consists of an average of Behavioral simulation plays a tion of the module itself is explicitly
ten methods. prominent role in the software proj- staged. For example, the Hack ALU is
This modularity impacts the proj- ects as well. For example, when devel- designed to compute a family of arith-
ect work as well as the learning expe- oping the Jack compiler, there is no metic/logic functions f(x,y) on two
rience. For example, in project 2, stu- need to worry about how the resulting 16bit inputs x and y. In addition, the
dents build several chips that carry out VM code is executed: The supplied ALU computes two 1bit outputs, indi-
Boolean arithmetic, including a “Half VM emulator can be used to test the cating that its output is zero or nega-
Adder.” Given two input bits x and y, code’s correctness. And when writ- tive. The computations of these flag
the half adder computes a two-bit out- ing the native VM implementation, bits are orthogonal to the ALU’s main
put consisting of the “sum bit” and the there is no need to worry about the logic and can be realized separately,
“carry bit” of x + y. As it turns out, these execution of the resulting assembly by independent blocks of HDL state-
bits can be computed, respectively, by code, since the latter can be loaded ments. With that in mind, our proj-
AND-ing and XOR-ing x and y. But what into, and executed, on the supplied ect 2 guidelines recommend building
if, for some reason, the student did not CPU emulator. In general, although and testing a basic ALU that computes
implement the requisite AND or XOR we recommend building the projects the f(x,y) output only, and then ex-
chip-parts in the previous project? from the bottom up in their natural tending the basic implementation to
Or, for that matter, the instructor has order (see Figure 1), any project in the handle the two flag bits as well. The
chosen to skip this part of the course? course represents a standalone build- staged implementation is supported
Blissfully, it does not matter, as we now ing block that can be developed inde- by two separate sets of ALU stub files,
turn to explain. pendently of all the other projects, in test scripts, and compare files.
Behavioral simulation. When our any desired order. The only requisite Staged implementations are also
hardware simulator evaluates a pro- is the API of the level below—that is, inherent in the software projects in
gram like HalfAdder.hdl that uses its abstract interface. Part II of the course. For example, con-
lower-level chip-parts, the simulator sider the assembler’s development: In
proceeds as follows: If the chipPart. Discussion: Pedagogy stage I, students are guided to develop
hdl file (like And.hdl and Xor.hdl) ex- A modular architecture and a system a basic assembler that handles assem-
ists in the project directory, the sim- specification are static artifacts, not bly programs containing no symbolic
ulator recurses to parse and evaluate plans of action. To turn them into a addresses. This is a fairly straightfor-
these lower-level HDL programs, all working system, we provide staged ward task: One writes a program that
the way down to the terminal Nand. implementation plans. The general translates symbolic mnemonics into
hdl leaves, which have a primitive/ staging strategy is based on sequen- their binary codes, following the Hack
built-in implementation. If, however, tial decomposition: Instead of real- machine-language specification. In
a chipPart.hdl file is missing in the izing a complex abstraction in one stage II, students are guided to imple-
project directory, the simulator in- sweep, the system architect can spec- ment and unit-test a symbol table, fol-
vokes and evaluates a built-in chip ify a basic version, which is imple- lowing a proposed API. Finally, and us-
implementation instead. This con- mented first. Once the basic version ing this added functionality, in stage
tract implies that all the chips in the is developed and tested, one proceeds III students morph the basic assem-
course can be implemented in any to extend it to a complete solution. bler into a final translator capable of
desired order, and failure to imple- Ideally, the API of the basic version handling assembly code with or with-
ment a chip does not prevent the should be a subset of the complete out symbolic addresses. Here, too,
implementation of other chips that API, and the basic version should be the separation to stages is supported
depend on it. morphed into, rather than replaced by customized and separate test files:
Using another example, a 1bit reg- by, the complete version. Such staged assembly programs in which all vari-
ister can be realized using a data flip- implementations must be carefully ables and jump destinations are physi-
flop and a multiplexor. Implementing articulated and supported by staged cal memory addresses for stage I, and
a flip-flop gate is an intricate art, and scaffolding. assembly programs with symbolic la-
instructors may wish to use it ab- Staging is informed by, but is not bels for stages II and III.
stractly. With that in mind, HDL pro- identical to, modularity. In some cas- Modularity and staging play a key
grams that use DFF chip-parts can be es, the architect simply recommends role in the compiler’s implementa-
implemented as is, without requiring the order in which modules should be tion, beginning with the separation
students to implement a DFF.hdl pro- developed and tested. In other cases, into a back-end module (the bytecode-
gram first. The built-in chip library, the development of the module itself to-assembly translator developed in

82 COMM UNICATIO NS O F THE ACM | M AY 2024 | VO L . 67 | NO. 5


research

projects 7–8) and a front-end module not permitted to modify the given
(the Jack-to-bytecode compiler devel- specifications.
oped in projects 10–11). The imple- Clearly, students must learn how
mentation of each module is staged to architect and specify systems. We
further into two separate projects.
In project 7, students implement and In Nand to Tetris, believe, though, that a crucial ele-
ment of mastering the art of design is
test a basic virtual machine that fea-
tures push/pop and arithmetic com-
the discussion of seeing many good examples, as done
consciously in architecture, law, med-
mands only. In project 8, they extend every hardware icine, and many other professional
the machine to also handle branch-
ing and function calling. In project
or software disciplines. In writing workshops, for
example, significant learning time is
10, students implement a basic com- module begins spent reading the works of great mas-
pilation engine that uses a tokenizer
and a parser to analyze the source
with an abstract ters and critically evaluating those of
other workshop participants. Why not
code’s syntax. In project 11, the com- specification do the same when teaching systems
pilation engine is extended to gener-
ate code. In each of these projects, of its intended building? In Nand to Tetris courses,
students engage in dozens of meticu-
students are guided to first handle functionality, and lously planned architectures, specifi-

a tool that realizes


source code that contains constants cations, and staged implementation
only, then variables, then expres- plans. For many students, this may
sions, and finally arrays and objects,
each accompanied with customized
the abstraction, well be the most well-designed and
well-managed development experi-
test programs and compare files. For hands-on. ence in their careers. Another reason
example, when writing the tokenizer for factoring out design and specifica-
and the parser, students use test pro- tion requirements to other courses is
grams that process the entire source pragmatic: It allows completing the
code and print token lists and parse Nand to Tetris journey in one course,
trees. These test programs are un- giving students a unique sense of clo-
suitable for later stages, since the sure and accomplishment.
fully developed compiler gets the Focus. Even when detailed designs
next token on the fly and builds the and specifications are given, devel-
parse tree dynamically. However, the oping a general-purpose computer
staged scaffolding is essential for system in one academic course is a
turning the compiler’s development tall order. To render it feasible, we
from a daunting assignment into a make two major concessions. First,
sequence of relatively small tasks that we require that the constructed com-
can be localized, tested, and graded puter system will be fast enough, but
separately. In general, staged devel- no faster. By “fast enough” we mean
opment is one of the key enablers of that the computer must deliver a
the accelerated pace of Nand to Tetris satisfying user experience. For ex-
courses. ample, if the computer’s graphics are
Design. In Nand to Tetris courses, sufficiently smooth to support the
instructors and students play the re- animation required by simple com-
spective roles of system architects puter games, then there is no need to
and junior developers. It is unsettling optimize relevant hardware or soft-
to see how, in many non-trivial pro- ware modules. In general, the perfor-
gramming assignments, computer mance of each built module is viewed
science students are often left to their pragmatically: As long as the module
own devices, expected to figure out passes a set of operational tests sup-
three very different things: how to plied by us, there is no need to opti-
design a system, how to implement mize it further. One exception is the
it, and how to test it. As system archi- OS, which is based on highly efficient
tects, we eliminate two-thirds of this and elegant algorithms.
uncertainty: For each hardware and In any hardware or software im-
software module, we supply detailed plementation project, much work is
design specifications, staged imple- spent on handling exceptions such
mentation plans, and test programs. as edge cases and erroneous inputs.
Students are allowed to deviate from Our second concession is downplay-
our proposed implementation and ing the former and ignoring the lat-
develop their own tests, but they are ter. For example, when students

MAY 2 0 2 4 | VO L. 6 7 | N O. 5 | C OM M U N IC AT ION S OF T HE ACM 83


research

implement a chip that computes an we write the compiler, the compiled


n-bit arithmetic operation, they are code of the Bar class constructor
allowed to ignore overflow and settle includes a call to an OS routine that
for computed values that are correct allocates the required memory. Why
up to n bits. And, when they develop
the assembler and the compiler, they The ability to work are assignment statements in the Jack
language preceded by a let prefix, as
are allowed to assume the source
programs contain no syntax errors.
on each module in in let x = 1? Answer: This is one of
the grammatical features that turns
Although learning to handle excep- isolation, and often Jack into an LL(1) language, which
tions is an important educational
objective, we believe it is equally im-
in parallel, allows is easier to compile using recursive
descent algorithms. And why does
portant to assume, at least provision- developers to Jack not have a switch statement?
ally, an error-free world. This allows
focusing on fundamental ideas and
compartmentalize Answer: Indeed, this could be a nice
touch; why not extend the language
core concepts, rather than spending and manage specification and implement switch
much time on handling exceptions,
as required by industrial-strength ap- complexity. in your compiler? And so it goes: Stu-
dents are invited to question every de-
plications. sign aspect of the architectures and
The rationale for these concessions languages presented in the course,
is pragmatic. First, without them, and instructors are invited to discuss
there would be no way to complete them critically and propose possible
the computer’s construction in one extensions.
semester. Second, Nand to Tetris is a
synthesis course that leaves many de- Extensions
tails to other, more specific CS cours- Optimization. With the exception of
es. Third, any one of the limitations the OS, the computer system built in
inherent in our computer system (and the course is largely unoptimized,
there are many, to be sure) provides a and improving its efficiency is a fer-
rich and well-motivated opportunity tile playground for aspiring hardware
for extension projects, as described and software engineers. We provide
in this article’s final section. two examples, focusing on hardware
Exploration. Before implementing and software optimizations. The n-
a hardware or software abstraction, bit ripple array adder chip built in
we encourage playing with executable Part I of the course (n = 16) is based
solutions. As students engage in these on n lower-level full-adder chip-parts,
experiments, questions abound. We each adding up two input bits and a
use these questions to motivate and carry bit. In the worst case, carry bits
explain our design decisions. For ex- propagate from the least- to the most-
ample: How can we rely on the ALU’s significant full-adders, resulting in a
calculations if it takes a while before computation delay that is proportion-
they produce correct answers? An- al to n. To boost performance, we can
swer: When we will introduce sequen- augment the basic adder logic with
tial logic in the next project, we will Carry Look Ahead (CLA) logic. The
set the clock cycle sufficiently long to CLA logic uses AND/OR operations to
allow time for the ALU circuits to sta- compute carry bits up the carry chain,
bilize on correct results. How can we enabling various degrees of parallel
use goto label instructions in assem- addition, depending on how far we are
bly programs before the labels are willing to look ahead. Alas, for large n
declared? Answer: When we will write values, the CLA logic becomes com-
the assembler later in the course, we plex, and the efficiency gain of paral-
will present a two-pass translation lel addition may not justify the cost of
algorithm that addresses this very is- the supplementary look-ahead logic.
sue. When a class Foo method creates Cost-benefit analyses of various CLA
a new object of class Bar, and given schemes can help yield an optimized
that each class is a separate compila- adder which is demonstratively faster
tion unit, how does Foo’s code know than the basic one. This optimization
how much memory to allocate for the is “nice to have,” since the basic de-
Bar object without having access to sign of the adder is sufficiently fast for
its field declarations? Answer: It does the course purposes. That said, every
not know, but as you will see when hardware module built in the course

84 COMM UNICATIO NS O F THE ACM | M AY 2024 | VO L . 67 | NO. 5


research

offers improvement opportunities lator. Committing the Hack comput- from computer architecture and com-
that can be turned into follow-up, er to silicon requires two additional pilation in one course, and as popular
bonus assignments that go beyond steps. First, one must rewrite the HDL MOOCS taken by many self-learners
the basic project requirements. Other programs of the main Hack chips us- and developers. Part I of Nand to Tet-
examples include instructions re- ing an industrial-strength language, ris (hardware) is also suitable for high
quiring different clock times (IMUL such as Verilog or VHDL. This is not school CS programs. All Nand to Te-
/ IDIV), pipelining, cache hierarchy, a difficult task, but one must learn tris course materials (lectures, proj-
and more. Built-in versions of these the language’s basics, which may well ects, software tools) are available free-
extensions can be implemented in be one of the goals of this extension ly in open source5,8 and instructors are
our open-ended hardware simulator, project. Next, using a low-cost FPGA welcome to use and extend them.
and then realized by students in HDL. board and open source FPGA synthe-
One of the software modules built sis tools, one can translate the HDL Acknowledgments
in Part II of the course is a virtual programs into an optimized con- The chief contributors to the software
machine. In projects 7–8, we guide figuration file that can then be load- suite that preceded the online Nand
students to realize this abstraction ed into the board, which becomes a to Tetris IDE were Yaron Ukrainitz,
by writing a program that trans- physical implementation of the Hack Nir Rozen, and Yannai Gonczarows-
lates each VM command into several computer. Examples of such exten- ki. Mark Armbrust, William Bahn,
machine-language instructions. For sion projects, including step-by-step Ran Navok, Yong Bakos, Tali Gutman,
example, consider the VM code se- guidelines, are publicly available.8 Rudolf Adamkovič, and Eytan Lifshitz
quence push a, push b, add. The se- Input/output. The Hack computer made other significant contributions.
mantics of the latter add primitive is built in the course uses two memory Most of the key ideas and techniques
“pop the two topmost values from the bitmaps to connect to a black-and- underlying Nand to Tetris came from
stack, add them up, and push the re- white screen and to a standard key- the brilliant mind of my friend and
sult onto the stack.” In the standard board. It would be nice to extend the colleague, Noam Nisan.
VM implementation, the translation basic Hack platform to accommodate
of each such VM command yields a a flexible and open-ended set of sen- References
1. Dijkstra, E.W. The humble programmer. Commun.
separate chunk of binary instruc- sors, motors, relays, and displays, like ACM 15, 3 (Oct. 1972), 859–866.
tions. Yet, an optimized translator those found on Arduino and Raspber- 2. Nand to Tetris Course Syllabus. Computer Science
Dept., Princeton University; https://bit.ly/3raALBk.
could infer from the VM code that the ry Pi platforms. This extension can be 3. Nisan, N. and Schocken, S. The Elements of Computing
first two push operations are super- done as follows. First, allocate addi- Systems. 2nd ed., MIT Press (2021).
4. Patterson, D.A. and Hennessy, J.L. Computer
fluous, replacing the whole sequence tional maps in the Hack memory for Organization and Design RISC-V Edition. 2nd ed.,
Morgan Kauffman, Cambridge, MA (2021), 11–13.
with binary code that implements the representing the various peripheral 5. Schocken, S. and Nisan, N. Nand to Tetris website;
single semantic operation push (a + devices. Second, specify and imple- https://bit.ly/3XD0Rt4.
6. Schocken, S., Nisan, N., and Armoni, M. A synthesis
b). Similar optimizations were made ment an interrupt controller chip that course in hardware architecture, compilers, and
by Robert Woodhead, at the Hack stores the states of the individual in- software engineering. In Proceedings of the ACM
SIGCSE. ACM (Mar. 2009), 443–447.
assembly language level.10 Such opti- terrupts triggered by the various I/O 7. Schröder, M. FPGA implementations of the Hack
mizations yield dramatic efficiency devices. Third, extend the Hack CPU Computer; https://bit.ly/3puLCpp, https://bit.ly/3NZIiMu.
8. Souther, D. and London, N. Nand to Tetris IDE Online;
gains as well as valuable hands-on to probe and handle the output of the bit.ly/3wNjeSu.
system-building lessons. interrupt controller. Finally, extend 9. Strunk, Jr., W. and White, E.B. The Elements of Style,
Macmillan (1959).
These are just two examples of the operating system to mask, clean, 10. Woodhead, R.J. Optimizing Nand2Tetris assembly
the numerous opportunities to im- and handle interrupts. We have start- code. Medium (Dec. 2023); bit.ly/4acMJfc

prove the efficiency of the hardware ed working on such extensions, but


Shimon Schocken ([email protected]) is a professor
and software platforms built in Nand readers may well come up with better at the Efi Arazi School of Computer Science, Reichman
to Tetris courses. The simplicity of implementations. University, Israel.
the platforms and the ubiquity of
the software tools that support the Conclusion © 2024 Copyright held by owner(s)/author(s).
coursework make such analyses and This article described Nand to Tet-
improvements a natural sequel of ev- ris, an infrastructure for courses that
ery lecture and project. Quite simply, teach applied computer science by
once an improvement has been artic- building a general-purpose computer
ulated algorithmically or technically, system—hardware and software—
learners have what it takes to realize from the ground up. Nand to Tetris
the extension and appreciate the re- demystifies how computers work and
sulting gains by experimenting with how they are built, engaging students
the optimized design in the relevant in 12 hands-on projects. Different
simulator. courses can use different subsets of
FPGA. In typical Nand to Tetris these projects and implement them Watch the author discuss
courses, students build chips by writ- in any desired order. Nand to Tetris this work in the exclusive
Communications video.
ing HDL programs and executing courses are offered in academic set- https://cacm.acm.org/videos/
them on the supplied hardware simu- tings that seek to combine key lessons nand-to-tetris

MAY 2 0 2 4 | VO L. 6 7 | N O. 5 | C OM M U N IC AT ION S OF T HE ACM 85

You might also like