Nand To Tetris: Building A Modern Computer System From First Principles
Nand To Tetris: Building A Modern Computer System From First Principles
DOI:10.1145/ 3626513
Several reasonable options come
CS course walks students through a step- to mind. One is a hands-on over-
view of applied CS, building on the
by-step construction of a complete, general- programming skills and theoretical
purpose computer system—hardware and knowledge acquired in the first two
software—in one semester. courses. Such a course could survey
key topics in computer architecture,
compilation, operating systems, and
BY SHIMON SCHOCKEN
software engineering, presented in
one cohesive framework. Ideally, the
Nand to Tetris:
course would engage students in sig-
nificant programming assignments,
have them implement classical algo-
rithms and widely used data struc-
Building
tures, and expose them to a range of
optimization and complexity issues.
This hands-on synthesis could ben-
efit students who seek an overarch-
a Modern
ing understanding of computing
systems, as well as self-learners and
non-majors who cannot commit to
more than a few core CS courses.
Computer
This article describes such a
course, called Nand to Tetris, which
walks students through a step-by-
step construction of a complete,
System from
general-purpose computer system—
hardware and software—in one se-
mester. As it turns out, construction
of the computer system built during
First Principles
the course requires exposure to, and
application of, some of the most per-
key insights
˽ In the early days of computers, any
curious person could gain a gestalt
understanding of how the machine
works. As digital technologies became
increasingly more complex, this clarity
is all but lost: The most fundamental
ideas and techniques in applied
computer science are now hidden under
many layers of obscure interfaces and
asked to design an abridged
S U P P O S E YOU W ER E proprietary implementations.
computer science (CS) program consisting of just ˽ Starting from NAND gates only, students
build a hardware platform comprising
three courses. How would you go about it? The a CPU, RAM, datapath, and a software
hierarchy consisting of an assembler,
first course would probably be an introduction a virtual machine, a basic OS, and a
compiler for a simple, Java-like object-
to computer science, exposing students to based language.
computational thinking and equipping them with
IMAGERY BY DEA D SA KURA
tinent and beautiful ideas and tech- typical course syllabus is available.2 computer. The remaining seven proj-
niques in applied CS. The computer’s About half of the course’s online ects revolve around the design and
hardware platform (CPU, RAM, data- learners are developers who wish implementation of a typical software
path) is built using a simple hardware to acquire a deep, hands-on under- hierarchy. In particular, we motivate
description language and a supplied standing of the hardware and soft- and build an assembler; a virtual ma-
hardware simulator. The computer’s ware infrastructures that enable their chine; a two-tier compiler for a high-
software hierarchy (assembler, vir- work. And the best way to understand level, object-based programming
tual machine, compiler) can be built something deeply is to build it from language; a basic operating system
in any programming language, fol- the ground up. (OS), and an application—typically a
lowing supplied specifications. The simple computer game involving ani-
resulting computer is equipped with From Nand to Tetris mation and interaction. The overall
a simple Java-like, object-based lan- The explicit goal of Nand to Tetris course consists of two parts, as we
guage that lends itself well to interac- courses is to build a general-purpose now turn to describe.
tive applications using graphics and computer system from elementary Part I: Hardware. The course starts
animation. Thousands of computer logic gates. The implicit goals are to with a focused overview of Boolean
games have already been developed offer a hands-on exposition of key algebra. We use disjunctive normal
on this computer, and many of them concepts and techniques in applied forms and reductive reasoning to
are illustrated in YouTube. computer science, and a compelling show that every Boolean function can
Following early versions of the synthesis of core topics from digital be realized using no more than NAND
course6 that underwent many im- architectures, compilation, operat- operators. This provides a theoretical
provements and extensions, the com- ing systems, and software engineer- yet impractical demonstration that
plete Nand to Tetris approach was ing. We make this synthesis con- the course goal—building a general-
described in the book The Elements crete by walking students through purpose computer system from just
of Computing Systems, by Noam Nisan 12 hands-on projects. Each project NAND gates—is indeed feasible. We
and Shimon Schocken.3 By choosing presents and motivates an important then discuss gate logic and chip spec-
a book title that nods to Strunk and hardware or software abstraction, ification, and present a simple hard-
White’s masterpiece,9 we sought to and then it provides guidelines for ware description language (HDL) that
allude to the concise and principled implementing the abstraction using can be learned in a few hours.
nature of our approach. All course executable modules developed in pre- In project 1, students use this HDL
materials—lectures, projects, speci- vious, lower-level projects. The com- and a supplied hardware simulator to
fications, and software tools—are puter system that emerges from this build and unit-test elementary logic
freely available in open source.5 Ver- effort is built gradually and from the gates such as AND, OR, NOT, multiplex-
sions of the course are now offered in bottom up (see Figure 1). ors, and their 16-bit extensions. We then
many educational settings, including The first five projects in the course discuss Boolean arithmetic, two’s com-
academic departments, high schools, focus on constructing the chipset and plement, and arithmetic-logic opera-
bootcamps, and online platforms. A architecture of a simple von Neumann tions. This background sets the stage
for project 2, in which students use the
Figure 1. Overall course plan. elementary gates built in project 1 to
implement a family of combinational
Each project p1, p2, …, p12 lasts one to two course-weeks. Project numbers indicate
the normal sequence, although they can be done in any desired order.
chips, leading up to an ALU. We then
discuss how sequential logic and finite-
state automata can be used to imple-
p9 software hierarchy ment chips that maintain state. In proj-
abstract writing a p10 p11
ect 3, students apply this knowledge to
abstraction gradually build and unit-test 1bit and
concept program
high-level building a abstraction p7 p8
language compiler 16bit registers, as well as a family of di-
building a
OS VM code
virtual
abstraction rect-access memory units (in which ad-
machine machine dressing and storage are realized using
(bytecode) language
p12
combinational and sequential logic, re-
assembler p6
spectively), leading up to a RAM.
At this stage, we have all the basic
hardware platform
building blocks necessary for synthe-
abstraction p4 p5 sizing a simple 16bit von Neumann
CPU, building a p2 p3 machine, which we call “Hack.”
datapath computer abstraction
building p1 Before doing so, we present the in-
ALU, RAM abstraction
chips struction set of this target computer
elementary building d
logic gates logic gates Nan (viewed abstractly), in both its sym-
bolic and binary versions. In project
4, students use the symbolic Hack
machine language to write assembly
programs that perform basic alge- Figure 2. The specification of each chip consists of a stub HDL file (containing the chip
braic, graphical, and user-interaction signature and an empty PARTS section), a test script, and a compare file.
tasks (the Hack computer specifica-
tion includes input and output driv- When evaluated by the hardware simulator, the output file produced by a correctly
implemented HDL program should be identical to the given compare file.
ers that use memory bitmaps for ren-
dering pixels on a connected screen
and for reading 16bit character codes
from a connected keyboard). The stu-
dents test and execute their assembly
programs on a supplied emulator that
simulates the Hack computer along
with its screen and keyboard devices.
Next, we present possible skeletal
architectures of the Hack CPU and
datapath. This is done abstractly, by
discussing how an architecture can
be functionally planned to fetch, de-
code, and execute binary instructions
written in the Hack instruction set.
We then discuss how the ALU, regis-
ters, and RAM chips built in projects
1–3 can be integrated into a hardware
platform that realizes the Hack com-
puter specification and machine lan-
guage. Construction of this topmost
computer-on-a-chip is completed in Figure 3. The Hardware Simulator, running/evaluating the HDL program shown in Figure
project 5. 2 (the roles of the various panels are explained in the text in parentheses).
Altogether, in Part I of the course,
students build 35 combinational and This particular XOR implementation, which is readable but not necessarily ef-
sequential chips, which are devel- ficient, is based on two NOT, two AND, and one OR chip parts, each implemented
as a standalone HDL program. When evaluating a chip, the simulator evaluates
oped in an HDL and tested on a sup-
recursively all its chip parts, all the way down to evaluating NAND gates, which
plied hardware simulator. For each have a primitive (built-in) implementation.
chip, we provide a skeletal HDL pro-
gram (listing the chip name, I/O pin fx i
The compiled VM code is loaded into, and executed by, the VM Simulator shown here. The simulator displays the VM code, the
simulated computer screen, the VM stack and the virtual memory segments, and the host RAM in which they are realized. For example,
RAM[0] stores the stack pointer, RAM[1] stores the base address of the local variables segment, and so on.
fx i
two-tier compilation models. We also grams involving arrays, objects, list that the parser’s logic can correctly
emphasize the role that intermediate processing, recursion, graphics, and tokenize and decode programs. Next,
bytecode plays in modern program- animation (the basic Jack language is we discuss algorithms for translat-
ming frameworks. augmented by a standard class library ing parsed statements, expressions,
Following this general overview, that extends it with string operations, objects, arrays, methods, and con-
we introduce a stack-based virtual I/O support, graphics rendering, structors into VM commands that re-
machine and a VM language that memory management, and more). alize the program’s semantics on the
features push/pop, stack arithmetic, These OS services are used by Jack virtual machine built in projects 7–8.
branching, and function call-and- programs abstractly and implement- In project 11, students apply these al-
return primitives. This abstraction ed in the last project in the course. In gorithms to morph the parser built in
is realized in projects 7 and 8, in project 9, students use Jack to build a project 10 into a full-scale compiler.
which students write a program that simple computer game of their choos- Specifically, we replace the logic that
translates each VM command into a ing. The purpose of this project is not generated passive XML code with log-
sequence of Hack instructions. This learning Jack, but rather setting the ic that generates executable VM code.
translator serves two purposes. First, stage for writing a Jack compiler and The resulting code can be executed
it implements our virtual machine a Jack-based OS. on the supplied VM emulator (see Fig-
abstraction. Second, it functions as Development of the compiler ure 4) or translated further into ma-
the back-end module of two-tier com- spans two projects. We start with a chine language and executed on the
pilers. For example, instead of writ- general discussion of lexicons, gram- hardware simulator.
ing a monolithic compiler that trans- mars, parse trees, and recursive-de- The software hierarchy is summa-
lates Jack programs into the target scent parsing algorithms. We then rized in Figure 5. The final task in the
machine language, one can write a present an XML mark-up representa- course is developing a basic operating
simple and elegant front-end trans- tion designed to capture the syntax system. The OS is minimal, lacking
lator that parses Jack programs and of Jack programs. In project 10, stu- many typical services, such as process
generates intermediate VM code, just dents build a program that parses and file management. Rather, our OS
like Java and C# compilers do. Before Jack programs as input and gener- serves two purposes. First, it extends
developing such a compiler, we offer ates their XML mark-up representa- the basic Jack language with added
a complete specification of the Jack tions as output. An inspection of the functionality, like mathematical and
language, and illustrate Jack pro- resulting XML code allows verifying string operations. Second, the OS is
designed to close gaps between the and OS. Before implementing a chip, These specifications leave no room for
software hierarchy built in Part II and or, when teaching or learning its in- design uncertainty: Before setting out
the hardware platform built in Part tended behavior, one can load a built- to implement a module, students have
I. Examples include a heap-manage- in chip implementation into the hard- an exact, hands-on understanding of its
ment system for storing and dispos- ware simulator and experiment with it intended functionality.
ing arrays and objects, an input driv- (we elaborate on this “behavioral sim- The ability to experiment with ex-
er for reading characters and strings ulation” practice later in this article). ecutable solutions has subtle educa-
from the keyboard, and output driv- Before implementing the assembler, tional virtues. In addition to actively
ers for rendering text and graphics on one can load assembly programs into understanding the abstraction—a
the screen. For each such OS service, the supplied assembler and visually rich world in itself—students are en-
we discuss its abstraction and API, as inspect how symbolic instructions couraged to discuss and question the
well as relevant algorithms and data are translated into binary codes. Prior merits and limitations of the abstrac-
structures for realizing them. For ex- to implementing the Jack compiler, tion’s design. We describe these ex-
ample, we use bitwise algorithms for one can use the supplied compiler plorations in the last section of this
efficient implementation of algebraic to translate representative Jack pro- article, where we discuss the course's
operations, first-fit/best-fit and linked grams, inspect the compiled VM code, pedagogy.
list algorithms for memory manage- and observe its execution on the sup- Modularity. A system architecture
ment, and Bresenham’s algorithm for plied VM emulator. And before imple- is said to be modular when it consists
drawing lines and circles. In project menting any OS function, one can call of (recursively) a set of relatively small
12, students use these CS gems to de- the function from a compiled Jack test and standalone modules, so that each
velop the OS, using Jack and supplied program and investigate its input-out- module can be independently devel-
API’s. And with that, the Nand to Tet- put behavior. oped and unit tested. Like abstrac-
ris journey comes to an end. The central role of abstraction is also tion, modularity is a key element of
inherent in all the project materials: sound system engineering: The ability
Discussion: Engineering One cannot start implementing a mod- to work on each module in isolation,
Abstraction-implementation. A hall- ule before carefully studying its intend- and often in parallel, allows develop-
mark of sound system engineering is ed functionality. Every chip is specified ers to compartmentalize and manage
separating the abstract specification abstractly by a stub HDL file containing complexity.
of what a system does from the imple- the chip signature and documentation, The computer system built in Nand
mentation details of how it does it. a test script, and a compare file. Every to Tetris courses comprises many
In Patterson and Hennessy’s “Seven software module—for example, the hardware and software modules.
Great Ideas in Computer Architec- assembler’s symbol table or the com- Each module is accompanied by an
ture,” abstraction is at the top of the piler’s parser—is specified by an API abstract specification and a proposed
list.4 Likewise, Dijkstra describes ab- that documents the module along with architecture that outlines how it can
straction as an essential mental tool staged test programs and compare files. be built from lower-level modules. In-
in programming.1 In Nand to Tetris,
the discussion of every hardware or Figure 5. The software hierarchy built in Part II of the course (projects 7–12).
dividual modules are relatively small, which is part of our open source hard- is staged. For example, the hardware
so developing each one is a manage- ware simulator, includes Java imple- platform developed in Part I of the
able and self-contained activity. Spe- mentations of all the chips built in the course consists of 35 modules (stand-
cifically, the HDL construction of a course. Instructors who wish to mod- alone chips) that are developed and
typical chip in Part I of the course in- ify or extend the Hack computer or unit-tested separately, according to
cludes an average of seven lower-level build other hardware platforms can staged plans given in each project.
chip-parts, and the proposed API of a edit the existing built-in chips library In complex chips such as the ALU,
typical software module in Part II of or create new libraries. CPU, and the RAM, the implementa-
the course consists of an average of Behavioral simulation plays a tion of the module itself is explicitly
ten methods. prominent role in the software proj- staged. For example, the Hack ALU is
This modularity impacts the proj- ects as well. For example, when devel- designed to compute a family of arith-
ect work as well as the learning expe- oping the Jack compiler, there is no metic/logic functions f(x,y) on two
rience. For example, in project 2, stu- need to worry about how the resulting 16bit inputs x and y. In addition, the
dents build several chips that carry out VM code is executed: The supplied ALU computes two 1bit outputs, indi-
Boolean arithmetic, including a “Half VM emulator can be used to test the cating that its output is zero or nega-
Adder.” Given two input bits x and y, code’s correctness. And when writ- tive. The computations of these flag
the half adder computes a two-bit out- ing the native VM implementation, bits are orthogonal to the ALU’s main
put consisting of the “sum bit” and the there is no need to worry about the logic and can be realized separately,
“carry bit” of x + y. As it turns out, these execution of the resulting assembly by independent blocks of HDL state-
bits can be computed, respectively, by code, since the latter can be loaded ments. With that in mind, our proj-
AND-ing and XOR-ing x and y. But what into, and executed, on the supplied ect 2 guidelines recommend building
if, for some reason, the student did not CPU emulator. In general, although and testing a basic ALU that computes
implement the requisite AND or XOR we recommend building the projects the f(x,y) output only, and then ex-
chip-parts in the previous project? from the bottom up in their natural tending the basic implementation to
Or, for that matter, the instructor has order (see Figure 1), any project in the handle the two flag bits as well. The
chosen to skip this part of the course? course represents a standalone build- staged implementation is supported
Blissfully, it does not matter, as we now ing block that can be developed inde- by two separate sets of ALU stub files,
turn to explain. pendently of all the other projects, in test scripts, and compare files.
Behavioral simulation. When our any desired order. The only requisite Staged implementations are also
hardware simulator evaluates a pro- is the API of the level below—that is, inherent in the software projects in
gram like HalfAdder.hdl that uses its abstract interface. Part II of the course. For example, con-
lower-level chip-parts, the simulator sider the assembler’s development: In
proceeds as follows: If the chipPart. Discussion: Pedagogy stage I, students are guided to develop
hdl file (like And.hdl and Xor.hdl) ex- A modular architecture and a system a basic assembler that handles assem-
ists in the project directory, the sim- specification are static artifacts, not bly programs containing no symbolic
ulator recurses to parse and evaluate plans of action. To turn them into a addresses. This is a fairly straightfor-
these lower-level HDL programs, all working system, we provide staged ward task: One writes a program that
the way down to the terminal Nand. implementation plans. The general translates symbolic mnemonics into
hdl leaves, which have a primitive/ staging strategy is based on sequen- their binary codes, following the Hack
built-in implementation. If, however, tial decomposition: Instead of real- machine-language specification. In
a chipPart.hdl file is missing in the izing a complex abstraction in one stage II, students are guided to imple-
project directory, the simulator in- sweep, the system architect can spec- ment and unit-test a symbol table, fol-
vokes and evaluates a built-in chip ify a basic version, which is imple- lowing a proposed API. Finally, and us-
implementation instead. This con- mented first. Once the basic version ing this added functionality, in stage
tract implies that all the chips in the is developed and tested, one proceeds III students morph the basic assem-
course can be implemented in any to extend it to a complete solution. bler into a final translator capable of
desired order, and failure to imple- Ideally, the API of the basic version handling assembly code with or with-
ment a chip does not prevent the should be a subset of the complete out symbolic addresses. Here, too,
implementation of other chips that API, and the basic version should be the separation to stages is supported
depend on it. morphed into, rather than replaced by customized and separate test files:
Using another example, a 1bit reg- by, the complete version. Such staged assembly programs in which all vari-
ister can be realized using a data flip- implementations must be carefully ables and jump destinations are physi-
flop and a multiplexor. Implementing articulated and supported by staged cal memory addresses for stage I, and
a flip-flop gate is an intricate art, and scaffolding. assembly programs with symbolic la-
instructors may wish to use it ab- Staging is informed by, but is not bels for stages II and III.
stractly. With that in mind, HDL pro- identical to, modularity. In some cas- Modularity and staging play a key
grams that use DFF chip-parts can be es, the architect simply recommends role in the compiler’s implementa-
implemented as is, without requiring the order in which modules should be tion, beginning with the separation
students to implement a DFF.hdl pro- developed and tested. In other cases, into a back-end module (the bytecode-
gram first. The built-in chip library, the development of the module itself to-assembly translator developed in
projects 7–8) and a front-end module not permitted to modify the given
(the Jack-to-bytecode compiler devel- specifications.
oped in projects 10–11). The imple- Clearly, students must learn how
mentation of each module is staged to architect and specify systems. We
further into two separate projects.
In project 7, students implement and In Nand to Tetris, believe, though, that a crucial ele-
ment of mastering the art of design is
test a basic virtual machine that fea-
tures push/pop and arithmetic com-
the discussion of seeing many good examples, as done
consciously in architecture, law, med-
mands only. In project 8, they extend every hardware icine, and many other professional
the machine to also handle branch-
ing and function calling. In project
or software disciplines. In writing workshops, for
example, significant learning time is
10, students implement a basic com- module begins spent reading the works of great mas-
pilation engine that uses a tokenizer
and a parser to analyze the source
with an abstract ters and critically evaluating those of
other workshop participants. Why not
code’s syntax. In project 11, the com- specification do the same when teaching systems
pilation engine is extended to gener-
ate code. In each of these projects, of its intended building? In Nand to Tetris courses,
students engage in dozens of meticu-
students are guided to first handle functionality, and lously planned architectures, specifi-
offers improvement opportunities lator. Committing the Hack comput- from computer architecture and com-
that can be turned into follow-up, er to silicon requires two additional pilation in one course, and as popular
bonus assignments that go beyond steps. First, one must rewrite the HDL MOOCS taken by many self-learners
the basic project requirements. Other programs of the main Hack chips us- and developers. Part I of Nand to Tet-
examples include instructions re- ing an industrial-strength language, ris (hardware) is also suitable for high
quiring different clock times (IMUL such as Verilog or VHDL. This is not school CS programs. All Nand to Te-
/ IDIV), pipelining, cache hierarchy, a difficult task, but one must learn tris course materials (lectures, proj-
and more. Built-in versions of these the language’s basics, which may well ects, software tools) are available free-
extensions can be implemented in be one of the goals of this extension ly in open source5,8 and instructors are
our open-ended hardware simulator, project. Next, using a low-cost FPGA welcome to use and extend them.
and then realized by students in HDL. board and open source FPGA synthe-
One of the software modules built sis tools, one can translate the HDL Acknowledgments
in Part II of the course is a virtual programs into an optimized con- The chief contributors to the software
machine. In projects 7–8, we guide figuration file that can then be load- suite that preceded the online Nand
students to realize this abstraction ed into the board, which becomes a to Tetris IDE were Yaron Ukrainitz,
by writing a program that trans- physical implementation of the Hack Nir Rozen, and Yannai Gonczarows-
lates each VM command into several computer. Examples of such exten- ki. Mark Armbrust, William Bahn,
machine-language instructions. For sion projects, including step-by-step Ran Navok, Yong Bakos, Tali Gutman,
example, consider the VM code se- guidelines, are publicly available.8 Rudolf Adamkovič, and Eytan Lifshitz
quence push a, push b, add. The se- Input/output. The Hack computer made other significant contributions.
mantics of the latter add primitive is built in the course uses two memory Most of the key ideas and techniques
“pop the two topmost values from the bitmaps to connect to a black-and- underlying Nand to Tetris came from
stack, add them up, and push the re- white screen and to a standard key- the brilliant mind of my friend and
sult onto the stack.” In the standard board. It would be nice to extend the colleague, Noam Nisan.
VM implementation, the translation basic Hack platform to accommodate
of each such VM command yields a a flexible and open-ended set of sen- References
1. Dijkstra, E.W. The humble programmer. Commun.
separate chunk of binary instruc- sors, motors, relays, and displays, like ACM 15, 3 (Oct. 1972), 859–866.
tions. Yet, an optimized translator those found on Arduino and Raspber- 2. Nand to Tetris Course Syllabus. Computer Science
Dept., Princeton University; https://bit.ly/3raALBk.
could infer from the VM code that the ry Pi platforms. This extension can be 3. Nisan, N. and Schocken, S. The Elements of Computing
first two push operations are super- done as follows. First, allocate addi- Systems. 2nd ed., MIT Press (2021).
4. Patterson, D.A. and Hennessy, J.L. Computer
fluous, replacing the whole sequence tional maps in the Hack memory for Organization and Design RISC-V Edition. 2nd ed.,
Morgan Kauffman, Cambridge, MA (2021), 11–13.
with binary code that implements the representing the various peripheral 5. Schocken, S. and Nisan, N. Nand to Tetris website;
single semantic operation push (a + devices. Second, specify and imple- https://bit.ly/3XD0Rt4.
6. Schocken, S., Nisan, N., and Armoni, M. A synthesis
b). Similar optimizations were made ment an interrupt controller chip that course in hardware architecture, compilers, and
by Robert Woodhead, at the Hack stores the states of the individual in- software engineering. In Proceedings of the ACM
SIGCSE. ACM (Mar. 2009), 443–447.
assembly language level.10 Such opti- terrupts triggered by the various I/O 7. Schröder, M. FPGA implementations of the Hack
mizations yield dramatic efficiency devices. Third, extend the Hack CPU Computer; https://bit.ly/3puLCpp, https://bit.ly/3NZIiMu.
8. Souther, D. and London, N. Nand to Tetris IDE Online;
gains as well as valuable hands-on to probe and handle the output of the bit.ly/3wNjeSu.
system-building lessons. interrupt controller. Finally, extend 9. Strunk, Jr., W. and White, E.B. The Elements of Style,
Macmillan (1959).
These are just two examples of the operating system to mask, clean, 10. Woodhead, R.J. Optimizing Nand2Tetris assembly
the numerous opportunities to im- and handle interrupts. We have start- code. Medium (Dec. 2023); bit.ly/4acMJfc