0% found this document useful (0 votes)
418 views150 pages

Ece120 Notes PDF

The document discusses the halting problem, which is proven to be undecidable. Specifically: 1) The halting problem asks, given a Turing machine and input, does the Turing machine halt in a finite number of steps? Alan Turing proved in 1936 that no algorithm can solve this problem. 2) To prove the halting problem is undecidable, Turing assumed a "Halting Machine" could solve it, but then showed this leads to a contradiction. He constructed a machine that inverts the Halting Machine's output, exposing the contradiction. 3) No machine can reliably answer the halting problem, as it could produce inconsistent results depending on how it analyzes itself

Uploaded by

Marcus Hoang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
418 views150 pages

Ece120 Notes PDF

The document discusses the halting problem, which is proven to be undecidable. Specifically: 1) The halting problem asks, given a Turing machine and input, does the Turing machine halt in a finite number of steps? Alan Turing proved in 1936 that no algorithm can solve this problem. 2) To prove the halting problem is undecidable, Turing assumed a "Halting Machine" could solve it, but then showed this leads to a contradiction. He constructed a machine that inverts the Halting Machine's output, exposing the contradiction. 3) No machine can reliably answer the halting problem, as it could produce inconsistent results depending on how it analyzes itself

Uploaded by

Marcus Hoang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 150

c 2004-2015 Steven S. Lumetta. All rights reserved.

ECE120: Introduction to Computer Engineering


Notes Set 1.1
The Halting Problem
For some of the topics in this course, we plan to cover the material more deeply than does the textbook. We
will provide notes in this format to supplement the textbook for this purpose. In order to make these notes
more useful as a reference, definitions are highlighted with boldface, and italicization emphasizes pitfalls
or other important points. Sections marked with an asterisk are provided solely for your interest, but you
probably need to learn this material in later classes.
These notes are broken up into four parts, corresponding to the three midterm exams and the final exam.
Each part is covered by one examination in our class. The last section of each of the four parts gives you
a summary of material that you are expected to know for the corresponding exam. Feel free to read it in
advance.
As discussed in the textbook and in class, a universal computational device (or computing machine)
is a device that is capable of computing the solution to any problem that can be computed, provided that
the device is given enough storage and time for the computation to finish.
One might ask whether we can describe problems that we cannot answer (other than philosophical ones,
such as the meaning of life). The answer is yes: there are problems that are provably undecidable, for
which no amount of computation can solve the problem in general. This set of notes describes the first
problem known to be undecidable, the halting problem. For our class, you need only recognize the name
and realize that one can, in fact, give examples of problems that cannot be solved by computation. In the future, you should be able to recognize this type of problem so as to avoid spending your time trying to solve it.

Universal Computing Machines*


The things that we call computers today, whether we are talking about a programmable microcontroller in
a microwave oven or the Blue Waters supercomputer sitting on the south end of our campus (the United
States main resource to support computational science research), are all equivalent in the sense of what
problems they can solve. These machines do, of course, have access to different amounts of memory, and
compute at different speeds.
The idea that a single model of computation could be described and proven to be equivalent to all other
models came out of a 1936 paper by Alan Turing, and today we generally refer to these devices as Turing
machines. All computers mentioned earlier, as well as all computers with which you are familiar in your
daily life, are provably equivalent to Turing machines.
Turing also conjectured that his definition of computable was identical to the natural definition (today,
this claim is known as the Church-Turing conjecture). In other words, a problem that cannot be solved
by a Turing machine cannot be solved in any systematic manner, with any machine, or by any person. This
conjecture remains unproven! However, neither has anyone been able to disprove the conjecture, and it is
widely believed to be true. Disproving the conjecture requires that one demonstrate a systematic technique
(or a machine) capable of solving a problem that cannot be solved by a Turing machine. No one has been
able to do so to date.

The Halting Problem*


You might reasonably ask whether any problems can be shown to be incomputable. More common terms
for such problemsthose known to be insolvable by any computerare intractable or undecidable. In the
same 1936 paper in which he introduced the universal computing machine, Alan Turing also provided an
answer to this question by introducing (and proving) that there are in fact problems that cannot be computed
by a universal computing machine. The problem that he proved undecidable, using proof techniques almost
identical to those developed for similar problems in the 1880s, is now known as the halting problem.

c 2004-2015 Steven S. Lumetta. All rights reserved.


The halting problem is easy to state and easy to prove undecidable. The problem is this: given a Turing
machine and an input to the Turing machine, does the Turing machine finish computing in a finite number
of steps (a finite amount of time)? In order to solve the problem, an answer, either yes or no, must be given
in a finite amount of time regardless of the machine or input in question. Clearly some machines never finish.
For example, we can write a Turing machine that counts upwards starting from one.
You may find the proof structure for undecidability of the halting problem easier to understand if you first
think about a related problem with which you may already be familiar, the Liars paradox (which is at least
2,300 years old). In its stengthened form, it is the following sentence: This sentence is not true.
To see that no Turing machine can solve the halting problem, we
begin by assuming that such a machine exists, and then show that
yes
Turing
Halting
or
machine +
its existence is self-contradictory. We call the machine the Halting
Machine
(HM)
no
inputs
Machine, or HM for short. HM is a machine that operates on another
Turing machine and its inputs to produce a yes or no answer in finite time: either the machine in question
finishes in finite time (HM returns yes), or it does not (HM returns no). The figure illustrates HMs
operation.
From HM, we construct a second machine that we call
the HM Inverter, or HMI. This machine inverts the sense
of the answer given by HM. In particular, the inputs are
fed directly into a copy of HM, and if HM answers yes,
HMI enters an infinite loop. If HM answers no, HMI
halts. A diagram appears to the right.

Turing
machine +
inputs

Halting
Machine (HM)

HM
said yes?

no

done

yes

Halting Machine

count forever
Inverter (HMI)
The inconsistency can now be seen by asking HM whether
HMI halts when given itself as an input (repeatedly), as
shown below. Two copies of HM are thus being asked the same question. One copy is the rightmost in the
figure below and the second is embedded in the HMI machine that we are using as the input to the rightmost
HM. As the two copies of HM operate on the same input (HMI operating on HMI), they should return the
same answer: a Turing machine either halts on an input, or it does not; they are deterministic.

a Turing machine + inputs

...

HMI

HMI

Halting
Machine (HM)

HM
said yes?

yes

Halting Machine
Inverter (HMI)

no

done
Halting
Machine (HM)

yes
or
no

count forever

Lets assume that the rightmost HM tells us that HMI operating on itself halts. Then the copy of HM in
HMI (when HMI executes on itself, with itself as an input) must also say yes. But this answer implies
that HMI doesnt halt (see the figure above), so the answer should have been no!
Alternatively, we can assume that the rightmost HM says that HMI operating on itself does not halt. Again,
the copy of HM in HMI must give the same answer. But in this case HMI halts, again contradicting our
assumption.
Since neither answer is consistent, no consistent answer can be given, and the original assumption that HM
exists is incorrect. Thus, no Turing machine can solve the halting problem.

c 2012-2015 Steven S. Lumetta. All rights reserved.


ECE120: Introduction to Computer Engineering


Notes Set 1.2
The 2s Complement Representation
This set of notes explains the rationale for using the 2s complement representation for signed integers and
derives the representation based on equivalence of the addition function to that of addition using the unsigned representation with the same number of bits.

Review of Bits and the Unsigned Representation


In modern digital systems, we represent all types of information using binary digits, or bits. Logically, a
bit is either 0 or 1. Physically, a bit may be a voltage, a magnetic field, or even the electrical resistance of
a tiny sliver of glass. Any type of information can be represented with an ordered set of bits, provided that
any given pattern of bits corresponds to only one value and that we agree in advance on which pattern of bits
represents which value.
For unsigned integersthat is, whole numbers greater or equal to zerowe chose to use the base 2 representation already familiar to us from mathematics. We call this representation the unsigned representation.
For example, in a 4-bit unsigned representation, we write the number 0 as 0000, the number 5 as 0101,
and the number 12 as 1100. Note that we always write the same number of bits for any pattern in the
representation: in a digital system, there is no blank bit value.

Picking a Good Representation


In class, we discussed the question of what makes one representation better than another. The value of
the unsigned representation, for example, is in part our existing familiarity with the base 2 analogues of
arithmetic. For base 2 arithmetic, we can use nearly identical techniques to those that we learned in
elementary school for adding, subtracting, multiplying, and dividing base 10 numbers.
Reasoning about the relative merits of representations from a practical engineering perspective is (probably) currently beyond your ability. Saving energy, making the implementation simple, and allowing the
implementation to execute quickly probably all sound attractive, but a quantitative comparison between two
representations on any of these bases requires knowledge that you will acquire in the next few years.
We can sidestep such questions, however, by realizing that if a digital system has hardware to perform
operations such as addition on unsigned values, using the same piece of hardware to operate on other
representations incurs little or no additional cost. In this set of notes, we discuss the 2s complement representation, which allows reuse of the unsigned add unit (as well as a basis for performing subtraction of either
representation using an add unit!). In discussion section and in your homework, you will use the same idea
to perform operations on other representations, such as changing an upper case letter in ASCII to a lower
case one, or converting from an ASCII digit to an unsigned representation of the same number.

The Unsigned Add Unit


In order to define a representation for signed integers that allows us to reuse a piece of hardware designed
for unsigned integers, we must first understand what such a piece of hardware actually does (we do not need
to know how it works yetwell explore that question later in our class).
The unsigned representation using N bits is not closed under addition. In other words, for any value of N , we
can easily find two N -bit unsigned numbers that, when added together, cannot be represented as an N -bit
unsigned number. With N = 4, for example, we can add 12 (1100) and 6 (0110) to obtain 18. Since 18
is outside of the range [0, 24 1] representable using the 4-bit unsigned representation, our representation
breaks if we try to represent the sum using this representation. We call this failure an overflow condition:
the representation cannot represent the result of the operation, in this case addition.

c 2012-2015 Steven S. Lumetta. All rights reserved.


Using more bits to represent the answer is not an attractive solution, since we might then
want to use more bits for the inputs, which in turn requires more bits for the outputs,
and so on. We cannot build something supporting an infinite number of bits. Instead, we
choose a value for N and build an add unit that adds two N -bit numbers and produces
an N -bit sum (and some overflow indicators, which we discuss in the next set of notes).
The diagram to the right shows how we might draw such a device, with two N -bit numbers
entering at from the top, and the N -bit sum coming out from the bottom.

Nbit
add unit
N

The function used for N -bit unsigned addition is addition modulo 2N . In a practical sense,
you can think of this function as simply keeping the last N bits of the answer; other bits
are simply discarded. In the example to the right, we add 12 and 6 to obtain 18, but then
discard the extra bit on the left, so the add unit produces 2 (an overflow).
Modular arithmetic defines a way of performing arithmetic
for a finite number of possible values, usually integers. As a
concrete example, lets use modulo 16, which corresponds to
the addition unit for our 4-bit examples.

...

16

1 0

a second group
of numbers

1100 (12)
+ 0110 (6)
10010 (2)

15 16
one group
of numbers

31

...

a third group
of numbers

Starting with the full range of integers, we can define equivalence classes for groups of 16 integers by simply breaking up
the number line into contiguous groups, starting with the numbers 0 to 15, as shown to the right. The
numbers -16 to -1 form a group, as do the numbers from 16 to 31. An infinite number of groups are defined
in this manner.
You can think of these groups as defining equivalence classes modulo 16. All of the first numbers in the
groups are equivalent modulo 16. All of the second numbers in the groups are equivalent modulo 16. And
so forth. Mathematically, we say that two numbers A and B are equivalent modulo 16, which we write as
(A = B) mod 16
if and only if A = B + 16k for some integer k.
It is worth noting that equivalence as defined by a particular modulus distributes over addition and multiplication. If, for example, we want to find the equivalence class for (A + B) mod 16, we can find the equivalence
classes for A (call it C) and B (call it D) and then calculate the equivalence class of (C + D) mod 16.
As a concrete example of distribution over multiplication, given (A = 1, 083, 102, 112 7, 323, 127) mod 10,
find A. For this problem, we note that the first number is equivalent to 2 mod 10, while the second number
is equivalent to 7 mod 10. We then write (A = 2 7) mod 10, and, since 2 7 = 14, we have (A = 4) mod 10.

Deriving 2s Complement
Given these equivalence classes, we might
instead choose to draw a circle to identify the equivalence classes and to associate
each class with one of the sixteen possible
4-bit patterns, as shown to the right. Using this circle representation, we can add by
counting clockwise around the circle, and
we can subtract by counting in a counterclockwise direction around the circle. With
an unsigned representation, we choose to
use the group from [0, 15] (the middle group
in the diagram markings to the right) as
the number represented by each of the patterns. Overflow occurs with unsigned addition (or subtraction) because we can only
choose one value for each binary pattern.

..., 16, 0, 16, ...


..., 15, 1, 17, ...

..., 1, 15, 31, ...


0000

..., 2, 14, 30, ...

0001

1111

..., 14, 2, 18, ...


0010

1110
..., 3, 13, 29, ...

..., 13, 3, 19, ...


0011

1101

..., 4, 12, 28, ...

1100

equivalence classes
modulo 16
(binary patterns inside circle)

0100

..., 12, 4, 20, ...

0101

1011

..., 11, 5, 21

..., 5, 11, 27, ...


0110

1010
..., 6, 10, 26, ...

0111

1001

..., 10, 6, 22

1000

..., 7, 9, 25, ...

..., 9, 7, 23
..., 8, 8, 24, ...

c 2012-2015 Steven S. Lumetta. All rights reserved.


In fact, we can choose any single value for each pattern to create a representation, and our add unit will
always produce results that are correct modulo 16. Look back at our overflow example, where we added 12
and 6 to obtain 2, and notice that (2 = 18) mod 16. Normally, only a contiguous sequence of integers makes
a useful representation, but we do not have to restrict ourselves to non-negative numbers.
The 2s complement representation can then be defined by choosing a set of integers balanced around zero
from the groups. In the circle diagram, for example, we might choose to represent numbers in the range
[7, 7] when using 4 bits. What about the last pattern, 1000? We could choose to represent either -8 or 8.
The number of arithmetic operations that overflow is the same with both choices (the choices are symmetric
around 0, as are the combinations of input operands that overflow), so we gain nothing in that sense from either choice. If we choose to represent -8, however, notice that all patterns starting with a 1 bit then represent
negative numbers. No such simple check arises with the opposite choice, and thus an N -bit 2s complement
representation is defined to represent the range [2N 1 , 2N 1 1], with patterns chosen as shown in the circle.

An Algebraic Approach
Some people prefer an algebraic approach to understanding the definition of 2s complement, so we present
such an approach next. Lets start by writing f (A, B) for the result of our add unit:
f (A, B) = (A + B) mod 2N

We assume that we want to represent a set of integers balanced around 0 using our signed representation, and
that we will use the same binary patterns as we do with an unsigned representation to represent non-negative
numbers. Thus, with an N -bit representation, the patterns in the range [0, 2N 1 1] are the same as those
used with an unsigned representation. In this case, we are left with all patterns beginning with a 1 bit.
The question then is this: given an integer k, 2N 1 > k > 0, for which we want to find a pattern to
represent k, and any integer m 0 that we might want to add to k, can we find another integer p > 0
such that
(k + m = p + m) mod 2N

(1)

If we can, we can use ps representation to represent k and our unsigned addition unit f (A, B) will work
correctly.
To find the value p, start by subtracting m from both sides of Equation (1) to obtain:
(k = p) mod 2N

(2)

Note that (2N = 0) mod 2N , and add this equation to Equation (2) to obtain
(2N k = p) mod 2N
Let p = 2N k. For example, if N = 4, k = 3 gives p = 16 3 = 13, which is the pattern 1101. With N = 4
and k = 5, we obtain p = 16 5 = 11, which is the pattern 1011. In general, since 2N 1 > k > 0, we
have 2N 1 < p < 2N . But these patterns are all unusedthey all start with a 1 bit!so the patterns that
we have defined for negative numbers are disjoint from those that we used for positive numbers, and the
meaning of each pattern is unambiguous. The algebraic definition of bit patterns for negative numbers also
matches our circle diagram from the last section exactly, of course.

c 2012-2015 Steven S. Lumetta. All rights reserved.


Negating 2s Complement Numbers


The algebraic approach makes understanding negation of an integer represented using 2s complement fairly
straightforward, and gives us an easy procedure for doing so. Recall that given an integer k in an N -bit 2s
complement representation, the N -bit pattern for k is given by 2N k (also true for k = 0 if we keep
only the low N bits of the result). But 2N = (2N 1) + 1. Note that 2N 1 is the pattern of all 1 bits.
Subtracting any value k from this value is equivalent to simply flipping the bits, changing 0s to 1s and 1s
to 0s. (This operation is called a 1s complement, by the way.) We then add 1 to the result to find the
pattern for k.
Negation can overflow, of course. Try finding the negative pattern for -8 in 4-bit 2s complement.
Finally, be aware that people often overload the term 2s complement and use it to refer to the operation
of negation in a 2s complement representation. In our class, we try avoid this confusion: 2s complement
is a representation for signed integers, and negation is an operation that one can apply to a signed integer
(whether the representation used for the integer is 2s complement or some other representation for signed
integers).

c 2012 Steven S. Lumetta. All rights reserved.


ECE120: Introduction to Computer Engineering


Notes Set 1.3

Overflow Conditions

This set of notes discusses the overflow conditions for unsigned and 2s complement addition. For both
types, we formally prove that the conditions that we state are correct. Many of our faculty want our students to learn to construct formal proofs, so we plan to begin exposing you to this process in our classes.
Prof. Lumetta is a fan of Prof. George Polyas educational theories with regard to proof techniques, and
in particular the idea that one builds up a repertoire of approaches by seeing the approaches used in practice.

Implication and Mathematical Notation


Some of you may not have been exposed to basics of mathematical logic, so lets start with a brief introduction
to implication. Well use variables p and q to represent statements that can be either true or false. For
example, p might represent the statement, Jan is an ECE student, while q might represent the statement,
Jan works hard. The logical complement or negation of a statement p, written for example as not p,
has the opposite truth value: if p is true, not p is false, and if p is false, not p is true.
An implication is a logical relationship between two statements. The implication itself is also a logical
statement, and may be true or false. In English, for example, we might say, If p, q. In mathematics, the
same implication is usually written as either q if p or p q, and the latter is read as, p implies q.
Using our example values for p and q, we can see that p q is true: Jan is an ECE student does in fact
imply that Jan works hard!
The implication p q is only considered false if p is true and q is false. In all other cases, the implication
is true. This definition can be a little confusing at first, so lets use another example to see why. Let p
represent the statement Entity X is a flying pig, and let q represent the statement, Entity X obeys air
traffic control regulations. Here the implication p q is again true: flying pigs do not exist, so p is false,
and thus p q is truefor any value of statement q!
Given an implication p q, we say that the converse of the implication is the statement q p, which
is also an implication. In mathematics, the converse of p q is sometimes written as q only if p. The
converse of an implication may or may not have the same truth value as the implication itself. Finally, we
frequently use the shorthand notation, p if and only if q, (or, even shorter, p iff q) to mean p q and
q p. This last statement is true only when both implications are true.

Overflow for Unsigned Addition


Lets say that we add two N -bit unsigned numbers, A and B. The N -bit unsigned representation can
represent integers in the range [0, 2N 1]. Recall that we say that the addition operation has overflowed
if the number represented by the N -bit pattern produced for the sum does not actually represent the
number A + B.
For clarity, lets name the bits of A by writing the number as aN 1 aN 2 ...a1 a0 . Similarly, lets write B as
bN 1 bN 2 ...b1 b0 . Name the sum C = A + B. The sum that comes out of the add unit has only N bits, but
recall that we claimed in class that the overflow condition for unsigned addition is given by the carry out
of the most significant bit. So lets write the sum as cN cN 1 cN 2 ...c1 c0 , realizing that cN is the carry out
and not actually part of the sum produced by the add unit.
Theorem: Addition of two N -bit unsigned numbers A = aN 1 aN 2 ...a1 a0 and B = bN 1 bN 2 ...b1 b0 to
produce sum C = A + B = cN cN 1 cN 2 ...c1 c0 , overflows if and only if the carry out cN of the addition is
a 1 bit.

c 2012 Steven S. Lumetta. All rights reserved.


Proof: Lets start with the if direction. In other words, cN = 1 implies overflow. Recall that unsigned
addition is the same as base 2 addition, except that we discard bits beyond cN 1 from the sum C. The
bit cN has place value 2N , so, when cN = 1 we can write that the correct sum C 2N . But no value that
large can be represented using the N -bit unsigned representation, so we have an overflow.
The other direction (only if) is slightly more complex: we need to show that overflow implies that cN = 1.
We use a range-based argument for this purpose. Overflow means that the sum C is outside the representable
range [0, 2N 1]. Adding two non-negative numbers cannot produce a negative number, so the sum cant
be smaller than 0. Overflow thus implies that C 2N .
Does that argument complete the proof? No, because some numbers, such as 2N +1 , are larger than 2N , but
do not have a 1 bit in the N th position when written in binary. We need to make use of the constraints
on A and B implied by the possible range of the representation.
In particular, given that A and B are represented as N -bit unsigned values, we can write
0 A
0 B

2N 1
2N 1

We add these two inequalities and replace A + B with C to obtain


0 C

2N +1 2

Combining the new inequality with the one implied by the overflow condition, we obtain
2N

2N +1 2

All of the numbers in the range allowed by this inequality have cN = 1, completing our proof.

Overflow for 2s Complement Addition


Understanding overflow for 2s complement addition is somewhat trickier, which is why the problem is a good
one for you to think about on your own first. Our operands, A and B, are now two N -bit 2s complement
numbers. The N -bit 2s complement representation can represent integers in the range [2N 1 , 2N 1 1].
Lets start by ruling out a case that we can show never leads to overflow.
Lemma: Addition of two N -bit 2s complement numbers A and B does not overflow if one of the numbers
is negative and the other is not.
Proof: We again make use of the constraints implied by the fact that A and B are represented as N -bit 2s
complement values. We can assume without loss of generality1 , or w.l.o.g., that A < 0 and B 0.
Combining these constraints with the range representable by N -bit 2s complement, we obtain
2N 1
0

A
B

<0
< 2N 1

We add these two inequalities and replace A + B with C to obtain


2N 1

< 2N 1

But anything in the range specified by this inequality can be represented with N -bit 2s complement, and
thus the addition does not overflow.
1 This common mathematical phrasing means that we are using a problem symmetry to cut down the length of the proof
discussion. In this case, the names A and B arent particularly important, since addition is commutative (A + B = B + A).
Thus the proof for the case in which A is negative (and B is not) is identical to the case in which B is negative (and A is not),
except that all of the names are swapped. The term without loss of generality means that we consider the proof complete
even with additional assumptions, in our case that A < 0 and B 0.

c 2012 Steven S. Lumetta. All rights reserved.


We are now ready to state our main theorem. For convenience, lets use different names for the actual
sum C = A + B and the sum S returned from the add unit. We define S as the number represented by the
bit pattern produced by the add unit. When overflow occurs, S 6= C, but we always have (S = C) mod 2N .
Theorem: Addition of two N -bit 2s complement numbers A and B overflows if and only if one of the
following conditions holds:
1. A < 0 and B < 0 and S 0
2. A 0 and B 0 and S < 0
Proof: We once again start with the if direction. That is, if condition 1 or condition 2 holds, we have
an overflow. The proofs are straightforward. Given condition 1, we can add the two inequalities A < 0 and
B < 0 to obtain C = A + B < 0. But S 0, so clearly S 6= C, thus overflow has occurred.
Similarly, if condition 2 holds, we can add the inequalities A 0 and B 0 to obtain C = A + B 0. Here
we have S < 0, so again S 6= C, and we have an overflow.
We must now prove the only if direction, showing that any overflow implies either condition 1 or condition 2.
By the contrapositive2 of our Lemma, we know that if an overflow occurs, either both operands are negative,
or they are both positive.
Lets start with the case in which both operands are negative, so A < 0 and B < 0, and thus the real
sum C < 0 as well. Given that A and B are represented as N -bit 2s complement, they must fall in the
representable range, so we can write
2N 1
2N 1

A
B

<0
<0

We add these two inequalities and replace A + B with C to obtain


2N C

<0

Given that an overflow has occurred, C must fall outside of the representable range. Given that C < 0, it
cannot be larger than the largest possible number representable using N -bit 2s complement, so we can write
2N

< 2N 1

We now add 2N to each part to obtain


0 C + 2N

< 2N 1

This range of integers falls within the representable range for N -bit 2s complement, so we can replace the
middle expression with S (equal to C modulo 2N ) to find that
0 S

< 2N 1

Thus, if we have an overflow and both A < 0 and B < 0, the resulting sum S 0, and condition 1 holds.
The proof for the case in which we observe an overflow when both operands are non-negative (A 0 and
B 0) is similar, and leads to condition 2. We again begin with inequalities for A and B:
0 A
0 B

< 2N 1
< 2N 1

We add these two inequalities and replace A + B with C to obtain


0
2 If

C<

2N

we have a statement of the form (p implies q), its contrapositive is the statement (not q implies not p). Both statements
have the same truth value. In this case, we can turn our Lemma around as stated.

c 2012 Steven S. Lumetta. All rights reserved.


10

Given that an overflow has occurred, C must fall outside of the representable range. Given that C 0, it
cannot be smaller than the smallest possible number representable using N -bit 2s complement, so we can
write
2N 1

< 2N

We now subtract 2N to each part to obtain


2N 1

C 2N

<0

This range of integers falls within the representable range for N -bit 2s complement, so we can replace the
middle expression with S (equal to C modulo 2N ) to find that
2N 1 S

<0

Thus, if we have an overflow and both A 0 and B 0, the resulting sum S < 0, and condition 2 holds.
Thus overflow implies either condition 1 or condition 2, completing our proof.

c 2012 Steven S. Lumetta. All rights reserved.


11

ECE120: Introduction to Computer Engineering


Notes Set 1.4
Logic Operations
This set of notes briefly describes a generalization to truth tables, then introduces Boolean logic operations
as well as notational conventions and tools that we use to express general functions on bits. We illustrate
how logic operations enable us to express functions such as overflow conditions concisely, then show by construction that a small number of logic operations suffices to describe any operation on any number of bits.
We close by discussing a few implications and examples.

Truth Tables
You have seen the basic form of truth tables in the textbook and in class. Over
the semester, we will introduce several extensions to the basic concept, mostly with
the goal of reducing the amount of writing necessary when using truth tables. For
example, the truth table to the right uses two generalizations to show the carry
out C (also the unsigned overflow indicator) and the sum S produced by adding
two 2-bit unsigned numbers. First, rather than writing each input bit separately,
we have grouped pairs of input bits into the numbers A and B. Second, we have
defined multiple output columns so as to include both bits of S as well as C in the
same table. Finally, we have grouped the two bits of S into one column.

inputs
A B
00 00
00 01
00 10
00 11
01 00
01 01
01 10
01 11
10 00
10 01
10 10
10 11
11 00
11 01
11 10
11 11

outputs
C
S
0
00
0
01
0
10
0
11
0
01
0
10
0
11
1
00
0
10
0
11
1
00
1
01
0
11
1
00
1
01
1
10

Keep in mind as you write truth tables that only rarely does an operation correspond
to a simple and familiar process such as addition of base 2 numbers. We had to
choose the unsigned and 2s complement representations carefully to allow ourselves
to take advantage of a familiar process. In general, for each line of a truth table for
an operation, you may need to make use of the input representation to identify the
input values, calculate the operations result as a value, and then translate the value
back into the correct bit pattern using the output representation. Signed magnitude addition, for example, does not always correspond to base 2 addition: when the
signs of the two input operands differ, one should instead use base 2 subtraction. For other operations or
representations, base 2 arithmetic may have no relevance at all.

Boolean Logic Operations


In the middle of the 19th century, George Boole introduced a set of logic operations that are today known as
Boolean logic (also as Boolean algebra). These operations today form one of the lowest abstraction levels
in digital systems, and an understanding of their meaning and use is critical to the effective development of
both hardware and software.
You have probably seen these functions many times already in your educationperhaps first in set-theoretic
form as Venn diagrams. However, given the use of common English words with different meanings to name
some of the functions, and the sometimes confusing associations made even by engineering educators, we
want to provide you with a concise set of definitions that generalizes correctly to more than two operands.
You may have learned these functions based on truth values (true and false), but we define them based on
bits, with 1 representing true and 0 representing false.
Table 1 on the next page lists logic operations. The first column in the table lists the name of each function.
The second provides a fairly complete set of the notations that you are likely to encounter for each function,
including both the forms used in engineering and those used in mathematics. The third column defines the
functions value for two or more input operands (except for NOT, which operates on a single value). The last
column shows the form generally used in logic schematics/diagrams and mentions the important features
used in distinguishing each function (in pictorial form usually called a gate, in reference to common physical
implementations) from the others.

c 2012 Steven S. Lumetta. All rights reserved.


12
Function

AND

Notation
A AND B
AB
AB
AB
AB

Explanation
the all function: result is 1 iff
all input operands are equal to 1

OR

A OR B
A+B
AB

the any function: result is 1 iff


any input operand is equal to 1

NOT

NOT A
A
A
A

logical complement/negation:
NOT 0 is 1, and NOT 1 is 0

XOR
exclusive OR

A XOR B
AB

the odd function: result is 1 iff an odd


number of input operands are equal to 1

English
or

A, B, or C

the one of function: result is 1 iff exactly


one of the input operands is equal to 1

Schematic
A
B

AB

flat input, round output


A

A+B

round input, pointed output


A

triangle and circle


A
B

A XOR B

OR with two lines


on input side
(not used)

Table 1: Boolean logic operations, notation, definitions, and symbols.


The first function of importance is AND. Think of AND as the all function: given a set of input values
as operands, AND evaluates to 1 if and only if all of the input values are 1. The first notation line simply
uses the name of the function. In Boolean algebra, AND is typically represented as multiplication, and the
middle three forms reflect various ways in which we write multiplication. The last notational variant is from
mathematics, where the AND function is formally called conjunction.
The next function of importance is OR. Think of OR as the any function: given a set of input values
as operands, OR evaluates to 1 if and only if any of the input values is 1. The actual number of input
values equal to 1 only matters in the sense of whether it is at least one. The notation for OR is organized
in the same way as for AND, with the function name at the top, the algebraic variant that we will use in
classin this case additionin the middle, and the mathematics variant, in this case called disjunction,
at the bottom.
The definition of Boolean OR is not the same as our use of the word or in English. For example, if you
are fortunate enough to enjoy a meal on a plane, you might be offered several choices: Would you like the
chicken, the beef, or the vegetarian lasagna today? Unacceptable answers to this English question include:
Yes, Chicken and lasagna, and any other combination that involves more than a single choice!
You may have noticed that we might have instead mentioned that AND evaluates to 0 if any input value
is 0, and that OR evaluates to 0 if all input values are 0. These relationships reflect a mathematical duality
underlying Boolean logic that has important practical value in terms of making it easier for humans to digest
complex logic expressions. We will talk more about duality later in the course, but you should learn some
of the practical value now: if you are trying to evaluate an AND function, look for an input with value 0;
if you are trying to evaluate an OR function, look for an input with value 1. If you find such an input, you
know the functions value without calculating any other input values.
We next consider the logical complement function, NOT. The NOT function is also called negation.
Unlike our first two functions, NOT accepts only a single operand, and reverses its value, turning 0 into 1
and 1 into 0. The notation follows the same pattern: a version using the function name at the top, followed
by two variants used in Boolean algebra, and finally the version frequently used in mathematics. For the
NOT gate, or inverter, the circle is actually the important part: the triangle by itself merely copies the
input. You will see the small circle added to other gates on both inputs and outputs; in both cases the circle
implies a NOT function.

c 2012 Steven S. Lumetta. All rights reserved.


13

Last among the Boolean logic functions, we have the XOR, or exclusive OR function. Think of XOR as
the odd function: given a set of input values as operands, XOR evaluates to 1 if and only if an odd number
of the input values are 1. Only two variants of XOR notation are given: the first using the function name,
and the second used with Boolean algebra. Mathematics rarely uses this function.
Finally, we have included the meaning of the word or in English as a separate function entry to enable you
to compare that meaning with the Boolean logic functions easily. Note that many people refer to English
use of the word or as exclusive because one
true value excludes all others from being true. Do
inputs
outputs
not let this human language ambiguity confuse you
A
B
C
ABC
A
+
B
+C A ABC
about XOR! For all logic design purposes, XOR is
0
0
0
0
0
1
0
the odd function.
0 0 1
0
1
1
1
The truth table to the right provides values il0 1 0
0
1
1
1
lustrating these functions operating on three in0 1 1
0
1
1
0
puts. The AND, OR, and XOR functions are all
1 0 0
0
1
0
1
associative(A op B) op C = A op (B op C)
1 0 1
0
1
0
0
and commutativeA op B = B op A, as you
0
1
0
0
1 1 0
may have already realized from their definitions.
1
1
0
1
1 1 1

Overflow as Logic Expressions


In the last set of notes, we discussed overflow conditions for unsigned and 2s complement representations.
Lets use Boolean logic to express these conditions.
We begin with addition of two 1-bit unsigned numbers. Call the two input bits A0 and B0 . If you write a
truth table for this operation, youll notice that overflow occurs only when all (two) bits are 1. If either bit
is 0, the sum cant exceed 1, so overflow cannot occur. In other words, overflow in this case can be written
using an AND operation:
A0 B0
The truth table for adding two 2-bit unsigned numbers is four times as large, and seeing the structure may
be difficult. One way of writing the expression for overflow of 2-bit unsigned addition is as follows:
A1 B1 + (A1 + B1 )A0 B0
This expression is slightly trickier to understand. Think about the place value of the bits. If both of the most
significant bitsthose with place value 2are 1, we have an overflow, just as in the case of 1-bit addition.
The A1 B1 term represents this case. We also have an overflow if one or both (the OR) of the most significant
bits are 1 and the sum of the two next significant bitsin this case those with place value 1generates a
carry.
The truth table for adding two 3-bit unsigned numbers is probably not something that you want to write
out. Fortunately, a pattern should start to become clear with the following expression:
A2 B2 + (A2 + B2 )A1 B1 + (A2 + B2 )(A1 + B1 )A0 B0
In the 2-bit case, we mentioned the most significant bit and the next most significant bit to help you see
the pattern. The same reasoning describes the first two product terms in our overflow expression for 3-bit
unsigned addition (but the place values are 4 for the most significant bit and 2 for the next most significant
bit). The last term represents the overflow case in which the two least significant bits generate a carry which
then propagates up through all of the other bits because at least one of the two bits in every position is a 1.

c 2012 Steven S. Lumetta. All rights reserved.


14

The overflow condition for addition of two N -bit 2s complement numbers


AN 1 AN 2 . . .A2 A1 A0
can be written fairly concisely in terms of the first bits of the two numbers
+ BN 1 BN 2 . . .B2 B1 B0
and the first bit of the sum. Recall that overflow in this case depends only on
SN 1 SN 2 . . .S2 S1 S0
whether the three numbers are negative or non-negative, which is given by
the most significant bit. Given the bit names as shown to the right, we can write the overflow condition as
follows:
AN 1 BN 1 SN 1 + AN 1 BN 1 SN 1
The overflow condition does of course depend on all of the bits in the two numbers being added. In the
expression above, we have simplified the form by using SN 1 . But SN 1 depends on the bits AN 1 and BN 1
as well as the carry out of bit (N 2).
Later in this set of notes, we present a technique with which you can derive an expression for an arbitrary
Boolean logic function. As an exercise, after you have finished reading these notes, try using that technique
to derive an overflow expression for addition of two N -bit 2s complement numbers based on AN 1 , BN 1 ,
and the carry out of bit (N 2) (and into bit (N 1)), which we might call CN 1 . You might then calculate CN 1 in terms of the rest of the bits of A and B using the expressions for unsigned overflow just
discussed. In the next month or so, you will learn how to derive more compact expressions yourself from
truth tables or other representations of Boolean logic functions.

Logical Completeness
Why do we feel that such a short list of functions is enough? If you think about the number of possible
functions on N bits, you might think that we need many more functions to be able to manipulate bits.
With 10 bits, for example, there are 21024 such functions. Obviously, some of them have never been used in
any computer system, but maybe we should define at least a few more logic operations? In fact, we do not
even need XOR. The functions AND, OR, and NOT are sufficient, even if we only allow two input operands
for AND and OR!
The theorem below captures this idea, called logical completeness. In this case, we claim that the set of
functions {AND, OR, NOT} is sufficient to express any operation on any finite number of variables, where
each variable is a bit.
Theorem: Given enough 2-input AND, 2-input OR, and 1-input NOT functions, one can express any
Boolean logic function on any finite number of variables.
The proof of our theorem is by construction. In other words, we show a systematic approach for transforming an arbitrary Boolean logic function on an arbitrary number of variables into a form that uses only
AND, OR, and NOT functions on one or two operands. As a first step, we remove the restriction on the
number of inputs for the AND and OR functions. For this purpose, we state and prove two lemmas, which
are simpler theorems used to support the proof of a main theorem.
Lemma 1: Given enough 2-input AND functions, one can express an AND function on any finite number
of variables.
Proof: We prove the Lemma by induction.1 Denote the number of inputs to a particular AND function
by N .
The base case is N = 2. Such an AND function is given.
To complete the proof, we need only show that, given any
number of AND functions with up to N inputs, we can express an AND function with N + 1 inputs. To do so, we need
merely use one 2-input AND function to join together the
result of an N -input AND function with an additional input,
as illustrated to the right.
1 We

assume that you have seen proof by induction previously.

input 1

...

input N

input N+1

AND of N+1 inputs

c 2012 Steven S. Lumetta. All rights reserved.


15

Lemma 2: Given enough 2-input OR functions, one can express an OR function on any finite number of
variables.
Proof: The proof of Lemma 2 is identical in structure to that of Lemma 1, but uses OR functions instead
of AND functions.
Lets now consider a small subset of functions on N variables. For any such function, you can write out the
truth table for the function. The output of a logic function is just a bit, either a 0 or a 1. Lets consider the
set of functions on N variables that produce a 1 for exactly one combination of the N variables. In other
words, if you were to write out the truth table for such a function, exactly one row in the truth table would
have output value 1, while all other rows had output value 0.
Lemma 3: Given enough AND functions and 1-input NOT functions, one can express any Boolean logic
function that produces a 1 for exactly one combination of any finite number of variables.
Proof: The proof of Lemma 3 is by construction. Let N be the number of variables on which the function
operates. We construct a minterm on these N variables, which is an AND operation on each variable or its
complement. The minterm is specified by looking at the unique combination of variable values that produces
a 1 result for the function. Each variable that must be a 1 is included as itself, while each variable that must
be a 0 is included as the variables complement (using a NOT function). The resulting minterm produces the
desired function exactly. When the variables all match the values for which the function should produce 1,
the inputs to the AND function are all 1, and the function produces 1. When any variable does not match
the value for which the function should produce 1, that variable (or its complement) acts as a 0 input to the
AND function, and the function produces a 0, as desired.
The table below shows all eight minterms for three variables.
A
0
0
0
0
1
1
1
1

inputs
B C
0 0
0 1
1 0
1 1
0 0
0 1
1 0
1 1

ABC
1
0
0
0
0
0
0
0

ABC
0
1
0
0
0
0
0
0

ABC
0
0
1
0
0
0
0
0

outputs
ABC ABC
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0

ABC
0
0
0
0
0
1
0
0

ABC
0
0
0
0
0
0
1
0

ABC
0
0
0
0
0
0
0
1

We are now ready to prove our theorem.


Proof (of Theorem): Any given function on N variables produces the value 1 for some set of combinations
of inputs. Lets say that M such combinations produce 1. Note that M 2N . For each combination that
produces 1, we can use Lemma 1 to construct an N -input AND function. Then, using Lemma 3, we can
use as many as M NOT functions and the N -input AND function to construct a minterm for that input
combination. Finally, using Lemma 2, we can construct an M -input OR function and OR together all of
the minterms. The result of the OR is the desired function. If the function should produce a 1 for some
combination of inputs, that combinations minterm provides a 1 input to the OR, which in turn produces a 1.
If a combination should produce a 0, its minterm does not appear in the OR; all other minterms produce 0
for that combination, and thus all inputs to the OR are 0 in such cases, and the OR produces 0, as desired.
The construction that we used to prove logical completeness does not necessarily help with efficient design
of logic functions. Think about some of the expressions that we discussed earlier in these notes for overflow
conditions. How many minterms do you need for N -bit unsigned overflow? A single Boolean logic function
can be expressed in many different ways, and learning how to develop an efficient implementation of a
function as well as how to determine whether two logic expressions are identical without actually writing
out truth tables are important engineering skills that you will start to learn in the coming months.

c 2012 Steven S. Lumetta. All rights reserved.


16

Implications of Logical Completeness


If logical completeness doesnt really help us to engineer logic functions, why is the idea important? Think
back to the layers of abstraction and the implementation of bits from the first couple of lectures. Voltages
are real numbers, not bits. The device layer implementations of Boolean logic functions must abstract away
the analog properties of the physical system. Without such abstraction, we must think carefully about analog
issues such as noise every time we make use of a bit! Logical completeness assures us that no matter what
we want to do with bits, implementating a handful of operations correctly is enough to guarantee that we
never have to worry.
A second important value of logical completeness is as a tool in screening potential new technologies for
computers. If a new technology does not allow implementation of a logically complete set of functions, the
new technology is extremely unlikely to be successful in replacing the current one.
That said, {AND, OR, and NOT} is not the only logically complete set of functions. In fact, our current
complementary metal-oxide semiconductor (CMOS) technology, on which most of the computer industry is
now built, does not directly implement these functions, as you will see later in our class.
The functions that are implemented directly in CMOS are NAND
and NOR, which are abbreviations for AND followed by NOT and
OR followed by NOT, respectively. Truth tables for the two are
shown to the right.

inputs
A
0
0
1
1

B
0
1
0
1

outputs
AB
A+B
A NAND B A NOR B
1
1
1
0
1
0
0
0

Either of these functions by itself forms a logically complete set.


That is, both the set {NAND} and the set {NOR} are logically
complete. For now, we leave the proof of this claim to you. Remember that all you need to show is that you can implement any set known to be logically complete, so in
order to prove that {NAND} is logically complete (for example), you need only show that you can implement
AND, OR, and NOT using only NAND.

Examples and a Generalization


Lets use our construction to solve a few examples. We begin with the functions that we illustrated with the
first truth table from this set of notes, the carry out C and sum S of two 2-bit unsigned numbers. Since each
output bit requires a separate expression, we now write S1 S0 for the two bits of the sum. We also need to be
able to make use of the individual bits of the input values, so we write these as A1 A0 and B1 B0 , as shown
on the left below. Using our construction from the logical completeness theorem, we obtain the equations
on the right. You should verify these expressions yourself.
A1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1

inputs
A0 B1
0
0
0
0
0
1
0
1
1
0
1
0
1
1
1
1
0
0
0
0
0
1
0
1
1
0
1
0
1
1
1
1

B0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1

C
0
0
0
0
0
0
0
1
0
0
1
1
0
1
1
1

outputs
S1 S0
0
0
0
1
1
0
1
1
0
1
1
0
1
1
0
0
1
0
1
1
0
0
0
1
1
1
0
0
0
1
1
0

= A1 A0 B1 B0 + A1 A0 B1 B0 + A1 A0 B1 B0 +
A1 A0 B1 B0 + A1 A0 B1 B0 + A1 A0 B1 B0

S1

= A1 A0 B1 B0 + A1 A0 B1 B0 + A1 A0 B1 B0 +
A1 A0 B1 B0 + A1 A0 B1 B0 + A1 A0 B1 B0 +
A1 A0 B1 B0 + A1 A0 B1 B0

S0

= A1 A0 B1 B0 + A1 A0 B1 B0 + A1 A0 B1 B0 +
A1 A0 B1 B0 + A1 A0 B1 B0 + A1 A0 B1 B0 +
A1 A0 B1 B0 + A1 A0 B1 B0

c 2012 Steven S. Lumetta. All rights reserved.


17

Now lets consider a new function. Given an 8-bit 2s complement number, A = A7 A6 A5 A4 A3 A2 A1 A0 , we


want to compare it with the value -1. We know that we can construct this function using AND, OR, and
NOT, but how? We start by writing the representation for -1, which is 11111111. If the number A matches
that representation, we want to produce a 1. If the number A differs in any bit, we want to produce a 0.
The desired function has exactly one combination of inputs that produces a 1, so in fact we need only one
minterm! In this case, we can compare with -1 by calculating the expression:
A7 A6 A5 A4 A3 A2 A1 A0
Here we have explicitly included multiplication symbols to avoid confusion with our notation for groups of
bits, as we used when naming the individual bits of A.
In closing, we briefly introduce a generalization of logic operations to groups of bits.
Our representations for integers, real numbers, and characters from human languages
all use more than one bit to represent a given value. When we use computers, we
often make use of multiple bits in groups in this way. A byte, for example, today
means an ordered group of eight bits. We can extend our logic functions to operate
on such groups by pairing bits from each of two groups and performing the logic
operation on each pair. For example, given A = A7 A6 A5 A4 A3 A2 A1 A0 = 01010101
and B = B7 B6 B5 B4 B3 B2 B1 B0 = 11110000, we calculate A AND B by computing
the AND of each pair of bits, A7 AND B7 , A6 AND B6 , and so forth, to produce
the result 01010000, as shown to the right. In the same way, we can extend other
logic operations, such as OR, NOT, and XOR, to operate on bits of groups.

A 01010101
AND B 11110000
01010000

c 2004-2014 Steven S. Lumetta. All rights reserved.


18

ECE120: Introduction to Computer Engineering


Notes Set 1.5
Programming Concepts and the C Language
This set of notes introduces the C programming language and explains some basic concepts in computer
programming. Our purpose in showing you a high-level language at this early stage of the course is to give
you time to become familiar with the syntax and meaning of the language, not to teach you how to program.
Throughout this semester, we will use software written in C to demonstrate and validate the digital system
design material in our course. Towards the end of the semester, you will learn to program computers using
instructions and assembly language. In ECE 220, you will make use of the C language to write programs, at
which point already being familiar with the language will make the material easier to master. These notes
are meant to complement the introduction provided by Patt and Patel.
After a brief introduction to the history of C and the structure of a program written in C, we connect the
idea of representations developed in class to the data types used in high-level languages. We next discuss the
use of variables in C, then describe some of the operators available to the programmer, including arithmetic
and logic operators. The notes next introduce C functions that support the ability to read user input from
the keyboard and to print results to the monitor. A description of the structure of statements in C follows,
explaining how programs are executed and how a programmer can create statements for conditional execution
as well as loops to perform repetitive tasks. The main portion of the notes concludes with an example
program, which is used to illustrate both the execution of C statements as well as the difference between
variables in programs and variables in algebra.
The remainder of the notes covers more advanced topics. First, we describe how the compilation process
works, illustrating how a program written in a high-level language is transformed into instructions. You will
learn this process in much more detail in ECE 220. Second, we briefly introduce the C preprocessor. Finally,
we discuss implicit and explicit data type conversion in C. Sections marked with an asterisk are provided
solely for your interest, but you probably need to learn this material in later classes.

The C Programming Language


Programming languages attempt to bridge the semantic gap between human descriptions of problems and the
relatively simple instructions that can be provided by an instruction set architecture (ISA). Since 1954, when
the Fortran language first enabled scientists to enter FORmulae symbolically and to have them TRANslated
automatically into instructions, people have invented thousands of computer languages.
The C programming language was developed by Dennis Ritchie at Bell Labs in order to simplify the task
of writing the Unix operating system. The C language
provides a fairly transparent mapping to typical ISAs,
which makes it a good choice both for system software
such as operating systems and for our class. The syntax used in Cthat is, the rules that one must follow
to write valid C programshas also heavily influenced
many other more recent languages, such as C++, Java,
and Perl.

int
main ()
{
int answer = 42;

/* the Answer! */

printf ("The answer is %d.\n", answer);


/* Our work here is done.
Lets get out of here!
return 0;

*/

}
For our purposes, a C program consists of a set of variable declarations and a sequence of statements.
Both of these parts are written into a single C function called main, which executes when the program starts.
A simple example appears to the right. The program uses one variable called answer, which it initializes
to the value 42. The program prints a line of output to the monitor for the user, then terminates using
the return statement. Comments for human readers begin with the characters /* (a slash followed by an
asterisk) and end with the characters */ (an asterisk followed by a slash). The C language ignores white

c 2004-2014 Steven S. Lumetta. All rights reserved.


19

space in programs, so we encourage you to use blank lines and extra spacing to make your programs easier
to read.
The variables defined in the main function allow a programmer to associate arbitrary symbolic names
(sequences of English characters, such as sum or product or highScore) with specific types of data,
such as a 16-bit unsigned integer or a double-precision floating-point number. In the example program above,
the variable answer is declared to be a 32-bit 2s complement number.
Those with no programming experience may at first find the difference between variables in algebra and
variables in programs slightly confusing. As a program executes, the values of variables can change from step
to step of execution.
The statements in the main function are executed one by one until the program terminates. Programs are
not limited to simple sequences of statements, however. Some types of statements allow a programmer to
specify conditional behavior. For example, a program might only print out secret information if the users
name is lUmeTTa. Other types of statements allow a programmer to repeat the execution of a group of
statements until a condition is met. For example, a program might print the numbers from 1 to 10, or ask
for input until the user types a number between 1 and 10. The order of statement execution is well-defined
in C, but the statements in main do not necessarily make up an algorithm: we can easily write a C program
that never terminates.
If a program terminates, the main function returns an integer to the operating system, usually by executing
a return statement, as in the example program. By convention, returning the value 0 indicates successful
completion of the program, while any non-zero value indicates a program-specific error. However, main is
not necessarily a function in the mathematical sense because the value returned from main is not necessarily
unique for a given set of input values to the program. For example, we can write a program that selects a
number from 1 to 10 at random and returns the number to the operating system.

Data Types
As you know, modern digital computers represent all information with binary digits (0s and 1s), or bits.
Whether you are representing something as simple as an integer or as complex as an undergraduate thesis,
the data are simply a bunch of 0s and 1s inside a computer. For any given type of information, a human
selects a data type for the information. A data type (often called just a type) consists of both a size
in bits and a representation, such as the 2s complement representation for signed integers, or the ASCII
representation for English text. A representation is a way of encoding the things being represented as a
set of bits, with each bit pattern corresponding to a unique object or thing.
A typical ISA supports a handful of data types in hardware in the sense that it provides hardware support for
operations on those data types. The arithmetic logic units (ALUs) in most modern processors, for example,
support addition and subtraction of both unsigned and 2s complement representations, with the specific
data type (such as 16- or 64-bit 2s complement) depending on the ISA. Data types and operations not
supported by the ISA must be handled in software using a small set of primitive operations, which form the
instructions available in the ISA. Instructions usually include data movement instructions such as loads
and stores and control instructions such as branches and subroutine calls in addition to arithmetic and logic
operations. The last quarter of our class covers these concepts in more detail and explores their meaning
using an example ISA from the textbook.
In class, we emphasized the idea that digital systems such as computers do not interpret the meaning of
bits. Rather, they do exactly what they have been designed to do, even if that design is meaningless. If,
for example, you store a sequence of ASCII characters in a computers memory as and then write computer
instructions to add consecutive groups of four characters as 2s complement integers and to print the result
to the screen, the computer will not complain about the fact that your code produces meaningless garbage.
In contrast, high-level languages typically require that a programmer associate a data type with each datum
in order to reduce the chance that the bits making up an individual datum are misused or misinterpreted
accidentally. Attempts to interpret a set of bits differently usually generate at least a warning message, since

20

c 2004-2014 Steven S. Lumetta. All rights reserved.


such re-interpretations of the bits are rarely intentional and thus rarely correct. A compilera program
that transforms code written in a high-level language into instructionscan also generate the proper type
conversion instructions automatically when the transformations are intentional, as is often the case with
arithmetic.
Some high-level languages, such as Java, prevent programmers from changing the type of a given datum. If
you define a type that represents one of your favorite twenty colors, for example, you are not allowed to turn
a color into an integer, despite the fact that the color is represented as a handful of bits. Such languages are
said to be strongly typed.
The C language is not strongly typed, and programmers are free to interpret any bits in any manner they
see fit. Taking advantage of this ability in any but a few exceptional cases, however, results in arcane and
non-portable code, and is thus considered to be bad programming practice. We discuss conversion between
types in more detail later in these notes.
Each high-level language defines a number of primitive data types, which are always available. Most
languages, including C, also provide ways of defining new types in terms of primitive types, but we leave
that part of C for ECE 220. The primitive data types in C include signed and unsigned integers of various
sizes as well as single- and double-precision IEEE floating-point numbers.
The primitive integer types in C include both
unsigned and 2s complement representations.
2s complement
unsigned
These types were originally defined so as
8 bits char
unsigned char
to give reasonable performance when code
16 bits short
unsigned short
was ported. In particular, the int type is
short int
unsigned short int
intended to be the native integer type for
32
bits
int
unsigned
the target ISA. Using data types supported
unsigned int
directly in hardware is faster than using
32 or long
unsigned long
larger or smaller integer types. When C
64
bits
long
int
unsigned long int
was standardized in 1989, these types were
64
bits
long
long
unsigned long long
defined so as to include a range of existing
long
long
int
unsigned long long int
C compilers rather than requiring all compilers to produce uniform results. At the
time, most workstations and mainframes were 32-bit machines, while most personal computers were 16-bit
machines, thus flexibility was somewhat desirable. For the GCC compiler on Linux, the C integer data types
are defined in the table above. Although the int and long types are usually the same, there is a semantic
difference in common usage. In particular, on most architectures and most compilers, a long has enough
bits to identify a location in the computers memory, while an int may not. When in doubt, the size in
bytes of any type or variable can be found using the built-in C function sizeof.
Over time, the flexibility of size in C types has become
less important (except for the embedded markets, where
2s complement unsigned
one often wants even more accurate bit-width control), and
8 bits int8 t
uint8 t
the fact that the size of an int can vary from machine to
16 bits int16 t
uint16 t
machine and compiler to compiler has become more a source
32 bits int32 t
uint32 t
of headaches than a helpful feature. In the late 1990s, a
64 bits int64 t
uint64 t
new set of fixed-size types were recommended for inclusion
in the C library, reflecting the fact that many companies had already developed and were using such definitions to make their programs platform-independent. We encourage you to make use of these types, which
are shown in the table above. In Linux, they can be made available by including the stdint.h header file.
Floating-point types in C include float and double, which correspond respectively to single- and doubleprecision IEEE floating-point values. Although the 32-bit float type can save memory compared with use
of 64-bit double values, Cs math library works with double-precision values, and single-precision data are
uncommon in scientific and engineering codes. In contrast, single-precision floating-point operations dominated the graphics industry until recently, and are still well-supported even on todays graphics processing
units.

c 2004-2014 Steven S. Lumetta. All rights reserved.


21

Variable Declarations
The function main executed by a program begins with a list of variable declarations. Each declaration
consists of two parts: a data type specification and a comma-separated list of variable names. Each variable
declared can also be initialized by assigning an initial value. A few examples appear below. Notice that
one can initialize a variable to have the same value as a second variable.
int
int
int
double

x = 42;
y = x;
z;
a, b, c, pi = 3.1416;

/* a 2s complement variable, initially equal to 42


*/
/* a second 2s complement variable, initially equal to x
*/
/* a third 2s complement variable with unknown initial value */
/*
* four double-precision IEEE floating-point variables
* a, b, and c are initially of unknown value, while pi is
* initially 3.1416
*/

What happens if a programmer declares a variable but does not initialize it? Remember that bits can only
be 0 or 1. An uninitialized variable does have a value, but its value is unpredictable. The compiler tries
to detect uses of uninitialized variables, but sometimes it fails to do so, so until you are more familiar with
programming, you should always initialize every variable.
Variable names, also called identifiers, can include both letters and digits in C. Good programming style
requires that programmers select variable names that are meaningful and are easy to distinguish from one
another. Single letters are acceptable in some situations, but longer names with meaning are likely to help
people (including you!) understand your program. Variable names are also case-sensitive in C, which allows programmers to use capitalization to differentiate behavior and meaning, if desired. Some programs,
for example, use identifiers with all capital letters to indicate variables with values that remain constant
for the programs entire execution. However, the fact that identifiers are case-sensitive also means that a
programmer can declare distinct variables named variable, Variable, vaRIable, vaRIabLe, and VARIABLE.
We strongly discourage you from doing so.

Expressions and Operators


The main function also contains a sequence
of statements. A statement is a complete
specification of a single step in the programs execution. We explain the structure of statements in the next section.
Many statements in C include one or more
expressions, which represent calculations
such as arithmetic, comparisons, and logic
operations. Each expression is in turn composed of operators and operands. Here
we give only a brief introduction to some of
the operators available in the C language.
We deliberately omit operators with more
complicated meanings, as well as operators
for which the original purpose was to make
writing common operations a little shorter.
For the interested reader, both the textbook
and ECE 220 give more detailed introductions. The table to the right gives examples
for the operators described here.

int i = 42, j = 1000;


/* i = 0x0000002A, j = 0x000003E8 */
i + j
i - 4 * j
-j
j / i
j % i
i & j
i | j
i j
i
(i) >> 2
((i) >> 4)
j >> 4
j << 3
i > j
i <= j
i == j
j = i

1042
-3958
-1000
23
42
40
1002
962
-43
-11
2
62
8000
0
1
0
42

/*
/*
/*
/*
/*
/*
/*
/*

0x00000028
0x000003EA
0x000003C2
0xFFFFFFD5
0xFFFFFFF5
0x00000002
0x0000003E
0x00001F40

*/
*/
*/
*/
*/
*/
*/
*/

/* ...and j is changed!

*/

c 2004-2014 Steven S. Lumetta. All rights reserved.


22

Arithmetic operators in C include addition (+), subtraction (-), negation (a minus sign not preceded by
another expression), multiplication (*), division (/), and modulus (%). No exponentiation operator exists;
instead, library routines are defined for this purpose as well as for a range of more complex mathematical
functions.
C also supports bitwise operations on integer types, including AND (&), OR (|), XOR (^), NOT (), and
left (<<) and right (>>) bit shifts. Right shifting a signed integer results in an arithmetic right shift
(the sign bit is copied), while right shifting an unsigned integer results in a logical right shift (0 bits are
inserted).
A range of relational or comparison operators are available, including equality (==), inequality (!=), and
relative order (<, <=, >=, and >). All such operations evaluate to 1 to indicate a true relation and 0 to
indicate a false relation. Any non-zero value is considered to be true for the purposes of tests (for example,
in an if statement or a while loop) in Cthese statements are explained later in these notes.
Assignment of a new value to a variable uses a single equal sign (=) in C. For example, the expression
A = B copies the value of variable B into variable A, overwriting the bits representing the previous value of A.
The use of two equal signs for an equality check and a single equal sign for assignment is a common source
of errors, although modern compilers generally detect and warn about this type of mistake. Assignment
in C does not solve equations, even simple equations. Writing A-4=B, for example, generates a compiler
error. You must solve such equations yourself to calculate the desired new value of a single variable, such
as A=B+4. For the purposes of our class, you must always write a single variable on the left side of an
assignment, and can write an arbitrary expression on the right side.
Many operators can be combined into a single expression. When an expression has more than one operator,
which operator is executed first? The answer depends on the operators precedence, a well-defined order on
operators that specifies how to resolve the ambiguity. In the case of arithmetic, the C languages precedence
specification matches the one that you learned in elementary school. For example, 1+2*3 evaluates to 7, not
to 9, because multiplication has precedence over addition. For non-arithmetic operators, or for any case in
which you do not know the precedence specification for a language, do not look it upother programmers
will not remember the precedence ordering, either! Instead, add parentheses to make your expressions clear
and easy to understand.

Basic I/O
The main function returns an integer to the operating system. Although we do not discuss how additional
functions can be written in our class, we may sometimes make use of functions that have been written in
advance by making calls to those functions. A function call is type of expression in C, but we leave further
description for ECE 220. In our class, we make use of only two additional functions to enable our programs
to receive input from a user via the keyboard and to write output to the monitor for a user to read.
Lets start with output. The printf function allows a program to print output to the monitor using a
programmer-specific format. The f in printf stands for formatted.1 When we want to use printf, we
write a expression with the word printf followed by a parenthesized, comma-separated list of expressions.
The expressions in this list are called the arguments to the printf function.
The first argument to the printf function is a format stringa sequence of ASCII characters between
quotation markswhich tells the function what kind of information we want printed to the monitor as well
as how to format that information. The remaining arguments are C expressions that give printf a copy of
any values that we want printed.
How does the format string specify the format? Most of the characters in the format string are simply
printed to the monitor. In the first example shown to on the next page, we use printf to print a hello
message followed by an ASCII newline character to move to the next line on the monitor.

1 The

original, unformatted variant of printing was never available in the C language. Go learn Fortran.

c 2004-2014 Steven S. Lumetta. All rights reserved.



The percent sign%is used
as an escape character in the
printf function. When % appears in the format string, the
function examines the next character in the format string to determine which format to use, then
takes the next expression from the
sequence of arguments and prints
the value of that expression to the
monitor. Evaluating an expression
generates a bunch of bits, so it is
up to the programmer to ensure
that those bits are not misinterpreted. In other words, the programmer must make sure that the
number and types of formatted values match the number and types of
arguments passed to printf (not
counting the format string itself).
The printf function returns the
number of characters printed to the
monitor.

23

printf ("Hello, world!\n");

output: Hello, world! [and a newline]


printf ("To %x or not to %d...\n", 190, 380 / 2);

output: To be or not to 190... [and a newline]


printf ("My favorite number is %c%c.\n", 0x34, 0+2);

output: My favorite number is 42. [and a newline]


printf ("What is pi?

%f or %e?\n", 3.1416, 3.1416);

output: What is pi? 3.141600 or 3.141600e+00? [and a newline]


escape
sequence
%c
%d
%e
%f
%u
%x
%X
%%

A program can read input from the user with


the scanf function. The user enters characters in ASCII using the keyboard, and the
scanf function converts the users input into
C primitive types, storing the results into variables. As with printf, the scanf function
takes a format string followed by a commaseparated list of arguments. Each argument
after the format string provides scanf with
the memory address of a variable into which
the function can store a result.
How does scanf use the format string? For
scanf, the format string is usually just a sequence of conversions, one for each variable to
be typed in by the user. As with printf, the
conversions start with % and are followed
by characters specifying the type of conversion to be performed. The first example shown
to the right reads two integers. The conversions in the format string can be separated by
spaces for readability, as shown in the example. The spaces are ignored by scanf. However, any non-space characters in the format
string must be typed exactly by the user!
The remaining arguments to scanf specify
memory addresses where the function can
store the converted values. The ampersand
(&) in front of each variable name in the
examples is an operator that returns the address of a variable in memory. For each con-

printf functions interpretation of expression bits

2s complement integer printed as an ASCII character


2s complement integer printed as decimal
double printed in decimal scientific notation
double printed in decimal
unsigned integer printed as decimal
integer printed as hexadecimal (lower case)
integer printed as hexadecimal (upper case)
a single percent sign
int
char
unsigned
double
float

a, b; /* example variables */
c;
u;
d;
f;

scanf ("%d%d", &a, &b);


scanf ("%d %d", &a, &b);

/* These have the */


/* same effect.
*/

effect: try to convert two integers typed in decimal to


2s complement and store the results in a and b
scanf ("%c%x %lf", &c, &u, &d);

effect: try to read an ASCII character into c, a value


typed in hexadecimal into u, and a doubleprecision floating-point number into d
scanf ("%lf %f", &d, &f);

effect: try to read two real numbers typed as decimal,


convert the first to double-precision and store it
in d, and convert the second to single-precision
and store it in f
escape
sequence
%c
%d
%f
%lf
%u
%x
%X

scanf functions conversion to bits


store one ASCII character (as char)

convert decimal integer to 2s complement


convert decimal real number to float
convert decimal real number to double
convert decimal integer to unsigned int
convert hexadecimal integer to unsigned int
(as above)

c 2004-2014 Steven S. Lumetta. All rights reserved.


24

version in the format string, the scanf function tries to convert input from the user into the appropriate
result, then stores the result in memory at the address given by the next argument. The programmer
is responsible for ensuring that the number of conversions in the format string matches the number of
arguments provided (not counting the format string itself). The programmer must also ensure that the type
of information produced by each conversion can be stored at the address passed for that conversionin other
words, the address of a variable with the correct type must be provided. Modern compilers often detect
missing & operators and incorrect variable types, but many only give warnings to the programmer. The
scanf function itself cannot tell whether the arguments given to it are valid or not.
If a conversion failsfor example, if a user types hello when scanf expects an integerscanf does not
overwrite the corresponding variable and immediately stops trying to convert input. The scanf function
returns the number of successful conversions, allowing a programmer to check for bad input from the user.

Types of Statements in C
Each statement in a C program specifies a complete operation. There are three types of statements, but
two of these types can be constructed from additional statements, which can in turn be constructed from
additional statements. The C language specifies no bound on this type of recursive construction, but code
readability does impose a practical limit.
The three types are shown to the right. They
are the null statement, simple statements, and compound statements. A
null statement is just a semicolon, and a
compound statement is just a sequence of
statements surrounded by braces.

/* a null statement (does nothing) */

A = B;
/* examples of simple statements
printf ("Hello, world!\n");

*/

*/
*/
*/

C = D;
N = 4;
L = D - N;

/* a compound statement
/* (a sequence of statements
/* between braces)

Simple statements can take several forms.


All of the examples shown to the right, in}
cluding the call to printf, are simple statements consisting of a C expression followed by a semicolon. Simple statements can also
consist of conditionals or iterations, which we introduce next.

sequential

Remember that after variable declarations, the main function contains a sequence of statements. These statements are executed one at a time in the order given in the program,
as shown to the right for two statements. We say that the statements are executed in
sequential order.

first
subtask

A program must also be able to execute statements only when some condition holds. In
the C language, such a condition can be an arbitrary expression. The expression is first
evaluated. If the result is 0, the condition is considered to be false. Any result other
than 0 is considered to be true. The C statement for conditional execution is called an if
statement. Syntactically, we put the expression for the condition in
parentheses after the keyword if and follow the parenthesized expresconditional
sion with a compound statement containing the statements that should
if
be executed when the condition is true. Optionally, we can append the
keyword else and a second compound statement containing statements
Y
does some
condition hold?
to be executed when the condition evaluates to false. The corresponding flow chart is shown to the right.
then

/* Set the variable y to the absolute value of variable x. */


if (0 <= x) {
/* Is x greater or equal to 0? */
y = x;
/* Then block: assign x to y. */
} else {
y = -x;
/* Else block: assign negative x to y. */
}

subtask when
condition holds

second
subtask

else
subtask when
condition does
not hold

c 2004-2014 Steven S. Lumetta. All rights reserved.


25

If instead we chose to assign the absolute value of variable x to itself, we can do so without an else block:
/* Set the variable x to its absolute value. */
if (0 > x) {
/* Is x less than 0? */
x = -x;
/* Then block: assign negative x to x. */
}
/* No else block is given--no work is needed. */

Finally, we sometimes need to repeatedly execute a set of statements, either a fixed


number of times or so long as some condition holds. We refer to such repetition
as an iteration or a loop. In our class, we make use of Cs for loop when we
need to perform such a task. A for loop is structured as follows:
for ([initialization] ; [condition] ; [update]) {
[subtask to be repeated]
}

A flow chart corresponding to execution of a for loop appears to the right.


First, any initialization is performed. Then the conditionagain an arbitrary C
expressionis checked. If the condition evaluates to false (exactly 0), the loop
is done. Otherwise, if the condition evaluates to true (any non-zero value), the
statements in the compound statement, the subtask or loop body, are executed.
The loop body can contain anything: a sequence of simple statements, a conditional, another loop, or even just an empty list. Once the loop body has finished
executing, the for loop update rule is executed. Execution then checks the condition again, and this process repeats until the condition evaluates to 0. The for
loop below, for example, prints the numbers from 1 to 42.
/* Print the numbers from 1 to 42.
for (i = 1; 42 >= i; i = i + 1) {
printf ("%d\n", i);
}

iterative
for
initialize for
first iteration

condition holds
for iterating?

Y
subtask for
one iteration

update for
next iteration

*/

Program Execution
We are now ready to consider
the execution of a simple program, illustrating how variables change value from step
to step and determine program behavior.

int
main ()
{
int check;
int friend;

Lets say that two numbers


are friends if they have
at least one 1 bit in common when written in base 2.
So, for example, 1002 and
1112 are friends because both
numbers have a 1 in the bit
with place value 22 = 4. Similarly, 1012 and 0102 are not
friends, since no bit position
is 1 in both numbers.
The program to the right
prints all friendships between
numbers in the interval [0, 7].

/* number to check for friends


*/
/* a second number to consider as checks friend */

/* Consider values of check from 0 to 7. */


for (check = 0; 8 > check; check = check + 1) {
/* Consider values of friend from 0 to 7. */
for (friend = 0; 8 > friend; friend = friend + 1) {
/* Use bitwise AND to see if the two share a 1 bit.
if (0 != (check & friend)) {

*/

/* We have friendship! */
printf ("%d and %d are friends.\n", check, friend);
}
}
}
}

26

c 2004-2014 Steven S. Lumetta. All rights reserved.


The program uses two integer variables, one for each of the numbers that we consider. We use a for loop
to iterate over all values of our first number, which we call check. The loop initializes check to 0, continues
until check reaches 8, and adds 1 to check after each loop iteration. We use a similar for loop to iterate
over all possible values of our second number, which we call friend. For each pair of numbers, we determine
whether they are friends using a bitwise AND operation. If the result is non-zero, they are friends, and we
print a message. If the two numbers are not friends, we do nothing, and the program moves on to consider
the next pair of numbers.
Now lets think about what
after executing...
check is...
and friend is...
happens when this program
(variable declarations) unpredictable bits unpredictable bits
executes. When the procheck = 0
0
unpredictable bits
gram starts, both variables
8 > check
0
unpredictable bits
are filled with random bits,
friend = 0
0
0
so their values are unpre8 > friend
0
0
dictable. The first step is
if (0 != (check & friend))
0
0
the initialization of the first
friend = friend + 1
0
1
for loop, which sets check
8 > friend
0
1
to 0. The condition for that
if (0 != (check & friend))
0
1
loop is 8 > check, which
friend = friend + 1
0
2
is true, so execution enters
(repeat last three lines six more times; number 0 has no friends!)
the loop body and starts to
8 > friend
0
8
execute the first statement,
check = check + 1
1
8
which is our second for
8 > check
1
8
loop. The next step is then
friend = 0
1
0
the initialization code for
8 > friend
1
0
the second for loop, which
if (0 != (check & friend))
1
0
sets friend to 0. The confriend = friend + 1
1
1
dition for the second loop is
8 > friend
1
1
8 > friend, which is true,
if (0 != (check & friend))
1
1
so execution enters the loop
printf ...
1
1
body and starts to execute
(our first friend!?)
the first statement, which
is the if statement. Since both variables are 0, the if condition is false, and nothing is printed. Having
finished the loop body for the inner loop (on friend), execution continues with the update rule for that
loopfriend = friend + 1then returns to check the loops condition again. This process repeats, always
finding that the number 0 (in check) is not friends (0 has no friends!) until friend reaches 8, at which
point the inner loop condition becomes false. Execution then moves to the update rule for the first for loop,
which increments check. Check is then compared with 8 to see if the loop is done. Since it is not, we once
again enter the loop body and start the second for loop over. The initialization code again sets friend to 0,
and we move forward as before. As you see above, the first time that we find our if condition to be true is
when both check and friend are equal to 1.
Is that result what you expected? To learn that the number 1 is friends with itself? If so, the program
works. If you assumed that numbers could not be friends with themselves, perhaps we should fix the bug?
We could, for example, add another if statement to avoid printing anything when check == friend.
Our program, you might also realize, prints each pair of friends twice. The numbers 1 and 3, for example,
are printed in both possible orders. To eliminate this redundancy, we can change the initialization in the
second for loop, either to friend = check or to friend = check + 1, depending on how we want to define
friendship (the same question as before: can a number be friends with itself?).

c 2004-2014 Steven S. Lumetta. All rights reserved.


27

Compilation and Interpretation*


Many programming languages, including C, can be
compiled, which means that the program is converted
into instructions for a particular ISA before the program is run on a processor that supports that ISA. The
figure to the right illustrates the compilation process
for the C language. In this type of figure, files and
other data are represented as cylinders, while rectangles represent processes, which are usually implemented in software. In the figure to the right, the
outer dotted box represents the full compilation process that typically occurs when one compiles a C program. The inner dotted box represents the work performed by the compiler software itself. The cylinders
for data passed between the processes that compose
the full compilation process have been left out of the
figure; instead, we have written the type of data being passed next to the arrows that indicate the flow of
information from one process to the next.

C
source
code

C
header
files

C preprocessor
preprocessed
source code

compiler
(strict sense)

source code analysis


languagedependent
front end

intermediate
representation (IR)

ISAdependent
back end

target code synthesis

The C preprocessor (described later in these notes)


assembly
forms the first step in the compilation process. The
code
preprocessor operates on the programs source code
along with header files that describe data types and
assembler
operations. The preprocessor merges these together
object code
into a single file of preprocessed source code. The pre(instructions)
other
processed source code is then analyzed by the front
object
files
end of the compiler based on the specific programming
language being used (in our case, the C language),
linker
compilation
then converted by the back end of the compiler into
process
instructions for the desired ISA. The output of a comlibraries
piler is not binary instructions, however, but is instead
a human-readable form of instructions called assemexecutable
bly code, which we cover in the last quarter of our
image
class. A tool called an assembler then converts these
human-readable instructions into bits that a processor can understand. If a program consists of multiple source files, or needs to make use of additional pre-programmed operations (such as math functions,
graphics, or sound), a tool called a linker merges the object code of the program with those additional
elements to form the final executable image for the program. The executable image is typically then
stored on a disk, from which it can later be read into memory in order to allow a processor to execute the
program.
Some languages are difficult or even impossible to compile. Typically, the behavior of these languages depends on input data that are only available when the program runs. Such languages can be interpreted:
each step of the algorithm described by a program is executed by a software interpreter for the language.
Languages such as Java, Perl, and Python are usually interpreted. Similarly, when we use software to simulate one ISA using another ISA, as we do at the end of our class with the LC-3 ISA described by the textbook,
the simulator is a form of interpreter. In the lab, you will use a simulator compiled into and executing as
x86 instructions in order to interpret LC-3 instructions. While a program is executing in an interpreter,
enough information is sometimes available to compile part or all of the program to the processors ISA as
the program runs, a technique known as just in time (JIT) compilation.

c 2004-2014 Steven S. Lumetta. All rights reserved.


28

The C Preprocessor*
The C language uses a preprocessor to support inclusion of common information (stored in header files) into
multiple source files. The most frequent use of the preprocessor is to enable the unique definition of new
data types and operations within header files that can then be included by reference within source files that
make use of them. This capability is based on the include directive, #include, as shown here:
#include <stdio.h>
#include "my header.h"

/* search in standard directories


*/
/* search in current followed by standard directories */

The preprocessor also supports integration of compile-time constants into source files before compilation.
For example, many software systems allow the definition of a symbol such as NDEBUG (no debug) to compile
without additional debugging code included in the sources. Two directives are necessary for this purpose:
the define directive, #define, which provides a text-replacement facility, and conditional inclusion (or
exclusion) of parts of a file within #if/#else/#endif directives. These directives are also useful in allowing
a single header file to be included multiple times
without causing problems, as C does not allow re- #if !defined(MY HEADER H)
#define MY HEADER H
definition of types, variables, and so forth, even
/* actual header file material goes here */
if the redundant definitions are identical. Most #endif /* MY HEADER H */
header files are thus wrapped as shown to the
right.
The preprocessor performs a simple linear pass on the source and does not parse or interpret any C syntax.
Definitions for text replacement are valid as soon as they are defined and are performed until they are
undefined or until the end of the original source file. The preprocessor does recognize spacing and will not
replace part of a word, thus #define i 5 will not wreak havoc on your if statements, but will cause
problems if you name any variable i.
Using the text replacement capabilities of the preprocessor does have drawbacks, most importantly in that
almost none of the information is passed on for debugging purposes.

Changing Types in C*
Changing the type of a datum is necessary from time to time, but sometimes a compiler can do the work
for you. The most common form of implicit type conversion occurs with binary arithmetic operations.
Integer arithmetic in C always uses types of at least the size of int, and all floating-point arithmetic uses
double. If either or both operands have smaller integer types, or differ from one another, the compiler
implicitly converts them before performing the operation, and the type of the result may be different from
those of both operands. In general, the compiler selects the final type according to some preferred ordering
in which floating-point is preferred over integers, unsigned values are preferred over signed values, and more
bits are preferred over fewer bits. The type of the result must be at least as large as either argument, but is
also at least as large as an int for integer operations and a double for floating-point operations.
Modern C compilers always extend an integer types bit width before converting from signed to unsigned.
The original C specification interleaved bit width extensions to int with sign changes, thus older compilers
may not be consistent, and implicitly require both types of conversion in a single operation may lead to
portability bugs.
The implicit extension to int can also be confusing in the sense that arithmetic that seems to work on
smaller integers fails with larger ones. For example, multiplying two 16-bit integers set to 1000 and printing
the result works with most compilers because the 32-bit int result is wide enough to hold the right answer.
In contrast, multiplying two 32-bit integers set to 100,000 produces the wrong result because the high bits
of the result are discarded before it can be converted to a larger type. For this operation to produce the
correct result, one of the integers must be converted explicitly (as discussed later) before the multiplication.

c 2004-2014 Steven S. Lumetta. All rights reserved.


29

Implicit type conversions also occur due to assignments. Unlike arithmetic conversions, the final type must
match the left-hand side of the assignment (for example, a variable to which a result is assigned), and the
compiler simply performs any necessary conversion. Since the desired type may be smaller than the type of
the value assigned, information can be lost. Floating-point values are truncated when assigned to integers,
and high bits of wider integer types are discarded when assigned to narrower integer types. Note that a
positive number may become a negative number when bits are discarded in this manner.
Passing arguments to functions can be viewed as a special case of assignment. Given a function prototype,
the compiler knows the type of each argument and can perform conversions as part of the code generated
to pass the arguments to the function. Without such a prototype, or for functions with variable numbers of
arguments, the compiler lacks type information and thus cannot perform necessary conversions, leading to
unpredictable behavior. By default, however, the compiler extends any integer smaller than an int to the
width of an int and converts float to double.
Occasionally it is convenient to use an
explicit type cast to force conver- int
sion from one type to another. Such main ()
casts must be used with caution, as {
int numerator = 10;
they silence many of the warnings that
int denominator = 20;
a compiler might otherwise generate
when it detects potential problems.
printf ("%f\n", numerator / (double)denominator);
One common use is to promote intereturn
0;
gers to floating-point before an arith}
metic operation, as shown to the right.
The type to which a value is to be converted is placed in parentheses in front of the value. In most cases,
additional parentheses should be used to avoid confusion about the precedence of type conversion over other
operations.

30

c 2012-2015 Steven S. Lumetta. All rights reserved.


ECE120: Introduction to Computer Engineering


Notes Set 1.6
Summary of Part 1 of the Course
This short summary provides a list of both terms that we expect you to know and and skills that we expect
you to have after our first few weeks together. The first part of the course is shorter than the other three
parts, so the amount of material is necessarily less. These notes supplement the Patt and Patel textbook,
so you will also need to read and understand the relevant chapters (see the syllabus) in order to master this
material completely.
According to educational theory, the difficulty of learning depends on the type of task involved. Remembering
new terminology is relatively easy, while applying the ideas underlying design decisions shown by example
to new problems posed as human tasks is relatively hard. In this short summary, we give you lists at several
levels of difficulty of what we expect you to be able to do as a result of the last few weeks of studying
(reading, listening, doing homework, discussing your understanding with your classmates, and so forth).
This time, well list the skills first and leave the easy stuff for the next page. We expect you to be able to
exercise the following skills:
Represent decimal numbers with unsigned, 2s complement, and IEEE floating-point representations,
and be able to calculate the decimal value represented by a bit pattern in any of these representations.
Be able to negate a number represented in the 2s complement representation.
Perform simple arithmetic by hand on unsigned and 2s complement representations, and identify when
overflow occurs.
Be able to write a truth table for a Boolean expression.
Be able to write a Boolean expression as a sum of minterms.
Know how to declare and initialize C variables with one of the primitive data types.
At a more abstract level, we expect you to be able to:
Understand the value of using a common mathematical basis, such as modular arithmetic, in defining
multiple representations (such as unsigned and 2s complement).
Write Boolean expressions for the overflow conditions on both unsigned and 2s complement addition.
Be able to write single if statements and for loops in C in order to perform computation.
Be able to use scanf and printf for basic input and output in C.
And, at the highest level, we expect that you will be able to reason about and analyze problems in the
following ways:
Understand the tradeoffs between integer and floating-point representations for numbers.
Understand logical completeness and be able to prove or disprove logical completeness for sets of logic
functions.
Understand the properties necessary in a representation.
Analyze a simple, single-function C program and be able to explain its purpose.

c 2012-2015 Steven S. Lumetta. All rights reserved.


31

You should recognize all of these terms and be able to explain what they mean. Note that we are not
saying that you should, for example, be able to write down the ASCII representation from memory. In that
example, knowing that it is a 7-bit representation used for English text is sufficient. You can always look up
the detailed definition in practice.
universal computational devices /
computing machines
undecidable
the halting problem
information storage in computers
bits
representation
data type
unsigned representation
2s complement representation
IEEE floating-point representation
ASCII representation
operations on bits
1s complement operation
carry (from addition)
overflow (on any operation)
Boolean logic and algebra
logic functions/gates
truth table
AND/conjunction
OR/disjunction
NOT/logical complement/
(logical) negation/inverter
XOR
logical completeness
minterm
mathematical terms
modular arithmetic
implication
contrapositive
proof approaches: by construction,
by contradiction, by induction
without loss of generality (w.l.o.g.)

high-level language concepts


syntax
variables
declaration
primitive data types
symbolic name/identifier
initialization
expression
statement
C operators
operands
arithmetic
bitwise
comparison/relational
assignment
address
arithmetic shift
logical shift
precedence
functions in C
main
function call
arguments
printf and scanf
format string
escape character
sizeof
transforming tasks into programs
flow chart
sequential construction
conditional construction
iterative construction/iteration/loop
loop body
C statements
statement:
null, simple, compound
if statement
for loop
return statement

32

c 2012-2015 Steven S. Lumetta. All rights reserved.


c 2012 Steven S. Lumetta. All rights reserved.


ECE120: Introduction to Computer Engineering


Notes Set 2.1
Optimizing Logic Expressions
The second part of the course covers digital design more deeply than does the textbook. The lecture notes
will explain the additional material, and we will provide further examples in lectures and in discussion
sections. Please let us know if you need further material for study.
In the last notes, we introduced Boolean logic operations and showed that with AND, OR, and NOT, we
can express any Boolean function on any number of variables. Before you begin these notes, please read the
first two sections in Chapter 3 of the textbook, which discuss the operation of complementary metaloxide semiconductor (CMOS) transistors, illustrate how gates implementing the AND, OR, and NOT
operations can be built using transistors, and introduce DeMorgans laws.
This set of notes exposes you to a mix of techniques, terminology, tools, and philosophy. Some of the material
is not critical to our class (and will not be tested), but is useful for your broader education, and may help
you in later classes. The value of this material has changed substantially in the last couple of decades, and
particularly in the last few years, as algorithms for tools that help with hardware design have undergone
rapid advances. We talk about these issues as we introduce the ideas.
The notes begin with a discussion of the best way to express a Boolean function and some techniques
used historically to evaluate such decisions. We next introduce the terminology necessary to understand
manipulation of expressions, and use these terms to explain the Karnaugh map, or K-map, a tool that we
will use for many purposes this semester. We illustrate the use of K-maps with a couple of examples, then
touch on a few important questions and useful ways of thinking about Boolean logic. We conclude with a
discussion of the general problem of multi-metric optimization, introducing some ideas and approaches of
general use to engineers.

Defining Optimality
In the notes on logic operations, you learned how to express an arbitrary function on bits as an OR of
minterms (ANDs with one input per variable on which the function operates). Although this approach
demonstrates logical completeness, the results often seem inefficient, as you can see by comparing the
following expressions for the carry out C from the addition of two 2-bit unsigned numbers, A = A1 A0
and B = B1 B0 .
C

=
=

A1 B1 + (A1 + B1 )A0 B0
A1 B1 + A1 A0 B0 + A0 B1 B0

(1)
(2)

A1 A0 B1 B0 + A1 A0 B1 B0 + A1 A0 B1 B0 +
A1 A0 B1 B0 + A1 A0 B1 B0 + A1 A0 B1 B0

(3)

These three expressions are identical in the sense that they have the same truth tablesthey are the same
mathematical function. Equation (1) is the form that we gave when we introduced the idea of using logic
to calculate overflow. In this form, we were able to explain the terms intuitively. Equation (2) results from
distributing the parenthesized OR in Equation (1). Equation (3) is the result of our logical completeness
construction.
Since the functions are identical, does the form actually matter at all? Certainly either of the first two
forms is easier for us to write than is the third. If we think of the form of an expression as a mapping from
the function that we are trying to calculate into the AND, OR, and NOT functions that we use as logical
building blocks, we might also say that the first two versions use fewer building blocks. That observation
does have some truth, but lets try to be more precise by framing a question. For any given function, there
are an infinite number of ways that we can express the function (for example, given one variable A on which
the function depends, you can OR together any number of copies of AA without changing the function).
What exactly makes one expression better than another?

c 2012 Steven S. Lumetta. All rights reserved.


In 1952, Edward Veitch wrote an article on simplifying truth functions. In the introduction, he said, This
general problem can be very complicated and difficult. Not only does the complexity increase greatly with
the number of inputs and outputs, but the criteria of the best circuit will vary with the equipment involved.
Sixty years later, the answer is largely the same: the criteria depend strongly on the underlying technology
(the gates and the devices used to construct the gates), and no single metric, or way of measuring, is
sufficient to capture the important differences between expressions in all cases.
Three high-level metrics commonly used to evaluate chip designs are cost, power, and performance. Cost
usually represents the manufacturing cost, which is closely related to the physical silicon area required for the
design: the larger the chip, the more expensive the chip is to produce. Power measures energy consumption
over time. A chip that consumes more power means that a users energy bill is higher and, in a portable
device, either that the device is heavier or has a shorter battery life. Performance measures the speed at
which the design operates. A faster design can offer more functionality, such as supporting the latest games,
or can just finish the same work in less time than a slower design. These metrics are sometimes related: if
a chip finishes its work, the chip can turn itself off, saving energy.
How do such high-level metrics relate to the problem at hand? Only indirectly in practice. There are
too many factors involved to make direct calculations of cost, power, or performance at the level of logic
expressions. Finding an optimal solutionthe best formulation of a specific logic function for a given
metricis often impossible using the computational resources and algorithms available to us. Instead, tools
typically use heuristic approaches to find solutions that strike a balance between these metrics. A heuristic
approach is one that is believed to yield fairly good solutions to a problem, but does not necessarily find an
optimal solution. A human engineer can typically impose constraints, such as limits on the chip area or
limits on the minimum performance, in order to guide the process. Human engineers may also restructure
the implementation of a larger design, such as a design to perform floating-point arithmetic, so as to change
the logic functions used in the design.
Today, manipulation of logic expressions for the purposes of optimization is performed almost entirely by
computers. Humans must supply the logic functions of interest, and must program the acceptable transformations between equivalent forms, but computers do the grunt work of comparing alternative formulations
and deciding which one is best to use in context.
Although we believe that hand optimization of Boolean expressions is no longer an important skill for
our graduates, we do think that you should be exposed to the ideas and metrics historically used for such
optimization. The rationale for retaining this exposure is threefold. First, we believe that you still need to be
able to perform basic logic reformulations (slowly is acceptable) and logical equivalence checking (answering
the question, Do two expressions represent the same function?). Second, the complexity of the problem is
a good way to introduce you to real engineering. Finally, the contextual information will help you to develop
a better understanding of finite state machines and higher-level abstractions that form the core of digital
systems and are still defined directly by humans today.
Towards that end, we conclude this introduction by discussing two metrics that engineers traditionally used
to optimize logic expressions. These metrics are now embedded in computer-aided design (CAD) tools
and tuned to specific underlying technologies, but the reasons for their use are still interesting.
The first metric of interest is a heuristic for the area needed for a design. The measurement is simple: count
the number of variable occurrences in an expression. Simply go through and add up how many variables
you see. Using our example function C, Equation (1) gives a count of 6, Equation (2) gives a count of 8,
and Equation (3) gives a count of 24. Smaller numbers represent better expressions, so Equation (1) is
the best choice by this metric. Why is this metric interesting? Recall how gates are built from transistors.
An N -input gate requires roughly 2N transistors, so if you count up the number of variables in the expression,
you get an estimate of the number of transistors needed, which is in turn an estimate for the area required
for the design.
A variation on variable counting is to add the number of operations, since each gate also takes space for
wiring (within as well as between gates). Note that we ignore the number of inputs to the operations, so
a 2-input AND counts as 1, but a 10-input AND also counts as 1. We do not usually count complementing

c 2012 Steven S. Lumetta. All rights reserved.


variables as an operation for this metric because the complements of variables are sometimes available at
no extra cost in gates or wires. If we add the number of operations in our example, we get a count of 10
for Equation (1)two ANDs, two ORs, and 6 variables, a count of 12 for Equation (2)three ANDS, one
OR, and 8 variables, and a count of 31 for Equation (3)six ANDs, one OR, and 24 variables. The relative
differences between these equations are reduced when one counts operations.
A second metric of interest is a heuristic for the performance of a design. Performance is inversely related
to the delay necessary for a design to produce an output once its inputs are available. For example, if you
know how many seconds it takes to produce a result, you can easily calculate the number of results that
can be produced per second, which measures performance. The measurement needed is the longest chain of
operations performed on any instance of a variable. The complement of a variable is included if the variables
complement is not available without using an inverter. The rationale for this metric is that gate outputs do
not change instantaneously when their inputs change. Once an input to a gate has reached an appropriate
voltage to represent a 0 or a 1, the transistors in the gate switch (on or off) and electrons start to move.
Only when the output of the gate reaches the appropriate new voltage can the gates driven by the output
start to change. If we count each function/gate as one delay (we call this time a gate delay), we get an
estimate of the time needed to compute the function. Referring again to our example equations, we find
that Equation (1) requires 3 gate delays, Equation (2) requires 2 gate delays, Equation (3) requires 2 or 3
gate delays, depending on whether we have variable complements available. Now Equation (2) looks more
attractive: better performance than Equation (1) in return for a small extra cost in area.
Heuristics for estimating energy use are too complex to introduce at this point, but you should be aware that
every time electrons move, they generate heat, so we might favor an expression that minimizes the number
of bit transitions inside the computation. Such a measurement is not easy to calculate by hand, since you
need to know the likelihood of input combinations.

Terminology
We use many technical terms when we talk about simplification of logic expressions, so we now introduce
those terms so as to make the description of the tools and processes easier to understand.
Lets assume that we have a logic function F (A, B, C, D) that we want to express concisely. A literal in an
expression of F refers to either one of the variables or its complement. In other words, for our function F ,
the following is a complete set of literals: A, A, B, B, C, C, D, and D.
When we introduced the AND and OR functions, we also introduced notation borrowed from arithmetic,
using multiplication to represent AND and addition to represent OR. We also borrow the related terminology,
so a sum in Boolean algebra refers to a number of terms ORd together (for example, A + B, or AB + CD),
and a product in Boolean algebra refers to a number of terms ANDd together (for example, AD, or
AB(C + D). Note that the terms in a sum or product may themselves be sums, products, or other types of
expressions (for example, A B).
The construction method that we used to demonstrate logical completeness made use of minterms for each
input combination for which the function F produces a 1. We can now use the idea of a literal to give
a simpler definition of minterm: a minterm for a function on N variables is a product (AND function)
of N literals in which each variable or its complement appears exactly once. For our function F , examples
of minterms include ABCD, ABCD, and ABCD. As you know, a minterm produces a 1 for exactly one
combination of inputs.
When we sum minterms for each output value of 1 in a truth table to express a function, as we did to obtain
Equation (3), we produce an example of the sum-of-products form. In particular, a sum-of-products (SOP)
is a sum composed of products of literals. Terms in a sum-of-products need not be minterms, however.
Equation (2) is also in sum-of-products form. Equation (1), however, is not, since the last term in the sum
is not a product of literals.
Analogously to the idea of a minterm, we define a maxterm for a function on N variables as a sum (OR
function) of N literals in which each variable or its complement appears exactly once. Examples for F

c 2012 Steven S. Lumetta. All rights reserved.


include (A + B + C + D), (A + B + C + D), and (A + B + C + D). A maxterm produces a 0 for exactly one
combination of inputs. Just as we did with minterms, we can multiply a maxterm corresponding to each
input combination for which a function produces 0 (each row in a truth table that produces a 0 output)
to create an expression for the function. The resulting expression is in a product-of-sums (POS) form:
a product of sums of literals. The carry out function that we used to produce Equation (3) has 10 input
combinations that produce 0, so the expression formed in this way is unpleasantly long:
C

(A1 + A0 + B1 + B0 )(A1 + A0 + B1 + B0 )(A1 + A0 + B1 + B0 )(A1 + A0 + B1 + B0 )


(A1 + A0 + B1 + B0 )(A1 + A0 + B1 + B0 )(A1 + A0 + B1 + B0 )(A1 + A0 + B1 + B0 )
(A1 + A0 + B1 + B0 )(A1 + A0 + B1 + B0 )

However, the approach can be helpful with functions that produce mostly 1s. The literals in maxterms are
complemented with respect to the literals used in minterms. For example, the maxterm (A1 + A0 + B1 + B0 )
in the equation above produces a zero for input combination A1 = 1, A0 = 1, B1 = 0, B0 = 0.
An implicant G of a function F is defined to be a second function operating on the same variables for which
the implication G F is true. In terms of logic functions that produce 0s and 1s, if G is an implicant of F ,
the input combinations for which G produces 1s are a subset of the input combinations for which F produces
1s. Any minterm for which F produces a 1, for example, is an implicant of F .
In the context of logic design, the term implicant is used to refer to a single product of literals. In other
words, if we have a function F (A, B, C, D), examples of possible implicants of F include AB, BC, ABCD,
and A. In contrast, although they may technically imply F , we typically do not call expressions such as
(A + B), C(A + D), nor AB + C implicants.
Lets say that we have expressed function F in sum-of-products form. All of the individual product terms
in the expression are implicants of F . As a first step in simplification, we can ask: for each implicant, is it
possible to remove any of the literals that make up the product? If we have an implicant G for which the
answer is no, we call G a prime implicant of F . In other words, if one removes any of the literals from a
prime implicant G of F , the resulting product is not an implicant of F .
Prime implicants are the main idea that we use to simplify logic expressions, both algebraically and with
graphical tools (computer tools use algebra internallyby graphical here we mean drawings on paper).

Veitch Charts and Karnaugh Maps


Veitchs 1952 paper was the first to introduce the idea of using a graphical representation to simplify logic
expressions. Earlier approaches were algebraic. A year later, Maurice Karnaugh published a paper showing
a similar idea with a twist. The twist makes the use of Karnaugh maps to simplify expressions much easier
than the use of Veitch charts. As a result, few engineers have heard of Veitch, but everyone who has ever
taken a class on digital logic knows how to make use of a K-map.
Before we introduce the Karnaugh map, lets
A A
think about the structure of the domain of a
logic function. Recall that a functions domain
C
A A
C
is the space on which the function is defined,
A A
that is, for which the function produces values.
B
B
B
B
For a Boolean logic function on N variables,
you can think of the domain as sequences
of N bits, but you can also visualize the
domain as an N -dimensional hypercube. An
N -dimensional hypercube is the generalization of a cube to N dimensions. Some people only use the
term hypercube when N 4, since we have other names for the smaller values: a point for N = 0, a line
segment for N = 1, a square for N = 2, and a cube for N = 3. The diagrams above and to the right illustrate
the cases that are easily drawn on paper. The black dots represent specific input combinations, and the blue
edges connect input combinations that differ in exactly one input value (one bit).
ABC=000

A=0

ABC=100

AB=00

AB=10

ABC=001

ABC=101

AB=01

AB=11

ABC=011

ABC=111

A=1

ABC=010

ABC=110

c 2012 Steven S. Lumetta. All rights reserved.


By viewing a functions domain in this way, we can make a connection between a product of literals and
the structure of the domain. Lets use the 3-dimensional version as an example. We call the variables A, B,
and C, and note that the cube has 23 = 8 corners corresponding to the 23 possible combinations of A, B,
and C. The simplest product of literals in this case is 1, which is the product of 0 literals. Obviously, the
product 1 evaluates to 1 for any variable values. We can thus think of it as covering the entire domain of
the function. In the case of our example, the product 1 covers the whole cube. In order for the product 1 to
be an implicant of a function, the function itself must be the function 1.
What about a product consisting of a single literal, such as A or C? The dividing lines in the diagram
illustrate the answer: any such product term evaluates to 1 on a face of the cube, which includes 22 = 4 of
the corners. If a function evaluates to 1 on any of the six faces of the cube, the corresponding product term
(consisting of a single literal) is an implicant of the function.
Continuing with products of two literals, we see that any product of two literals, such as AB or BC,
corresponds to an edge of our 3-dimensional cube. The edge includes 21 = 2 corners. And, if a function
evaluates to 1 on any of the 12 edges of the cube, the corresponding product term (consisting of two literals)
is an implicant of the function.
Finally, any product of three literals, such as ABC, corresponds to a corner of the cube. But for a function
on three variables, these are just the minterms. As you know, if a function evaluates to 1 on any of the 8
corners of the cube, that minterm is an implicant of the function (we used this idea to construct the function
to prove logical completeness).
How do these connections help us to simplify functions? If were careful, we can map cubes onto paper in
such a way that product terms (the possible implicants of the function) usually form contiguous groups of 1s,
allowing us to spot them easily. Lets work upwards starting from one variable to see how this idea works.
The end result is called a Karnaugh map.
The first drawing shown to the right replicates our view of
the 1-dimensional hypercube, corresponding to the domain of a function on one variable, in this case the variable A. To the right of the
hypercube (line segment) are two variants of a Karnaugh map on one
variable. The middle variant clearly indicates the column corresponding to the product A (the other column corresponds to A). The right
variant simply labels the column with values for A.

A=0

The three drawings shown to the right illustrate the three possible
product terms on one variable. The functions shown in these Karnaugh
maps are arbitrary, except that we have chosen them such that each
implicant shown is a prime implicant for the illustrated function.
Lets now look at two-variable functions. We have replicated our drawing of the 2-dimensional hypercube (square)
to the right along with two variants of Karnaugh maps on
two variables. With only two variables (A and B), the
extension is fairly straightforward, since we can use the
second dimension of the paper (vertical) to express the
second variable (B).

A A
A=1

implicant: 1

implicant: A

implicant: A

A
0

AB=10

B
B
AB=01

A A
AB=00

AB=11

A
0

B
B

c 2012 Steven S. Lumetta. All rights reserved.


The number of possible products of


A
A
A
A
A
literals grows rapidly with the num0
1
0
1
0
1
0
1
0
1
ber of variables. For two variables,
0 1
1
0
1
0 0
1
0 1
0 0
1
0 1
nine are possible, as shown to the
B 1 1
1
B 1 0 1
B 1 1 1
B 1 1
1
B 1 0 1
right. Notice that all implicants
implicant: 1
implicant: A
implicant: A
implicant: B
implicant: B
have two properties. First, they occupy contiguous regions of the grid.
A
A
A
A
And, second, their height and width
0
1
0
1
0
1
0
1
are always powers of two. These
0 0
0 0
0
0 0
0 1
0
1
1
properties seem somewhat trivial at
B 1 0 1
B 1 0 0
B 1 1
B 1 0 0
0
this stage, but they are the key to
implicant: AB
implicant: AB
implicant: AB
implicant: AB
the utility of K-maps on more variables.
Three-variable functions are next. The
A A
cube diagram is again replicated to the
right. But now we have a problem: how
C
A
AC
C
00
01
11
10
00
01
11
10
can we map four points (say, from the top
0
1
1
0
0
0
1
1
0
0
half of the cube) into a line in such a way
B
B
B
1
1
1
1
1
B
1
1
1
1
1
that any points connected by a blue line are
adjacent in the K-map? The answer is that
C
we cannot, but we can preserve most of the
connections by choosing an order such as
the one illustrated by the arrow. The result
is called a Gray code. Two K-map variants again appear to the right of the cube. Look closely at the order
of the two-variable combinations along the top, which allows us to have as many contiguous products of
literals as possible. Any product of literals that contains C but not A nor A wraps around the edges of the
K-map, so you should think of it as rolling up into a cylinder rather than a grid. Or you can think that
were unfolding the cube to fit the corners onto a sheet of paper, but the place that we split the cube should
still be considered to be adjacent when looking for implicants. The use of a Gray code is the one difference
between a K-map and a Veitch chart; Veitch used the base 2 order, which makes some implicants hard to
spot.
ABC=000

ABC=100

ABC=001

ABC=101

ABC=011

ABC=111

ABC=010

ABC=110

With three variables, we have 27 possible products of literals. You may have noticed that the count scales
as 3N for N variables; can you explain why? We illustrate several product terms below. Note that we
sometimes need to wrap around the end of the K-map, but that if we account for wrapping, the squares
covered by all product terms are contiguous. Also notice that both the width and the height of all product
terms are powers of two. Any square or rectangle that meets these two constraints corresponds to a product term! And any such square or rectangle that is filled with 1s is an implicant of the function in the K-map.
A

01

11

10

00

01

11

10

00

00

01

11

10

00

01

11

10

00

01

11

10

implicant: 1

implicant: C

implicant: AB

implicant: AC

implicant: ABC

Lets keep going. With a function on four variablesA, B, C, and Dwe can use a Gray code order on two
of the variables in each dimension. Which variables go with which dimension in the grid really doesnt matter,
so well assign AB to the horizontal dimension and CD to the vertical dimension. A few of the 81 possible
product terms are illustrated at the top of the next page. Notice that while wrapping can now occur in both
dimensions, we have exactly the same rule for finding implicants of the function: any square or rectangle (allowing for wrapping) that is filled with 1s and has both height and width equal to (possibly different) powers
of two is an implicant of the function. Furthermore, unless such a square or rectangle is part of a larger
square or rectangle that meets these criteria, the corresponding implicant is a prime implicant of the function.

c 2012 Steven S. Lumetta. All rights reserved.



AB

AB
00

01

11

10

00

01

11

10

AB

AB

00

01

11

10

00

01

11

10

CD

CD

00

01

11

10

00

01

11

implicant: D

10

01

11

10

00

01

11

10

CD

CD

implicant: 1

AB

00

AB

00

01

11

10

00

01

11

10

CD

01

11

10

00

01

11

10

CD

implicant: AB

implicant: BD

00

implicant: ACD

implicant: ABCD

Finding a simple expression for a function using a K-map then consists of solving the following problem:
pick a minimal set of prime implicants such that every 1 produced by the function is covered by at least one
prime implicant. The metric that you choose to minimize the set may vary in practice, but for simplicity,
lets say that we minimize the number of prime implicants chosen.
Lets try a few! The table on the left below reproduces (from Notes Set 1.4) the truth table for addition of
two 2-bit unsigned numbers, A1 A0 and B1 B0 , to produce a sum S1 S0 and a carry out C. K-maps for each
output bit appear to the right. The colors are used only to make the different prime implicants easier to
distinguish. The equations produced by summing these prime implicants appear below the K-maps.
A1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1

inputs
A0 B1
0
0
0
0
0
1
0
1
1
0
1
0
1
1
1
1
0
0
0
0
0
1
0
1
1
0
1
0
1
1
1
1

B0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1

C
0
0
0
0
0
0
0
1
0
0
1
1
0
1
1
1

outputs
S1 S0
0
0
0
1
1
0
1
1
0
1
1
0
1
1
0
0
1
0
1
1
0
0
0
1
1
1
0
0
0
1
1
0

S1

A1A0
00

01

11

10

00

01

11

10

B1B0

S0

A1A0
00

01

11

10

00

01

11

10

B1B0

A1A0
00

01

11

00

01

11

10

B1B0

A1 B1 + A1 A0 B0 + A0 B1 B0

S1

A1 B1 B0 + A1 A0 B1 + A1 A0 B1 + A1 B1 B0 +
A1 A0 B1 B0 + A1 A0 B1 B0

S0

A0 B0 + A0 B0

10

In theory, K-maps extend to an arbitrary number of variables. Certainly Gray codes can be extended. An
N -bit Gray code is a sequence of N -bit patterns that includes all possible patterns such that any two
adjacent patterns differ in only one bit. The code is actually a cycle: the first and last patterns also differ
in only one bit. You can construct a Gray code recursively as follows: for an (N + 1)-bit Gray code, write
the sequence for an N -bit Gray code, then add a 0 in front of all patterns. After this sequence, append a
second copy of the N -bit Gray code in reverse order, then put a 1 in front of all patterns in the second copy.
The result is an (N + 1)-bit Gray code. For example, the following are Gray codes:
1-bit
2-bit
3-bit
4-bit

0, 1
00, 01, 11, 10
000, 001, 011, 010, 110, 111, 101, 100
0000, 0001, 0011, 0010, 0110, 0111, 0101, 0100, 1100, 1101, 1111, 1110, 1010, 1011, 1001, 1000

Unfortunately, some of the beneficial properties of K-maps do not extend beyond two variables in a dimension. Once you have three variables in one dimension, as is necessary if a function operates on five or
more variables, not all product terms are contiguous in the grid. The terms still require a total number of
rows and columns equal to a power of two, but they dont all need to be a contiguous group. Furthermore,
some contiguous groups of appropriate size do not correspond to product terms. So you can still make use of
K-maps if you have more variables, but their use is a little trickier.

c 2012 Steven S. Lumetta. All rights reserved.


Canonical Forms
What if we want to compare two expressions to determine whether they represent the same logic function?
Such a comparison is a test of logical equivalence, and is an important part of hardware design. Tools
today provide help with this problem, but you should understand the problem.
You know that any given function can be expressed in many ways, and that two expressions that look quite
different may in fact represent the same function (look back at Equations (1) to (3) for an example). But
what if we rewrite the function using only prime implicants? Is the result unique? Unfortunately, no.
In general, a sum of products is not unique (nor is a product of sums), even if the sum
contains only prime implicants.
For example, consensus terms may or may not be included in our expressions. (They
are necessary for reliable design of certain types of systems, as you will learn in a
later ECE class.) The green ellipse in the K-map to the right represents the consensus
term BC.
Z
Z

=
=

BC
00

01

11

10

00

01

11

10

00

A C +A B+B C
A C +A B

Some functions allow several equivalent formulations as sums of


prime implicants, even without consensus terms. The K-maps
shown to the right, for example, illustrate how one function
might be written in either of the following ways:

CD

= A B D+A C D+A B C+B C D


= A B C +B C D+A B D+A C D

CD

00

01

11

10

00

01

01

11

11

10

10

AB

Z
Z

AB

When we need to compare two things (such as functions), we need to transform them into what in mathematics is known as a canonical form, which simply means a form that is defined so as to be unique for
each thing of the given type. What can we use for logic functions? You already know two answers! The
canonical sum of a function (sometimes called the canonical SOP form) is the sum of minterms. The
canonical product of a function (sometimes called the canonical POS form) is the product of maxterms.
These forms technically only meet the mathematical definition of canonical if we agree on an order for the
min/maxterms, but that problem is solvable. However, as you already know, the forms are not particularly
convenient to use. In practice, people and tools in the industry use more compact approaches when comparing functions, but those solutions are a subject for a later class (such as ECE 462).

Two-Level Logic
Two-level logic is a popular way of expressing logic functions. The two levels refer
simply to the number of functions through which an input passes to reach an output, and
both the SOP and POS forms are examples of two-level logic. In this section, we illustrate
one of the reasons for this popularity and show you how to graphically manipulate
expressions, which can sometimes help when trying to understand gate diagrams.
We begin with one of DeMorgans laws, which we can illustrate both algebraically and
graphically: C = B + A = B A

A
B

A
B

c 2012 Steven S. Lumetta. All rights reserved.


Lets say that we have a function expressed in SOP form, such as Z = ABC + DE + F GHJ. The diagram
on the left below shows the function constructed from three AND gates and an OR gate. Using DeMorgans
law, we can replace the OR gate with a NAND with inverted inputs. But the bubbles that correspond to
inversion do not need to sit at the input to the gate. We can invert at any point along the wire, so we slide
each bubble down the wire to the output of the first column of AND gates. Be careful: if the wire splits,
which does not happen in our example, you have to replicate the inverter onto the other output paths as you
slide past the split point! The end result is shown on the right: we have not changed the function, but now
we use only NAND gates. Since CMOS technology only supports NAND and NOR directly, using two-level
logic makes it simple to map our expression into CMOS gates.
A
B
C
D
E
F
G
H
J

A
B
C

A
B
C
Z

first, we replace
this OR gate
using DeMorgans law

D
E
F
G
H
J

next, we slide these


inversion bubbles
down the wires to the left

E
F
G
H
J

we now have the same


function (SOP form)
implemented with NAND gates

You may want to make use of DeMorgans other law, illustrated graphically to the right,
to perform the same transformation on a POS expression. What do you get?

A
B

Multi-Metric Optimization
As engineers, almost every real problem that you encounter will admit multiple metrics for evaluating possible
designs. Becoming a good engineer thus requires not only that you be able to solve problems creatively so
as to improve the quality of your solutions, but also that you are aware of how people might evaluate those
solutions and are able both to identify the most important metrics and to balance your design effectively
according to them. In this section, we introduce some general ideas and methods that may be of use to you
in this regard. We will not test you on the concepts in this section.
When you start thinking about a new problem, your first step should be to think carefully about metrics
of possible interest. Some important metrics may not be easy to quantify. For example, compatibility of
a design with other products already owned by a customer has frequently defined the success or failure of
computer hardware and software solutions. But how can you compute the compability of your approach as
a number?
Humansincluding engineersare not good at comparing multiple metrics simultaneously. Thus, once you
have a set of metrics that you feel is complete, your next step is to get rid of as many as you can. Towards
this end, you may identify metrics that have no practical impact in current technology, set threshold values
for other metrics to simplify reasoning about them, eliminate redundant metrics, calculate linear sums to
reduce the count of metrics, and, finally, make use of the notion of Pareto optimality. All of these ideas are
described in the rest of this section.
Lets start by considering metrics that we can quantify as real numbers. For a given metric, we can divide
possible measurement values into three ranges. In the first range, all measurement values are equivalently
useful. In the second range, possible values are ordered and interesting with respect to one another. Values
in the third range are all impossible to use in practice. Using power consumption as our example, the first
range corresponds to systems in which when a processors power consumption in a digital system is extremely
low relative to the power consumption of the system. For example, the processor in a computer might use
less than 1% of the total used by the system including the disk drive, the monitor, the power supply, and so
forth. One power consumption value in this range is just as good as any another, and no one cares about the
power consumption of the processor in such cases. In the second range, power consumption of the processor

10

c 2012 Steven S. Lumetta. All rights reserved.


makes a difference. Cell phones use most of their energy in radio operation, for example, but if you own
a phone with a powerful processor, you may have noticed that you can turn off the phone and drain the
battery fairly quickly by playing a game. Designing a processor that uses half as much power lengthens the
battery life in such cases. Finally, the third region of power consumption measurements is impossible: if you
use so much power, your chip will overheat or even burst into flames. Consumers get unhappy when such
things happen.
As a first step, you can remove any metrics for which all solutions are effectively equivalent. Until a little
less than a decade ago, for example, the power consumption of a desktop processor actually was in the first
range that we discussed. Power was simply not a concern to engineers: all designs of interest consumed so
little power that no one cared. Unfortunately, at that point, power consumption jumped into the third range
rather quickly. Processors hit a wall, and products had to be cancelled. Given that the time spent designing
a processor has historically been about five years, a lot of engineering effort was wasted because people had
not thought carefully enough about power (since it had never mattered in the past). Today, power is an
important metric that engineers must take into account in their designs.
However, in some areas, such as desktop and high-end server processors, other metrics (such as performance)
may be so important that we always want to operate at the edge of the interesting range. In such cases,
we might choose to treat a metric such as power consumption as a threshold: stay below 150 Watts for a
desktop processor, for example. One still has to make a coordinated effort to ensure that the system as a
whole does not exceed the threshold, but reasoning about threshold values, a form of constraint, is easier
than trying to think about multiple metrics at once.
Some metrics may only allow discrete quantification. For example, one could choose to define compatibility
with previous processor generations as binary: either an existing piece of software (or operating system) runs
out of the box on your new processor, or it does not. If you want people who own that software to make use
of your new processor, you must ensure that the value of this binary metric is 1, which can also be viewed
as a threshold.
In some cases, two metrics may be strongly correlated, meaning that a design that is good for one of the
metrics is frequently good for the other metric as well. Chip area and cost, for example, are technically
distinct ways to measure a digital design, but we rarely consider them separately. A design that requires a
larger chip is probably more complex, and thus takes more engineering time to get right (engineering time
costs money). Each silicon wafer costs money to fabricate, and fewer copies of a large design fit on one
wafer, so large chips mean more fabrication cost. Physical defects in silicon can cause some chips not to
work. A large chip uses more silicon than a small one, and is thus more likely to suffer from defects (and
not work). Cost thus goes up again for large chips relative to small ones. Finally, large chips usually require
more careful testing to ensure that they work properly (even ignoring the cost of getting the design right,
we have to test for the presence of defects), which adds still more cost for a larger chip. All of these factors
tend to correlate chip area and chip cost, to the point that most engineers do not consider both metrics.
After you have tried to reduce your set of metrics as much as possible, or simplified them by turning them into
thresholds, you should consider turning the last few metrics into a weighted linear sum. All remaining metrics
must be quantifiable in this case. For example, if you are left with three metrics for which a given design has
values A, B, and C, you might reduce these to one metric by calculating D = wA A + wB B + wC C. What
are the w values? They are weights for the three metrics. Their values represent the relative importance of
the three metrics to the overall evaluation. Here weve assumed that larger values of A, B, and C are either
all good or all bad. If you have metrics with different senses, use the reciprocal values. For example, if a
large value of A is good, a small value of 1/A is also good.
The difficulty with linearizing metrics is that not everyone agrees on the weights. Is using less power more
important than having a cheaper chip? The answer may depend on many factors.
When you are left with several metrics of interest, you can use the idea of Pareto optimality to identify
interesting designs. Lets say that you have two metrics. If a design D1 is better than a second design D2
for both metrics, we say that D1 dominates D2 . A design D is then said to be Pareto optimal if no other
design dominates D. Consider the figure on the left below, which illustrates seven possible designs measured

c 2012 Steven S. Lumetta. All rights reserved.


11

with two metrics. The design corresponding to point B dominates the designs corresponding to points A
and C, so neither of the latter designs is Pareto optimal. No other point in the figure dominates B, however,
so that design is Pareto optimal. If we remove all points that do not represent Pareto optimal designs, and
instead include only those designs that are Pareto optimal, we obtain the version shown on the right. These
are points in a two-dimensional space, not a line, but we can imagine a line going through the points, as
illustrated in the figure: the points that make up the line are called a Pareto curve, or, if you have more
than two metrics, a Pareto surface.
good

good
B

B
E

A
D

metric 2

metric 2

C
F
dominated
by point B

bad

bad
bad

good
metric 1

bad

good
metric 1

As an example of the use of Pareto optimality, consider


the figure to the right, which is copied with permission
from Neal Cragos Ph.D. dissertation (UIUC ECE,
2012). The figure compares hundreds of thousands of
possible designs based on a handful of different core
approaches for implementing a processor. The axes in
the graph are two metrics of interest. The horizontal
axis measures the average performance of a design
when executing a set of benchmark applications, normalized to a baseline processor design. The vertical
axis measures the energy consumed by a design when
executing the same benchmarks, normalized again to the energy consumed by a baseline design. The six
sets of points in the graph represent alternative design techniques for the processor, most of which are in
commercial use today. The points shown for each set are the subset of many thousands of possible variants
that are Pareto optimal. In this case, more performance and less energy consumption are the good directions,
so any point in a set for which another point is both further to the right and further down is not shown in the
graph. The black line represents an absolute power consumption of 150 Watts, which is a nominal threshold
for a desktop environment. Designs above and to the right of that line are not as interesting for desktop use.
The design-space exploration that Neal reported in this figure was of course done by many computers
using many hours of computation, but he had to design the process by which the computers calculated each
of the points.

c 2012 Steven S. Lumetta. All rights reserved.


12

ECE120: Introduction to Computer Engineering


Notes Set 2.2
Boolean Properties and Dont Care Simplification
This set of notes begins with a brief illustration of a few properties of Boolean logic, which may be of use
to you in manipulating algebraic expressions and in identifying equivalent logic functions without resorting
to truth tables. We then discuss the value of underspecifying a logic function so as to allow for selection of
the simplest possible implementation. This technique must be used carefully to avoid incorrect behavior, so
we illustrate the possibility of misuse with an example, then talk about several ways of solving the example
correctly. We conclude by generalizing the ideas in the example to several important application areas and
talking about related problems.

Logic Properties
Table 1 (on the next page) lists a number of properties of Boolean logic. Most of these are easy to derive
from our earlier definitions, but a few may be surprising to you. In particular, in the algebra of real numbers,
multiplication distributes over addition, but addition does not distribute over multiplication. For example,
3 (4 + 7) = (3 4) + (3 7), but 3 + (4 7) 6= (3 + 4) (3 + 7). In Boolean algebra, both operators
distribute over one another, as indicated in Table 1. The consensus properties may also be nonintuitive.
Drawing a K-map may help you understand the consensus property on the right side of the table. For the
consensus variant on the left side of the table, consider that since either A or A must be 0, either B or C
or both must be 1 for the first two factors on the left to be 1 when ANDed together. But in that case, the
third factor is also 1, and is thus redundant.
As mentioned previously, Boolean algebra has an elegant symmetry known as a duality, in which any logic
statement (an expression or an equation) is related to a second logic statement. To calculate the dual
form of a Boolean expression or equation, replace 0 with 1, replace 1 with 0, replace AND with OR, and
replace OR with AND. Variables are not changed when finding the dual form. The dual form of a dual form is
the original logic statement. Be careful when calculating a dual form: our convention for ordering arithmetic
operations is broken by the exchange, so you may want to add explicit parentheses before calculating the
dual. For example, the dual of AB + C is not A + BC. Rather, the dual of AB + C is (A + B)C. Add
parentheses as necessary when calculating a dual form to ensure that the order of operations does not change.
Duality has several useful practical applications. First, the principle of duality states that any theorem
or identity has the same truth value in dual form (we do not prove the principle here). The rows of Table 1
are organized according to this principle: each row contains two equations that are the duals of one another.
Second, the dual form is useful when designing certain types of logic, such as the networks of transistors
connecting the output of a CMOS gate to high voltage and ground. If you look at the gate designs in the
textbook (and particularly those in the exercises), you will notice that these networks are duals. A function/expression is not a theorem nor an identity, thus the principle of duality does not apply to the dual
of an expression. However, if you treat the value 0 as true, the dual form of an expression has the same
truth values as the original (operating with value 1 as true). Finally, you can calculate the complement
of a Boolean function (any expression) by calculating the dual form and then complementing each variable.

Choosing the Best Function


When we specify how something works using a human language, we leave out details. Sometimes we do so
deliberately, assuming that a reader or listener can provide the details themselves: Take me to the airport!
rather than Please bend your right arm at the elbow and shift your right upper arm forward so as to place
your hand near the ignition key. Next, ...
You know the basic technique for implementing a Boolean function using combinational logic: use a K-map
to identify a reasonable SOP or POS form, draw the resulting design, and perhaps convert to NAND/NOR
gates.

c 2012 Steven S. Lumetta. All rights reserved.



1+A=1
1A =A
A+A=A
AA= 0
A+B =A B
(A + B)C = AC + BC
(A + B)(A + C)(B + C) = (A + B)(A + C)

13
0A= 0
0+A=A
AA= A
A+A =1
AB = A + B
A B + C = (A + C)(B + C)
A B+A C +B C =A B+A C

DeMorgans laws
distribution
consensus

Table 1: Boolean logic properties. The two columns are dual forms of one another.

When we develop combinational logic designs, we may also choose to leave some aspects unspecified. In
particular, the value of a Boolean logic function to be implemented may not matter for some input combinations. If we express the function as a truth table, we may choose to mark the functions value for some
input combinations as dont care, which is written as x (no quotes).
What is the benefit of using dont care values? Using dont care values allows you to choose from
among several possible logic functions, all of which produce the desired results (as well as some combination
of 0s and 1s in place of the dont care values). Each input combination marked as dont care doubles
the number of functions that can be chosen to implement the design, often enabling the logic needed for
implementation to be simpler.
For example, the K-map to the right specifies a function F (A, B, C) with two dont
care entries. If you are asked to design combinational logic for this function, you can
choose any values for the two dont care entries. When identifying prime implicants,
each x can either be a 0 or a 1.
Depending on the choices made for the xs, we obtain one of the following four functions:

AB
00

01

11

10

00

01

11

10

F
F

=
=

A B+B C
A B+B C +A B C

F
F

=
=

B
B+A C

AB
0
1

Given this set of choices, a designer typically chooses the third: F = B, which corresponds to the K-map
shown to the right of the equations. The design then produces F = 1 when A = 1, B = 1, and C = 0
(ABC = 110), and produces F = 0 when A = 1, B = 0, and C = 0 (ABC = 100). These differences are
marked with shading and green italics in the new K-map. No implementation ever produces an x.

Caring about Dont Cares


What can go wrong? In the context of a digital system, unspecified details may or may not be important.
However, any implementation of a specification implies decisions about these details, so decisions should
only be left unspecified if any of the possible answers is indeed acceptable.
As a concrete example, lets design logic to control an ice cream dispenser. The dispenser has two flavors,
lychee and mango, but also allows us to create a blend of the two flavors. For each of the two flavors,
our logic must output two bits to control the amount of ice cream that comes out of the dispenser. The
two-bit CL [1 : 0] output of our logic must specify the number of half-servings of lychee ice cream as a binary
number, and the two-bit CM [1 : 0] output must specify the number of half-servings of mango ice cream.
Thus, for either flavor, 00 indicates none of that flavor, 01 indicates one-half of a serving, and 10 indicates
a full serving.
Inputs to our logic will consist of three buttons: an L button to request a serving of lychee ice cream,
a B button to request a blendhalf a serving of each flavor, and an M button to request a serving of mango
ice cream. Each button produces a 1 when pressed and a 0 when not pressed.

c 2012 Steven S. Lumetta. All rights reserved.


14

Lets start with the assumption that the user only presses one button at a time. In this case, we can treat
input combinations in which more than one button is pressed as dont care values in the truth tables for
the outputs. K-maps for all four output bits appear below. The xs indicate dont care values.
CL [1]
0

CL [0]

LB
00

01

11

10

CM [1]

LB
00

01

11

10

M
1

LB

CM [0]

00

01

11

10

LB
00

01

11

10

When we calculate the logic function for an output, each dont care value can be treated as either 0 or 1,
whichever is more convenient in terms of creating the logic. In the case of CM [1], for example, we can treat
the three xs in the ellipse as 1s, treat the x outside of the ellipse as a 0, and simply use M (the implicant
represented by the ellipse) for CM [1]. The other three output bits are left as an exercise, although the result
appears momentarily.
The implementation at right takes full advantage of the dont care
parts of our specification. In this case, we require no logic at all; we
need merely connect the inputs to the correct outputs. Lets verify the
operation. We have four cases to consider. First, if none of the buttons
are pushed (LBM = 000), we get no ice cream, as desired (CM = 00
and CL = 00). Second, if we request lychee ice cream (LBM = 100),
the outputs are CL = 10 and CM = 00, so we get a full serving of
lychee and no mango. Third, if we request a blend (LBM = 010), the
outputs are CL = 01 and CM = 01, giving us half a serving of each
flavor. Finally, if we request mango ice cream (LBM = 001), we get
no lychee but a full serving of mango.

L
(lychee flavor)

CL[1]
CL[0]
(lychee output control)

B
(blend of two flavors)

M
(mango flavor)

CM[1]
CM[0]
(mango output control)

The K-maps for this implementation appear below. Each of the dont care xs from the original design
has been replaced with either a 0 or a 1 and highlighted with shading and green italics. Any implementation
produces either 0 or 1 for every output bit for every possible input combination.
CL [1]

CL [0]

LB
00

01

11

10

CM [1]

LB
00

01

11

10

CM [0]

LB
00

01

11

10

LB
00

01

11

10

As you can see, leveraging dont care output bits can sometimes significantly simplify our logic. In the
case of this example, we were able to completely eliminate any need for gates! Unfortunately, the resulting
implementation may sometimes produce unexpected results. Based on the implementation, what happens if
a user presses more than one button? The ice cream cup overflows!
Lets see why. Consider the case LBM = 101, in which weve pressed both the lychee and mango buttons.
Here CL = 10 and CM = 10, so our dispenser releases a full serving of each flavor, or two servings total.
Pressing other combinations may have other repercussions as well. Consider pressing lychee and blend
(LBM = 110). The outputs are then CL = 11 and CM = 01. Hopefully the dispenser simply gives us one
and a half servings of lychee and a half serving of mango. However, if the person who designed the dispenser
assumed that no one would ever ask for more than one serving, something worse might happen. In other
words, giving an input of CL = 11 to the ice cream dispenser may lead to other unexpected behavior if its
designer decided that that input pattern was a dont care.
The root of the problem is that while we dont care about the value of any particular output marked x for
any particular input combination, we do actually care about the relationship between the outputs.
What can we do? When in doubt, it is safest to make choices and to add the new decisions to the specification
rather than leaving output values specified as dont care. For our ice cream dispenser logic, rather than
leaving the outputs unspecified whenever a user presses more than one button, we could choose an acceptable
outcome for each input combination and replace the xs with 0s and 1s. We might, for example, decide to
produce lychee ice cream whenever the lychee button is pressed, regardless of other buttons (LBM = 1xx,

c 2012 Steven S. Lumetta. All rights reserved.


15

which means that we dont care about the inputs B and M , so LBM = 100, LBM = 101, LBM = 110,
or LBM = 111). That decision alone covers three of the four unspecified input patterns. We might also decide that when the blend and mango buttons are pushed together (but without the lychee button, LBM=011),
our logic produces a blend. The resulting K-maps are shown below, again with shading and green italics
identifying the combinations in which our original design specified dont care.
CL [1]

CL [0]

LB
00

01

11

10

CM [1]

LB
00

01

11

10

CM [0]

LB
00

01

11

10

LB
00

01

11

10

The logic in the dashed box to the


L
CL[1]
right implements the set of choices just
(lychee flavor)
CL[0]
discussed, and matches the K-maps
(lychee output control)
above. Based on our additional choices,
this implementation enforces a strict
B
(blend of two flavors)
priority scheme on the users button
presses. If a user requests lychee, they
can also press either or both of the
CM[1]
M
CM[0]
other buttons with no effect. The lychee
(mango flavor)
(mango output control)
this logic prioritizes the buttons
button has priority. Similarly, if the
and passes only one at any time
user does not press lychee, but presses the blend button, pressing the mango button at the same time has no effect. Choosing mango requires
that no other buttons be pressed. We have thus chosen a prioritization order for the buttons and imposed
this order on the design.
We can view this same implementation in another way. Note the one-to-one correspondence between inputs
(on the left) and outputs (on the right) for the dashed box. This logic takes the users button presses and
chooses at most one of the buttons to pass along to our original controller implementation (to the right of the
dashed box). In other words, rather than thinking of the logic in the dashed box as implementing a specific
set of decisions, we can think of the logic as cleaning up the inputs to ensure that only valid combinations
are passed to our original implementation. Once the inputs are cleaned up, the original implementation is
acceptable, because input combinations containing more than a single 1 are in fact impossible.
Strict prioritization is one useful way to clean up our inputs. In general, we can design logic to map each
of the four undesirable input patterns into one of the permissible combinations (the four that we specified
explicitly in our original design, with LBM in the set {000, 001, 010, 100}). Selecting a prioritization scheme
is just one approach for making these choices in a way that is easy for a user to understand and is fairly easy
to implement.
A second simple approach is to ignore
L
illegal combinations by mapping them
CL[1]
(lychee flavor)
into the no buttons pressed input patCL[0]
(lychee output control)
tern. Such an implementation appears
to the right, laid out to show that one
B
can again view the logic in the dashed
(blend of two flavors)
box either as cleaning up the inputs (by
mentally grouping the logic with the inputs) or as a specific set of choices for our
CM[1]
M
CM[0]
dont care output values (by grouping
(mango flavor)
(mango output control)
this
logic
allows
only
a
single
the logic with the outputs). In either
button to be pressed at any time
case, the logic shown enforces our assumptions in a fairly conservative way: if a user presses more than one button, the logic squashes all button
presses. Only a single 1 value at a time can pass through to the wires on the right of the figure.

c 2012 Steven S. Lumetta. All rights reserved.


16

For completeness, the K-maps corresponding to this implementation are given here.
CL [1]

CL [0]

LB
00

01

11

10

CM [1]

LB
00

01

11

10

CM [0]

LB
00

01

11

10

LB
00

01

11

10

Generalizations and Applications


The approaches that we illustrated to clean up the input signals to our design have application in many
areas. The ideas in this section are drawn from the field and are sometimes the subjects of later classes, but
are not exam material for our class.
Prioritization of distinct inputs is used to arbitrate between devices attached to a processor. Processors
typically execute much more quickly than do devices. When a device needs attention, the device signals
the processor by changing the voltage on an interrupt line (the name comes from the idea that the device
interrupts the processors current activity, such as running a user program). However, more than one device
may need the attention of the processor simultaneously, so a priority encoder is used to impose a strict order
on the devices and to tell the processor about their needs one at a time. If you want to learn more about
this application, take ECE391.
When components are designed together, assuming that some input patterns do not occur is common practice,
since such assumptions can dramatically reduce the number of gates required, improve performance, reduce
power consumption, and so forth. As a side effect, when we want to test a chip to make sure that no defects
or other problems prevent the chip from operating correctly, we have to be careful so as not to test bit
patterns that should never occur in practice. Making up random bit patterns is easy, but can produce bad
results or even destroy the chip if some parts of the design have assumed that a combination produced
randomly can never occur. To avoid these problems, designers add extra logic that changes the disallowed
patterns into allowed patterns, just as we did with our design. The use of random bit patterns is common
in Built-In Self Test (BIST), and so the process of inserting extra logic to avoid problems is called BIST
hardening. BIST hardening can add 10-20% additional logic to a design. Our graduate class on digital
system testing, ECE543, covers this material, but has not been offered recently.

c 2012 Steven S. Lumetta. All rights reserved.


17

ECE120: Introduction to Computer Engineering


Notes Set 2.3
Example: Bit-Sliced Addition
In this set of notes, we illustrate basic logic design using integer addition as an example. By recognizing
and mimicking the structured approach used by humans to perform addition, we introduce an important
abstraction for logic design. We follow this approach to design an adder known as a ripple-carry adder,
then discuss some of the implications of the approach and highlight how the same approach can be used in
software. In the next set of notes, we use the same technique to design a comparator for two integers.

One Bit at a Time


Many of the operations that we want to perform on groups of bits can be broken down into repeated
operations on individual bits. When we add two binary numbers, for example, we first add the least
significant bits, then move to the second least significant, and so on. As we go, we may need to carry from
lower bits into higher bits. When we compare two (unsigned) binary numbers with the same number of bits,
we usually start with the most significant bits and move downward in significance until we find a difference
or reach the end of the two numbers. In the latter case, the two numbers are equal.
When we build combinational logic to implement this kind of calculation, our approach as humans can be
leveraged as an abstraction technique. Rather than building and optimizing a different Boolean function for
an 8-bit adder, a 9-bit adder, a 12-bit adder, and any other size that we might want, we can instead design
a circuit that adds a single bit and passes any necessary information into another copy of itself. By using
copies of this bit-sliced adder circuit, we can mimic our approach as humans and build adders of any size,
just as we expect that a human could add two binary numbers of any size. The resulting designs are, of
course, slightly less efficient than designs that are optimized for their specific purpose (such as adding two
17-bit numbers), but the simplicity of the approach makes the tradeoff an interesting one.

Abstracting the Human Process


Think about how we as humans add two N -bit numbers, A and B. An
illustration appears to the right, using N = 8. For now, lets assume
that our numbers are stored in an unsigned representation. As you
know, addition for 2s complement is identical except for the calculation
of overflow. We start adding from the least significant bit and move
to the left. Since adding two 1s can overflow a single bit, we carry a 1
when necessary into the next column. Thus, in general, we are actually
adding three input bits. The carry from the previous column is usually
not written explicitly by humans, but in a digital system we need to
write a 0 instead of leaving the value blank.

carry C 0 0 1 1
A
0 0 1
B + 0 0 1
sum S
0 1 1

0
1
0
1

1
0
0
1

1 (0)
1 1
0 1
0 0

information flows
in this direction

Focus now on the addition of a single column. Except for the first
and last bits, which we might choose to handle slightly differently, the
addition process is identical for any column. We add a carry in bit
(possibly 0) with one bit from each of our numbers to produce a sum
bit and a carry out bit for the next column. Column addition is the
task that our bit slice logic must perform.
The diagram to the right shows an abstract model of our adder bit
slice. The inputs from the next least significant bit come in from the
right. We include arrowheads because figures are usually drawn with
inputs coming from the top or left and outputs going to the bottom or
right. Outside of the bit slice logic, we index the carry bits using the

0
1
1
0

M+1

Cout

AM

BM

adder
bit
slice M
S

SM

C in

c 2012 Steven S. Lumetta. All rights reserved.


18

bit number. The bit slice has C M provided as an input and produces C M+1 as an output. Internally, we
use Cin to denote the carry input, and Cout to denote the carry output. Similarly, the bits AM and BM
from the numbers A and B are represented internally as A and B, and the bit SM produced for the sum S is
represented internally as S. The overloading of meaning should not confuse you, since the context (designing
the logic block or thinking about the problem as a whole) should always be clear.
The abstract device for adding three inputs bits and producing two output bits is called a full adder. You
may also encounter the term half adder, which adds only two input bits. To form an N -bit adder, we
integrate N copies of the full adderthe bit slice that we design nextas shown below. The result is called
a ripple carry adder because the carry information moves from the low bits to the high bits slowly, like a
ripple on the surface of a pond.

an Nbit adder composed of bit slices

Cout

AN1

BN1

adder
C in
bit
slice N1

N1

Cout

AN2

BN2

adder
C in
bit
slice N2

...

Cout

A1

B1

adder
bit
slice 1

C in

Cout

A0

B0

adder
bit
slice 0

S N1

S N2

S1

S0

C in

Designing the Logic


Now we are ready to design our adder bit slice. Lets start by writing a truth table for Cin and S, as shown
on the left below. To the right of the truth tables are K-maps for each output, and equations for each output
are then shown to the right of the K-maps. We suggest that you work through identification of the prime
implicants in the K-maps and check your work with the equations.
A
0
0
0
0
1
1
1
1

B
0
1
0
1
0
0
1
1

Cin
0
0
1
1
0
1
0
1

Cout
0
0
0
1
0
1
1
1

S
0
1
1
0
1
0
0
1

Cout

AB
00

01

11

10

00

01

11

10

Cout

A B + A Cin + B Cin

A B Cout + A B Cout +

A B Cout + A B Cout
A B Cout

Cin

AB

Cin

The equation for Cout implements a majority function on three bits. In particular, a carry is produced
whenever at least two out of the three input bits (a majority) are 1s. Why do we mention this name?
Although we know that we can build any logic function from NAND gates, common functions such as those
used to add numbers may benefit from optimization. Imagine that in some technology, creating a majority
function directly may produce a better result than implementing such a function from logic gates. In such
a case, we want the person designing the circuit to know that can make use of such an improvement. We
rewrote the equation for S to make use of the XOR operation for a similar reason: the implementation of
XOR gates from transistors may be slightly better than the implementation of XOR based on NAND gates.
If a circuit designer provides an optimized variant of XOR, we want our design to make use of the optimized
version.

c 2012 Steven S. Lumetta. All rights reserved.


19

an adder bit slice (known as a "full adder")

an adder bit slice using NAND gates

Cout

Cin

Cout

Cin

The gate diagrams above implement a single bit slice for an adder. The version on the left uses AND and
OR gates (and an XOR for the sum), while the version on the right uses NAND gates, leaving the XOR as
an XOR.
Lets discuss the design in terms of area and speed. As an estimate of area, we can count gates, remembering
that we need two transistors per input on a gate. For each bit, we need three 2-input NAND gates, one
3-input NAND gate, and a 3-input XOR gate (a big gate; around 30 transistors). For speed, we make rough
estimates in terms of the amount of time it takes for a CMOS gate to change its output once its input has
changed. This amount of time is called a gate delay. We can thus estimate our designs speed by simply
counting the maximum number of gates on any path from input to output. For this measurement, using a
NAND/NOR representation of the design is important to getting the right answer. Here we have two gate
delays from any of the inputs to the Cout output. The XOR gate may be a little slower, but none of its
inputs come from other gates anyway. When we connect multiple copies of our bit slice logic together to
form an adder, the A and B inputs to the outputs is not as important as the delay from Cin to the outputs.
The latter delay adds to the total delay of our comparator on a per-bit-slice basisthis propagation delay
gives rise to the name ripple carry. Looking again at the diagram, notice that we have two gate delays
from Cin to Cout . The total delay for an N -bit comparator based on this implementation is thus two gate
delays per bit, for a total of 2N gate delays.

Adders and Word Size


Now that we know how to build an N -bit adder, we can add some detail to the
diagram that we drew when we introduced 2s complement back in Notes Set 1.2, as
shown to the right. The adder is important enough to computer systems to merit
its own symbol in logic diagrams, which is shown to the right with the inputs and
outputs from our design added as labels. The text in the middle marking the
symbol as an adder is only included for clarity: any time you see a symbol of the
shape shown to the right, it is an adder (or sometimes a device that can add and
do other operations). The width of the operand input and output lines then tells
you the size of the adder.

Nbit adder
Cout

C in
S
N

You may already know that most computers have a word size specified as part of the Instruction Set
Architecture. The word size specifies the number of bits in each operand when the computer adds two
numbers, and is often used widely within the microarchitecture as well (for example, to decide the number of
wires to use when moving bits around). Most desktop and laptop machines now have a word size of 64 bits,
but many phone processors (and desktops/laptops a few years ago) use a 32-bit word size. Embedded
microcontrollers may use a 16-bit or even an 8-bit word size.

20

c 2012 Steven S. Lumetta. All rights reserved.


Having seen how we can build an N -bit adder from simple chunks
of logic operating on each pair of bits, you should not have much
difficulty in understanding the diagram to the right. If we start with
a design for an N -bit addereven if that design is not built from
bit slices, but is instead optimized for that particular sizewe can
create a 2N -bit adder by simply connecting two copies of the N -bit
adder. We give the adder for the less significant bits (the one on the
right in the figure) an initial carry of 0, and pass the carry produced
by the adder for the less significant bits into the carry input of the
adder for the more significant bits. We calculate overflow based on
the results of the adder for more significant bits (the one on the left
in the figure), using the method appropriate to the type of operands
we are adding (either unsigned or 2s complement).

Nbit adder

Nbit adder

Cout

Cout

C in
S

C in
S

You should also realize that this connection need not be physical. In other words, if a computer has an N -bit
adder, it can handle operands with 2N bits (or 3N , or 10N , or 42N ) by using the N -bit adder repeatedly,
starting with the least significant bits and working upward until all of the bits have been added. The
computer must of course arrange to have the operands routed to the adder a few bits at a time, and must
ensure that the carry produced by each addition is then delivered to the carry input (of the same adder!) for
the next addition. In the coming months, you will learn how to design hardware that allows you to manage
bits in this way, so that by the end of our class, you will be able to design a simple computer on your own.

c 2012 Steven S. Lumetta. All rights reserved.


21

ECE120: Introduction to Computer Engineering


Notes Set 2.4
Example: Bit-Sliced Comparison
This set of notes develops comparators for unsigned and 2s complement numbers using the bit-sliced approach that we introduced in Notes Set 2.3. We then use algebraic manipulation and variation of the internal
representation to illustrate design tradeoffs.

Comparing Two Numbers


Lets begin by thinking about how we as humans compare two N -bit numbers, A and B. An illustration appears to the right, using N = 8. For now,
lets assume that our numbers are stored in an unsigned representation, so
we can just think of them as binary numbers with leading 0s. We handle
2s complement values later in these notes.

humans usually compare


in this direction
A7 A6 A5 A4 A3 A2 A1 A0
A 0 0 0 0 1 0 0 1

As humans, we typically start comparing at the most significant bit. After all,
if we find a difference in that bit, we are done, saving ourselves some time. In
the example to the right, we know that A < B as soon as we reach bit 4 and
observe that A4 < B4 . If we instead start from the least significant bit, we
must always look at all of the bits.

B 0 0 0 1 0 0 0 1
B7 B6 B5 B4 B3 B2 B1 B0
lets design logic that

When building hardware to compare all of the bits at once, however, hardware
compares in this direction
for comparing each bit must exist, and the final result must be able to consider
all of the bits. Our choice of direction should thus instead depend on how effectively we can build the
corresponding functions. For a single bit slice, the two directions are almost identical. Lets develop a bit
slice for comparing from least to most significant.

An Abstract Model
Comparison of two numbers, A and B, can produce three possible answers: A < B, A = B, or A > B (one
can also build an equality comparator that combines the A < B and A > B cases into a single answer).
As we move from bit to bit in our design, how much information needs to pass from one bit to the next? Here
you may want to think about how you perform the task yourself. And perhaps to focus on the calculation
for the most significant bit. You need to know the values of the two bits that you are comparing. If those
two are not equal, you are done. But if the two bits are equal, what do you do? The answer is fairly simple:
pass along the result from the less significant bits. Thus our bit slice logic for bit M needs to be able to
accept three possible answers from the bit slice logic for bit M 1 and must be able to pass one of three
possible answers to the logic for bit M + 1. Since log2 (3) = 2, we need two bits of input and two bits of
output in addition to our input bits from numbers A and B.
The diagram to the right shows an abstract model of our comparator bit
A
B
slice. The inputs from the next least significant bit come in from the
right. We include arrowheads because figures are usually drawn with
inputs coming from the top or left and outputs going to the bottom or
A
B
right. Outside of the bit slice logic, we index these comparison bits using
C
Z comparator C
C
the bit number. The bit slice has C1M1 and C0M1 provided as inputs
bit
slice M C
C
Z
C
and produces C1M and C0M as outputs. Internally, we use C1 and C0 to
denote these inputs, and Z1 and Z0 to denote the outputs. Similarly, the
bits AM and BM from the numbers A and B are represented internally simply as A and B. The overloading
of meaning should not confuse you, since the context (designing the logic block or thinking about the problem
as a whole) should always be clear.
M

M
1

M1
1

M
0

M1
0

c 2012 Steven S. Lumetta. All rights reserved.


22

A Representation and the First Bit


We need to select a representation for our three possible answers before we can
design any logic. The representation chosen affects the implementation, as we
discuss later in these notes. For now, we simply choose the representation to the
right, which seems reasonable.
Now we can design the logic for the first bit (bit 0). In keeping with the bit slice
philosophy, in practice we simply use another copy of the full bit slice design for
bit 0 and attach the C1 C0 inputs to ground (to denote A = B). Here we tackle
the simpler problem as a warm-up exercise.
The truth table for bit 0 appears to the right (recall that we use Z1 and Z0 for
the output names). Note that the bit 0 function has only two meaningful inputs
there is no bit to the right of bit 0. If the two inputs A and B are the same, we
output equality. Otherwise, we do a 1-bit comparison and use our representation
mapping to select the outputs. These functions are fairly straightforward to derive
by inspection. They are:
Z1 = A B
Z0

C1
0
0
1
1

C0
0
1
0
1

meaning
A=B
A<B
A>B
not used

A
0
0
1
1

B
0
1
0
1

Z1
0
0
1
0

Z0
0
1
0
0

AB

These forms should also be intuitive, given the representation that we chose: A > B if and only if A = 1
and B = 0; A < B if and only if A = 0 and B = 1.
Implementation diagrams for our
A
A
one-bit functions appear to the right.
Z1
Z1
The diagram to the immediate right
shows the implementation as we might
Z0
Z0
initially draw it, and the diagram on
B
B
the far right shows the implementation
converted to NAND/NOR gates for a more accurate estimate of complexity when implemented in CMOS.
The exercise of designing the logic for bit 0 is also useful in the sense that the logic structure illustrated
forms the core of the full design in that it identifies the two cases that matter: A < B and A > B.
Now we are ready to design the full function. Lets start by writing a full truth table, as shown on the left
below.
A
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1

B
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1

C1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1

C0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1

Z1
0
0
1
x
0
0
0
x
1
1
1
x
0
0
1
x

Z0
0
1
0
x
1
1
1
x
0
0
0
x
0
1
0
x

A
0
0
0
0
0
0
1
1
1
1
1
1
x

B
0
0
0
1
1
1
0
0
0
1
1
1
x

C1
0
0
1
0
0
1
0
0
1
0
0
1
1

C0
0
1
0
0
1
0
0
1
0
0
1
0
1

Z1
0
0
1
0
0
0
1
1
1
0
0
1
x

Z0
0
1
0
1
1
1
0
0
0
0
1
0
x

A
0
0
0
0
0
0
1
1
1
1
1
1

B C1
0
0
0
0
0
1
1
0
1
0
1
1
0
0
0
0
0
1
1
0
1
0
1
1
other

C0
0
1
0
0
1
0
0
1
0
0
1
0

Z1
0
0
1
0
0
0
1
1
1
0
0
1
x

Z0
0
1
0
1
1
1
0
0
0
0
1
0
x

In the truth table, we marked the outputs as dont care (xs) whenever C1 C0 = 11. You might recall that
we ran into problems with our ice cream dispenser control in Notes Set 2.2. However, in that case we could
not safely assume that a user did not push multiple buttons. Here, our bit slice logic only accepts inputs

c 2012 Steven S. Lumetta. All rights reserved.


23

from other copies of itself (or a fixed value for bit 0), andassuming that we design the logic correctlyour
bit slice never generates the 11 combination. In other words, that input combination is impossible (rather
than undesirable or unlikely), so the result produced on the outputs is irrelevant.
It is tempting to shorten the full truth table by replacing groups of rows. For example, if AB = 01, we
know that A < B, so the less significant bits (for which the result is represented by the C1 C0 inputs) dont
matter. We could write one row with input pattern ABC1 C0 = 01xx and output pattern Z1 Z0 = 01. We
might also collapse our dont care output patterns: whenever the input matches ABC1 C0 =xx11, we dont
care about the output, so Z1 Z0 =xx. But these two rows overlap in the input space! In other words, some
input patterns, such as ABC1 C0 = 0111, match both of our suggested new rows. Which output should take
precedence? The answer is that a reader should not have to guess. Do not use overlapping rows to shorten
a truth table. In fact, the first of the suggested new rows is not valid: we dont need to produce output 01 if
we see C1 C0 = 11. Two valid short forms of this truth table appear to the right of the full table. If you have
an other entry, as shown in the rightmost table, this entry should always appear as the last row. Normal
rows, including rows representing multiple input patterns, are not required to be in any particular order.
Use whatever order makes the table easiest to read for its purpose (usually by treating the input pattern as
a binary number and ordering rows in increasing numeric order).
In order to translate our design into algebra, we transcribe the
truth table into a K-map for each output variable, as shown to the
right. You may want to perform this exercise yourself and check
that you obtain the same solution. Implicants for each output
are marked in the K-maps, giving the following equations:
Z1
Z0

= A B + A C1 + B C1
= A B + A C0 + B C0

An implementation based on our equations appears to


the right. The figure makes it easy to see the symmetry
between the inputs, which arises from the representation
that weve chosen. Since the design only uses two-level
logic (not counting the inverters on the A and B inputs,
since inverters can be viewed as 1-input NAND or NOR
gates), converting to NAND/NOR simply requires replacing all of the AND and OR gates with NAND gates.

Z1

Z0

C1C0
00

01

11

10

00

01

AB

C1C0
00

01

11

10

00

01

AB
11

11

10

10

a comparator bit slice (first attempt)


A

C1

Lets discuss the designs efficiency roughly in terms of


area and speed. As an estimate of area, we can count
gates, remembering that we need two transistors per input
on a gate. Our initial design uses two inverters, six 2-input
gates, and two 3-input gates.
C0

Z1

Z0

For speed, we make rough estimates in terms of the


amount of time it takes for a CMOS gate to change its
output once its input has changed. This amount of time
B
is called a gate delay. We can thus estimate our designs
speed by simply counting the maximum number of gates on any path from input to output. For this measurement, using a NAND/NOR representation of the design is important to getting the right answer, but, as
we have discussed, the diagram above is equivalent on a gate-for-gate basis. Here we have three gate delays
from the A and B inputs to the outputs (through the inverters). But when we connect multiple copies of
our bit slice logic together to form a comparator, as shown on the next page, the delay from the A and B
inputs to the outputs is not as important as the delay from the C1 and C0 inputs to the outputs. The latter
delay adds to the total delay of our comparator on a per-bit-slice basis. Looking again at the diagram, notice
that we have only two gate delays from the C1 and C0 inputs to the outputs. The total delay for an N -bit
comparator based on this implementation is thus three gate delays for bit 0 and two more gate delays per
additional bit, for a total of 2N + 1 gate delays.

c 2012 Steven S. Lumetta. All rights reserved.


24

an Nbit unsigned comparator composed of bit slices


AN1

N1

C1

A
B
Z1 comparator C 1

N1

C0

BN1

Z0

bit
slice N1 C 0

AN2

N2

C1

A
B
Z1 comparator C 1

N2

C0

A1

BN2

Z0

...

bit
slice N2 C 0

C1

A0

A
B
Z1 comparator C 1

C0

B1

Z0

bit
slice 1

C1

A
B
Z1 comparator C 1

C0

C0

B0

Z0

bit
slice 0

C0

Optimizing Our Design


We have a fairly good design at this pointgood enough for a homework or exam problem in this class,
certainlybut lets consider how we might further optimize it. Today, optimization of logic at this level
is done mostly by computer-aided design (CAD) tools, but we want you to be aware of the sources of
optimization potential and the tradeoffs involved. And, if the topic interests you, someone has to continue
to improve CAD software!
The first step is to manipulate our algebra to expose common terms that occur due to the designs symmetry.
Starting with our original equation for Z1 , we have

Z1

= A B + A C1 + B C1

= A B + A + B C1
= A B + A B C1

Similarly,

Z0

= A B + A B C0

Notice that the second term in each equation now includes the complement of first term from the other
equation. For example, the Z1 equation includes the complement of the AB product that we need to
compute Z0 . We may be able to improve our design by combining these computations.
An implementation based on our
new algebraic formulation appears
to the right. In this form, we
seem to have kept the same number of gates, although we have replaced the 3-input gates with inverters. However, the middle inverters disappear when we convert
to NAND/NOR form, as shown below to the right. Our new design requires only two inverters and
six 2-input gates, a substantial reduction relative to the original implementation.
Is there a disadvantage? Yes, but
only a slight one. Notice that the
path from the A and B inputs to
the outputs is now four gates (maximum) instead of three. Yet the path
from C1 and C0 to the outputs is
still only two gates. Thus, overall,
we have merely increased our N -bit
comparators delay from 2N +1 gate
delays to 2N + 2 gate delays.

a comparator bit slice (optimized)


C1
A

Z1

Z0

C0

a comparator bit slice (optimized, NAND/NOR)


C1
A

Z1

Z0

C0

c 2012 Steven S. Lumetta. All rights reserved.


25

Extending to 2s Complement
What about comparing 2s complement numbers? Can we make use of the unsigned comparator that we
just designed?
Lets start by thinking about the sign of the numbers A and B. Recall that 2s complement records a
numbers sign in the most significant bit. For example, in the 8-bit numbers shown in the first diagram in
this set of notes, the sign bits are A7 and B7 . Lets denote these sign bits in the general case by As and Bs .
Negative numbers have a sign bit equal to 1, and non-negative numbers have a sign bit equal to 0. The table
below outlines an initial evaluation of the four possible combinations of sign bits.
As
0
0
1
1

Bs
0
1
0
1

interpretation
A 0 AND B 0
A 0 AND B < 0
A < 0 AND B 0
A < 0 AND B < 0

solution
use unsigned comparator on remaining bits
A>B
A<B
unknown

What should we do when both numbers are negative? Need we


design a completely separate logic circuit? Can we somehow
convert a negative value to a positive one?

A3 A2 A1 A 0

B3 B2 B1 B0

A 1 1 0 0 (4)

The answer is in fact much simpler.


Recall that
2s complement is defined based on modular arithmetic.
Given an N-bit negative number A, the representation for
the bits A[N 2 : 0] is the same as the binary (unsigned)
representation of A + 2N 1 . An example appears to the right.

B 1 1 1 0 (2)

4 = 4 + 8

6 = 2 + 8

Lets define Ar = A + 2N 1 as the value of the remaining bits for A and Br similarly for B. What happens
if we just go ahead and compare Ar and Br using an (N 1)-bit unsigned comparator? If we find that
Ar < Br we know that Ar 2N 1 < Br 2N 1 as well, but that means A < B! We can do the same with
either of the other possible results. In other words, simply comparing Ar with Br gives the correct answer
for two negative numbers as well.
All we need to design is a logic block for the sign bits.
At this point, we might write out a K-map, but instead
lets rewrite our high-level table with the new information, as shown to the right.

As
0
0
1
1

Bs
0
1
0
1

solution
pass result from less significant bits
A>B
A<B
pass result from less significant bits

Looking at the table, notice the similarity to the highlevel design for a single bit of an unsigned value. The
only difference is that the two A 6= B cases are reversed. If we swap As and Bs , the function is identical.
We can simply use another bit slice but swap these two inputs. Implementation of an N -bit 2s complement
comparator based on our bit slice comparator is shown below. The blue circle highlights the only change
from the N -bit unsigned comparator, which is to swap the two inputs on the sign bit.
an Nbit 2s complement comparator composed of bit slices
AN1

N1

C1

N1
0

BN1

A
B
Z1 comparator C 1
Z0

bit
slice N1 C 0

AN2

N2

C1

N2
0

BN2

A
B
Z1 comparator C 1
Z0

bit
slice N2 C 0

A1

...

C1

1
0

A0

B1

A
B
Z1 comparator C 1
Z0

bit
slice 1

C0

C1

0
0

B0

A
B
Z1 comparator C 1
Z0

bit
slice 0

C0

c 2012 Steven S. Lumetta. All rights reserved.


26

Further Optimization
Lets return to the topic of optimization. To what extent did the
representation of the three outcomes affect our ability to develop a
good bit slice design? Although selecting a good representation can
be quite important, for this particular problem most representations
lead to similar implementations.

C1
0
0
1
1

C0
0
1
0
1

original
A=B
A<B
A>B
not used

alternate
A=B
A>B
not used
A<B

Some representations, however, have interesting properties. Consider


the alternate representation on the right, for example (a copy of the original representation is included for
comparison). Notice that in the alternate representation, C0 = 1 whenever A 6= B. Once we have found
the numbers to be different in some bit, the end result can never be equality, so perhaps with the right
representationthe new one, for examplewe might be able to cut delay in half?
An implementation based on the
alternate representation appears in
the diagram to the right. As you can
see, in terms of gate count, this design replaces one 2-input gate with
an inverter and a second 2-input
gate with a 3-input gate. The path
lengths are the same, requiring 2N +
2 gate delays for an N -bit comparator. Overall, it is about the same as
our original design.

a comparator bit slice (alternate representation)


C1
A
Z1

B
Z0
C0

Why didnt it work? Should we consider still other representations? In fact, none of the possible representations that we might choose for a bit slice can cut the delay down to one gate delay per bit. The problem
is fundamental, and is related to the nature of CMOS. For a single bit slice, we define the incoming and
outgoing representations to be the same. We also need to have at least one gate in the path to combine
the C1 and C0 inputs with information from the bit slices A and B inputs. But all CMOS gates invert the
sense of their inputs. Our choices are limited to NAND and NOR. Thus we need at least two gates in the
path to maintain the same representation.
One simple answer is to use different representations for odd and even bits. Instead, we optimize a logic
circuit for comparing two bits. We base our design on the alternate representation. The implementation is
shown below. The left shows an implementation based on the algebra, and the right shows a NAND/NOR
implementation. Estimating by gate count and number of inputs, the two-bit design doesnt save much over
two single bit slices in terms of area. In terms of delay, however, we have only two gate delays from C1
and C0 to either output. The longest path from the A and B inputs to the outputs is five gate delays. Thus,
for an N -bit comparator built with this design, the total delay is only N + 3 gate delays. But N has to be
even.
a comparator 2bit slice (alternate representation, NAND/NOR)

a comparator 2bit slice (alternate representation)


C1

C1

A1

A1

Z1

Z1

B1

B1

A0

A0

B0

B0

Z0

Z0
C0

C0

As you can imagine, continuing to scale up the size of our logic block gives us better performance at
the expense of a more complex design. Using the alternate representation may help you to see how one
can generalize the approach to larger groups of bitsfor example, you may have noticed the two bitwise
comparator blocks on the left of the implementations above.

c 2012 Steven S. Lumetta. All rights reserved.


27

ECE120: Introduction to Computer Engineering


Notes Set 2.5
Example: Using Abstraction to Simplify Problems
In this set of notes, we illustrate the use of abstraction to simplify problems. In particular, we show how two
specific examplesinteger subtraction and identification of upper-case letters in ASCIIcan be implemented
using logic functions that we have already developed. We also introduce a conceptual technique for breaking
functions into smaller pieces, which allows us to solve several simpler problems and then to compose a full
solution from these partial solutions.
Together with the idea of bit-sliced designs that we introduced earlier, these techniques help to simplify
the process of designing logic that operates correctly. The techniques can, of course, lead to less efficient
designs, but correctness is always more important than performance. The potential loss of efficiency is often
acceptable for three reasons. First, as we mentioned earlier, computer-aided design tools for optimizing logic
functions are fairly effective, and in many cases produce better results than human engineers (except in the
rare cases in which the human effort required to beat the tools is worthwhile). Second, as you know from the
design of the 2s complement representation, we may be able to reuse specific pieces of hardware if we think
carefully about how we define our problems and representations. Finally, many tasks today are executed in
software, which is designed to leverage the fairly general logic available via an instruction set architecture.
A programmer cannot easily add new logic to a users processor. As a result, the hardware used to execute
a function typically is not optimized for that function. The approaches shown in this set of notes illustrate
how abstraction can be used to design logic.

Subtraction
Our discussion of arithmetic implementation has focused so far on addition. What about other operations,
such as subtraction, multiplication, and division? The latter two require more work, and we will not discuss
them in detail until later in our class (if at all).
Subtraction, however, can be performed almost trivially using logic that we have already designed. Lets
say that we want to calculate the difference D between two N -bit numbers A and B. In particular, we
want to find D = A B. For now, think of A, B, and D as 2s complement values. Recall how we defined
the 2s complement representation: the N -bit pattern that we use to represent B is the same as the base 2
bit pattern for (2N B), so we can use an adder if we first calculate the bit pattern for B, then add the
resulting pattern to A. As you know, our N -bit adder always produces a result that is correct modulo 2N ,
so the result of such an operation, D = 2N + A B, is correct so long as the subtraction does not overflow.
How can we calculate 2N B? The same way that we do by hand! Calculate
the 1s complement, (2N 1) B, then add 1. The diagram to the right
shows how we can use the N -bit adder that we designed in Notes Set 2.3 to
build an N -bit subtracter. New elements appear in blue in the figurethe
rest of the logic is just an adder. The box labeled 1s comp. calculates
the 1s complement of the value B, which together with the carry in value of 1
correspond to calculating B. Whats in the 1s comp. box? One inverter
per bit in B. Thats all we need to calculate the 1s complement. You might
now ask: does this approach also work for unsigned numbers? The answer is
yes, absolutely. However, the overflow conditions for both 2s complement and
unsigned subtraction are different than the overflow condition for either type
of addition. What does the carry out of our adder signify, for example? The
answer may not be immediately obvious.

B
N

1s comp.
N

Nbit adder
Cout
What does the
carry out mean?

C in

S
N

D=AB

Lets start with the overflow condition for unsigned subtraction. Overflow means that we cannot represent
the result. With an N -bit unsigned number, we have A B 6 [0, 2N 1]. Obviously, the difference cannot
be larger than the upper limit, since A is representable and we are subtracting a non-negative (unsigned)
value. We can thus assume that overflow occurs only when A B < 0. In other words, when A < B.

c 2012 Steven S. Lumetta. All rights reserved.


28

To calculate the unsigned subtraction overflow condition in terms of the bits, recall that our adder is calculating 2N + A B. The carry out represents the 2N term. When A B, the result of the adder is at
least 2N , and we see a carry out, Cout = 1. However, when A < B, the result of the adder is less than 2N ,
and we see no carry out, Cout = 0. Overflow for unsigned subtraction is thus inverted from overflow for
unsigned addition: a carry out of 0 indicates an overflow for subtraction.
What about overflow for 2s complement subtraction? We can use arguments similar to those that we used
to reason about overflow of 2s complement addition to prove that subtraction of one negative number from
a second negative number can never overflow. Nor can subtraction of a non-negative number from a second
non-negative number overflow.
If A 0 and B < 0, the subtraction overflows iff A B 2N 1 . Again using similar arguments as
before, we can prove that the difference D appears to be negative in the case of overflow, so the product
AN 1 BN 1 DN 1 evaluates to 1 when this type of overflow occurs (these variables represent the most
significant bits of the two operands and the difference; in the case of 2s complement, they are also the sign
bits). Similarly, if A < 0 and B 0, we have overflow when A B < 2N 1 . Here we can prove that D 0
on overflow, so AN 1 BN 1 DN 1 evaluates to 1.
Our overflow condition for N -bit 2s complement subtraction is thus given by the following:
AN 1 BN 1 DN 1 + AN 1 BN 1 DN 1
If we calculate all four overflow conditionsunsigned and 2s complement, addition and subtractionand
provide some way to choose whether or not to complement B and to control the Cin input, we can use the
same hardware for addition and subtraction of either type.

Checking ASCII for Uppercase Letters


Lets now consider how we can check whether or not an ASCII character is an upper-case letter. Lets call
the 7-bit letter C = C6 C5 C4 C3 C2 C1 C0 and the function that we want to calculate L(C). The function L
should equal 1 whenever C represents an upper-case letter, and 0 whenever C does not.
In ASCII, the 7-bit patterns from 0x41 through 0x5A correspond to the letters A through Z in order. Perhaps
you want to draw a 7-input K-map? Get a few large sheets of paper! Instead, imagine that weve written the
full 128-row truth table. Lets break the truth table into pieces. Each piece will correspond to one specific
pattern of the three high bits C6 C5 C4 , and each piece will have 16 entries for the four low bits C3 C2 C1 C0 .
The truth tables for high bits 000, 001, 010, 011, 110, and 111 are easy: the function is exactly 0. The other
two truth tables appear on the left below. Weve called the two functions T4 and T5 , where the subscripts
correspond to the binary value of the three high bits of C.
C3
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1

C2
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1

C1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1

C0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1

T4
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

T5
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0

T4

C3 C2
00

01

11

10

00

01

11

10

00

01

11

10

00

01

11

10

C1C0

T5

T4

= C3 + C2 + C1 + C0

T5

= C3 + C2 C1 + C2 C0

C3 C2

C1C0

c 2012 Steven S. Lumetta. All rights reserved.


29

As shown to the right of the truth tables, we can then draw simpler K-maps for T4 and T5 , and can solve
the K-maps to find equations for each, as shown to the right (check that you get the same answers).
How do we merge these results to form our final expression for L? We AND each of the term functions (T4
and T5 ) with the appropriate minterm for the high bits of C, then OR the results together, as shown here:

C6 C5 C4 T4 + C6 C5 C4 T5

C6 C5 C4 (C3 + C2 + C1 + C0 ) + C6 C5 C4 (C3 + C2 C1 + C2 C0 )

Rather than trying to optimize by hand, we can at this point let the CAD tools take over, confident that we
have the right function to identify an upper-case ASCII letter.
Breaking the truth table into pieces and using simple logic to reconnect the pieces is one way to make use of
abstraction when solving complex logic problems. In fact, recruiters for some companies often ask questions
that involve using specific logic elements as building blocks to implement other functions. Knowing that you
can implement a truth table one piece at a time will help you to solve this type of problem.
Lets think about other ways to tackle the problem of calculating L. In Notes Sets 2.3 and 2.4, we developed
adders and comparators. Can we make use of these as building blocks to check whether C represents an
upper-case letter? Yes, of course we can: by comparing C with the ends of the range of upper-case letters,
we can check whether or not C falls in that range.
The idea is illustrated on the left below using two 7-bit comparators constructed as discussed in Notes Set 2.4.
The comparators are the black parts of the drawing, while the blue parts represent our extensions to calculate L. Each comparator is given the value C as one input. The second value to the comparators is either
the letter A (0x41) or the letter Z (0x5A). The meaning of the 2-bit input and result to each comparator is
given in the table on the right below. The inputs on the right of each comparator are set to 0 to ensure that
equality is produced if C matches the second input (B). One output from each comparator is then routed
to a NOR gate to calculate L. Lets consider how this combination works. The left comparator compares C
with the letter A (0x41). If C 0x41, the comparator produces Z0 = 0. In this case, we may have a letter.
On the other hand, if C < 0x41, the comparator produces Z0 = 1, and the NOR gate outputs L = 0, since we
do not have a letter in this case. The right comparator compares C with the letter Z (0x5A). If C 0x5A,
the comparator produces Z1 = 0. In this case, we may have a letter. On the other hand, if C > 0x5A, the
comparator produces Z1 = 1, and the NOR gate outputs L = 0, since we do not have a letter in this case.
Only when 0x41 C 0x51 does L = 1, as desired.

0x41
7

discard

0x5A

Z1

C1
7bit
comparator

Z0

C0

Z1

C1
7bit
comparator

Z0

C0

discard

Z1
0
0
1
1

Z0
0
1
0
1

meaning
A=B
A<B
A>B
not used

30
What if we have only 8-bit adders available for our use,
such as those developed in Notes Set 2.3? Can we still calculate L? Yes. The diagram shown to the right illustrates
the approach, again with black for the adders and blue
for our extensions. Here we are actually using the adders
as subtracters, but calculating the 1s complements of the
constant values by hand. The zero extend box simply
adds a leading 0 to our 7-bit ASCII letter. The left adder
subtracts the letter A from C: if no carry is produced, we
know that C < 0x41 and thus C does not represent an
upper-case letter, and L = 0. Similarly, the right adder
subtracts 0x5B (the letter Z plus one) from C. If a carry
is produced, we know that C 0x5B, and thus C does
not represent an upper-case letter, and L = 0. With the
right combination of carries (1 from the left and 0 from the
right), we obtain L = 1.

c 2012 Steven S. Lumetta. All rights reserved.


0xBE

0xA4

zero extend
8

8bit adder
Cout

C in
S

8bit adder
1

Cout

C in
S

discard

discard

Looking carefully at this solution, however, you might be struck by the fact that we are calculating two sums
and then discarding them. Surely such an approach is inefficient?
We offer two answers. First, given the design shown above, a good CAD tool recognizes that the sum outputs
of the adders are not being used, and does not generate logic to calculate them. The logic for the two carry
bits used to calculate L can then be optimized. Second, the design shown, including the calculation of the
sums, is similar in efficiency to what happens at the rate of about 1015 times per second, 24 hours a day, seven
days a week, inside processors in data centers processing HTML, XML, and other types of human-readable
Internet traffic. Abstraction is a powerful tool.
Later in our class, you will learn how to control logical connections between hardware blocks so that you
can make use of the same hardware for adding, subtracting, checking for upper-case letters, and so forth.

c 2000-2012 Steven S. Lumetta. All rights reserved.


31

ECE120: Introduction to Computer Engineering


Notes Set 2.6
Sequential Logic
These notes introduce logic components for storing bits, building up from the idea of a pair of cross-coupled
inverters through an implementation of a flip-flop, the storage abstractions used in most modern logic design
processes. We then introduce simple forms of timing diagrams, which we use to illustrate the behavior of a
logic circuit. After commenting on the benefits of using a clocked synchronous abstraction when designing
systems that store and manipulate bits, we illustrate timing issues and explain how these are abstracted
away in clocked synchronous designs. Sections marked with an asterisk are provided solely for your interest,
but you probably need to learn this material in later classes.

Storing One Bit


So far we have discussed only implementation of Boolean functions: given some bits as input, how can we
design a logic circuit to calculate the result of applying some function to those bits? The answer to such
questions is called combinational logic (sometimes combinatorial logic), a name stemming from the
fact that we are combining existing bits with logic, not storing new bits.
You probably already know, however, that combinational logic alone is not sufficient to build a computer.
We need the ability to store bits, and to change those bits. Logic designs that make use of stored bitsbits
that can be changed, not just wires to high voltage and groundare called sequential logic. The name
comes from the idea that such a system moves through a sequence of stored bit patterns (the current stored
bit pattern is called the state of the system).
Consider the diagram to the right. What is it? A 1-input NAND gate, or
an inverter drawn badly? If you think carefully about how these two gates
are built, you will realize that they are the same thing. Conceptually, we
use two inverters to store a bit, but in most cases we make use of NAND
gates to simplify the mechanism for changing the stored bit.
Take a look at the design to the right. Here we have taken
two inverters (drawn as NAND gates) and coupled each
Q
gates output to the others input. What does the circuit
do? Lets make some guesses and see where they take us.
Imagine that the value at Q is 0. In that case, the lower
P
gate drives P to 1. But P drives the upper gate, which
forces Q to 0. In other words, this combination forms a
stable state of the system: once the gates reach this state, they continue to hold these values. The
of the truth table to the right (outputs only) shows this state.

Q
0
1

P
1
0

first row

What if Q = 1, though? In this case, the lower gate forces P to 0, and the upper gate in turn forces Q to 1.
Another stable state! The Q = 1 state appears as the second row of the truth table.
We have identified all of the stable states.1 Notice that our cross-coupled inverters can store a bit. Unfortunately, we have no way to specify which value should be stored, nor to change the bits value once the gates
have settled into a stable state. What can we do?

1 Most

logic families also allow unstable states in which the values alternate rapidly between 0 and 1. These metastable
states are beyond the scope of our class, but ensuring that they do not occur in practice is important for real designs.

c 2000-2012 Steven S. Lumetta. All rights reserved.


32
Lets add an input to the upper gate, as shown
The S
to the right. We call the input S.
stands for setas you will see, our new input
allows us to set our stored bit Q to 1. The use
of a complemented name for the input indicates
that the input is active low. In other words,
the input performs its intended task (setting Q
to 1) when its value is 0 (not 1).

the complement marking


indicates that this input
is active low

S
1
1
0

Q
0
1
1

P
1
0
0

Think about what happens when the new input is not active, S = 1. As you know, ANDing any value with 1
produces the same value, so our new input has no effect when S = 1. The first two rows of the truth table
are simply a copy of our previous table: the circuit can store either bit value when S = 1. What happens
when S = 0? In that case, the upper gates output is forced to 1, and thus the lower gates is forced to 0.
This third possibility is reflected in the last row of the truth table.
Now we have the ability to force bit Q to
have value 1, but if we want Q = 0, we
just have to hope that the circuit happens
to settle into that state when we turn on
the power. What can we do?
As you probably guessed, we add an input
to the other gate, as shown to the right.
the inputs purWe call the new input R:
pose is to reset bit Q to 0, and the input
is active low. We extend the truth table
= 0 and S = 1,
to include a row with R
which forces Q = 0 and P = 1.

an RS latch (stores a single bit)


S

R
1
1
1
0
0

R
the complement markings
indicate that the inputs
are active low

S
1
1
0
1
0

Q
0
1
1
0
1

P
1
0
0
1
1

S
latch. One can also build R-S latches (with active
The circuit that we have drawn has a name: an R S
latch (labeled incorrectly). Can you figure out
high set and reset inputs). The textbook also shows an Rhow to build an R-S latch yourself?
S
latch. What happens if we set S = 0 and R
= 0 at the same time?
Lets think a little more about the RNothing bad happens immediately. Looking at the design, both gates produce 1, so Q = 1 and P = 1. The
back to 1 at around the same time, the stored bit may end
bad part happens later: if we raise both S and R
up in either state.2
from ever being 1
We can avoid the problem by adding gates to prevent the two control inputs (S and R)
at the same time. A single inverter might technically suffice, but lets build up the structure shown below,
have no practical effect at the moment. A
noting that the two inverters in sequence connecting D to R
is forced to 0, and the bit is reset.
truth table is shown to the right of the logic diagram. When D = 0, R

Similarly, when D = 1, S is forced to 0, and the bit is set.


D

S
Q

D
0
1

R
0
1

S
1
0

Q
0
1

P
1
0

P
R

Unfortunately, except for some interesting timing characteristics, the new design has the same functionality
as a piece of wire. And, if you ask a circuit designer, thin wires also have some interesting timing characteristics. What can we do? Rather than having Q always reflect the current value of D, lets add some extra
inputs to the new NAND gates that allow us to control when the value of D is copied to Q, as shown on the
next page.
2 Or,

worse, in a metastable state, as mentioned earlier.

c 2000-2012 Steven S. Lumetta. All rights reserved.


33

a gated D latch (stores a single bit)


D

S
Q

P
WE

WE
1
1
0
0
0
0

D
0
1
0
1
0
1

The W E (write enable) input controls whether or not Q mirrors the value of D.
The first two rows in the truth table are replicated from our wire design: a value
of W E = 1 has no effect on the first two NAND gates, and Q = D. A value of W E = 0
= 1, S = 1, and the bit Q can
forces the first two NAND gates to output 1, thus R
occupy either of the two possible states, regardless of the value of D, as reflected in
the lower four lines of the truth table.

R
0
1
1
1
1
1

S
1
0
1
1
1
1

Q
0
1
0
0
1
1

P
1
0
1
1
0
0

gated D latch symbol


D

WE

P and Q were always opposites,


so we now just write Q
(often omitted entirely in drawings)

The circuit just shown is called a gated D latch, and is an important mechanism
for storing state in sequential logic. (Random-access memory uses a slightly different technique to connect
the cross-coupled inverters, but latches are used for nearly every other application of stored state.) The D
stands for data, meaning that the bit stored is matches the value of the input. Other types of latches
(including S-R latches) have been used historically, but D latches are used predominantly today, so we omit
discussion of other types. The gated qualifier refers to the presence of an enable input (we called it W E)
to control when the latch copies its input into the stored bit. A symbol for a gated D latch appears to the
since P = Q
in a gated D latch.
right. Note that we have dropped the name P in favor of Q,

The Clock Abstraction


High-speed logic designs often use latches directly. Engineers specify the number of latches as well as
combinational logic functions needed to connect one latch to the next, and the CAD tools optimize the
combinational logic. The enable inputs of successive groups of latches are then driven by what we call a
clock signal, a single bit line distributed across most of the chip that alternates between 0 and 1 with a
regular period. While the clock is 0, one set of latches holds its bit values fixed, and combinational logic uses
those latches as inputs to produce bits that are copied into a second set of latches. When the clock switches
to 1, the second set of latches stops storing their data inputs and retains their bit values in order to drive
other combinational logic, the results of which are copied into a third set of latches. Of course, some of the
latches in the first and third sets may be the same.
The timing of signals in such designs plays a critical role in their correct operation. Fortunately, we have
developed powerful abstractions that allow engineers to ignore much of the complexity while thinking about
the Boolean logic needed for a given design.
Towards that end, we make a simplifying assumption for the rest of our class, and for most of your career
as an undergraduate: the clock signal is a square wave delivered uniformly across a chip. For example, if
the period of a clock is 0.5 nanoseconds (2 GHz), the clock signal is a 1 for 0.25 nanoseconds, then a 0 for
0.25 nanoseconds. We assume that the clock signal changes instantaneously and at the same time across
the chip. Such a signal can never exist in the real world: voltages do not change instantaneously, and the
phrase at the same time may not even make sense at these scales. However, circuit designers can usually
provide a clock signal that is close enough, allowing us to forget for now that no physical signal can meet
our abstract definition.

34

c 2000-2012 Steven S. Lumetta. All rights reserved.


The device shown to the right is a


a positive edgetriggered D flipflop
master-slave implementation of a
positive edge-triggered D flip-flop.
X
D flipflop symbol
D
D
Q
D
Q
Q
As you can see, we have constructed it
D
Q
from two gated D latches with oppoWE Q
WE Q
Q
site senses of write enable. The D
Q
CLOCK
part of the name has the same meaning as with a gated D latch: the bit
(masterslave implementation)
stored is the same as the one delivered
to the input. Other variants of flip-flops have also been built, but this type dominates designs today. Most
are actually generated automatically from hardware design languages (that is, computer programming
languages for hardware design).
When the clock is low (0), the first latch copies its value from the flip-flops D input to the midpoint
(marked X in our figure, but not usually given a name). When the clock is high (1), the second latch copies
its value from X to the flip-flops output Q. Since X can not change when the clock is high, the result is that
the output changes each time the clock changes from 0 to 1, which is called the rising edge or positive
edge (the derivative) of the clock signal. Hence the qualifier positive edge-triggered, which describes the
flip-flops behavior. The master-slave implementation refers to the use of two latches. In practice, flip-flops
are almost never built this way. To see a commercial design, look up 74LS74, which uses six 3-input NAND
gates and allows set/reset of the flip-flop (using two extra inputs).
The timing diagram to the right illustrates the operation
D copied to X when CLK is low
of our flip-flop. In a timing diagram, the horizontal axis
(with a masterslave implementation)
represents (continuous) increasing time, and the individual
notice that D may change
D
before rising edge
lines represent voltages for logic signals. The relatively
X
simple version shown here uses only binary values for each
CLK
signal. One can also draw transitions more realistically
Q
(as taking finite time). The dashed vertical lines here
X copied to Q when CLK is high
represent the times at which the clock rises. To make the
example interesting, we have varied D over two clock cycles. Notice that even though D rises and falls during
the second clock cycle, its value is not copied to the output of our flip-flop. One can build flip-flops that
catch this kind of behavior (and change to output 1), but we leave such designs for later in your career.
Circuits such as latches and flip-flops are called sequential feedback circuits, and the process by which
they are designed is beyond the scope of our course. The feedback part of the name refers to the fact
that the outputs of some gates are fed back into the inputs of others. Each cycle in a sequential feedback
circuit can store one bit. Circuits that merely use latches and flip-flops as building blocks are called clocked
synchronous sequential circuits. Such designs are still sequential: their behavior depends on the bits
currently stored in the latches and flip-flops. However, their behavior is substantially simplified by the use
of a clock signal (the clocked part of the name) in a way that all elements change at the same time
(synchronously).
The value of using flip-flops and assuming a square-wave clock signal with uniform timing may not be clear
to you yet, but it bears emphasis. With such assumptions, we can treat time as having discrete values. In
other words, time ticks along discretely, like integers instead of real numbers. We can look at the state of
the system, calculate the inputs to our flip-flops through the combinational logic that drives their D inputs,
and be confident that, when time moves to the next discrete value, we will know the new bit values stored
in our flip-flops, allowing us to repeat the process for the next clock cycle without worrying about exactly
when things change. Values change only on the rising edge of the clock!
Real systems, of course, are not so simple, and we do not have one clock to drive the universe, so engineers
must also design systems that interact even though each has its own private clock signal (usually with different periods).

c 2000-2012 Steven S. Lumetta. All rights reserved.


35

Static Hazards: Causes and Cures*


Before we forget about the fact that real designs do not provide perfect clocks, lets explore some of the
issues that engineers must sometimes face. We discuss these primarily to ensure that you appreciate the
power of the abstraction that we use in the rest of our course. In later classes (probably our 298, which will
absorb material from 385), you may be required to master this material. For now, we provide it simply for
your interest.
C.

Consider the circuit shown below, for which the output is given by the equation S = AB + B

A
B

A
B
C
S
B goes high
B goes low
a glitch in S

The timing diagram on the right shows a glitch in the output when the input shifts from ABC = 110 to 100,
that is, when B falls. The problem lies in the possibility that the upper AND gate, driven by B, might go
goes high. In such a case, the OR gate output S falls until the
low before the lower AND gate, driven by B,
second AND gate rises, and the output exhibits a glitch.
A circuit that might exhibit a glitch in an output that functionally remains stable at 1 is said to have a
static-1 hazard. The qualifier static here refers to the fact that we expect the output to remain static,
while the 1 refers to the expected value of the output.
The presence of hazards in circuits can be problematic in certain cases. In domino logic, for example, an
output is precharged and kept at 1 until the output of a driving circuit pulls it to 0, at which point it stays
low (like a domino that has been knocked over). If the driving circuit contains static-1 hazards, the output
may fall in response to a glitch.
Similarly, hazards can lead to unreliable behavior in sequential feedback circuits. Consider the addition of
a feedback loop to the circuit just discussed, as shown in the figure below. The output of the circuit is now
CS,
where S denotes the state after S feeds back through the lower
given by the equation S = AB + B
AND gate. In the case discussed previously, the transition from ABC = 110 to 100, the glitch in S can
break the feedback, leaving S low or unstable. The resulting sequential feedback circuit is thus unreliable.

A
B
C

A
B
C
S
unknown/unstable

Eliminating static hazards from two-level circuits is fairly straightforward. The Karnaugh map to the right corresponds to our original circuit; the solid lines indicate the
implicants selected by the AND gates. A static-1 hazard is present when two adjacent 1s
in the K-map are not covered by a common implicant. Static-0 hazards do not occur in
two-level SOP circuits.

AB
00 01 11 10

0
1

1 0 1 1
0 0 1 0

Eliminating static hazards requires merely extending the circuit with consensus terms in order to ensure that
some AND gate remains high through every transition between input states with output 1.3 In the K-map

shown, the dashed line indicates the necessary consensus term, AC.
3 Hazard

elimination is not in general simple; we have considered only two-level circuits.

c 2000-2012 Steven S. Lumetta. All rights reserved.


36

Dynamic Hazards*
Consider an input transition for which we expect to see a change in an output. Under certain timing
conditions, the output may not transition smoothly, but instead bounce between its original value and its
new value before coming to rest at the new value. A circuit that might exhibit such behavior is said to
contain a dynamic hazard. The qualifier dynamic refers to the expected change in the output.
Dynamic hazards appear only in more complex circuits, such as the one shown below. The output of this
+ AC + B
C + BD.
circuit is defined by the equation Q = AB

A
B

f
j
g

Q
h

Consider the transition from the input state ABCD = 1111 to 1011, in
which B falls from 1 to 0. For simplicity, assume that each gate has a
T f g h i j Q
delay of 1 time unit. If B goes low at time T = 0, the table shows the
0 0 0 0 1 1 1
progression over time of logic levels at several intermediate points in the
1 1 1 1 1 1 1
circuit and at the output Q. Each gate merely produces the appropriate
2 1 1 1 0 0 0
output based on its inputs in the previous time step. After one delay,
3 1 1 1 0 1 1
the three gates with B as a direct input change their outputs (to stable,
4 1 1 1 0 1 0
final values). After another delay, at T = 2, the other three gates respond to the initial changes and flip their outputs. The resulting changes induce another set of changes at
T = 3, which in turn causes the output Q to change a final time at T = 4.
The output column in the table illustrates the possible impact of a dynamic hazard: rather than a smooth
transition from 1 to 0, the output drops to 0, rises back to 1, and finally falls to 0 again. The dynamic hazard
in this case can be attributed to the presence of a static hazard in the logic that produces intermediate value j.

c 2000-2012 Steven S. Lumetta. All rights reserved.


37

Essential Hazards*
Essential hazards are inherent to the function of a circuit and may appear in any implementation. In
sequential feedback circuit design, they must be addressed at a low level to ensure that variations in logic
path lengths (timing skew) through a circuit do not expose them. With clocked synchronous circuits,
essential hazards are abstracted into a single form: clock skew, or disparate clock edge arrival times at a
circuits flip-flops.
An example demonstrates the possible effects: consider the construction of a clocked synchronous circuit to
recognize 0-1 sequences on an input IN . Output Q should be held high for one cycle after recognition, that
is, until the next rising clock edge. A description of states and a state diagram for such a circuit appear below.
S1 S0
00
01
10
11

state
A
B
C
unused

1/0

meaning
nothing, 1, or 11 seen last
0 seen last
01 recognized (output high)

0/0
0/0

1/0

0/1

1/1

For three states, we need two (= log2 3) flip-flops. Denote the internal state S1 S0 . The specific internal
state values for each logical state (A, B, and C) simplify the implementation and the example. A state table
and K-maps for the next-state logic appear below. The state table uses one line per state with separate
columns for each input combination, making the table more compact than one with one line per state/input
combination. Each column contains the full next-state information, including output. Using this form of the
state table, the K-maps can be read directly from the table.
S1 S0
00
01
11
10

IN
0
1
01/0 00/0
01/0 10/0
x
x
01/1 00/1

S1+

S1 S0

S0+

S1 S0

IN

0
1

0 0 x 0
0 1 x 0

S1 S0

00 01 11 10

00 01 11 10

IN

0
1

1 1 x 1
0 0 x 0

00 01 11 10

IN

0
1

0 0 x 1
0 0 x 1

Examining the K-maps, we see that the excitation and output equations are S1+ = IN S0 , S0+ = IN , and
Q = S1 . An implementation of the circuit using two D flip-flops appears below. Imagine that mistakes in
routing or process variations have made the clock signals path to flip-flop 1 much longer than its path into
flip-flop 0, as illustrated.
IN
D0 S0

D1 S1

CLK
a long, slow wire

Due to the long delays, we cannot assume that rising clock edges arrive at the flip-flops at the same time.
The result is called clock skew, and can make the circuit behave improperly by exposing essential hazards.
In the logical B to C transition, for example, we begin in state S1 S0 = 01 with IN = 1 and the clock edge
rising. Assume that the edge reaches flip-flop 0 at time T = 0. After a flip-flop delay (T = 1), S0 goes low.
After another AND gate delay (T = 2), input D1 goes low, but the second flip-flop has yet to change state!
Finally, at some later time, the clock edge reaches flip-flop 1. However, the output S1 remains at 0, leaving
the system in state A rather than state C.
Fortunately, in clocked synchronous sequential circuits, all essential hazards are related to clock skew. This
fact implies that we can eliminate a significant amount of complexity from circuit design by doing a good
job of distributing the clock signal. It also implies that, as a designer, you should avoid specious addition of
logic in a clock path, as you may regret such a decision later, as you try to debug the circuit timing.

c 2000-2012 Steven S. Lumetta. All rights reserved.


38

Proof Outline for Clocked Synchronous Design*


This section outlines a proof of the claim made regarding clock skew being the only source of essential
hazards for clocked synchronous sequential circuits. A proof outline suggests the form that a proof might
take and provides some of the logical arguments, but is not rigorous enough to be considered a proof. Here
we use a D flip-flop to illustrate a method for identifying essential hazards (the D flip-flop has no essential
hazards, however), then argue that the method can be applied generally to collections of flip-flops in a clocked
synchronous design to show that essential hazards occur only in the form of clock skew.

state
low
high
pulse low
pulse high

L
H
PL
PH

clock
clock
clock
clock

low, last input low


high, last input low
low, last input high (output high, too)
high, last input high (output high, too)

state

00

CLK D
01
11

10

PH

PL

PL

PL

PH

PH

PL

PL

PH

PH

Consider the sequential feedback state table for a positive edge-triggered D flip-flop, shown above. In
designing and analyzing such circuits, we assume that only one input bit changes at a time. The state
table consists of one row for each state and one column for each input combination. Within a row, input
combinations that have no effect on the internal state of the circuit (that is, those that do not cause any
change in the state) are said to be stable; these states are circled. Other states are unstable, and the circuit
changes state in response to changes in the inputs.
For example, given an initial state L with low output, low clock, and high input D, the solid arcs trace the
reaction of the circuit to a rising clock edge. From the 01 input combination, we move along the column to
the 11 column, which indicates the new state, PH. Moving down the column to that states row, we see that
the new state is stable for the input combination 11, and we stop. If PH were not stable, we would continue
to move within the column until coming to rest on a stable state.
An essential hazard appears in such a table as a difference between the final state when flipping a bit once
and the final state when flipping a bit thrice in succession. The dashed arcs in the figure illustrate the
concept: after coming to rest in the PH state, we reset the input to 01 and move along the PH row to find
a new state of PL. Moving up the column, we see that the state is stable. We then flip the clock a third
time and move back along the row to 11, which indicates that PH is again the next state. Moving down
the column, we come again to rest in PH, the same state as was reached after one flip. Flipping a bit three
times rather than once evaluates the impact of timing skew in the circuit; if a different state is reached after
two more flips, timing skew could cause unreliable behavior. As you can verify from the table, a D flip-flop
has no essential hazards.
A group of flip-flops, as might appear in a clocked synchronous circuit, can and usually does have essential
hazards, but only dealing with the clock. As you know, the inputs to a clocked synchronous sequential
circuit consist of a clock signal and other inputs (either external of fed back from the flip-flops). Changing
an input other than the clock can change the internal state of a flip-flop (of the master-slave variety), but
flip-flop designs do not capture the number of input changes in a clock cycle beyond one, and changing an
input three times is the same as changing it once. Changing the clock, of course, results in a synchronous
state machine transition.
The detection of essential hazards in a clocked synchronous design based on flip-flops thus reduces to examination of the state machine. If the next state of the machine has any dependence on the current state, an
essential hazard exists, as a second rising clock edge moves the system into a second new state. For a single
D flip-flop, the next state is independent of the current state, and no essential hazards are present.

c 2012-2014 Steven S. Lumetta. All rights reserved.


39

ECE120: Introduction to Computer Engineering


Notes Set 2.7

Registers
This set of notes introduces registers, an abstraction used for storage of groups of bits in digital systems.
We introduce some terminology used to describe aspects of register design and illustrate the idea of a shift
register. The registers shown here are important abstractions for digital system design. In the Fall 2012
offering of our course, we will cover this material on the third midterm.

Registers
A register is a storage element composed from one or more
flip-flops operating on a common clock. In addition to the flipflops, most registers include logic to control the bits stored by
the register. For example, the D flip-flops described previously
copy their inputs at the rising edge of each clock cycle, discarding whatever bits they have stored during that cycle. To
enable a flip-flop to retain its value, we might try to hide the
rising edge of the clock from the flip-flop, as shown to the right.

IN
LOAD
CLK

IN
LOAD
CLK
Q

incorrect output value

The LOAD input controls the clock signals through a method


c
specious falling edge
specious rising edge
known as clock gating. When LOAD is high, the circuit
(has no effect)
(causes incorrect load)
reduces to a regular D flip-flop. When LOAD is low, the flipflop clock input, c, is held high, and the flip-flop stores its current value. The problems with clock gating are
twofold. First, adding logic to the clock path introduces clock skew, which may cause timing problems later
in the development process (or, worse, in future projects that use your circuits as components). Second, in
the design shown above, the LOAD signal can only be lowered while the clock is high to prevent spurious
rising edges from causing incorrect behavior, as shown in the timing diagram.
A better approach is to add a feedback loop from the flip-flops
output, as shown in the figure to the right. When LOAD is low,
the upper AND gate selects the feedback line, and the register
reloads its current value. When LOAD is high, the lower AND
gate selects the IN input, and the register loads a new value.
We will generalize this type of selection structure, known as a
multiplexer, later in our course. The result is similar to a gated
D latch with distinct write enable and clock lines.
We can use this extended flip-flop as
a bit slice for a multi-bit register. A
four-bit register of this type is shown to
the right. Four data linesone for each
bitenter the registers from the top of
the figure. When LOAD is low, the logic
copies each flip-flops value back to its input, and the IN input lines are ignored.
When LOAD is high, the logic forwards
each IN line to the corresponding flipflops D input, allowing the register to
load the new 4-bit value. The use of one
input line per bit to load a multi-bit register in a single cycle is termed a parallel
load.

IN3

LOAD
Q

IN
CLK

IN2

IN1

IN0

LOAD

CLK
Q3

Q2

Q1

Q0

c 2012-2014 Steven S. Lumetta. All rights reserved.


40

Shift Registers
Certain types of registers include logic
D
D
D
D
SI
SO
to manipulate data held within the register. A shift register is an important
example of this type. The simplest
CLK
shift register is a series of D flip-flops,
with the output of each attached to
Q3
Q2
Q1
Q0
the input of the next, as shown to the
right. In the circuit shown, a serial input SI accepts a single bit of data per cycle and delivers the bit four
cycles later to a serial output SO. Shift registers serve many purposes in modern systems, from the obvious
uses of providing a fixed delay and performing bit shifts for processor arithmetic to rate matching between
components and reducing the pin count on programmable logic devices such as field programmable gate
arrays (FPGAs), the modern form of the programmable logic array mentioned in the textbook.
An example helps to illustrate the rate matching problem: historical I/O buses used fairly slow clocks, as they
had to drive signals and be arbitrated over relatively long distances. The Peripheral Control Interconnect
(PCI) standard, for example, provided for 33 and 66 MHz bus speeds. To provide adequate data rates, such
buses use many wires in parallel, either 32 or 64 in the case of PCI. In contrast, a Gigabit Ethernet (local
area network) signal travelling over a fiber is clocked at 1.25 GHz, but sends only one bit per cycle. Several
layers of shift registers sit between the fiber and the I/O bus to mediate between the slow, highly parallel
signals that travel over the I/O bus and the fast, serial signals that travel over the fiber. The latest variant
of PCI, PCIe (e for express), uses serial lines at much higher clock rates.
Returning to the figure above, imagine that the outputs Qi feed into logic clocked at 1/4th the rate of the
shift register (and suitably synchronized). Every four cycles, the flip-flops fill up with another four bits, at
which point the outputs are read in parallel. The shift register shown can thus serve to transform serial
data to 4-bit-parallel data at one-quarter the clock speed. Unlike the registers discussed earlier, the shift
register above does not support parallel load, which prevents it from transforming a slow, parallel stream
of data into a high-speed serial stream. The use of serial load requires N cycles for an N-bit register, but
can reduce the number of wires needed
SHIFT
to support the operation of the shift regSI
SO
ister. How would you add support for
parallel load? How many additional inputs would be necessary?
The shift register shown above is also incapable of storing a value rather than
continuously shifting. The addition of
the same structure that we used to control register loading can be applied to
control shifting, as shown to the right.

CLK
Q3

Q2

Q1

Q0

Through the use of


Qi+1 INi Qi1
more complex input
C1
logic, we can construct
C0
a shift register with adC1 C0
meaning
ditional functionality.
0
0
retain current value
The bit slice shown to
0
1
shift left (low to high)
the right allows us to
load new value (from IN )
1
0
build a bidirectional
1
1
shift right (high to low)
shift register with
parallel load capaD
Qi
bility and the ability
CLK
to retain its value
indefinitely. The two control inputs, C1 and C0 , make use of a representation that we have chosen for the
four operations supported by our shift register, as shown in the table to the far right.

c 2012-2014 Steven S. Lumetta. All rights reserved.



The bit slice allows us
to build N -bit shift registers by replicating the
slice and adding a fixed
amount of glue logic
(for example, the SO
output logic). The figure to the right represents a 4-bit bidirectional
shift register constructed
in this way.

SI
C1
C0

41

IN3

IN2

IN1

IN0

Q i+1 INi Q i1
C1
Qi
C0

Q i+1 INi Q i1
C1
Qi
C0

Q i+1 INi Q i1
C1
Qi
C0

Q i+1 INi Q i1
C1
Qi
C0

bidirectional
shift register bit

bidirectional
shift register bit

bidirectional
shift register bit

bidirectional
shift register bit

CLK

SO
Q3

Q2

Q1

Q0

At each rising clock edge, the action specified by C1 C0 is taken. When C1 C0 = 00, the register holds its
current value, with the register value appearing on Q[3 : 0] and each flip-flop feeding its output back into
its input. For C1 C0 = 01, the shift register shifts left: the serial input, SI, is fed into flip-flop 0, and Q3 is
passed to the serial output, SO. Similarly, when C1 C0 = 11, the shift register shifts right: SI is fed into
flip-flop 3, and Q0 is passed to SO. Finally, the case C1 C0 = 10 causes all flip-flops to accept new values
from IN [3 : 0], effecting a parallel load.
Several specialized shift operations are used to support data manipulation in modern processors (CPUs).
Essentially, these specializations dictate the form of the glue logic for a shift register as well as the serial
input value. The simplest is a logical shift, for which SI and SO are hardwired to 0; incoming bits are
always 0. A cyclic shift takes SO and feeds it back into SI, forming a circle of register bits through which
the data bits cycle.
Finally, an arithmetic shift treats the shift register contents as a number in 2s complement form. For
non-negative numbers and left shifts, an arithmetic shift is the same as a logical shift. When a negative
number is arithmetically shifted to the right, however, the sign bit is retained, resulting in a function similar
to division by two. The difference lies in the rounding direction. Division by two rounds towards zero in most
processors: 5/2 gives 2. Arithmetic shift right rounds away from zero for negative numbers (and towards
zero for positive numbers): 5 >> 1 gives 3. We transform our previous shift register into one capable of
arithmetic shifts by eliminating the serial input and feeding the most significant bit, which represents the
sign in 2s complement form, back into itself for right shifts, as shown below.
IN3

IN2

IN1

IN0

Q i+1 INi Q i1
C1
Qi
C0

Q i+1 INi Q i1
C1
Qi
C0

Q i+1 INi Q i1
C1
Qi
C0

Q i+1 INi Q i1
C1
Qi
C0

bidirectional
shift register bit

bidirectional
shift register bit

bidirectional
shift register bit

bidirectional
shift register bit

C1
C0

CLK

SO
Q3

Q2

Q1

Q0

42

c 2012-2014 Steven S. Lumetta. All rights reserved.


ECE120: Introduction to Computer Engineering


Notes Set 2.8
Summary of Part 2 of the Course
These notes supplement the Patt and Patel textbook, so you will also need to read and understand the
relevant chapters (see the syllabus) in order to master this material completely.
The difficulty of learning depends on the type of task involved. Remembering new terminology is relatively
easy, while applying the ideas underlying design decisions shown by example to new problems posed as
human tasks is relatively hard. In this short summary, we give you lists at several levels of difficulty of
what we expect you to be able to do as a result of the last few weeks of studying (reading, listening, doing
homework, discussing your understanding with your classmates, and so forth).
Well start with the skills, and leave the easy stuff for the next page. We expect you to be able to exercise
the following skills:
Design a CMOS gate from n-type and p-type transistors.
Apply DeMorgans laws repeatedly to simplify the form of the complement of a Boolean expression.
Use a K-map to find a reasonable expression for a Boolean function (for example, in POS or SOP form
with the minimal number of terms).
More generally, translate Boolean logic functions among concise algebraic, truth table, K-map, and
canonical (minterm/maxterm) forms.
When designing combinational logic, we expect you to be able to apply the following design strategies:
Make use of human algorithms (for example, multiplication from addition).
Determine whether a bit-sliced approach is applicable, and, if so, make use of one.
Break truth tables into parts so as to solve each part of a function separately.
Make use of known abstractions (adders, comparators, or other abstractions available to you) to simplify
the problem.
And, at the highest level, we expect that you will be able to do the following:
Understand and be able to reason at a high-level about circuit design tradeoffs between area/cost and
performance (and that power is also important, but we havent given you any quantification methods).
Understand the tradeoffs typically made to develop bit-sliced designstypically, bit-sliced designs are
simpler but bigger and slowerand how one can develop variants between the extremes of the bit-sliced
approach and optimization of functions specific to an N -bit design.
Understand the pitfalls of marking a functions value as dont care for some input combinations, and
recognize that implementations do not produce dont care.
Understand the tradeoffs involved in selecting a representation for communicating information between
elements in a design, such as the bit slices in a bit-sliced design.
Explain the operation of a latch or a flip-flop, particularly in terms of the bistable states used to hold
a bit.
Understand and be able to articulate the value of the clocked synchronous design abstraction.

c 2012-2014 Steven S. Lumetta. All rights reserved.


43

You should recognize all of these terms and be able to explain what they mean. For the specific circuits,
you should be able to draw them and explain how they work. Actually, we dont care whether you can draw
something from memorya full adder, for exampleprovided that you know what a full adder does and
can derive a gate diagram correctly for one in a few minutes. Higher-level skills are much more valuable.
Boolean functions and logic gates
- NOT/inverter
- AND
- OR
- XOR
- NAND
- NOR
- XNOR
- majority function
specific logic circuits
- full adder
- ripple carry adder
- N-to-M multiplexer (mux)
- N-to-2N decoder
S
latch
- R- R-S latch
- gated D latch
- master-slave implementation of a
positive edge-triggered D flip-flop
- (bidirectional) shift register
- register supporting parallel load
design metrics
- metric
- optimal
- heuristic
- constraints
- power, area/cost, performance
- computer-aided design (CAD) tools
- gate delay
general math concepts
- canonical form
- N -dimensional hypercube
tools for solving logic problems
- truth table
- Karnaugh map (K-map)
- implicant
- prime implicant
- bit-slicing
- timing diagram

device technology
- complementary metal-oxide
semiconductor (CMOS)
- field effect transistor (FET)
- transistor gate, source, drain
Boolean logic terms
- literal
- algebraic properties
- dual form, principle of duality
- sum, product
- minterm, maxterm
- sum-of-products (SOP)
- product-of-sums (POS)
- canonical sum/SOP form
- canonical product/POS form
- logical equivalence
digital systems terms
- word size
- N -bit Gray code
- combinational/combinatorial logic
- two-level logic
- dont care outputs (xs)
- sequential logic
- state
- active low inputs
- set a bit (to 1)
- reset a bit (to 0)
- master-slave implementation
- positive edge-triggered
- clock signal
- square wave
- rising/positive clock edge
- falling/negative clock edge
- clock gating
- clocked synchronous sequential circuits
- parallel/serial load of registers
- logical/arithmetic/cyclic shift

44

c 2012-2014 Steven S. Lumetta. All rights reserved.


c 2012 Steven S. Lumetta. All rights reserved.


ECE120: Introduction to Computer Engineering


Notes Set 3.1
Serialization and Finite State Machines
The third part of our class builds upon the basic combinational and sequential logic elements that we
developed in the second part. After discussing a simple application of stored state to trade between area and
performance, we introduce a powerful abstraction for formalizing and reasoning about digital systems, the
Finite State Machine (FSM). General FSM models are broadly applicable in a range of engineering contexts,
including not only hardware and software design but also the design of control systems and distributed
systems. We limit our model so as to avoid circuit timing issues in your first exposure, but provide some
amount of discussion as to how, when, and why you should eventually learn the more sophisticated models.
Through development a range of FSM examples, we illustrate important design issues for these systems and
motivate a couple of more advanced combinational logic devices that can be used as building blocks. Together
with the idea of memory, another form of stored state, these elements form the basis for development of
our first computer. At this point we return to the textbook, in which Chapters 4 and 5 provide a solid
introduction to the von Neumann model of computing systems and the LC-3 (Little Computer, version 3)
instruction set architecture. By the end of this part of the course, you will have seen an example of the
boundary between hardware and software, and will be ready to write some instructions yourself.
In this set of notes, we cover the first few parts of this material. We begin by describing the conversion of
bit-sliced designs into serial designs, which store a single bit slices output in flip-flops and then feed the
outputs back into the bit slice in the next cycle. As a specific example, we use our bit-sliced comparator
to discuss tradeoffs in area and performance. We introduce Finite State Machines and some of the tools
used to design them, then develop a handful of simple counter designs. Before delving too deeply into FSM
design issues, we spend a little time discussing other strategies for counter design and placing the material
covered in our course in the broader context of digital system design. Remember that sections marked with
an asterisk are provided solely for your interest, but you may need to learn this material in later classes.

Serialization: General Strategy


In previous notes, we discussed and illustrated the development of bit-sliced logic, in which one designs a
logic block to handle one bit of a multi-bit operation, then replicates the bit slice logic to construct a design
for the entire operation. We developed ripple carry adders in this way in Notes Set 2.3 and both unsigned
and 2s complement comparators in Notes Set 2.4.
Another interesting design strategy is serialization: rather than replicating the bit slice, we can use flipflops to store the bits passed from one bit slice to the next, then present the stored bits to the same bit slice
in the next cycle. Thus, in a serial design, we only need one copy of the bit slice logic! The area needed
for a serial design is usually much less than for a bit-sliced design, but such a design is also usually slower.
After illustrating the general design strategy, well consider these tradeoffs more carefully in the context of
a detailed example.
Recall the general bit-sliced design apa general bitsliced design
proach, as illustrated to the right. Some
perslice inputs
number of copies of the logic for a single
P
P
P
bit slice are connected in sequence. Each
bit slice accepts P bits of operand input
second M
last
first
M
M
M
and produces Q bits of external output.
output R
initial
results
bit
bit
bit
.
.
.
values
logic
Adjacent bit slices receive an addislice
slice
slice
tional M bits of information from the
Q
Q
Q
previous bit slice and pass along M bits
to the next bit slice, generally using some
perslice outputs
representation chosen by the designer.
The first bit slice is initialized by passing in constant values, and some calculation may be performed on the
final bit slices results to produce R bits more external output.

c 2012 Steven S. Lumetta. All rights reserved.


We can transform this bit-sliced design to a serial design with a single copy of the bit slice logic, M + Q
flip-flops, and M gates (and sometimes an inverter). The strategy is illustrated on the right below. A
single copy of the bit slice operates on one set of P external input bits and produces one set of Q external
output bits each clock cycle. In the design shown, these output bits are available during the next cycle, after
they have been stored in the flip-flops. The M bits to be passed to the next bit slice are also stored in
flip-flops, and in the next cycle are provided back to the same physical bit slice as inputs. The first cycle of a
multi-cycle operation must be handled slightly differently, so we add selection logic and an control signal, F .
For the first cycle, we apply F = 1, and the initial values are passed into the bit slice. For all other bits,
we apply F = 0, and the values stored in the flip-flops are returned to the bit slices inputs. After all bits
have passed through the bit sliceafter N cycles for an N -bit designthe final M bits are stored in the
flip-flops, and the results are calculated by the output logic.
a serialized bitsliced design

initialize to 0 when F=1

perslice inputs
F

F
Bi

P
CLK

initial
values

M
M

select
logic

bit
slice

M
flip
flops

Q
flip
flops

output
logic

results

initialize to 1 when F=1


F
Bi

perslice outputs

The selection logic merits explanation. Given that the original design initialized the bits to constant values
(0s or 1s), we need only simple logic for selection. The two drawings on the left above illustrate how Bi ,
the complemented flip-flop output for a bit i, can be combined with the first-cycle signal F to produce an
appropriate input for the bit slice. Selection thus requires one extra gate for each of the M inputs, and we
need an inverter for F if any of the initial values is 1.

Serialization: Comparator Example


We now apply the general strategy to a
a serial unsigned comparator
specific example, our bit-sliced unsigned
A
F
B
CLK
comparator from Notes Set 2.4. The result is
shown to the right. In terms of the general
Z
D
Q
model, the single comparator bit slice acB1
A
B
C
Z
cepts P = 2 bits of inputs each cycle, in this
Q
output
comparator
logic
case a single bit from each of the two numbers
bit
slice
being compared, presented to the bit slice in
Z
C
Z
D
Q
increasing order of significance. The bit slice
B0
Q
produces no external output other than the
final result (Q = 0). Two bits (M = 2) are
produced each cycle by the bit slice and stored
into flip flops B1 and B0 . These bits represent the relationship between the two numbers compared so far
(including only the bit already seen by the comparator bit slice). On the first cycle, when the least significant
bits of A and B are being fed into the bit slice, we set F = 1, which forces the C1 and C0 inputs of the
bit slice to 0 independent of the values stored in the flip-flops. In all other cycles, F = 0, and the NOR
gates set C1 = B1 and C0 = B0 . Finally, after N cycles for an N -bit comparison, the output logicin
this case simply wires, as shown in the dashed boxplaces the result of the comparison on the Z1 and Z0
outputs (R = 2 in the general model). The result is encoded in the representation defined for constructing
the bit slice (see Notes Set 2.4, but the encoding does not matter here).
i

c 2012 Steven S. Lumetta. All rights reserved.


How does the serial design compare


a comparator bit slice (optimized, NAND/NOR)
with the bit-sliced design? As an
C1
estimate of area, lets count gates.
Our optimized design is replicated
Z1
A
to the right for convenience. Each
bit slice requires six 2-input gates
and two inverters. Assume that
B
Z0
each flip-flop requires eight 2-input
gates and two inverters, so the seC0
rial design overall requires 24 gates
and six inverters to handle any number of bits. Thus, for any number of bits N 4, the serial design is
smaller than the bit-sliced design, and the benefit grows with N .
What about performance? In Notes Set 2.4, we counted gate delays for our bit-sliced design. The path
from A or B to the outputs is four gate delays, but the C to Z paths are all two gate delays. Overall, then,
the bit-sliced design requires 2N + 2 gate delays for N bits. What about the serial design?
The performance of the serial design is likely to be much worse for three reasons. First, all paths in the
design matter, not just the paths from bit slice to bit slice. None of the inputs can be assumed to be available
before the start of the clock cycle, so we must consider all paths from input to output. Second, we must also
count gate delays for the selection logic as well as the gates embedded in the flip-flops. Finally, the result
of these calculations may not matter, since the clock speed may well be limited by other logic elsewhere in
the system. If we want a common clock for all of our logic, the clock must not go faster than the slowest
element in the entire system, or some of our logic will not work properly.
What is the longest path through our serial comparator? Lets assume that the path through a flip-flop is
eight gate delays, with four on each side of the clocks rising edge. The inputs A and B are likely to be
driven by flip-flops elsewhere in our system, so we conservatively count four gate delays to A and B and five
gate delays to C1 and C0 (the extra one comes from the selection logic). The A and B paths thus dominate
inside the bit slice, adding four more gate delays to the outputs Z1 and Z0 . Finally, we add the last four
gate delays to flip the first latch in the flip-flops for a total of 12 gate delays. If we assume that our serial
comparator limits the clock frequency (that is, if everything else in the system can use a faster clock), we
take 12 gate delays per cycle, or 12N gate delays to compare two N -bit numbers.
You might also notice that adding support for 2s complement is no longer free. We need extra logic to swap
the A and B inputs in the cycle corresponding to the sign bits of A and B. In other cycles, they must remain
in the usual order. This extra logic is not complex, but adds further delay to the paths.
The bit-sliced and serial designs represent two extreme points in a broad space of design possibilities. Optimization of the entire N-bit logic function (for any metric) represents a third extreme. As an engineer,
you should realize that you can design systems anywhere in between these points as well. At the end of
Notes Set 2.4, for example, we showed a design for a logic slice that compares two bits at a time. In general,
we can optimize logic for any number of bits and then apply multiple copies of the resulting logic in space
(a generalization of the bit-sliced approach), or in time (a generalization of the serialization approach), or
in a combination of the two. Sometimes these tradeoffs may happen at a higher level. As mentioned in
Notes Set 2.3, computer software uses the carry out of an adder to perform addition of larger groups of
bits (over multiple clock cycles) than is supported by the processors adder hardware. In computer system
design, engineers often design hardware elements that are general enough to support this kind of extension
in software.
As a concrete example of the possible tradeoffs, consider a serial comparator design based on the 2-bit slice
variant. This approach leads to a serial design with 24 gates and 10 inverters, which is not much larger than
our earlier serial design. In terms of gate delays, however, the new design is identical, meaning that we finish
a comparison in half the time. More realistic area and timing metrics show slightly more difference between
the two designs. These differences can dominate the results if we blindly scale the idea to handle more bits
without thinking carefully about the design. Neither many-input gates nor gates driving many outputs work
well in practice.

c 2012 Steven S. Lumetta. All rights reserved.


Finite State Machines


A finite state machine (or FSM) is a model for understanding the behavior of a system by describing
the system as occupying one of a finite set of states, moving between these states in response to external
inputs, and producing external outputs. In any given state, a particular input may cause the FSM to move
to another state; this combination is called a transition rule. An FSM comprises five parts: a finite set of
states, a set of possible inputs, a set of possible outputs, a set of transition rules, and methods for calculating
outputs.
When an FSM is implemented as a digital system, all states must be represented as patterns using a fixed
number of bits, all inputs must be translated into bits, and all outputs must be translated into bits. For
a digital FSM, transition rules must be complete; in other words, given any state of the FSM, and any
pattern of input bits, a transition must be defined from that state to another state (transitions from a state
to itself, called self-loops, are acceptable). And, of course, calculation of outputs for a digital FSM reduces
to Boolean logic expressions. In this class, we focus on clocked synchronous FSM implementations, in which
the FSMs internal state bits are stored in flip-flops.
In this section, we introduce the tools used to describe, develop, and analyze implementations of FSMs with
digital logic. In the next few weeks, we will show you how an FSM can serve as the central control logic in a
computer. At the same time, we will illustrate connections between FSMs and software and will make some
connections with other areas of interest in ECE, such as the design and analysis of digital control systems.
The table below gives a list of abstract states for a typical keyless entry system for a car. In this case,
we have merely named the states rather than specifying the bit patterns to be used for each statefor this
reason, we refer to them as abstract states. The description of the states in the first column is an optional
element often included in the early design stages for an FSM, when identifying the states needed for the
design. A list may also include the outputs for each state. Again, in the list below, we have specified these
outputs abstractly. By including outputs for each state, we implicitly assume that outputs depend only on
the state of the FSM. We discuss this assumption in more detail later in these notes (see Machine Models),
but will make the assumption throughout our class.
meaning
vehicle locked
driver door unlocked
all doors unlocked
alarm sounding

state
LOCKED
DRIVER
UNLOCKED
ALARM

drivers door
locked
unlocked
unlocked
locked

other doors
locked
locked
unlocked
locked

alarm on
no
no
no
yes

Another tool used with FSMs is the next-state table (sometimes called a state transition table, or just
a state table), which maps the current state and input combination into the next state of the FSM. The
abstract variant shown below outlines desired behavior at a high level, and is often ambiguous, incomplete,
and even inconsistent. For example, what happens if a user pushes two buttons? What happens if they
push unlock while the alarm is sounding? These questions should eventually be considered. However, we
can already start to see the intended use of the design: starting from a locked car, a user can push unlock
once to gain entry to the drivers seat, or push unlock twice to open the car fully for passengers. To lock
the car, a user can push the lock button at any time. And, if a user needs help, pressing the panic
button sets off an alarm.
state
LOCKED
DRIVER
(any)
(any)

action/input
push unlock
push unlock
push lock
push panic

next state
DRIVER
UNLOCKED
LOCKED
ALARM

c 2012 Steven S. Lumetta. All rights reserved.



A state transition diagram (or transition
diagram, or state diagram), as shown to the
right, illustrates the contents of the next-state
table graphically, with each state drawn in a
circle, and arcs between states labeled with the
input combinations that cause these transitions
from one state to another.

push "unlock"

push
"lock"

LOCKED

DRIVER
push "lock"

push
"panic"

Putting the FSM design into this graphical


form does not solve the problems with the abstract model. The questions that we asked in
regard to the next-state table remain unanswered.

push
"panic"

push
"unlock"

push
"lock"

Implementing an FSM using digital logic requires that we translate the design into bits,
eliminate any ambiguity, and complete the
specification. How many internal bits should
we use? What are the possible input values, and how are their meanings represented
in bits? What are the possible output values, and how are their meanings represented
in bits? We will consider these questions for
several examples in the coming weeks.

push
"lock"

ALARM

push "panic"

UNLOCKED

push
"panic"

For now, we simply define answers for our example design, the keyless entry system. Given four states, we
need at least log2 (4) = 2 bits of internal state, which we store in two flip-flops and call S1 S0 . The table
below lists input and output signals and defines their meaning.
outputs

inputs

D
R
A
U
L
P

driver door; 1 means unlocked


other doors (Remaining doors); 1 means unlocked
alarm; 1 means alarm is sounding
unlock button; 1 means it has been pressed
lock button; 1 means it has been pressed
panic button; 1 means it has been pressed

We can now choose a representation for our states and rewrite the list of states, using bits both for the states
and for the outputs. We also include the meaning of each state for clarity in our example. Note that we can
choose the internal representation in any way. Here we have matched the D and R outputs when possible to
simplify the output logic needed for the implementation. The order of states in the list is not particularly
important, but should be chosen for convenience and clarity (including transcribing bits into to K-maps, for
example).
meaning
vehicle locked
driver door unlocked
all doors unlocked
alarm sounding

state
LOCKED
DRIVER
UNLOCKED
ALARM

S1 S0
00
10
11
01

drivers door
D
0
1
1
0

other doors
R
0
0
1
0

alarm on
A
0
0
0
1

c 2012 Steven S. Lumetta. All rights reserved.


We can also rewrite the next-state table in terms of bits. We use Gray code order on both axes, as these
orders make it more convenient to use K-maps. The values represented in this table are the next FSM
state given the current state S1 S0 and the inputs U , L, and P . Our symbols for the next-state bits are S1+
and S0+ . The + superscript is a common way of expressing the next value in a discrete series, here induced
by the use of clocked synchronous logic in implementing the FSM. In other words, S1+ is the value of S1 in
the next clock cycle, and S1+ in an FSM implemented as a digital system is a Boolean expression based on
the current state and the inputs. For our example problem, we want to be able to write down expressions
for S1+ (S1 , S0 , U, L, P ) and S1+ (S1 , S0 , U, L, P ), as well as expressions for the output logic U (S1 , S0 ), L(S1 , S0 ),
and P (S1 , S0 ).
current state
S1 S0
00
01
11
10
In the process of writing out the
next-state table, we have made decisions for all of the questions that
we asked earlier regarding the abstract state table. These decisions
are also reflected in the complete
state transition diagram shown to
the right. The states have been
extended with state bits and output bits, as S1 S0 /DRA. You
should recognize that we can also
leave some questions unanswered
by placing xs (dont cares) into
our table. However, you should
also understand at this point that
any implementation will produce
bits, not xs, so we must be careful not to allow arbitrary choices
unless any of the choices allowed
is indeed acceptable for our FSMs
purpose. We will discuss this process and the considerations necessary as we cover more FSM design
examples.

000
00
01
11
10

001
01
01
01
01

011
01
01
01
01

U LP
010 110
00
00
00
00
00
00
00
00

111
01
01
01
01

101
01
01
01
01

100
10
01
11
11

ULP=000

ULP=000,010,
or 110

LOCKED
00/000

ULP=100
ULP=010 or 110

ULP=001,011,
101, or 111

DRIVER
10/100
ULP=100

ULP=001,011,
101, or 111
ULP=
010 or
110

ULP=010
or 110

ALARM
01/001

ULP=001,011,101, or 111

UNLOCKED

ULP=000
or 100

11/110

ULP=000,001,011,
111,101, or 100

We have deliberately omitted calculation of expressions for the next-state variables S1+ and S0+ , and for the
outputs U , L, and P . We expect that you are able to do so from the detailed state table above, and may
assign such an exercise as part of your homework.

Synchronous Counters
A counter is a clocked sequential circuit with a state diagram consisting of a single logical cycle. Not all
counters are synchronous. In other words, not all flip-flops in a counter are required to use the same clock
signal. A counter in which all flip-flops do utilize the same clock signal is called a synchronous counter.
Except for a brief introduction to other types of counters in the next section, our class focuses entirely on
clocked synchronous designs, including counters.

c 2012 Steven S. Lumetta. All rights reserved.


The design of synchronous counter circuits is a fairly straightforward exercise


given the desired cycle of output patterns. The task can be more complex if
the internal state bits are allowed to differ from the output bits, so for now
we assume that output Zi is equal to internal bit Si . Note that distinction
between internal states and outputs is necessary if any output pattern appears
more than once in the desired cycle.

000
111

S1
0
0
1
1
0
0
1
1

S0
0
1
0
1
0
1
0
1

S2+
0
0
0
1
1
1
1
0

S1+
0
1
1
0
0
1
1
0

S0+
1
0
1
0
1
0
1
0

S2

S1

S1S0
0

S2
1

00

01

11

10

S0

00

01

11

10

S2+ = S2 S1 S0 + S2 S1 + S2 S0
S+ =
S1 S0 + S1 S0

1
S0+

S0

011
100

S1S0
00

01

11

10

S2

S2

010

101

S1S0
0

3bit
binary
counter
cycle

110

The cycle of states shown to the right corresponds to the states of a 3-bit binary
counter. The numbers in the states represent both internal state bits S2 S1 S0
and output bits Z2 Z1 Z0 . We transcribe this diagram into the next-state table
shown on the left below. We then write out K-maps for the next state bits S2+ ,
S1+ , and S0+ , as shown to the right, and use the K-maps to find expressions for
these variables in terms of the current state.
S2
0
0
0
0
1
1
1
1

001

= S2 (S1 S0 )
= S1 S0
= S0 1

The first form of the expression for each next-state variable is taken directly from the corresponding K-map.
We have rewritten each expression to make the emerging pattern more obvious. We can also derive the pattern intuitively by asking the following: given a binary counter in state SN 1 SN 2 . . . Sj+1 Sj Sj1 . . . S1 S0 ,
when does Sj change in the subsequent state? The answer, of course, is that Sj changes when all of the bits
below Sj are 1. Otherwise, Sj remains the same in the next state. We thus write Sj+ = Sj (Sj1 . . . S1 S0 )
and implement the counter as shown below for a 4-bit design. Note that the usual order of output bits along
the bottom is reversed in the figure, with the most significant bit at the right rather than the left.
a 4bit synchronous binary counter with serial gating

S0

S1
Q

S2
Q

S3
Q

CLK
Z0

Z1

Z2

Z3

The calculation of the left inputs to the XOR gates in the counter shown above is performed with a series of
two-input AND gates. Each of these gates ANDs another flip-flop value into the product. This approach,
called serial gating, implies that an N -bit counter requires more than N 2 gate delays to settle into the
next state. An alternative approach, called parallel gating, calculates each input independently with a

c 2012 Steven S. Lumetta. All rights reserved.


single logic gate, as shown below. The blue inputs to the AND gate for S3 highlight the difference from the
previous figure (note that the two approaches differ only for bits S3 and above). With parallel gating, the
fan-in of the gates (the number of inputs) and the fan-out of the flip-flop outputs (number of other gates
into which an output feeds) grow with the size of the counter. In practice, large counters use a combination
of these two approaches.
a 4bit synchronous binary counter with parallel gating

S0

S1
Q

S2
Q

S3
Q

CLK
Z0

Z1

Z2

Z3

Ripple Counters
A second class of counter drives some of its flip-flops with a clock signal and feeds flip-flop outputs into
the clock inputs of its remaining flip-flops, possibly through additional logic. Such a counter is called a
ripple counter, because the effect of a clock edge ripples through the flip-flops. The delay inherent to the
ripple effect, along with the complexity of ensuring that timing issues do not render the design unreliable,
are the major drawbacks of ripple counters. Compared with synchronous counters, however, ripple counters
consume less energy, and are sometimes used for devices with restricted energy supplies.
General ripple counters can be tricky because of timing issues, but certain types
are easy. Consider the design of binary ripple counter. The state diagram for
a 3-bit binary counter is replicated to the right. Looking at the states, notice
that the least-significant bit alternates with each state, while higher bits flip
whenever the next smaller bit (to the right) transitions from one to zero. To
take advantage of these properties, we use positive edge-triggered D flip-flops
outputs wired back to their inputs. The clock
with their complemented (Q)
input is fed only into the first flip-flop, and the complemented output of each
flip-flop is also connected to the clock of the next.

000
111
110

001
3bit
binary
counter
cycle

101

010
011

100

An implementation of a 4-bit binary ripple


a 4bit binary ripple counter
counter appears to the right. The order of bits
in the figure matches the order used for our
D
Q
D
Q
D
Q
D
Q
synchronous binary counters: least significant
S0
S1
S2
S3
Q
Q
Q
Q
CLK
on the left, most significant on the right. As
you can see from the figure, the technique genZ1
Z0
Z2
Z3
eralizes to arbitrarily large binary ripple counters, but the time required for the outputs to settle after a clock edge scales with the number of flip-flops in
the counter. On the other hand, an average of only two flip-flops see each clock edge (1 + 1/2 + 1/4 + . . .),
which reduces the power requirements.1

1 Recall

that flip-flops record the clock state internally. The logical activity required to record such state consumes energy.

c 2012 Steven S. Lumetta. All rights reserved.


Beginning with the state 0000, at the rising clock edge, the left (S0 ) flip-flop toggles to 1. The second (S1 )
flip-flop sees this change as a falling clock edge and does nothing, leaving the counter in state 0001. When
the next rising clock edge arrives, the left flip-flop toggles back to 0, which the second flip-flop sees as a rising
clock edge, causing it to toggle to 1. The third (S2 ) flip-flop sees the second flip-flops change as a falling
edge and does nothing, and the state settles as 0010. We leave verification of the remainder of the cycle as
an exercise.

Timing Issues*
Ripple counters are a form of a more general strategy known as clock gating.2 Clock gating uses logic to
control the visibility of a clock signal to flip-flops (or latches). Historically, digital system designers rarely
used clock gating techniques because of the complexity introduced for the circuit designers, who must ensure
that clock edges are delivered with little skew along a dynamically changing set of paths to flip-flops. Today,
however, the power benefits of hiding the clock signal from flip-flops have made clock gating an attractive
strategy. Nevertheless, digital logic designers and computer architects still almost never use clock gating
strategies directly. In most of the industry, CAD tools insert logic for clock gating automatically. A handful
of companies (such as Intel and Apple/Samsung) design custom circuits rather than relying on CAD tools
to synthesize hardware designs from standard libraries of elements. In these companies, clock gating is used
widely by the circuit design teams, and some input is occasionally necessary from the higher-level designers.
More aggressive gating strategies are also used in modern designs, but these usually require more time to
transition between the on and off states and can be more difficult to get right automatically (with the tools),
hence hardware designers may need to provide high-level information about their designs. A flip-flop that
does not see any change in its clock input still has connections to high voltage and ground, and thus allows
a small amount of leakage current. In contrast, with power gating, the voltage difference is removed,
and the circuit uses no power at all. Power gating can be trickyas you know, for example, when you turn
the power on, you need to make sure that each latch settles into a stable state. Latches may need to be
initialized to guarantee that they settle, which requires time after the power is restored.
If you want a deeper understanding of gating issues, take ECE482, Digital Integrated Circuit Design, or
ECE527, System-on-a-Chip Design.

Machine Models
Before we dive fully into FSM design, we must point out that we have placed a somewhat artificial restriction
on the types of FSMs that we use in our course. Historically, this restriction was given a name, and machines
of the type that we have discussed are called Moore machines. However, outside of introductory classes,
almost no one cares about this name, nor about the name for the more general model used almost universally
in hardware design, Mealy machines.
What is the difference? In a Moore machine, outputs depend only on the internal state bits of the FSM
(the values stored in the flip-flops). In a Mealy machine, outputs may be expressed as functions both
of internal state and FSM inputs. As we illustrate shortly, the benefit of using input signals to calculate
outputs (the Mealy machine model) is that input bits effectively serve as additional system state, which
means that the number of internal state bits can be reduced. The disadvantage of including input signals in
the expressions for output signals is that timing characteristics of input signals may not be known, whereas
an FSM designer may want to guarantee certain timing characteristics for output signals.
In practice, when such timing guarantees are needed, the designer simply adds state to the FSM to accommodate the need, and the problem is solved. The coin-counting FSM that we designed for our class lab
assignments, for example, required that we use a Moore machine model to avoid sending the servo controlling
the coins path an output pulse that was too short to enforce the FSMs decision about which way to send
the coin. By adding more states to the FSM, we were able to hold the servo in place, as desired.
2 Fall

2012 students: This part may seem a little redundant, but were going to remove the earlier mention of clock gating in
future semesters.

c 2012 Steven S. Lumetta. All rights reserved.


10

Why are we protecting you from the model used in practice? First, timing issues add complexity to a topic
that is complex enough for an introductory course. And, second, most software FSMs are Moore machines,
so the abstraction is a useful one in that context, too.
In many design contexts, the timing issues implied by a Mealy model can be relatively simple to manage.
When working in a single clock domain, all of the input signals come from flip-flops in the same domain, and
are thus stable for most of the clock cycle. Only rarely does one need to keep additional state to improve
timing characteristics in these contexts. In contrast, when interacting across clock domains, more care is
sometimes needed to ensure correct behavior.
We now illustrate the state reduction benefit of the Mealy machine model with a simple example, an FSM
that recognizes the pattern of a 0 followed by a 1 on a single input and outputs a 1 when it observes the
pattern. As already mentioned, Mealy machines often require fewer flip-flops. Intuitively, the number of
combinations of states and inputs is greater than the number of combinations of states alone, and allowing
a function to depend on inputs reduces the number of internal states needed.
A Mealy implementation of the FSM appears on the left below, and an example timing diagram illustrating
the FSMs behavior is shown on the right. The machine shown below occupies state A when the last bit
seen was a 0, and state B when the last bit seen was a 1. Notice that the transition arcs in the state
diagram are labeled with two values instead of one. Since outputs can depend on input values as well as
state, transitions in a Mealy machine are labeled with input/output combinations, while states are labeled
only with their internal bits (or just their names, as shown below). Labeling states with outputs does not
make sense for a Mealy machine, since outputs may vary with inputs. Notice that the outputs indicated on
any given transition hold only until that transition is taken (at the rising clock edge), as is apparent in the
timing diagram. When inputs are asynchronous, that is, not driven by the same clock signal, output pulses
from a Mealy machine can be arbitrarily short, which can lead to problems.

1/1
0/0

CLK
IN
OUT

1/0

OUT rises with IN

0/0

OUT falls at rising CLK

For a Moore machine, we must create a special state in which the output is high. Doing so requires that we
split state B into two states, a state C in which the last two bits seen were 01, and a state D in which the
last two bits seen were 11. Only state C generates output 1. State D also becomes the starting state for the
new state machine. The state diagram on the left below illustrates the changes, using the transition diagram
style that we introduced earlier to represent Moore machines. Notice in the associated timing diagram that
the output pulse lasts a full clock cycle.

1
A/0

C/1

0
1

0
D/0

CLK
IN
OUT
OUT rises with CLK

OUT falls at rising CLK

c 2012-2014 Steven S. Lumetta. All rights reserved.


11

ECE120: Introduction to Computer Engineering


Notes Set 3.2
Finite State Machine Design Examples, Part I
This set of notes uses a series of examples to illustrate design principles for the implementation of finite
state machines (FSMs) using digital logic. We begin with an overview of the design process for a digital
FSM, from the development of an abstract model through the implementation of functions for the next-state
variables and output signals. Our first few examples cover only the concrete aspects: we implement several
counters, which illustrate the basic process of translating a concrete and complete state transition diagram
into an implementation based on flip-flops and logic gates. We next consider a counter with a number of
states that is not a power of two, with which we illustrate the need for FSM initialization. As part of solving
the initialization problem, we also introduce a general form of selection logic called a multiplexer.
We next consider the design process as a whole through a more general example of a counter with multiple
inputs to control its behavior. We work from an abstract model down to an implementation, illustrating
how semantic knowledge from the abstract model can be used to simplify the implementation. Finally, we
illustrate how the choice of representation for the FSMs internal state affects the complexity of the implementation. Fortunately, designs that are more intuitive and easier for humans to understand also typically
make the best designs in terms of other metrics, such as logic complexity.

Steps in the Design Process


Before we begin exploring designs, lets talk briefly about the general approach that we take when designing
an FSM. We follow a six-step process:
1.
2.
3.
4.
5.
6.

develop an abstract model


specify I/O behavior
complete the specification
choose a state representation
calculate logic expressions
implement with flip-flops and gates

In Step 1, we translate our description in human language into a model with states and desired behavior.
At this stage, we simply try to capture the intent of the description and are not particularly thorough nor
exact.
Step 2 begins to formalize the model, starting with its input and output behavior. If we eventually plan
to develop an implementation of our FSM as a digital system (which is not the only choice, of course!), all
input and output must consist of bits. Often, input and/or output specifications may need to match other
digital systems to which we plan to connect our FSM. In fact, most problems in developing large digital
systems today arise because of incompatibilities when composing two or more separately designed pieces (or
modules) into an integrated system.
Once we know the I/O behavior for our FSM, in Step 3 we start to make any implicit assumptions clear
and to make any other decisions necessary to the design. Occasionally, we may choose to leave something
undecided in the hope of simplifying the design with dont care entries in the logic formulation.
In Step 4, we select an internal representation for the bits necessary to encode the state of our FSM. In
practice, for small designs, this representation can be selected by a computer in such a way as to optimize the
implementation. However, for large designs, such as the LC-3 instruction set architecture that we study later
in this class, humans do most of the work by hand. In the later examples in this set of notes, we show how
even a small design can leverage meaningful information from the design when selecting the representation,
leading to an implementation that is simpler and is easier to build correctly. We also show how one can use
abstraction to simplify an implementation.

c 2012-2014 Steven S. Lumetta. All rights reserved.


12

By Step 5, our design is a complete specification in terms of bits, and we need merely derive logic expressions
for the next-state variables and the output signals. This process is no different than for combinational logic,
and should already be fairly familiar to you.
Finally, in Step 6, we translate our logic expressions into gates, inserting flip-flops (or registers) to hold the
internal state bits of the FSM. In later notes, we will use more complex building blocks when implementing
an FSM, building up abstractions in order to simplify the design process in much the same way that we have
shown for combinational logic.

Example: A Two-Bit Gray Code Counter


Lets begin with a two-bit Gray code counter with no inputs. As we mentioned in Notes Set 2.1, a Gray code
is a cycle over all bit patterns of a certain length in which adjacent patterns differ in exactly one bit. For
simplicity, our first few examples are based on counters and use the internal state of the FSM as the output
values. You should already know how to design combinational logic for the outputs if it were necessary.
The inputs to a counter, if any, are typically limited to functions such as starting and stopping the counter,
controlling the counting direction, and resetting the counter to a particular state.
A fully-specified transition diagram for a two-bit Gray code counter appears below. With no inputs, the
states simply form a loop, with the counter moving from one state to the next each cycle. Each state in
the diagram is marked with the internal state value S1 S0 (before the /) and the output Z1 Z0 (after the
/), which are always equal for this counter. Based on the transition diagram, we can fill in the K-maps
for the next-state values S1+ and S0+ as shown to the right of the transition diagram, then derive algebraic
expressions in the usual way to obtain S1+ = S0 and S0+ = S1 . We then use the next-state logic to develop
the implementation shown on the far right, completing our first counter design.
CLOCK

S+1

S0
0

S1

COUNT A
00/00

COUNT B
01/01

COUNT C
11/11

COUNT D
10/10

Z1

S1
Q

S0

S0
0

D
1

S1

Z0

S0
Q

a twobit Gray code counter

Example: A Three-Bit Gray Code Counter


Now well add a third bit to our counter, but again
use a Gray code as the basis for the state sequence.
A fully-specified transition diagram for such a counter
appears to the right. As before, with no inputs, the
states simply form a loop, with the counter moving
from one state to the next each cycle. Each state in the
diagram is marked with the internal state value S2 S1 S0
(before the /) and the output Z2 Z1 Z0 (after the
/).
Based on the transition diagram, we can fill
in the K-maps for the next-state values S2+ ,
S1+ , and S0+ as shown to the right, then derive
algebraic expressions. The results are more
complex this time.

S2

COUNT A
000/000

COUNT B
001/001

COUNT C
011/011

COUNT D
010/010

COUNT H
100/100

COUNT G
101/101

COUNT F
111/111

COUNT E
110/110

S1

S1S0
00

01

11

10

S2

S0

S1S0
00

01

11

10

S2

S1S0
00

01

11

S2

10

c 2012-2014 Steven S. Lumetta. All rights reserved.


13

For our next-state logic, we obtain:


S2+

S2 S0 + S1 S0

S1+
S0+

=
=

S2 S0 + S1 S0
S2 S1 + S2 S1

CLOCK

Notice that the equations for S2+ and S1+ share a common term, S1 S0 . This design does not allow much
choice in developing good equations for the next-state
logic, but some designs may enable you to reduce the
design complexity by explicitly identifying and making
use of common algebraic terms and sub-expressions for
different outputs. In modern design processes, identifying such opportunities is generally performed by a
computer program, but its important to understand
how they arise. Note that the common term becomes a
single AND gate in the implementation of our counter,
as shown to the right.

Z2

S2
Q

Z1

S1
Q

Z0

S0
Q

Looking at the counters implementation diagram,


notice that the vertical lines carrying the current
a threebit Gray code counter
state values and their inverses back to the next state
logic inputs have been carefully ordered to simplify understanding the diagram. In particular, they are ordered from right to left (on the right side of the figure) as S0 S0 S1 S1 S2 S2 . When designing any logic diagram,
be sure to make use of a reasonable order so as to make it easy for someone (including yourself!) to read
and check the correctness of the logic.

Example: A Color Sequencer


Early graphics systems used a three-bit red-green-blue (RGB) encoding for
colors. The color mapping for such a system is shown to the right.
Imagine that you are charged with creating a counter to drive a light through
a sequence of colors. The light takes an RGB input as just described, and the
desired pattern is
off (black)

yellow

violet

green

color
black
blue
green
cyan
red
violet
yellow
white

RGB
000
001
010
011
100
101
110
111

blue

You immediately recognize that you merely need a counter with five states.
How many flip-flops will we need? At least three, since log2 (5) = 3. Given
that well need three flip-flops, and that the colors well need to produce as
outputs are all unique bit patterns, we can again choose to use the counters internal state directly as our
output values.
A fully-specified transition diagram for our
color sequencer appears to the right. The
states again form a loop, and are marked
with the internal state value S2 S1 S0 and
the output RGB.
As before, we can use the transition diagram to fill in K-maps for the next-state
values S2+ , S1+ , and S0+ as shown to
the right. For each of the three states
not included in our transition diagram, we

BLACK
000/000

S+2

YELLOW
110/110

S1

S1S0
0

S2
1

00

01

11

10

VIOLET
101/101

GREEN
010/010

S+0

S1S0
0

00

01

11

10

S1S0
00

01

11

10

S2

S2
1

BLUE
001/001

c 2012-2014 Steven S. Lumetta. All rights reserved.


14

CLOCK

have inserted xs into the K-maps to indicate dont


care. As you know, we can treat each x as either
a 0 or a 1, whichever produces better results (where
better usually means simpler equations). The terms
that we have chosen for our algebraic equations are
illustrated in the K-maps. The xs within the ellipses
become 1s in the implementation, and the xs outside
of the ellipses become 0s.

S2
Q

For our next-state logic, we obtain:


S2+
S1+

=
=

S2 S1 + S1 S0
S2 S0 + S1 S0

S0+

S1

S1
Q

Again our equations for S2+ and S1+ share a common


term, which becomes a single AND gate in the implementation shown to the right.

S0
Q

an RGB color sequencer

Identifying an Initial State


Lets say that you go the lab and build the implementation above, hook it up to the light, and turn it on.
Does it work? Sometimes. Sometimes it works perfectly, but sometimes the light glows cyan or red briefly
first. At other times, the light is an unchanging white.
What could be going wrong?
Lets try to understand. Well begin by deriv+
+
+
S2
S1
S0
S1S0
S1S0
S1S0
ing K-maps for the implementation diagram,
00
01
11
10
00
01
11
10
00
01
11
10
as shown to the right. In these K-maps, each
0 1
0 1
0 0
0
0
1
0
0
0
0
0
1
S2
S2
S2
of the xs in our design has been replaced by
1 1
1 1
1 0
1
1
1
0
1
1
0
0
1
either a 0 or a 1. These entries are highlighted with green italics to call your attention
to them.
Now lets imagine what might happen if somehow our FSM got into the S2 S1 S0 = 111 state. In such
a state, the light would appear white, since RGB = S2 S1 S0 = 111. What happens in the next cycle?
Plugging into the equations or looking into the K-maps gives (of course) the same answer: the next state
is S2+ S1+ S0+ = 111 state. In other words, the light will stay white indefinitely! As an exercise, you should
check what happens if the light is red or cyan.
We can extend the transition diagram that we developed for our design with the extra states possible in the
implementation, as shown below. As with the five states in the design, the extra states are named with the
color of light that they produce.

BLACK
000/000

YELLOW
110/110

VIOLET
101/101

GREEN
010/010

BLUE
001/001

CYAN
011/011

RED
100/100

WHITE
111/111

Notice that the FSM does not move out of the WHITE state (ever). You may at this point wonder whether
more careful decisions in selecting our next-state expressions might address this issue. To some extent, yes.
For example, if we replace the S2 S1 term in the equation for S2+ with S2 S0 , a decision allowed by the dont
care boxes in the K-map for our design, the resulting transition diagram does not suffer from the problem

c 2012-2014 Steven S. Lumetta. All rights reserved.


15

that weve found. However, even if we do change our implementation slightly, we need to address another
aspect of the problem: how can the FSM ever get into the unexpected states?
What is the initial state of the three flip-flops in our implementation? The initial state may not even be 0s
and 1s unless we have an explicit mechanism for initialization. Initialization can work in two ways. The
first approach makes use of the flip-flop design. As you know, a flip-flop is built from a pair of latches, and
we can make use of the internal reset lines on these latches to force each flip-flop into the 0 state (or the
1 state) using an additional input.
Alternatively, we can add some extra logic to our design. Consider adding a few AND gates and a RESET
input (active low), as shown in the dashed box in the figure below. In this case, when we assert RESET
by setting it to 0, the FSM moves to state 000 in the next cycle, putting it into the BLACK state. The
approach taken here is for clarity; one can optimize the design, if desired. For example, we could simply
connect RESET as an extra input into the three AND gates on the left rather than adding new ones, with
the same effect.
RESET
CLOCK

S2
Q

S1
Q

S0
Q

an RGB color sequencer with reset

The Multiplexer
We may sometimes want a more powerful initialization mechanismone that allows us to force the FSM
into any specific state in the next cycle. In such a case, we can add the logic block shown in the dashed
boxes in the figure at the top of the next page to each of our flip-flop inputs. The block has two inputs from
the left and one from the top. The top input allows us to choose which of the left inputs is forwarded to the
output. In our design, the top input comes from IN IT . When IN IT = 0, the top AND gate in each of the
three blocks outputs a 0, and the bottom AND gate forwards the corresponding next-state input from our
design. The OR gate thus also forwards the next-state input, and the system moves into the next state for
our FSM whenever IN IT = 0.
What happens when IN IT = 1? In this case, the bottom AND gate in each of the blocks in the dashed
boxes produces a 0, and the top AND gate as well as the OR gate forwards one of the Ix signals. The state
of our FSM in the next cycle is then given by I2 I1 I0 . In other words, we can put the FSM into any desired
state by applying that state to the I2 I1 I0 inputs, setting IN IT = 1, and waiting for the next cycle.

c 2012-2014 Steven S. Lumetta. All rights reserved.


16
I2 I1 I0

INIT

CLOCK

S2
Q

S1
Q

S0
Q

an RGB color sequencer with arbitrary initialization

The logic block that we have used is called a multiS


plexer. Multiplexers are an important abstraction for
S
digital logic. In general, a multiplexer allows us to use
D1
D1
1
one digital signal to select which of several others is
Q
Q
forwarded to an output. The one that our design used
D0
0
is the simplest form, a 2-to-1 multiplexer. The block is
D0
replicated to the right along with its symbolic form, a
a 2to1 multiplexer
a 2to1 multiplexer
trapezoid with data inputs on the larger side, an output
(logic diagram)
(symbolic form)
on the smaller side, and a select input on angled part of
the trapezoid. The labels inside the trapezoid indicate the value of the select input S for which the adjacent
data signal, D1 or D0 , is copied to the output Q.
We can generalize multiplexers in
two ways. First, we can extend the
single select input to a group of select inputs. An N -bit select input
allows selection from amongst 2N
inputs. A 4-to-1 multiplexer is
shown to the right, for example.
If you look back at Notes Set 2.7,
you will also find this type of multiplexer used in our bidirectional
shift register.

S1

S0

S
2

D3

D3

D2

D1

D0

Q
D2
Q

The second way in which we


can generalize multiplexers is by
D1
a 4to1 multiplexer
simply copying them and using
(symbolic form)
the same inputs for selection. For
D0
example, we might use a single
select bit to choose between any
a 4to1 multiplexer
number of paired inputs. Denote
(logic diagram)
i
i
input pair by i D1 and D0 . For
each pair, we have an output Qi . When S = 0, Qi = D0i for each value of i. And, when S = 1, Qi = D1i for
each value of i.

c 2012-2014 Steven S. Lumetta. All rights reserved.


17

Specific configurations of multiplexers are often referred to as N -to-M multiplexers. Here the value N
refers to the number of inputs, and M refers to the number of outputs. The number of select bits can then
be calculated as log2 (N/M )N/M is generally a power of twoand one way to build such a multiplexer is
to use M copies of an (M/N )-to-1 multiplexer.

Developing an Abstract Model


We are now ready to discuss the design process for
an FSM from start to finish. For this first abstract
state
no input halt button go button
FSM example, we build upon something that we have
counting counting
halted
already seen: a two-bit Gray code counter. We now
halted
halted
counting
want a counter that allows us to start and stop the
count.
What is the mechanism for stopping and starting? To begin our design, we could sketch out an abstract
next-state table such as the one shown above. In this form of the table, the first column lists the states,
while each of the other columns lists states to which the FSM transitions after a clock cycle for a particular
input combination. The table contains two states, counting and halted, and specifies that the design uses
two distinct buttons to move between the states.
A counter with a single counting state, of course,
state
no input
halt button go button
does not provide much value. We extend the
COUNT A COUNT B
HALT A
table with four counting states and four halted
COUNT B COUNT C
HALT B
states, as shown to the right. This version of the
HALT C
COUNT C COUNT D
table also introduces more formal state names,
COUNT D COUNT A
HALT D
for which these notes use all capital letters.
HALT A
HALT A
COUNT B
HALT B
COUNT C
HALT B
The upper four states represent uninterrupted
HALT C
HALT C
COUNT D
counting, in which the counter cycles through
HALT D
COUNT A
HALT D
these states indefinitely. A user can stop the
counter in any state by pressing the halt button, causing the counter to retain its current
value until the user presses the go button.

HALT A

HALT B

HALT C

o
sg
es

press
halt

COUNT D

pr

o
sg
es

press
halt

COUNT C

pr

sg
es

press
halt

COUNT B

COUNT A

pr

Below the state table is an abstract transition


diagram, which provides exactly the same information in graphical form. Here circles represent
states (as labeled) and arcs represent transitions
from one state to another based on an input combination (which is used to label the arc).

press
halt

HALT D

We have already implicitly made a few choices


press go
about our counter design. First, the counter
shown records the current state of the system when halt is pressed. We could instead reset the counter
state whenever it is restarted, in which case we need only five states: four for counting and one more for
a halted counter. Second, weve designed the counter to stop when the user presses halt and to resume
counting when the user presses go. We could instead choose to delay these effects by a cycle. For example, pressing halt in state COUNT B could take the counter to state HALT C, and pressing go in
state HALT C could take the system to state COUNT C. In these notes, we implement only the diagrams
shown.

c 2012-2014 Steven S. Lumetta. All rights reserved.


18

Specifying I/O Behavior


We next start to formalize our design by specifying its
input and output behavior digitally. Each of the two
control buttons provides a single bit of input. The
halt button we call H, and the go button we
call G. For the output, we use a two-bit Gray code.
With these choices, we can redraw the transition diagram as show to the right.

COUNT A
/00
H

In this figure, the states are marked with output values Z1 Z0 and transition arcs are labeled in terms of
our two input buttons, G and H. The uninterrupted
counting cycle is labeled with H to indicate that it
continues until we press H.

HALT A
/00

COUNT B
/01
H

HALT B
/01

HALT C
/11

COUNT C
/11

COUNT D
/10
H

HALT D
/10

Completing the Specification


Now we need to think about how the system should behave if something outside of our initial expectations
occurs. Having drawn out a partial transition diagram can help with this process, since we can use the
diagram to systematically consider all possible input conditions from all possible states. The state table
form can make the missing parts of the specification even more obvious.
For our counter, the symmetry
between counting states makes
the problem substantially simpler.
Lets write out part of a list of
states and part of a state table
with one counting state and one
halt state, as shown to the right.
Four values of the inputs HG are
possible (recall that N bits allow
2N possible patterns). We list the
columns in Gray code order, since
we may want to transcribe this table into K-maps later.

state
first counting state
first halted state

state
COUNT A
HALT A

COUNT A
HALT A

00
COUNT B
HALT A

description
counting, output Z1 Z0 = 00
halted, output Z1 Z0 = 00

HG
01
11
unspecified unspecified
COUNT B unspecified

10
HALT A
unspecified

Lets start with the COUNT A state. We know that if neither button is pressed (HG = 00), we want the
counter to move to the COUNT B state. And, if we press the halt button (HG = 10), we want the counter
to move to the HALT A state. What should happen if a user presses the go button (HG = 01)? Or if
the user presses both buttons (HG = 11)? Answering these questions is part of fully specifying our design.
We can choose to leave some parts unspecified, but any implementation of our system will imply answers,
and thus we must be careful. We choose to ignore the go button while counting, and to have the halt
button override the go button. Thus, if HG = 01 when the counter is in state COUNT A, the counter
moves to state COUNT B. And, if HG = 11, the counter moves to state HALT A.
Use of explicit bit patterns for the inputs HG may help you to check that all four possible input values are
covered from each state. If you choose to use a transition diagram instead of a state table, you might even
want to add four arcs from each state, each labeled with a specific value of HG. When two arcs connect the
same two states, we can either use multiple labels or can indicate bits that do not matter using a dont-care
symbol, x. For example, the arc from state COUNT A to state COUNT B could be labeled HG = 00, 01 or
HG = 0x. The arc from state COUNT A to state HALT A could be labeled HG = 10, 11 or HG = 1x. We
can also use logical expressions as labels, but such notation can obscure unspecified transitions.
Now consider the state HALT A. The transitions specified so far are that when we press go (HG = 01), the
counter moves to the COUNT B state, and that the counter remains halted in state HALT A if no buttons
are pressed (HG = 00). What if the halt button is pressed (HG = 10), or both buttons are pressed

c 2012-2014 Steven S. Lumetta. All rights reserved.


19

(HG = 11)? For consistency, we decide that halt overrides go, but does nothing special if it alone is
pressed while the counter is halted. Thus, input patterns HG = 10 and HG = 11 also take state HALT A
back to itself. Here the arc could be labeled HG = 00, 10, 11 or, equivalently, HG = 00, 1x or HG = x0, 11.

HG=0x

COUNT C
/11

HG=1x

H
G
=0

H
G
=0

HG=1x

HG=0x

COUNT D
/10

HG=0x

COUNT B
/01

HG=0x

COUNT A
/00

H
G
=0

To complete our design, we apply the same decisions


that we made for the COUNT A state to all of the
other counting states, and the decisions that we made
for the HALT A state to all of the other halted states.
If we had chosen not to specify an answer, an implementation could actually produce different behavior
from the different counting and/or halted states, which
could confuse a user.

HG=1x

HG=1x
HG=01

HALT A
/00

HALT B
/01

HG=x0,11

HALT C
/11

HG=x0,11

HALT D
/10

HG=x0,11

HG=x0,11

The resulting design appears to the right.

Choosing a State Representation


Now we need to select a representation for the states. Since our counter has eight states, we need at least
three (log2 (8) = 3) state bits S2 S1 S0 to keep track of the current state. As we show later, the choice
of representation for an FSMs states can dramatically affect the design complexity. For a design as simple
as our counter, you could just let a computer implement all possible representations (there arent more
than 840, if we consider simple symmetries) and select one according to whatever metrics are interesting.
For bigger designs, however, the number of possibilities quickly becomes impossible to explore completely.
Fortunately, use of abstraction in selecting a representation also tends to produce better designs for a wide
variety of metrics (such as design complexity, power consumption, and performance). The right strategy
is thus often to start by selecting a representation that makes sense to a human, even if it requires more
bits than are strictly necessary. The resulting implementation will be easier to design and to debug than an
implementation in which only the global behavior has any meaning.
Lets return to our specific example, the counter. We
can use one bit, S2 , to record whether or not our
counter is counting (S2 = 0) or halted (S2 = 1).
The other two bits can then record the counter state
in terms of the desired output. Choosing this representation implies that only wires will be necessary to
compute outputs Z1 and Z0 from the internal state:
Z1 = S1 and Z0 = S0 . The resulting design, in which
states are now labeled with both internal state and
outputs (S2 S1 S0 /Z1 Z0 ) appears to the right. In this
version, we have changed the arc labeling to use logical expressions, which can sometimes help us to think
about the implementation.

COUNT A
000/00
H

GH

COUNT B
001/01
H

GH

COUNT C
011/11
H

GH

COUNT D
010/10

GH

HALT A
100/00

H+G

HALT B
101/01

H+G

HALT C
111/11

H+G

HALT D
110/10

H+G

The equivalent state listing and state table appear below. We have ordered the rows of the state table in
Gray code order to simplify transcription of K-maps.
state
COUNT A
COUNT B
COUNT C
COUNT D
HALT A
HALT B
HALT C
HALT D

S2 S1 S0
000
001
011
010
100
101
111
110

description
counting, output Z1 Z0 = 00
counting, output Z1 Z0 = 01
counting, output Z1 Z0 = 11
counting, output Z1 Z0 = 10
halted, output Z1 Z0 = 00
halted, output Z1 Z0 = 01
halted, output Z1 Z0 = 11
halted, output Z1 Z0 = 10

state
COUNT A
COUNT B
COUNT C
COUNT D
HALT D
HALT C
HALT B
HALT A

S2 S1 S0
000
001
011
010
110
111
101
100

00
001
011
010
000
110
111
101
100

HG
01
11
001 100
011 101
010 111
000 110
000 110
010 111
011 101
001 100

10
100
101
111
110
110
111
101
100

c 2012-2014 Steven S. Lumetta. All rights reserved.


20
Having chosen a representation, we
can go ahead and implement our
design in the usual way. As shown
to the right, K-maps for the nextstate logic are complicated, since
we have five variables and must
consider implicants that are not
contiguous in the K-maps. The S2+
logic is easy enough: we only need
two terms, as shown.

S2

S+1

HG
00

01

11

10

000

001

011

010

S2 S1S0

S+0

HG
00

01

11

10

000

001

011

010

S2 S1S0

HG
00

01

11

10

000

001

011

010

S2 S1S0

110

110

110

111

111

111

101

101

101

Notice that we have used color and


line style to distinguish different
implicants in the K-maps. Furthermore, the symmetry of the design produces symmetry in the S1+ and S0+
formula, so we have used the same color and line style for analogous terms in these two K-maps. For S1+ ,
we need four terms. Notice that the green ellipses in the HG = 01 column are part of the same term, as are
the two halves of the dashed blue circle. In S0+ , we still need four terms, but three of them are split into
two pieces in the K-map. As you can see, the utility of the K-map is starting to break down with five variables.
100

100

100

Abstracting Design Symmetries


Rather than implementing the design as two-level logic, lets try to take advantage of our designs symmetry
to further simplify the logic (we reduce gate count at the expense of longer, slower paths).
Looking back to the last transition diagram, in which the arcs were labeled with logical expressions, lets
calculate an expression for when the counter should retain its current value in the next cycle. We call this
variable HOLD. In the counting states, when S2 = 0, the counter stops (moves into a halted state) when
H is true. In the halted states, when S2 = 1, the counter stops (stays in a halted state) when H + G is true.
We can thus write

HOLD

S2 H + S2 (H + G)

HOLD
HOLD

=
=

S2 H + S2 H + S2 G
H + S2 G

In other words, the counter should hold its current value (stop counting) if we press the halt button or if
the counter was already halted and we didnt press the go button. As desired, the current value of the
counter (S1 S0 ) has no impact on this decision. You may have noticed that the expression we derived for
HOLD also matches S2+ , the next-state value of S2 in the K-map above.
Now lets re-write our state transition table in terms of HOLD. The left version uses state names for clarity;
the right uses state values to help us transcribe K-maps.
state
COUNT A
COUNT B
COUNT C
COUNT D
HALT A
HALT B
HALT C
HALT D

S2 S1 S0
000
001
011
010
100
101
111
110

HOLD
0
1
COUNT B HALT
COUNT C HALT
COUNT D HALT
COUNT A HALT
COUNT B HALT
COUNT C HALT
COUNT D HALT
COUNT A HALT

A
B
C
D
A
B
C
D

state
COUNT A
COUNT B
COUNT C
COUNT D
HALT A
HALT B
HALT C
HALT D

S2 S1 S0
000
001
011
010
100
101
111
110

HOLD
0
1
001 100
011 101
010 111
000 110
001 100
011 101
010 111
000 110

c 2012-2014 Steven S. Lumetta. All rights reserved.



The K-maps based on the HOLD abstraction are shown to the right. As you can see,
the necessary logic has been simplified substantially, requiring only two terms for each
of S1+ and S0+ . Writing the next-state logic
algebraically, we obtain
S2+

HOLD

S1+

HOLD S0 + HOLD S1

S0+

HOLD S1 + HOLD S0

21

S+2

S+1

HOLD S2
00

01

11

10

00

01

11

10

S1S0

S+0

HOLD S2
00

01

11

10

00

01

11

10

S1S0

HOLD S2
00

01

11

10

00

01

11

10

S1S0

An implementation appears below. By using semantic meaning in our choice of representationin particular
the use of S2 to record whether the counter is currently halted (S2 = 1) or counting (S2 = 0)we have
enabled ourselves to separate out the logic for deciding whether to advance the counter fairly cleanly from
the logic for advancing the counter itself. Only the HOLD bit in the diagram is used to determine whether
or not the counter should advance in the current cycle.
Lets check that the implementation matches our original design. Start by verifying that the HOLD variable
is calculated correctly, HOLD = H + S2 G, then look back at the K-map for S2+ in the low-level design to
verify that the expression we used does indeed match. Next, verify that S1+ and S0+ are correctly implemented.
CLOCK

a controllable twobit counter

HOLD
D

Z1

S1
(halt button)
H
G
(go button)

Q
D

S2
Q
D

Z0

S0
this bit records whether
or not the counter is
currently halted

Finally, we check our abstraction. When HOLD = 1, the next-state logic for S1+ and S0+ reduces to
S1+ = S1 and S0+ = S0 ; in other words, the counter stops counting and simply stays in its current state.
When HOLD = 0, these equations become S1+ = S0 and S0+ = S1 , which produces the repeating sequence
for S1 S0 of 00, 01, 11, 10, as desired. You may want to look back at our two-bit Gray code counter design
to compare the next-state equations.
We can now verify that the implementation produces the correct transition behavior. In the counting states,
S2 = 0, and the HOLD value simplifies to HOLD = H. Until we push the halt button, S2 remains 0,
and and the counter continues to count in the correct sequence. When H = 1, HOLD = 1, and the counter
stops at its current value (S2+ S1+ S0+ = 1S1 S0 , which is shorthand for S2+ = 1, S1+ = S1 , and S0+ = S0 ).
In any of the halted states, S2 = 1, and we can reduce HOLD to HOLD = H + G. Here, so long as
we press the halt button or do not press the go button, the counter stays in its current state, because
HOLD = 1. If we release halt and press go, we have HOLD = 0, and the counter resumes counting
(S2+ S1+ S0+ = 0S0 S1 , which is shorthand for S2+ = 0, S1+ = S0 , and S0+ = S1 ). We have now verified the
implementation.

c 2012-2014 Steven S. Lumetta. All rights reserved.


22

Impact of the State Representation


What happens if we choose a bad representation? For the same FSM, the table below shows a poorly chosen
mapping from states to internal state representation. Below the table is a diagram of an implementation
using that representation. Verifying that the implementations behavior is correct is left as an exercise for
the determined reader.
state
COUNT
COUNT
COUNT
COUNT
a controllable twobit counter
(with poorly chosen state values)

A
B
C
D

S2 S1 S0
000
101
011
010

state
HALT A
HALT B
HALT C
HALT D

S2 S1 S0
111
110
100
001
CLOCK

S2
Q

(halt button)
H

Z1

G
(go button)

S1
Q

Z0

S0
Q

c 2012 Steven S. Lumetta. All rights reserved.


23

ECE120: Introduction to Computer Engineering


Notes Set 3.3

Design of the Finite State Machine for the Lab

This set of notes explains the process that Prof. Jones used to develop the FSM for the lab. The lab simulates a vending machine mechanism for automatically identifying coins (dimes and quarters only), tracking
the amount of money entered by the user, accepting or rejecting coins, and emitting a signal when a total
of 35 cents has been accepted. In the lab, we will only drive a light with the paid in full signal. Sorry, no
candy nor Dew will be distributed!

Physical Design, Sensors, and Timing


The physical elements of the lab were designed by Prof. Chris Schmitz and constructed with some help from
the ECE shop. A user inserts a coin into a slot at one end of the device. The coin then rolls down a slope
towards a gate controlled by a servo. The gate can be raised or lowered, and determines whether the coin
exits from the other side or the bottom of the device. As the coin rolls, it passes two optical sensors.1 One
of these sensors is positioned high enough above the slope that a dime passes beneath the sesnor, allowing
the signal T produced by the sensor to tell us whether the coin is a dime or a quarter. The second sensor is
positioned so that all coins pass in front of it. The sensor positions are chosen carefully to ensure that, in
the case of a quarter, the coin is still blocking the first sensor when it reaches the second sensor. Blocked
sensors give a signal of 1 in this design, so the rising edge the signal from the second sensor can be used as a
clock for our FSM. When the rising edge occurs, the signal T from the first sensor indicates whether the
coin is a quarter (T = 1) or a dime (T = 0).
CLK high as coin rolls by
A sample timing diagram for the lab appears to the right.
CLK low between coins (time varies)
The clock signal generated by the lab is not only not a
CLK
square wavein other words, the high and low portions are
T
not equalbut is also unlikely to be periodic. Instead, the
A
cycle is defined by the time between coin insertions. The T
next state implements
T rises before CLK
coin acceptance decision (A)
signal serves as the single input to our FSM. In the timing
diagram, T is shown as rising and falling before the clock edge. We use positive edge-triggered flip-flops to
implement our FSM, thus the aspect of the relative timing that matters to our design is that, when the clock
rises, the value of T is stable and indicates the type of coin entered. The signal T may fall before or after
the clock doesthe two are equivalent for our FSMs needs.

The signal A in the timing diagram is an output from the FSM, and indicates whether or not the coin should
be accepted. This signal controls the servo that drives the gate, and thus determines whether the coin is
accepted (A = 1) as payment or rejected (A = 0) and returned to the user.
Looking at the timing diagram, you should note that our FSM makes a decision based on its current state
and the input T and enters a new state at the rising clock edge. The value of A in the next cycle thus
determines the position of the gate when the coin eventually rolls to the end of the slope. As we said earlier,
our FSM is thus a Moore machine: the output A does not depend on the input T , but only on the current
internal state bits of the the FSM. However, you should also now realize that making A depend on T is not
adequate for this lab. If A were to rise with T and fall with the rising clock edge (on entry to the next
state), or even fall with the falling edge of T , the gate would return to the reject position by the time the
coin reached the gate, regardless of our FSMs decision!

1 The

full system actually allows four sensors to differentiate four types of coins, but our lab uses only two of these sensors.

24

c 2012 Steven S. Lumetta. All rights reserved.


An Abstract Model
We start by writing down
state
dime (T = 0) quarter (T = 1) accept? (A) paid? (P )
states for a users expected
START
DIME
QUARTER
no
behavior. Given the fairly
DIME
PAID
yes
no
tight constraints that we
PAID
yes
no
QUARTER
have placed on our lab,
PAID
yes
yes
few combinations are possible. For a total of 35 cents, a user should either insert a dime followed by a quarter, or a quarter followed by
a dime. We begin in a START state, which transitions to states DIME or QUARTER when the user inserts
the first coin. With no previous coin, we need not specify a value for A. No money has been deposited, so we
set output P = 0 in the START state. We next create DIME and QUARTER states corresponding to the
user having entered one coin. The first coin should be accepted, but more money is needed, so both of these
states output A = 1 and P = 0. When a coin of the opposite type is entered, each state moves to a state
called PAID, which we use for the case in which a total of 35 cents has been received. For now, we ignore
the possibility that the same type of coin is deposited more than once. Finally, the PAID state accepts the
second coin (A = 1) and indicates that the user has paid the full price of 35 cents (P = 1).
We next extend our design
dime (T = 0) quarter (T = 1) accept? (A) paid? (P )
state
to handle user mistakes.
START
DIME
QUARTER
no
If a user enters a second
DIME
REJECTD
PAID
yes
no
dime in the DIME state,
REJECTD
PAID
no
no
REJECTD
our FSM should reject the
QUARTER
PAID
REJECTQ
yes
no
coin. We create a REREJECTQ
PAID
REJECTQ
no
no
JECTD state and add it
PAID
yes
yes
as the next state from
DIME when a dime is entered. The REJECTD state rejects the dime (A = 0) and continues to wait for a
quarter (P = 0). What should we use as next states from REJECTD? If the user enters a third dime (or a
fourth, or a fifth, and so on), we want to reject the new dime as well. If the user enters a quarter, we want
to accept the coin, at which point we have received 35 cents (counting the first dime). We use this reasoning
to complete the description of REJECTD. We also create an analogous state, REJECTQ, to handle a user
who inserts more than one quarter.
What should happen after a user has paid 35 cents and bought one item? The FSM at that point is in the
PAID state, which delivers the item by setting P = 1. Given that we want the FSM to allow the user to
purchase another item, how should we choose the next states from PAID? The behavior that we want from
PAID is identical to the behavior that we defined from START. The 35 cents already deposited was used
to pay for the item delivered, so the machine is no longer holding any of the users money. We can thus
simply set the next states from PAID to be DIME when a dime is inserted and QUARTER when a quarter
is inserted.
At this point, we make a
dime (T = 0) quarter (T = 1) accept? (A) paid? (P )
state
decision intended primarPAID
DIME
QUARTER
yes
yes
ily to simplify the logic
DIME
REJECTD
PAID
yes
no
needed to build the lab.
REJECTD
REJECTD
PAID
no
no
Without a physical item
PAID
REJECTQ
yes
no
QUARTER
delivery mechanism with a
REJECTQ
PAID
REJECTQ
no
no
specification for how its input must be driven, the behavior of the output signal P can be fairly flexible. For example, we could build
a delivery mechanism that used the rising edge of P to open a chute. In this case, the output P = 0 in
the start state is not relevant, and we can merge the state START with the state PAID. The way that we
handle P in the lab, we might find it strange to have a paid light turn on before inserting any money, but
keeping the design simple enough for a first lab exercise is more important. Our final abstract state table
appears above.

c 2012 Steven S. Lumetta. All rights reserved.


25

Picking the Representation


We are now ready to choose the state representation for the lab FSM. With five states, we need three bits
of internal state. Prof. Jones decided to leverage human meaning in assigning the bit patterns, as follows:
S2
S1
S0

type of last coin inserted (0 for dime, 1 for quarter)


more than one quarter inserted? (1 for yes, 0 for no)
more than one dime inserted? (1 for yes, 0 for no)

These meanings are not easy to apply to all of our states. For example, in the
PAID state, the last coin inserted may have been of either type, or of no type
at all, since we decided to start our FSM in that state as well. However, for the
other four states, the meanings provide a clear and unique set of bit pattern
assignments, as shown to the right. We can choose any of the remaining four bit
patterns (010, 011, 101, or 111) for the PAID state. In fact, we can choose all
of the remaining patterns for the PAID state. We can always represent any state
with more than one pattern if we have spare patterns available. Prof. Jones used
the logic design.

state
PAID
DIME
REJECTD
QUARTER
REJECTQ

S2 S1 S0
???
000
001
100
110

this freedom to simplify

This particular example is slightly tricky. The four free patterns do not share any single bit in common,
so we cannot simply insert xs into all K-map entries for which the next state is PAID. For example, if we
insert an x into the K-map for S2+ , and then choose a function for S2+ that produces a value of 1 in place of
the dont care, we must also produce a 1 in the corresponding entry of the K-map for S0+ . Our options for
PAID include 101 and 111, but not 100 nor 110. These latter two states have other meanings.
Lets begin by writing a next-state table
S2+ S1+ S0+
consisting mostly of bits, as shown to
S+2
S2S1
state
S2 S1 S0 T = 0 T = 1
the right. We use this table to write
00
01
11
10
PAID
PAID
000
100
0 x x
00 0
out a K-map for S2+ as follows: any of
DIME
000
001
PAID
the patterns that may be used for the
1 1
1
01 x
REJECTD
001
001
PAID
S0T
PAID state obey the next-state rules for
1 1 1
11 x
QUARTER
100
PAID
110
PAID. Any next-state marked as PAID
0
0 0 0
10
REJECTQ
110
PAID
110
is marked as dont care in the K-map,
since we can choose patterns starting with either or both values to represent our PAID state. The resulting K-map appears to the far right. As shown, we simply set S2+ = T , which matches our original meaning
for S2 . That is, S2 is the type of the last coin inserted.
Based on our choice for S2+ , we can rewrite the K-map as shown to the right, with green
italics and shading marking the values produced for the xs in the specification. Each
of these boxes corresponds to one transition into the PAID state. By specifying the S2
value, we cut the number of possible choices from four to two in each case. For those
combinations in which the implementation produces S2+ = 0, we must choose S1+ = 1,
but are still free to leave S0+ marked as a dont care. Similarly, for those combinations
in which the implementation produces S2+ = 1, we must choose S0+ = 1, but are still
free to leave S1+ marked as a dont care.

S2

S2S1
00

01

11

10

00

01

11

10

S0T

+
+
The K-maps for S1+ and S0+ are shown to the right. We have
S1
S0
S2S1
S2S1
00
01
11
10
00
01
11
10
not given algebraic expressions for either, but have indicated our
0 0
0 1 1
0
00 0
00 1
choices by highlighting the resulting replacements of dont care
0 1
1
0 0 0
01 0
01 1
entries with the values produced by our expressions. At this
S0T
S0T
point, we can review the state patterns actually produced by each
0 0 0
0 0 0
11 0
11 1
of the four next-state transitions into the PAID state. From the
10 0
0 0 0
10 1
0 0 0
DIME state, we move into the 101 state when the user inserts a
quarter. The result is the same from the REJECTD state. From the QUARTER state, however, we move
into the 010 state when the user inserts a dime. The result is the same from the REJECTQ state. We must
thus classify both patterns, 101 and 010, as PAID states. The remaining two patterns, 011 and 111, cannot

c 2012 Steven S. Lumetta. All rights reserved.


26

be reached from any of the states in our design. We


might then try to leverage the fact that the nextstate patterns from these two states are not relevant
(recall that we fixed the next-state patterns for all
four of the possible PAID states) to further simplify
our logic, but doing so does not provide any advantage (you may want to check our claim).

state
PAID1
PAID2
DIME
REJECTD
QUARTER
REJECTQ
EXTRA1
EXTRA2

S2+ S1+ S0+


T =0 T =1
000
100
000
100
001
101
001
101
010
110
010
110
000
100
000
100

S2 S1 S0
010
101
000
001
100
110
011
111

A
1
1
1
0
1
0
x
x

P
1
1
0
0
0
0
x
x

The final state table is shown to the right. We


have included the extra states at the bottom of the
table. We have specified the next-state logic for these
states, but left the output bits as dont cares. A state transition diagram appears at the bottom of this page.

Testing the Design


Having a complete design on paper is a good step forward, but humans make mistakes at all stages. How
can we know that a circuit that we build in the lab correctly implements the FSM that we have outlined in
these notes?
For the lab design, we have two problems to solve. First, we have not specified an initialization scheme for
the FSM. We may want the FSM to start in one of the PAID states, but adding initialization logic to the
design may mean requiring you to wire together significantly more chips. Second, we need a sequence of
inputs that manages to test that all of the next-state and output logic implementations are correct.
Testing sequential logic, including FSMs, is in general extremely difficult. In fact, large sequential systems
today are generally converted into combinational logic by using shift registers to fill the flip-flops with a
particular pattern, executing the logic for one clock cycle, and checking that the resulting pattern of bits in
the flip-flops is correct. This approach is called scan-based testing, and is discussed in ECE 543. You will
make use of a similar approach when you test your combinational logic in the second week of the lab, before
wiring up the flip-flops.
We have designed our FSM to be easy to test (even small FSMs may be challenging) with a brute force
approach. In particular, we identify two input sequences that together serve both to initialize and to test a
correctly implemented variant of our FSM. Our initialization sequence forces the FSM into a specific state
regardless of its initial state. And our test sequence crosses every transition arc leaving the six valid states.
In terms of T , the coin type, we initialize the FSM with the input sequence 001. Notice that such a sequence
takes any initial state into PAID2.
For testing, we use the input sequence 111010010001. You should trace this sequence, starting from PAID2,
on the diagram below to see how the test sequence covers all of the possible arcs. As we test, we need also
to observe the A and P outputs in each state to check the output logic.
T=0
T=0

DIME
000/10

T=0

T=0

REJECTD
001/00

T=1

T=1

T=0
T=0

PAID1
010/11

EXTRA1
011/xx

EXTRA2
111/xx

PAID2
101/11

T=1
T=1

T=0

T=1

T=0

REJECTQ
110/00
T=1

T=1

QTR
100/10

T=1

c 2012 Steven S. Lumetta. All rights reserved.


27

ECE120: Introduction to Computer Engineering


Notes Set 3.4

Extending Keyless Entry with a Timeout


This set of notes builds on the keyless entry control FSM that we designed earlier. In particular, we use a
counter to make the alarm time out, turning itself off after a fixed amount of time. The goal of this extension
is to illustrate how we can make use of components such as registers and counters as building blocks for our
FSMs without fully expanding the design to explicitly illustrate all possible states.
To begin, lets review the FSM
that we designed earlier for keyless entry. The state transition
diagram for our design is replicated to the right. The four states
are labeled with state bits and
output bits, S1 S0 /DRA, where
D indicates that the drivers door
should be unlocked, R indicates
that the rest of the doors should
be unlocked, and A indicates that
the alarm should be on. Transition arcs in the diagram are labeled with concise versions of the
inputs U LP (using dont cares),
where U represents an unlock button, L represents a lock button,
and P represents a panic button.

ULP=000
ULP=100
ULP=000,x10

LOCKED
00/000

DRIVER
10/100

ULP=x10

ULP=xx1
ULP=xx1
ULP=x10
(We will add a
timeout here.)

ULP=100

ULP=x10

ALARM
01/001

ULP=x00
ULP=xx1

UNLOCKED

11/110

In this design, once a user presses


the panic button P , the alarm
ULP=xx1,x00
sounds until the user presses the
lock button L to turn it off. Instead of sounding the alarm indefinitely, we might want to turn the alarm off
after a fixed amount of time. In other words, after the system has been in the ALARM state for, say, thirty
or sixty seconds, we might want to move back to the LOCKED state even if the user has not pushed the
lock button. The blue annotation in the diagram indicates the arc that we must adjust. But thirty or sixty
seconds is a large number of clock cycles, and our FSM must keep track of the time. Do we need to draw all
of the states?
Instead of following the design process that we outlined earlier, lets think about how we can modify our
existing design to incorporate the new functionality. In order to keep track of time, we use a binary counter.
Lets say that we want our timeout to be T cycles. When we enter the alarm state, we want to set the
counters value to T 1, then let the counter count down until it reaches 0, at which point a timeout occurs.
To load the initial value, our counter should have a parallel load capability that sets the counter value when
input LD = 1. When LD = 0, the counter counts down. The counter also has an output Z that indicates
that the counters value is currently zero, which we can use to indicate a timeout on the alarm. You should
be able to build such a counter based on what you have learned earlier in the class. Here, we will assume
that we can just make use of it.
How many bits do we need in our counter? The answer depends on T . If we add the counter to our design,
the state of the counter is technically part of the state of our FSM, but we can treat it somewhat abstractly.
For example, we only plan to make use of the counter value in the ALARM state, so we ignore the counter
bits in the three other states. In other words, S1 S0 = 10 means that the system is in the LOCKED state
regardless of the counters value.

c 2012 Steven S. Lumetta. All rights reserved.


28

We expand the ALARM state into T separate states based on the value
of the counter. As shown to the right, we name the states ALARM(1)
through ALARM(T). All of these alarm states use S1 S0 = 01, but they can
be differentiated using a timer (the counter value).

All input arcs to ALARM enter


this state, including ULP=xx1
from any ALARM state.

We need to make design decisions about how the arcs entering and leaving the ALARM state in our original design should be used once we have
incorporated the timeout. As a first step, we decide that all arcs entering
ALARM from other states now enter ALARM(1). Similarly, if the user
presses the panic button P in any of the ALARM(t) states, the system
returns to ALARM(1). Effectively, pressing the panic button resets the
timer.

timer=T1
ULP=x00

ALARM(2)
timer=T2
ULP=x00
Outgoing arcs to LOCKED
(on ULP=x10) are replicated.

Finally, the self-loop back to the ALARM state on U LP = x00 becomes the countdown arcs in our expanded states, taking ALARM(t) to
ALARM(t+1), and ALARM(T) to LOCKED.
Now that we have a complete specification for the extended design, we can
implement it. We want to reuse our original design as much as possible,
but we have three new features that must be considered. First, when we
enter the ALARM(1) state, we need to set the counter value to T 1.
Second, we need the counter value to count downward while in the ALARM
state. Finally, we need to move back to the LOCKED state when a timeout
occursthat is, when the counter reaches zero.

...

The only arc leaving the ALARM state goes to the LOCKED state on
U LP = x10. We replicate this arc for all ALARM(t) states: the user can
push the lock button at any time to silence the alarm.

ALARM(1)

ALARM(T)
timer=0
ULP=x00

LOCKED

The first problem is fairly easy. Our counter supports parallel load, and the only value that we need to load
is T 1, so we apply the constant bit pattern for T 1 to the load inputs and raise the LD input whenever
we enter the ALARM(1) state. In our original design, we chose to enter the ALARM state whenever the
user pressed P , regardless of the other buttons. Hence we can connect P directly to our counters LD input.
The second problem is handled by the counters countdown functionality. In the ALARM(t) states, the
counter will count down each cycle, moving the system from ALARM(t) to ALARM(t+1).
The last problem is slightly trickier, since we need to change S1 S0 . Notice that S1 S0 = 01 for the ALARM
state and S1 S0 = 00 for the LOCKED state. Thus, we need only force S0 to 0 when a timeout occurs. We
can use a single 2-to-1 multiplexer for this purpose. The 0 input of the mux comes from the original S0+
logic, and the 1 input is a constant 0. All other state logic remains unchanged. When does a timeout
occur? First, we must be in the ALARM(T) state, so S1 S0 = 01 and the counters Z output is raised.
Second, the input combination must be U LP = xx0notice that both U LP = x00 and U LP = x10 return
to LOCKED from ALARM(T). A single, four-input AND gate thus suffices to obtain the timeout signal,
S1 S0 Z P , which we connect to the select input of the mux between the S0+ logic and the S0 flip-flop.
The extension thus requires only a counter, a mux, and a gate, as shown below.
U L P

S1 S0
D

S1+
logic

S0+
logic

1
0

T1
counter
LD

S1

S0
Q

c 2000-2014 Steven S. Lumetta. All rights reserved.


29

ECE120: Introduction to Computer Engineering


Notes Set 3.5
Random Access Memories
This set of notes describes random access memories.

Memory
A computer memory is a group of storage elements and the logic necessary to move data in and out of the
elements. The size of the elements in a memorycalled the addressability of the memoryvaries from a
single binary digit, or bit, to a byte (8 bits) or more. Typically, we refer to data elements larger than a
byte as words, but the size of a word depends on context.
Each element in a memory is assigned a unique name, called an address, that allows an external circuit
to identify the particular element of interest. These addresses are not unlike the street addresses that you
use when you send a letter. Unlike street addresses, however, memory addresses usually have little or no
redundancy; each possible combination of bits in an address identifies a distinct set of bits in the memory. The
figure on the right below illustrates the concept. Each house represents a storage element and is associated
with a unique address.
N

DATA_IN
ADDR
2k x N
R/W
memory
CS
DATA_OUT

000

001

010

011

100

101

110

111

The memories that we consider in this class have several properties in common. These memories support
two operations: write places a word of data into an element, and read retrieves a copy of a word of data
from an element. The memories are also volatile, which means that the data held by a memory are erased
when electrical power is turned off or fails. Non-volatile forms of memory include magnetic and optical
storage media such as DVDs, CD-ROMs, disks, and tapes, as well as some programmable logic devices,
such as ROMs. Finally, the memories considered in this class are random access memories (RAMs),
which means that the time required to access an element in the memory is independent of the element being
accessed. In contrast, serial memories such as magnetic tape require much less time to access data near
the current location in the tape than data far away from the current location.
The figure on the left above shows a generic RAM structure. The memory contains 2k elements of N bits
each. A k-bit address input, ADDR, identifies the memory element of interest for any particular operation.
The read/write input, R/W , selects the operation to be performed: if R/W is high, the operation is a read;
if it is low, the operation is a write. Data to be written into an element are provided through N inputs at
the top, and data read from an element appear on N outputs at the bottom. Finally, a chip select input,
CS, functions as an enable control for the memory; when CS is low, the memory neither reads nor writes
any location.
Random access memory further divides into two important types: static RAM, or SRAM, and dynamic
RAM, or DRAM. SRAM employs active logic in the form of a two-inverter loop to maintain stored values.
DRAM uses a charged capacitor to store a bit; the charge drains over time and must be replaced, giving rise
to the qualifier dynamic. Static thus serves only to differentiate memories with active logic elements
from those with capacitive elements. Both types are volatile, that is, both lose all data when the power
supply is removed. We study both SRAM and DRAM in some detail in this course.

c 2000-2014 Steven S. Lumetta. All rights reserved.


30

Static Random Access Memory


Static random access memory is used for high-speed applications such as processor caches and some embedded
designs. As SRAM bit densitythe number of bits in a given chip areais significantly lower than DRAM
bit density, most applications with less demanding speed requirements use DRAM. The main memory in
most computers, for example, is DRAM, whereas the memory on the same chip as a processor is SRAM.1
DRAM is also unavailable when recharging its capacitors, which can be a problem for applications with
stringent real-time needs.

SELECT

SELECT
S Q
RQ

BIT

BIT

Two diagrams of an SRAM cell (a single bit) appear above. On the left is the physical implementation: a
dual-inverter loop hooked to opposing BIT lines through transistors controlled by a SELECT line. On the
right is a logical implementation2 modeled after that given by Mano & Kime.
The physical cell works as follows. When SELECT is high, the transistors connect the inverter loop to
the bit lines. When writing a cell, the lines are held at opposite logic values, forcing the inverters to match
the values on the lines and storing the value from the BIT input. When reading a cell, the bit lines are
disconnected from other logic, allowing the inverter loop to drive the lines to their current values. The value
stored previously is thus copied onto the BIT line as an output, and the opposite value is placed on the
BIT line. When SELECT is low, the transistors effectively disconnect the inverters from the bit lines, and
the cell holds its current value until SELECT goes high again.
The logical cell retains the SELECT line, replaces the inverter loop with an S-R latch, and splits the bit
and bit read lines (C and C).
When SELECT is low, all AND gates
lines into bit write lines (B and B)
output 0, and the isolated cell holds its value. When SELECT is high, the cell can be written by raising
input signals, which set or reset the latch, as appropriate. Similarly, the latched value appears
the B or B
Recall that the markers represent connections to many-input gates, which appear in the
on C and C.
read logic described below.

C
cell
0

cell
1

cell
2

cell
3

...

cell
12

cell
13

cell
14

cell
15

C
B
B

read
logic

write
logic

4to16
decoder
4

ADDR

DATA_IN CS R/W DATA_OUT

A number of cells are combined into a bit slice, as shown above. The cells share bit lines and read/write
logic, which appears to the right in the figure. Based on the ADDR input, a decoder sets one cells SELECT
line high to enable a read or write operation to the cell. Details of the read and write logic are shown below,
to the left and right, respectively.
1 Chips combining both DRAM and processor logic are available, and are used by some processor manufacturers (such as
IBM). Research is underway to couple such logic types more efficiently by building 3D stacks of chips.
2 Logical implementation here implies that the functional behavior of the circuit is equivalent to that of the real circuit.
The real circuit is that shown on the left of the figure.

c 2000-2014 Steven S. Lumetta. All rights reserved.


S Q

RQ

31

DATA_IN

CS
R/W

CS
DATA_OUT

The read logic requires only the bit read lines and the chip select signal as inputs. The read lines function
logically as many-input OR gates, and the results of these lines are used to set or reset the S-R latch in the
read logic. When CS is high, the value held in the latch is then placed on the DAT A OU T line.
The write logic requires two enable inputs, CS and R/W , as well as a data input. When CS is high and a
write operation is requested, that is, when R/W is low, the output of the AND gate in the lower left of the
diagram goes high, enabling the AND gates to the right of the diagram to place the data input on the bit
write lines.

C
cell
48

cell
49

cell
50

cell
51

...

cell
60

cell
61

cell
62

cell
63

C
B
B
C

cell
35

...

cell
44

cell
45

cell
46

cell
47

C
B
B
C

cell
16

cell
17

cell
18

cell
19

...

cell
28

cell
29

cell
30

cell
31

cell
34

ADDR(5:4)

cell
33

2to4
decoder

cell
32

B
B
C

cell
0

cell
1

cell
2

cell
3

...

cell
12

cell
13

cell
14

cell
15

read
logic

B
B

write
logic

4to16
decoder
4

ADDR(3:0)

DATA_IN CS R/W DATA_OUT

The outputs of the cell selection decoder can be used to control multiple bit slices, as shown above. Selection
between bit slices is then based on other bits from the ADDRESS input. In the figure above, a 2-to-4
decoder enables one of four sets of tri-state buffers that connect the bit read and write lines to the read
and write logic. The many-input OR gatesa fictional construct of the logical representationhave been
replicated for each bit slice. In a real implementation, the transistor-gated connections to the bit lines
eliminate the need for the OR gates, and the extra logic amounts to only a pair of transistors per bit slice.
The approach shown above, in which one or more cells are selected through a two-dimensional indexing
scheme, is known as coincident selection. The qualifier coincident arises from the notion that the

c 2000-2014 Steven S. Lumetta. All rights reserved.


32

desired cell coincides with the intersection of the active row and column select lines.
The benefit of coincident selection is easily calculated in terms of the number of gates required for the
decoders. Decoder complexity is roughly equal to the number of outputs, as each output is a minterm and
requires a unique gate to calculate it. Fanout trees for input terms and inverted terms add relatively few
gates. Consider a 1M8b RAM chip. The number of addresses is 220 . One option is to use a single bit slice
and a 20-to-1048576 decoder, or about 220 gates. Alternatively, we can use 8,192 bit slices of 1,024 cells
(remember that we must output eight bits). For this implementation, we need two 10-to-1024 decoders, or
about 211 gates. As chip area is roughly proportional to the number of gates, the savings are substantial.
Other schemes are possible as well: if we want a more square chip area, we might choose to use 4,096 bit
slices of 2,048 cells along with one 11-to-2048 decoder and one 9-to-512 decoder. This approach requires
roughly 50% more decoder gates than our previous example, but is still far superior to the single bit slice
implementation.
Memories are typically unclocked devices. However, as you have seen, the circuits are highly structured, which
enables engineers to cope with the complexity of sequential feedback design. Devices used to control memories
are typically clocked, and the interaction between the two can be fairly complex. Timing diagrams for reads
and writes to SRAM are shown at the top of the next page. A write operation appears on the left. In the first
cycle, the controller raises the chip select signal and places the memory address to be written on the address
inputs.

CLK
ADDR
CS
R/W
DATA_IN

ADDR valid

DATA valid

write cycle

CLK
ADDR
CS
R/W
DATA_OUT

ADDR valid

read cycle

DATA valid

Once the memory has had time to set up the appropriate select lines internally, the R/W input is lowered and
data are placed on the data inputs. The delay, which is specified by the memory manufacturer, is necessary
to avoid writing data to the incorrect element within the memory. In the diagram, the delay is one cycle, but
delay logic can be used to tune the timing to match the memorys specification, if desired. At some point
after new data have been delivered to the memory, the write operation completes within the memory. The
time from the application of the address until the (worst-case) completion of the write operation is called
the write cycle of the memory, and is also specified by the manufacturer. One the write cycle has passed,
the controlling logic raises R/W , waits for the change to settle within the memory, then removes the address
and lowers the chip select signal. The reason for the delay is the same: to avoid mistakenly overwriting
another memory location.
A read operation is quite similar. As shown on the right, the controlling logic places the address on the
input lines and raises the chip select signal. No races need be considered, as read operations on SRAM do
not affect the stored data. After a delay called the read cycle, the data can be read from the data outputs.
The address can then be removed and the chip select signal lowered.
For both reads and writes, the number of cycles required for an operation depends on a combination of the
clock cycle of the controller and the cycle time of the memory. For example, with a 25 nanosecond write
cycle and a 10 nanosecond clock cycle, a write requires three cycles. In general, the number of cycles required
is given by the formula memory cycle time/clock cycle time.

Bidirectional Signals
We have on several previous occasions discussed the utility of tri-state buffers in gating outputs and constructing multiplexers. With shift registers, we also considered the use of tri-state buffers to use the same
lines for reading the register and parallel load of the register. In this section, we consider in general the
application of tri-state buffers to reduce pin count and examine the symbols used to denote their presence.

c 2000-2014 Steven S. Lumetta. All rights reserved.


IN

IN

some
circuit
OUT

some
circuit

IN
EN

33

IN

some
circuit

OUT

some
circuit

EN

OUT

EN
OUT

EN

some
circuit

IN/OUT

The figure above shows three groups of equivalent circuits. We begin with the generic circuit on the left,
with N inputs and N outputs. The circuit may have additional inputs and outputs beyond those shown, but
it will be convenient to restrict this discussion to an equal number of inputs and outputs. The second group
in the figure extends the first by using an enable input, EN , to gate the circuit outputs with N tri-state
buffers. The left member of the group adds the buffers externally, while the right member (third from the
left in the overall figure) adds them implicitly, as indicated by the inverted triangle symbol near the OU T
pins. This symbol is not meant to point towards the pins, but is rather always drawn in the orientation
shown, regardless of output direction in a figure. The third group further extends the circuit by connecting
its gated outputs to its inputs, either externally or internally (fourth and fifth from the left, respectively).
The resulting connections are called bidirectional signals, as information can flow either into or out of
the circuit. Bidirectional signals are important for memory devices, for which the number of logical inputs
and outputs can be quite large. Data inputs and outputs, for example, are typically combined into a single
set of bidirectional signals. The arrowheads in the figure are not a standard part of the representation, but
are sometimes provided to clarify the flow of information. The labels provide complete I/O information and
allow you to identify bidirectional signals.
With bidirectional signals, and with all outputs gated by tri-state buffers, it is important to ensure that multiple circuits are not simultaneously allowed to drive a set of wires, as attempting to drive wires to different
logic values creates a short circuit from high voltage to ground, which can easily destroy the system.

Dynamic Random Access Memory


Dynamic random access memory, or DRAM, is used for main memory in computers and for other applications in which size is more important than speed. While slower than SRAM, DRAM also is also denser (has
more bits per chip area).

SELECT

SELECT
DQ
Ctl

BIT

Two diagrams of a DRAM cell appear above. On the left is the physical implementation: a capacitor attached
to a BIT line through a transistor controlled by a SELECT line. On the right is a logical implementation
modeled after that given by Mano & Kime.
The logical implementation employs a D latch to record the value on the bit write line, B, whenever the
SELECT line is high. The output of the latch is also placed on the bit read line, C, when SELECT is
high. Rather than many-input gates, a tri-state buffer controls the output gating to remind you that DRAM
cells are read only when selected.
As illustrated by the physical cell structure, DRAM storage is capacitivea bit is stored by charging or not

c 2000-2014 Steven S. Lumetta. All rights reserved.


34

charging a capacitor. When SELECT is low, the capacitor is isolated, and it holds its charge. However, the
resistance across the transistor is finite, and some charge leaks out onto the bit line. Charge also leaks into
the substrate on which the device is constructed. After some amount of time, all of the charge dissipates,
and the bit is lost. To avoid such loss, the cell must be refreshed periodically by reading the contents and
writing them back with active logic.
When the SELECT line is high during a write operation, logic driving the bit line forces charge onto the
capacitor or removes all charge from it. For a read operation, the bit line is first brought to an intermediate
voltage level (a voltage level between 0 and 1), then SELECT is raised, allowing the capacitor to either
pull a small amount of charge from the bit line or to push a small amount of charge onto the bit line. The
resulting change in voltage is then detected by a sense amplifier3 at the end of the bit line. A sense amp
is analogous to a marble on a mountaintop: a small push causes the marble to roll rapidly downhill in the
direction of the push. Similarly, a small change in voltage causes a sense amps output to move rapidly to
a logical 0 or 1, depending on the direction of the small change. Sense amplifiers also appear in SRAM
implementations. While not technically necessary, as they are with DRAM, the use of a sense amp to react
to small changes in voltage makes reads faster.
Each read operation on a DRAM cell brings the voltage on its capacitor closer to the intermediate voltage
level, in effect destroying the data in the cell. DRAM is thus said to have destructive reads. To preserve
data during a read, the data read must be written back into the cells. For example, the output of the sense
amplifiers can be used to drive the bit lines, rewriting the cells with the appropriate data.
At the chip level, typical DRAM inputs and outputs differ from those of SRAM. Due to the large size and
high density of many DRAMs, addresses are split into row and column components and provided through a
common set of pins. The DRAM stores the components in registers to support this approach. Additional inputs, known as the row and column address strobesRAS and CAS, respectivelyare used to indicate
when address components are available. These control signals are also used to manage the DRAM refresh
process (see Mano & Kime for details). As you might guess from the structure of coincident selection, DRAM
refresh occurs on a row-by-row basis; raising the SELECT line for a row destructively reads the contents of
all cells on that row, forcing the cells to be rewritten and effecting a refresh. The row is thus a natural basis
for the refresh cycle. The DRAM data pins provide bidirectional signals for reading and writing elements of
the DRAM. An output enable input, OE, controls tri-state buffers with the DRAM to determine whether
or not the DRAM drives the data pins. The R/W input, which controls the type of operation, is also present.

CLK
ADDR
RAS
CAS
OE
R/W
DATA

ROW

COL

valid

write cycle

CLK
ADDR
RAS
CAS
OE
R/W
DATA

ROW

COL

hiZ

valid

read cycle

Timing diagrams for DRAM writes and reads appear above. In both cases, the row component of the address
is first applied to the address pins, then RAS is raised.4 In the next cycle of the controlling logic, the column
component is applied to the address pins, and CAS is raised.
For a write, as shown on the left, the R/W signal and the data can also be applied in the second cycle.
The DRAM has internal timing and control logic that prevent races from overwriting an incorrect element
(remember that the row and column addresses have to be stored in registers). The DRAM again specifies
a write cycle, after which the operation is guaranteed to be complete. In order, the R/W signal is then
raised, the CAS signal lowered, and the RAS signal lowered. Other orders of signal removal have different
meanings, such as initiation of a refresh.
3 The
4 In

implementation of a sense amplifier lies outside the scope of this class, but you should understand their role in memory.
practice, RAS, CAS, and OE are active low signals, and are thus usually written and appear as RAS, CAS, and OE.

c 2000-2014 Steven S. Lumetta. All rights reserved.


35

For a read operation, the output enable signal, OE, is lowered after CAS is raised. The DAT A pins, which
should be floating (in other words, not driven by any logic), are then driven by the DRAM. After the read
cycle, valid data appear on the DAT A pins, and OE, CAS, and RAS are lowered in order after the data
are read.
A typical DRAM implementation provides several approaches to managing refresh, but does not initiate any
refreshes internally. Refresh requirements are specified, but managing the refresh itself is left to a DRAM
controller. The duties of this controller also include mapping addresses into row and column components,
managing timing for signals to and from the DRAM, and providing status indicators on the state of the
DRAM.
As an example of refresh rates and requirements for modern DRAMs, I obtained a few specifications for a
16Mx4b EDO DRAM chip manufactured by Micron Semiconductor. The cells are structured into 4,096 rows,
each of which must be refreshed every 64 milliseconds. Using a certain style of refresh (CAS-before-RAS, or
CBR), the process of refreshing a single row takes roughly 100 nanoseconds. The most common approach
to managing refresh, termed distributed refresh, cycles through rows one at a time over a period of the
required refresh time, in this case 64 milliseconds. Row refreshes occur regularly within this period, or about
every 16 microseconds. The refreshes keep the DRAM busy 0.64% of the time; at other times, it can be
used for reads and writes. Alternatively, we might choose a burst refresh approach, in which we refresh all
rows in a burst. A burst refresh requires roughly 410 microseconds for the DRAM under discussion, as all
4,096 rows must be refreshed, and each row requires about 100 nanoseconds. A delay of 410 microseconds is
a long delay by processor standards, thus burst refresh is rarely used.

c 2012-2015 Steven S. Lumetta. All rights reserved.


36

ECE120: Introduction to Computer Engineering


Notes Set 3.6
From FSM to Computer
The FSM designs we have explored so far have started with a human-based design process in which someone
writes down the desired behavior in terms of states, inputs, outputs, and transitions. Such an approach makes
it easier to build a digital FSM, since the abstraction used corresponds almost directly to the implementation.
As an alternative, one can start by mapping the desired task into a high-level programming language, then
using components such as registers, counters, and memories to implement the variables needed. In this
approach, the control structure of the code maps into a high-level FSM design. Of course, in order to
implement our FSM with digital logic, we eventually still need to map down to bits and gates.
In this set of notes, we show how one can transform a piece of code written in a high-level language into an
FSM. This process is meant to help you understand how we can design an FSM that executes simple pieces
of a flow chart such as assignments, if statements, and loops. Later, we generalize this concept and build
an FSM that allows the pieces to be executed to be specified after the FSM is builtin other words, the
FSM executes a program specified by bits stored in memory. This more general model, as you might have
already guessed, is a computer.

Specifying the Problem


Lets begin by specifying the problem that we want to solve. Say that we want to find the minimum value
in a set of 10 integers. Using the C programming language, we can write the following fragment of code:
int values[10];
int idx;
int min

/* 10 integers--filled in by other code */

min = values[0];
for (idx = 1; 10 > idx; idx = idx + 1) {
if (min > values[idx]) {
min = values[idx];
}
}
/* The minimum value from array is now in min.

*/

The code uses array notation, which we have not used previously in our class, so lets first discuss the meaning
of the code.
The code uses three variables. The variable values represents the 10 values in our set. The suffix [10]
after the variable name tells the compiler that we want an array of 10 integers (int) indexed from 0 to 9.
These integers can be treated as 10 separate variables, but can be accessed using the single name values
along with an index (again, from 0 to 9 in this case). The variable idx holds a loop index that we use to
examine each of the values one by one in order to find the minimum value in the set. Finally, the variable
min holds the smallest known value as the program examines each of the values in the set.
The program body consists of two statements. We assume that some other piece of codeone not shown
herehas initialized the 10 values in our set before the code above executes. The first statement initializes
the minimum known value (min) to the value stored at index 0 in the array (values[0]). The second
statement is a loop in which the variable index takes on values from 1 to 9. For each value, an if statement
compares the current known minimum with the value stored in the array at index given by the idx variable.
If the stored value is smaller, the current known value (again, min) is updated to reflect the programs having
found a smaller value. When the loop finishes all nine iterations, the variable min holds the smallest value
among the set of 10 integers stored in the values array.

c 2012-2015 Steven S. Lumetta. All rights reserved.



As a first step towards designing an FSM
to implement the code, we transform the
code into a flow chart, as shown to the
right. The program again begins with initialization, which appears in the second
column of the flow chart. The loop in the
program translates to the third column
of the flow chart, and the if statement
to the middle comparison and update of
min.

START

37

min = values[0]

10 > idx?

END

T
idx = 1
min >
values[idx]?

T
min = values[idx]

Our goal is now to design an FSM


to implement the flow chart. In order to do so, we want to leverage the
same kind of abstraction that we used
idx = idx + 1
earlier, when extending our keyless entry system with a timer.
Although
the timers value was technically also
part of the FSMs state, we treated it as data and integrated it into our next-state decisions in only a couple
of cases.
For our minimum value problem, we have two sources of data. First, an external program supplies data in
the form of a set of 10 integers. If we assume 32-bit integers, these data technically form 320 input bits!
Second, as with the keyless entry system timer, we have data used internally by our FSM, such as the loop
index and the current minimum value. These are technically state bits. For both types of data, we treat
them abstractly as values rather than thinking of them individually as bits, allowing us to develop our FSM
at a high-level and then to implement it using the components that we have developed earlier in our course.

Choosing Components and Identifying States


Now we are ready to design an FSM that implements the flow chart. What components do we need, other
than our state logic? We use registers and counters to implement the variables idx and min in the program.
For the array values, we use a 1632-bit memory.1 We need a comparator to implement the test for the
if statement. We choose to use a serial comparator, which allows us to illustrate again how one logical
high-level state can be subdivided into many actual states. To operate the serial comparator, we make use
of two shift registers that present the comparator with one bit per cycle on each input, and a counter to
keep track of the comparators progress.
How do we identify high-level states from our flow chart? Although the flow chart attempts to break down
the program into simple steps, one step of a flow chart may sometimes require more than one state in an
FSM. Similarly, one FSM state may be able to implement several steps in a flow chart, if those steps can be
performed simultaneously. Our design illustrates both possibilities.
How we map flow chart elements into FSM states also depends to some degree on what components we use,
which is why we began with some discussion of components. In practice, one can go back and forth between
the two, adjusting components to better match the high-level states, and adjusting states to better match
the desired components.
Finally, note that we are only concerned with high-level states, so we do not need to provide details (yet) down
to the level of individual clock cycles, but we do want to define high-level states that can be implemented
in a fixed number of cycles, or at least a controllable number of cycles. If we cannot specify clearly when
transitions occur from an FSM state, we may not be able to implement the state.

1 We

technically only need a 1032-bit memory, but we round up the size of the address space to reflect more realistic
memory designs; one can always optimize later.

38

c 2012-2015 Steven S. Lumetta. All rights reserved.


Now lets go through the flow chart and identify states. Initialization of min and idx need not occur serially,
and the result of the first comparison between idx and the constant 10 is known in advance, so we can merge
all three operations into a single state, which we call INIT.
We can also merge the updates of min and idx into a second FSM state, which we call COPY. However, the
update to min occurs only when the comparison (min > value[idx]) is true. We can use logic to predicate
execution of the update. In other words, we can use the output of the comparator, which is available after
the comparator has finished comparing the two values (in a high-level FSM state that we have yet to define),
to determine whether or not the register holding min loads a new value in the COPY state.
Our model of use for this FSM involves external logic filling the memory (the array of integer values),
executing the FSM code, and then checking the answer. To support this use model, we create a FSM state
called WAIT for cycles in which the FSM has no work to do. Later, we also make use of an external input
signal START to start the FSM execution. The WAIT state logically corresponds to the START bubble in
the flow chart.
Only the test for the if statement
remains. Using a serial comparaINIT
F
tor to compare two 32-bit values remin = values[0]
START
END
10 > idx?
quires 32 cycles. However, we need
WAIT
an additional cycle to move values into
T
our shift registers so that the comparaidx = 1
PREP
tor can see the first bit. Thus our sinmin >
F
gle comparison operation breaks into
values[idx]?
two high-level states. In the first
COMPARE
T
state, which we call PREP, we copy
min to one of the shift registers, copy
min = values[idx]
values[idx] to the other shift register, and reset the counter that measures the cycles needed for our serial comparator. We then move to a
idx = idx + 1
second high-level state, which we call
COMPARE, in which we feed one bit per
COPY
cycle from each shift register to the serial comparator. The COMPARE state
executes for 32 cycles, after which the comparator produces the one-bit answer that we need, and we can
move to the COPY state. The association between the flow chart and the high-level FSM states is illustrated
in the figure shown to the right above.
We can now also draw an abstract state diagram for
START
our FSM, as shown to the right. The FSM begins in
signal
always
WAIT
INIT
PREP
the WAIT state. After external logic fills the values array, it signals the FSM to begin by raising the START
not end
signal. The FSM transitions into the INIT state, and
of loop
always
in the next cycle into the PREP state. From PREP, the
end of
loop
FSM always moves to COMPARE, where it remains for
COPY
COMPARE
32 cycles while the serial comparator executes a comafter 32 cycles
parison. After COMPARE, the FSM moves to the COPY
state, where it remains for one cycle. The transition from COPY depends on how many loop iterations have
executed. If more loop iterations remain, the FSM moves to PREP to execute the next iteration. If the loop
is done, the FSM returns to WAIT to allow external logic to read the result of the computation.

c 2012-2015 Steven S. Lumetta. All rights reserved.


39

Laying Out Components


Our high-level FSM design tells us what
our components need to be able to do in
any given cycle. For example, when we
load new values into the shift registers
that provide bits to the serial comparator, we always copy min into one shift
register and values[idx] into the second. Using this information, we can put
together our components and simplify
our design by fixing the way in which
bits flow between them.
The figure at the right shows how we
can organize our components. Again, in
practice, one goes back and forth thinking about states, components, and flow
from state to state. In these notes, we
present only a completed design.
Lets take a detailed look at each of the
components. At the upper left of the figure is a 4-bit binary counter called IDX
to hold the idx variable. The counter
can be reset to 0 using the RST input. Otherwise, the CNT input controls
whether or not the counter increments
its value. With this counter design, we
can force idx to 0 in the WAIT state
and then count upwards in the INIT and
COPY states.
A memory labeled VALUES to hold the array values appears in the upper right of the figure. The read/write
control for the memory is hardwired to 1 (read) in the figure, and the data input lines are unattached. To
integrate with other logic that can operate our FSM, we need to add more control logic to allow writing into
the memory and to attach the data inputs to something that provides the data bits. The address input of
the memory comes always from the IDX counter value; in other words, whenever we access this memory by
making use of the data output lines, we read values[idx].
In the middle left of the figure is a 32-bit register for the min variable. It has a control input LD that
determines whether or not it loads a new value at the end of the clock cycle. If a new value is loaded, the
new value always corresponds to the output of the VALUES memory, values[idx]. Recall that min always
changes in the INIT state, and may change in the COPY state. But the new value stored in min is always
values[idx]. Note also that when the FSM completes its task, the result of the computation is left in the
MIN register for external logic to read (connections for this purpose are not shown in the figure).
Continuing downward in the figure, we see two right shift registers labeled A and B. Each has a control input
LD that enables a parallel load. Register A loads from register MIN, and register B loads from the memory
data output (values[idx]). These loads are needed in the PREP state of our FSM. When LD is low, the shift
registers simply shifts to the right. The serial output SO makes the least significant bit of each shift register
available. Shifting is necessary to feed the serial comparator in the COMPARE state.
Below register A is a 5-bit binary counter called CNT. The counter is used to control the serial comparator in
the COMPARE state. A reset input RST allows it to be forced to 0 in the PREP state. When the counter value
is exactly zero, the output Z is high.

c 2012-2015 Steven S. Lumetta. All rights reserved.


40

The last major component is the serial comparator, which is based on the design developed in Notes Set 3.1.
The two bits to be compared in a cycle come from shift registers A and B. The first bit indicator comes
from the zero indicator of counter CNT. The comparator actually produces two outputs (Z1 and Z0), but the
meaning of the Z1 output by itself is A > B. In the diagram, this signal has been labeled THEN.
There are two additional elements in the figure that we have yet to discuss. Each simply compares the value
in a register with a fixed constant and produces a 1-bit signal. When the FSM finishes an iteration of the
loop in the COPY state, it must check the loop condition (10 > idx) and move either to the PREP state or,
when the loop finishes, to the WAIT state to let the external logic read the answer from the MIN register. The
loop is done when the current iteration count is nine, so we compare IDX with nine to produce the DONE
signal. The other constant comparison is between the counter CNT and the value 31 to produce the LAST
signal, which indicates that the serial comparator is on its last cycle of comparison. In the cycle after LAST
is high, the THEN output of the comparator indicates whether or not A > B.

Control and Data


One can think of the components and the interconnections between them as enabling the movement of data
between registers, while the high-level FSM controls which data move from register to register in each cycle.
With this model in mind, we call the components and interconnections for our design the datapatha term
that we will see again when we examine the parts of a computer in the coming weeks. The datapath requires
several inputs to control the operation of the componentsthese we can treat as outputs of the FSM. These
signals allow the FSM to control the motion of data in the datapath, so we call them control signals.
Similarly, the datapath produces several outputs that we can treat as inputs to the FSM. The tables below
summarize the control signals (left) and outputs (right) from the datapath for our FSM.
datapath
input
IDX.RST
IDX.CNT
MIN.LD
A.LD
B.LD
CNT.RST

meaning
reset IDX counter to 0
increment IDX counter
load new value into MIN register
load new value into shift register A
load new value into shift register B
reset CNT counter

datapath
output
DONE
LAST
THEN

meaning
last loop iteration finished
serial comparator executing
last cycle
if statement condition true

based on
IDX = 9
CNT = 31
A > B

Using the datapath controls signals and outputs, we can now write a more formal state transition table
for the FSM, as shown below. The actions column of the table lists the changes to register and counter
values that are made in each of the FSM states. The notation used to represent the actions is called
register transfer language (RTL). The meaning of an individual action is similar to the meaning of the
corresponding statement from our C code or from the flow chart. For example, in the WAIT state, IDX 0
means the same thing as idx = 0;. In particular, both mean that the value currently stored in the IDX
counter is overwritten with the number 0 (all 0 bits).
The meaning of RTL
is slightly different from the usual
interpretation
of
high-level programming
languages,
however, in terms
of when the actions
happen. A list of
C statements is
generally executed
one at a time. In
contrast, the entire
list of RTL actions

state
WAIT

actions (simultaneous)
IDX 0 (to read VALUES[0] in INIT)

INIT

MIN VALUES[IDX] (IDX is 0 in this state)


IDX 1
A MIN
B VALUES[IDX]
CNT 0
run serial comparator

PREP

COMPARE
COPY

THEN: MIN VALUES[IDX]


IDX IDX + 1

condition
START
START
(always)

next state
INIT
WAIT
PREP

(always)

COMPARE

LAST
LAST
DONE
DONE

COPY
COMPARE
WAIT
PREP

c 2012-2015 Steven S. Lumetta. All rights reserved.


41

for an FSM state is executed simultaneously, at the end of the clock cycle. As you know, an FSM moves
from its current state into a new state at the end of every clock cycle, so actions during different cycles
usually are associated with different states. We can, however, change the value in more than one register
at the end of the same clock cycle, so we can execute more than one RTL action in the same state, so long
as the actions do not exceed the capabilities of our datapath (the components must be able to support the
simultaneous execution of the actions). Some care must be taken with states that execute for more than one
cycle to ensure that repeating the RTL actions is appropriate. In our design, only the WAIT and COMPARE
states execute for more than one cycle. The WAIT state resets the IDX counter repeatedly, which causes
no problems. The COMPARE statement has no RTL actionsall of the shifting, comparison, and counting
activity needed to do its work occurs within the datapath itself.
One additional piece of RTL syntax needs explanation. In the COPY state, the first action begins with
THEN:, which means that the prefixed RTL action occurs only when the THEN signal is high. Recall that
the THEN signal indicates that the comparator has found A > B, so the equivalent C code is if (A > B)
{min = values[idx]}.

State Representation and Logic Expressions


Lets think about the representation for the FSM states. The FSM has five states, so we could use as few as
three flip-flops. Instead, we choose to use a one-hot encoding, in which any valid bit pattern has exactly
one 1 bit. In other words, we use five flip-flops instead of three, and our states are represented with the bit
patterns 10000, 01000, 00100, 00010, and 00001.
The table below shows the mapping from each high-level state to both the five-bit encoding for the state as
well as the six control signals needed for the datapath. For each state, the values of the control signals can
be found by examining the actions necessary in that state.
state
WAIT
INIT
PREP
COMPARE
COPY

S4 S3 S2 S1 S0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1

IDX.RST
1
0
0
0
0

IDX.CNT
0
1
0
0
1

MIN.LD
0
1
0
0
THEN

A.LD
0
0
1
0
0

B.LD
0
0
1
0
0

CNT.RST
0
0
1
0
0

The WAIT state needs to set IDX to 0 but need not affect other register or counter values, so WAIT produces
a 1 only for IDX.RST. The INIT state needs to load values[0] into the MIN register while simultaneously
incrementing the IDX counter (from 0 to 1), so INIT produces 1s for IDX.CNT and MIN.LD. The PREP state
loads both shift registers and resets the counter CNT by producing 1s for A.LD, B.LD, and CNT.RST. The
COMPARE state does not change any register values, so it produces all 0s. Finally, the COPY state increments
the IDX counter while simultaneously loading a new value into the MIN register. The COPY state produces 1
for IDX.CNT, but must use the signal THEN coming from the datapath to decide whether or not MIN is loaded.
The advantage of a one-hot encoding becomes obvious when we write
equations for the six control signals
and the next-state logic, as shown
to the right. Implementing the logic
to complete our design now requires
only a handful of small logic gates.

IDX.RST =
IDX.CNT =
MIN.LD =
A.LD =
B.LD =
CNT.RST =

S4
S3 + S0
S3 + S0 THEN
S2
S2
S2

S+
4

S4 START + S0 DONE

S+
3
S+
2
S+
1
S+
0

=
=

S4 START
S3 + S0 DONE

=
=

S2 + S1 LAST
S1 LAST

Notice that the terms in each control signal can be read directly from the rows of the state table and ORd
together. The terms in each of the next-state equations represent the incoming arcs for the corresponding
state. For example, the WAIT state has one self-loop (the first term) and a transition arc coming from the
COPY state when the loop is done. These expressions complete our design.

c 2012-2014 Steven S. Lumetta. All rights reserved.


42

ECE120: Introduction to Computer Engineering


Notes Set 3.7
Summary of Part 3 of the Course
These notes supplement the Patt and Patel textbook, so you will also need to read and understand the
relevant chapters (see the syllabus) in order to master this material completely.
In this short summary, we give you lists at several levels of difficulty of what we expect you to be able to do as
a result of the last few weeks of studying (reading, listening, doing homework, discussing your understanding
with your classmates, and so forth).
Students typically find that the homeworks in this part of the course require more time than did those in
earlier parts of the course. Problems on the exam will be similar in nature but designed to require less actual
time to solve (assuming that you have been doing the homeworks).
Well start with the easy stuff. You should recognize all of these terms and be able to explain what they
mean. For the specific circuits, you should be able to draw them and explain how they work. Actually, we
dont care whether you can draw something from memorya mux, for exampleprovided that you know
what a mux does and can derive a gate diagram correctly for one in a few minutes. Higher-level skills are
much more valuable.
digital systems terms
modules
fan-in
fan-out
machine models: Moore and Mealy
simple state machines
synchronous counter
ripple counter
serialization (of bit-sliced design)
finite state machines (FSMs)
states and state representation
transition rule
self-loop
next state (+) notation
meaning of dont care in input
combination
meaning of dont care in output
unused states and initialization
completeness (with regard to
FSM specification)
list of (abstract) states
next-state table/state transition
table/state table
state transition diagram/transition
diagram/state diagram

memory
design as a collection of latches
number of addresses
addressability
read/write logic
serial/random access memory (RAM)
volatile/non-volatile (N-V)
static/dynamic RAM (SRAM/DRAM)
SRAM cell
DRAM cell
bit lines and sense amplifiers
von Neumann model
processing unit
register file
arithmetic logic unit (ALU)
word size
control unit
program counter (PC)
instruction register (IR)
implementation as FSM
input and output units
memory
memory address register (MAR)
memory data register (MDR)
tri-state buffer
meaning of Z/hi-Z output
use in distributed mux

c 2012-2014 Steven S. Lumetta. All rights reserved.


43

We expect you to be able to exercise the following skills:


Transform a bit-sliced design into a serial design, and explain the tradeoffs involved in terms of area
and time required to compute a result.
Based on a transition diagram, implement a synchronous counter from flip-flops and logic gates.
Implement a binary ripple counter (but not necessarily a more general type of ripple counter) from
flip-flops and logic gates.
Given an FSM implemented as digital logic, analyze the FSM to produce a state transition diagram.
Design an FSM to meet an abstract specification for a task, including production of specified output
signals, and possibly including selection of appropriate inputs.
Complete the specification of an FSM by ensuring that each state includes a transition rule for every
possible input combination.
Compose memory chips into larger memory systems, using additional decoders when necessary.
When designing a finite state machine, we expect you to be able to apply the following design strategies:
Abstract design symmetries from an FSM specification in order to simplify the implementation.
Make use of a high-level state design, possibly with many sub-states in each high-level state, to simplify
the implementation.
Use counters to insert time-based transitions between states (such as timeouts).
Implement an FSM using logic components such as registers, counters, comparators, and adders as
building blocks.
And, at the highest level, we expect that you will be able to do the following:
Explain the difference between the Moore and Mealy machine models, as well as why you might find
each of them useful when designing an FSM.
Understand the need for initialization of an FSM, be able to analyze and identify potential problems
arising from lack of initialization, and be able to extend an implementation to include initialization to
an appropriate state when necessary.
Understand how the choice of internal state bits for an FSM can affect the complexity of the implementation of next-state and output logic, and be able to select a reasonable state assignment.
Identify and fix design flaws in FSMs by analyzing an existing implementation, comparing it with the
specification, and removing any differences by making any necessary changes to the implementation.

44

c 2012-2014 Steven S. Lumetta. All rights reserved.


(C) 2012-2014 Steven S. Lumetta. All rights reserved.


ECE120: Introduction to Computer Engineering
Notes Set 4.1: Control Unit Design

c 2014 Steven S. Lumetta. All rights reserved.


13

ECE120: Introduction to Computer Engineering


Notes Set 4.2
Redundancy and Coding
This set of notes introduces the idea of using sparsely populated representations to protect against accidental
changes to bits. Today, such representations are used in almost every type of storage system, from bits on
a chip to main memory to disk to archival tapes. We begin our discussion with examples of representations
in which some bit patterns have no meaning, then consider what happens when a bit changes accidentally.
We next outline a general scheme that allows a digital system to detect a single bit error. Building on the
mechanism underlying this scheme, we describe a distance metric that enables us to think more broadly
about both detecting and correcting such errors, and then show a general approach that allows correction of
a single bit error. We leave discussion of more sophisticated schemes to classes on coding and information
theory.

Sparse Representations
Representations used by computers must avoid ambiguity: a single bit pattern in a representation cannot be
used to represent more than one value. However, the converse need not be true. A representation can have
several bit patterns representing the same value, and not all bit patterns in a representation need be used
to represent values.
Lets consider a few example of representations with unused patterns. Historically, one common class of
representations of this type was those used to represent individual decimal digits. We examine three examples
from this class.
The first is Binary-coded Decimal (BCD), in which decimal digits are encoded individually using their representations in the unsigned (binary) representation. Since we have 10 decimal digits, we need 10 patterns, and
four bits for each digit. But four bits allow 24 = 16 bit patterns. In BCD, the patterns 1010, 1011, . . . , 1111
are unused. It is important to note that BCD is not the same as the unsigned representation. The decimal
number 732, for example, requires 12 bits when encoded as BCD: 0111 0011 0010. When written using a
12-bit unsigned representation, 732 is written 001011011100. Operations on BCD values were implemented
in early processors, including the 8086, and are thus still available in the x86 instruction set architecture
today!
The second example is an Excess-3 code, in which each decimal digit d is represented by the pattern corresponding to the 4-bit unsigned pattern for d + 3.
For example, the digit 4 is represented as 0111, and the digit 7 is represented
as 1010. The Excess-3 code has some attractive aspects when using simple
hardware. For example, we can use a 4-bit binary adder to add two digits c
and d represented in the Excess-3 code, and the carry out signal produced by
the adder is the same as the carry out for the decimal addition, since c + d 10
is equivalent to (c + 3) + (d + 3) 16.
The third example of decimal digit representations is a 2-out-of-5 code. In
such a code, five bits are used to encode each digit. Only patterns with
exactly two 1s are used. There are exactly ten such patterns, and an example
representation is shown to the right (more than one assignment of values to
patterns has been used in real systems).

digit
1
2
3
4
5
6
7
8
9
0

a 2-out-of-5
representation
00011
00101
00110
01001
01010
01100
10001
10010
10100
11000

Error Detection
Errors in digital systems can occur for many reasons, ranging from cosmic ray strikes to defects in chip
fabrication to errors in the design of the digital system. As a simple model, we assume that an error takes
the form of changes to some number of bits. In other words, a bit that should have the value 0 instead has
the value 1, or a bit that should have the value 1 instead has the value 0. Such an error is called a bit error.

14

c 2014 Steven S. Lumetta. All rights reserved.


Digital systems can be designed with or without tolerance to errors. When an error occurs, no notification
nor identification of the error is provided. Rather, if error tolerance is needed, the system must be designed
to be able to recognize and identify errors automatically. Often, we assume that each of the bits may be in
error independently of all of the others, each with some low probability. With such an assumption, multiple
bit errors are much less likely than single bit errors, and we can focus on designs that tolerate a single bit
error. When a bit error occurs, however, we must assume that it can happen to any of the bits.
The use of many patterns to represent a smaller number of values, as is the case in a 2-out-of-5 code,
enables a system to perform error detection. Lets consider what happens when a value represented using
a 2-out-of-5 code is subjected to a single bit error. Imagine that we have the digit 7. In the table on the
previous page, notice that the digit 7 is represented with the pattern 10001.
As we mentioned, we must assume that the bit error can occur in any of the five bits, thus we have five
possible bit patterns after the error occurs. If the error occurs in the first bit, we have the pattern 00001. If
the error occurs in the second bit, we have the pattern 11001. The complete set of possible error patterns
is 00001, 11001, 10101, 10011, and 10000.
Notice that none of the possible error patterns has exactly two 1s, and thus none of them is a meaningful
pattern in our 2-out-of-5 code. In other words, whenever a digital system represents the digit 7 and a single
bit error occurs, the system will be able to detect that an error has occurred.
What if the system needs to represent a different digit? Regardless of which digit is represented, the pattern
with no errors has exactly two 1s, by the definition of our representation. If we then flip one of the five bits
by subjecting it to a bit error, the resulting error pattern has either one 1 (if the bit error changes a 1 to
a 0) or three 1s (if the bit error changes a 0 to a 1). In other words, regardless of which digit is represented,
and regardless of which bit has an error, the resulting error pattern never has a meaning in the 2-out-of-5
code. So this representation enables a digital system to detect any single bit error!

Parity
The ability to detect any single bit error is certainly useful. However, so far we have only shown how to
protect ourselves when we want to represent decimal digits. Do we need to develop a separate error-tolerant
representation for every type of information that we might want to represent? Or can we instead come up
with a more general approach? The answer to the second question is yes: we can, in fact, systematically
transform any representation into a representation that allows detection of a single bit error. The key to
this transformation is the idea of parity.
Consider an arbitrary representation for
some type of information. For each pattern
used in the representation, we can count
the number of 1s. The resulting count is
either odd or even. By adding an extra
bitcalled a parity bitto the representation, and selecting the parity bits value
appropriately for each bit pattern, we can
ensure that the count of 1s is odd (called
odd parity) or even (called even parity)
for all values represented. The idea is illustrated in the table to the right for the
3-bit unsigned representation. The parity
bits are shown in bold.

value
represented
0
1
2
3
4
5
6
7

3-bit
unsigned
000
001
010
011
100
101
110
111

number
of 1s
0
1
1
2
1
2
2
3

with odd
parity
0001
0010
0100
0111
1000
1011
1101
1110

with even
parity
0000
0011
0101
0110
1001
1010
1100
1111

Either approach to selecting the parity bits ensures that any single bit error can be detected. For example,
if we choose to use odd parity, a single bit error changes either a 0 into a 1 or a 1 into a 0. The number
of 1s in the resulting error pattern thus differs by exactly one from the original pattern, and the parity of
the error pattern is even. But all valid patterns have odd parity, so any single bit error can be detected by
simply counting the number of 1s.

c 2014 Steven S. Lumetta. All rights reserved.


15

Hamming Distance
Next, lets think about how we might use representationswe might also think of them as codesto protect
a system against multiple bit errors. As we have seen with parity, one strategy that we can use to provide
such error tolerance is the use of representations in which only some of the patterns actually represent values.
Lets call such patterns code words. In other words, the code words in a representation are those patterns
that correspond to real values of information. Other patterns in the representation have no meaning.
As a tool to help us understand error tolerance, lets define a measure of the distance between code words
in a representation. Given two code words X and Y , we can calculate the number NX,Y of bits that must
change to transform X into Y . Such a calculation merely requires that we compare the patterns bit by bit
and count the number of places in which they differ. Notice that this relationship is symmetric: the same
number of changes are required to transform Y into X, so NY,X = NX,Y . We refer to this number NX,Y
as the Hamming distance between code word X and code word Y . The metric is named after Richard
Hamming, a computing pioneer and an alumnus of the UIUC Math department.
The Hamming distance between two code words tells us how many bit errors are necessary in order for
a digital system to mistake one code word for the other. Given a representation, we can calculate the
minimum Hamming distance between any pair of code words used by the representation. The result is called
the Hamming distance of the representation, and represents the minimum of bit errors that must occur
before a system might fail to detect errors in a stored value.
The Hamming distance for nearly all of the representations that we introduced in earlier sections is 1. Since
more than half of the patterns (and often all of the patterns!) correspond to meaningful values, some pairs
of code words must differ in only one bit, and these representations cannot tolerate any errors. For example,
the decimal value 42 is stored as 101010 using a 6-bit unsigned representation, but any bit error in that
pattern produces another valid pattern corresponding to one of the following decimal numbers: 10, 58, 34,
46, 40, 43. Note that the Hamming distance between any two patterns is not necessarily 1. Rather, the
Hamming distance of the unsigned representation, which corresponds to the minimum between any pair of
valid patterns, is 1.
In contrast, the Hamming distance of the 2-out-of-5 code that we discussed earlier is 2. Similarly, the
Hamming distance of any representation extended with a parity bit is at least 2.
Now lets think about the problem slightly differently. Given a particular representation, how many bit
errors can we detect in values using that representation? A representation with Hamming distance d can
detect up to d 1 bit errors. To understand this claim, start by selecting a code word from the representation
and changing up to d 1 of the bits. No matter how one chooses to change the bits, these changes cannot
result in another code word, since we know that any other code word has to require at least d changes from
our original code word, by the definition of the representations Hamming distance. A digital system using
the representation can thus detect up to d 1 errors. However, if d or more errors occur, the system might
sometimes fail to detect any error in the stored value.

Error Correction
Detection of errors is important, but may sometimes not be enough. What can a digital system do when it
detects an error? In some cases, the system may be able to find the original value elsewhere, or may be able
to re-compute the value from other values. In other cases, the value is simply lost, and the digital system
may need to reboot or even shut down until a human can attend to it. Many real systems cannot afford such
a luxury. Life-critical systems such as medical equipment and airplanes should not turn themselves off and
wait for a humans attention. Space vehicles face a similar dilemma, since no human may be able to reach
them.
Can we use a strategy similar to the one that we have developed for error detection in order to try to
perform error correction, recovering the original value? Yes, but the overheadthe extra bits that we
need to provide such functionalityis higher.

16

c 2014 Steven S. Lumetta. All rights reserved.


Lets start by thinking about a code with Hamming distance 2, such as 4-bit 2s complement with odd parity.
We know that such a code can detect one bit error. Can it correct such a bit error, too?
Imagine that a system has stored the decimal value 6 using the pattern 01101, where the last bit is the odd
parity bit. A bit error occurs, changing the stored pattern to 01111, which is not a valid pattern, since it has
an even number of 1s. But can the system know that the original value stored was 6? No, it cannot. The
original value may also have been 7, in which case the original pattern was 01110, and the bit error occurred
in the final bit. The original value may also have been -1, 3, or 5. The system has no way of resolving this
ambiguity. The same problem arises if a digital system uses a code with Hamming distance d to detect up
to d 1 errors.
Error correction is possible, however, if we assume that fewer bit errors occur
three-copy
value
(or if we instead use a representation with a larger Hamming distance). As
represented
code
a simple example, lets create a representation for the numbers 0 through 3
0
000000
by making three copies of the 2-bit unsigned representation, as shown to the
010101
1
right. The Hamming distance of the resulting code is 3, so any two bit errors
2
101010
can be detected. However, this code also enables us to correct a single bit
3
111111
error. Intuitively, think of the three copies as voting on the right answer.
Since a single bit error can only corrupt one copy, a majority vote always gives the right answer! Tripling
the number of bits needed in a representation is not a good general strategy, however. Notice also that
correcting a pattern with two bit errors can produce the wrong result.
Lets think about the problem in terms of Hamming distance. Assume that we use a code with Hamming
distance d and imagine that up to k bit errors affect a stored value. The resulting pattern then falls within a
neighborhood of distance k from the original code word. This neighborhood contains all bit patterns within
Hamming distance k of the original pattern. We can define such a neighborhood around each code word.
Now, since d bit errors are needed to transform a code word into any other code word, these neighborhoods
are disjoint so long as 2k d 1. In other words, if the inequality holds, any bit pattern in the representation
can be in at most one code words neighborhood. The digital system can then correct the errors by selecting
the unique value identified by the associated neighborhood. Note that patterns encountered as a result of
up to k bit errors always fall within the original code words neighborhood; the inequality ensures that the
neighborhood identified in this way is unique. We can manipulate the inequality to express the number
of errors k that can be corrected in terms of the Hamming distance d of the code. A code with Hamming
distance d allows up to d1
2 errors to be corrected, where x represents the integer floor function on x, or
rounding x down to the nearest integer.

Hamming Codes
Hamming also developed a general and efficient approach for extending an arbitrary representation to allow
correction of a single bit error. The approach yields codes with Hamming distance 3. To understand how
a Hamming code works, think of the bits in the representation as being numbered starting from 1. For
example, if we have seven bits in the code, we might write a bit pattern X as x7 x6 x5 x4 x3 x2 x1 .
The bits with indices that are powers of two are parity check bits. These include x1 , x2 , x4 , x8 , and so forth.
The remaining bits can be used to hold data. For example, we could use a 7-bit Hamming code and map the
bits from a 4-bit unsigned representation into bits x7 , x6 , x5 , and x3 . Notice that Hamming codes are not
so useful for small numbers of bits, but require only logarithmic overhead for large numbers of bits. That is,
in an N -bit Hamming code, only log2 (N + 1) bits are used for parity checks.
How are the parity checks defined? Each parity bit is used to provide even parity for those bits with indices
for which the index, when written in binary, includes a 1 in the single position in which the parity bits index
contains a 1. The x1 bit, for example, provides even parity on all bits with odd indices. The x2 bit provides
even parity on x2 , x3 , x6 , x7 , x10 , and so forth.
In a 7-bit Hamming code, for example, x1 is chosen so that it has even parity together with x3 , x5 , and x7 .
Similarly, x2 is chosen so that it has even parity together with x3 , x6 , and x7 . Finally, x4 is chosen so that
it has even parity together with x5 , x6 , and x7 .

c 2014 Steven S. Lumetta. All rights reserved.



The table to the right shows the result of embedding a 4-bit unsigned representation into
a 7-bit Hamming code.
A Hamming code provides a convenient way to
identify which bit should be corrected when a
single bit error occurs. Notice that each bit is
protected by a unique subset of the parity bits
corresponding to the binary form of the bits index. Bit x6 , for example, is protected by bits x4
and x2 , because the number 6 is written 110 in
binary. If a bit is affected by an error, the parity
bits that register the error are those corresponding to 1s in the binary number of the index. So
if we calculate check bits as 1 to represent an
error (odd parity) and 0 to represent no error
(even parity), then concatenate those bits into
a binary number, we obtain the binary value of
the index of the single bit affected by an error
(or the number 0 if no error has occurred).

17

value
represented
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

4-bit
unsigned
(x7 x6 x5 x3 )
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111

x4
0
0
1
1
1
1
0
0
1
1
0
0
0
0
1
1

x2
0
1
0
1
1
0
1
0
1
0
1
0
0
1
0
1

x1
0
1
1
0
0
1
1
0
1
0
0
1
1
0
0
1

7-bit
Hamming
code
0000000
0000111
0011001
0011110
0101010
0101101
0110011
0110100
1001011
1001100
1010010
1010101
1100001
1100110
1111000
1111111

Lets do a couple of examples based on the pattern for the decimal number 9, 1001100. First, assume that
no error occurs. We calculate check bit c4 by checking whether x4 , x5 , x6 , and x7 together have even parity.
Since no error occurred, they do, so c4 = 0. Similarly, for c2 we consider x2 , x3 , x6 , and x7 . These also have
even parity, so c2 = 0. Finally, for c1 , we consider x1 , x3 , x5 , and x7 . As with the others, these together
have even parity, so c1 = 0. Writing c4 c2 c1 , we obtain 000, and conclude that no error has occurred.
Next assume that bit 3 has an error, giving us the pattern 1001000. In this case, we have again that c4 = 0,
but the bits corresponding to both c2 and c1 have odd parity, so c2 = 1 and c1 = 1. Now when we write the
check bits c4 c2 c1 , we obtain 011, and we are able to recognize that bit 3 has been changed.
A Hamming code can only correct one bit error, however. If two bit errors occur, correction will produce
the wrong answer. Lets imagine that both bits 3 and 5 have been flipped in our example pattern for the
decimal number 9, producing the pattern 1011000. Calculating the check bits as before and writing them
as c4 c2 c1 , we obtain 110, which leads us to incorrectly conclude that bit 6 has been flipped. As a result, we
correct the pattern to 1111000, which represents the decimal number 14.

SEC-DED Codes
We now consider one final extension of Hamming codes to enable a system to perform single error correction
while also detecting any two bit errors. Such codes are known as Single Error Correction, Double Error
Detection (SEC-DED) codes. Creating such a code from a Hamming code is trivial: add a parity bit
covering the entire Hamming code. The extra parity bit increases the Hamming distance to 4. A Hamming
distance of 4 still allows only single bit error correction, but avoids the problem of Hamming distance 3 codes
when two bit errors occur, since patterns at Hamming distance 2 from a valid code word cannot be within
distance 1 of another code word, and thus cannot be corrected to the wrong result.
In fact, one can add a parity bit to any representation with an odd Hamming distance to create a new
representation with Hamming distance one greater than the original representation. To proof this convenient
fact, begin with a representation with Hamming distance d, where d is odd. If we choose two code words
from the representation, and their Hamming distance is already greater than d, their distance in the new
representation will also be greater than d. Adding a parity bit cannot decrease the distance. On the other
hand, if the two code words are exactly distance d apart, they must have opposite parity, since they differ
by an odd number of bits. Thus the new parity bit will be a 0 for one of the code words and a 1 for the
other, increasing the Hamming distance to d + 1 in the new representation. Since all pairs of code words
have Hamming distance of at least d + 1, the new representation also has Hamming distance d + 1.

18

c 2000-2014 Steven S. Lumetta. All rights reserved.


ECE120: Introduction to Computer Engineering


Notes Set 4.3
Instruction Set Architecture*
This set of notes discusses tradeoffs and design elements of instruction set architectures (ISAs). The material
is beyond the scope of our class, and is provided purely for your interest.
As you know, the ISA defines the interface between software and hardware, abstracting the capabilities
of a computers datapath and standardizing the format of instructions to utilize those capabilities. Successful ISAs are rarely discarded, as success implies the existence of large amounts of software built to
use the ISA. Rather, they are extended, and their original forms must be supported for decades (consider,
for example, the IBM 360 and the Intel x86). Employing sound design principles is thus imperative in an ISA.

Formats and Fields


The LC-3 ISA employs fixed-length instructions and a load-store architecture, two aspects that help to
reduce the design space to a manageable set of choices. In a general ISA design, many other options exist
for instruction formats.
Recall the idea of separating the bits of an instruction into (possibly non-contiguous) fields. One of the fields
must contain an opcode, which specifies the type of operation to be performed by the instruction. In our
example architecture, the opcode specified both the type of operation and the types of arguments to the
operation. In a more general architecture, many addressing modes are possible for each operand (as you
learned in section), and we can think of the bits that specify the addressing mode as a separate field, known
as the mode field. With regard to our example architecture, the first bit of the ALU/shift operations can
be called a mode bit: when the bit has value 0, the instruction uses register-to-register mode; when the bit
has value 1, the instruction uses immediate mode.
Several questions must be answered in order to define the possible instruction formats for an ISA. First, are
instructions fixed-length or variable-length? Second, how many addresses are needed for each instruction,
and how many of the addresses can be memory addresses? Finally, what forms of addresses are possible for
each operand? For example, can one use full memory addresses or only limited offsets relative to a register?
The answer to the first question depends on many factors, but several clear advantages exist for both answers.
Fixed-length instructions are easy to fetch and decode. A processor knows in advance how many bits
must be fetched to fetch a full instruction; fetching the opcode and mode fields in order to decide how
many more bits are necessary to complete the instruction often takes more than one cycle, as did the twoword instructions in our example architecture. Fixing the time necessary for instruction fetch also simplifies
pipelining. Finally, fixed-length instructions simplify the datapath by restricting instructions to the size of
the bus and always fetching properly aligned instructions. As an example of this simplification, note that
our 16-bit example architecture did not need to support addressing for individual bytes, only 16-bit words.
Variable-length instructions also have benefits, however. Variable-length encodings allow more efficient
encodings, saving both memory and disk space. A register transfer operation, for example, clearly requires
fewer bits than addition of values at two direct memory addresses for storage at a third. Fixed-length instructions must be fixed at the length of the longest possible instruction, whereas variable-length instructions
can use lengths appropriate to each mode. The same tradeoff has another form in the sense that fixed-length
ISAs typically eliminate many addressing modes in order to limit the size of the instructions. Variablelength instructions thus allow more flexibility; indeed, extensions to a variable-length ISA can incorporate
new addressing modes that require longer instructions without affecting the original ISA.

c 2000-2014 Steven S. Lumetta. All rights reserved.


19

Moving to the last of the three questions posed for instruction format definition, we explore a range of answers developed over the last few decades. Answers are usually chosen based on the number of bits necessary,
and we use this metric to organize the possibilities. The figure below separates approaches into two dimensions: the vertical dimension divides addressing into registers and memory, and the horizontal dimension
into varieties within each type.
fewer
bits

need fewer bits

register

implicit

memory

implicit

need more bits

special-purpose
registers

"zero page"
memory

relative
addresses

general-purpose
registers

segmented
memory

full
addresses

more
bits

As a register file contains fewer registers than a memory does words, the use of register operands rather than
memory addresses reduces the number of bits required to specify an operand. Our example architecture
used only register operands to stay within the limit imposed by the decision to use only 16-bit instructions.
Both register and memory addresses, however, admit a wide range of implementations.
Implicit operands of either type require no additional bits for the implicit address. A typical procedure
call instruction, for example, pushes a return address onto the stack, but the stack pointer can be named
implicitly, without the use of bits in the instruction beyond the opcode bits necessary to specify a procedure
call. Similarly, memory addresses can be implicitly equated to other memory addresses; an increment
instruction operating on a memory address, for example, implicitly writes the result back to the same
address. The opposite extreme provides full addressing capabilities, either to any register in the register file
or to any address in the memory. As addressing decisions are usually made for classes of instructions rather
than individual operations, I have used the term general-purpose registers to indicate that the registers
are used in any operation.
Special-purpose registers, in contrast, split the register file and allow only certain registers to be used in
each operation. For example, the Motorola 680x0 series, used until recently in Apple Macintosh computers,
provides distinct sets of address and data registers. Loads and stores use the address registers; ALSU
operations use the data registers. As a result, each instruction selects from a smaller set of registers and
thus requires fewer bits in the instruction to name the register for use.
As full memory addresses require many more bits than full register addresses, a wider range of techniques
has been employed to reduce the length. Zero page addresses, as defined in the 6510 (6502) ISA used by
Commodore PETs,1 C64s,2 and VIC 20s, prefixed a one-byte address with a zero byte, allowing shorter
instructions when memory addresses fell within the first 256 memory locations. Assembly and machine
language programmers made heavy use of these locations to produce shorter programs.
Relative addressing appeared in the context of control flow instructions of our example architecture, but
appears in many modern architectures as well. The Alpha, for example, has a relative form of procedure
call with a 21-bit offset (plus or minus a megabyte). The x86 architecture has a short form of branch
instructions that uses an 8-bit offset.
Segmented memory is a form of relative addressing that uses a register (usually implicit) to provide the high
bits of an address and an explicit memory address (or another register) to provide the low bits. In the x86
architecture, for example, 20-bit addresses are found by adding a 16-bit segment register extended with four
zero bits to a 16-bit offset.

1 My
2 My

computer in junior high school.


computer in high school.

c 2000-2014 Steven S. Lumetta. All rights reserved.


20

Addressing Architectures
One question remains for the definition of instruction formats: how many addresses are needed for each
instruction, and how many of the addresses can be memory addresses? The first part of this question usually
ranges from zero to three, and is very rarely allowed to go beyond three. The answer to the second part
determines the addressing architecture implemented by an ISA. We now illustrate the tradeoffs between
five distinct addressing architectures through the use of a running example, the assignment X = AB + C/D.
A binary operator requires two source operands and one destination operand, for a total of three addresses.
The ADD instruction, for example, has a 3-address format:
or

ADD A,B,C
ADD R1,R2,R3

; M [A] M [B] + M [C]


; R1 R2 + R3

If all three addresses can be memory addresses, the ISA is dubbed a memory-to-memory architecture.
Such architectures may have small register sets or even lack a register file completely. To implement the
assignment, we assume the availability of two memory locations, T1 and T2, for temporary storage:
MUL T1,A,B
DIV T2,C,D
ADD X,T1,T2

; T 1 M [A] M [B]
; T 2 M [C]/M [D]
; X M [T 1] + M [T 2]

The assignment requires only three instructions to implement, but each instruction contains three full memory
addresses, and is thus very long.
At the other extreme is the load-store architecture used by the ISA that we developed earlier. In a loadstore architecture, only loads and stores can use memory addresses; all other operations use only registers.
As most instructions use only registers, this type of addressing architecture is also called a register-toregister architecture. The example assignment translates to the code at the top of the next page, which
assumes that R1, R2, and R3 are free for use.
LD
LD
MUL
LD
LD
DIV
ADD
ST

R1,A
R2,B
R1,R1,R2
R2,C
R3,D
R2,R2,R3
R1,R1,R2
R1,X

;
;
;
;
;
;
;
;

R1 M [A]
R2 M [B]
R1 R1 R2
R2 M [C]
R3 M [D]
R2 R2/R3
R1 R1 + R2
M [X] R1

Eight instructions are necessary, but no instruction requires more than one full memory address, and several
use only register addresses, allowing the use of shorter instructions. The need to move data in and out of
memory explicitly, however, also requires a reasonably large register set, as is available in the Sparc, Alpha,
and IA-64 architectures.
Architectures that use other combinations of memory and register addresses with 3-address formats are not
named. Unary operators and transfer operators require only one source operand, thus can use a 2-address
format (for example, NOT A,B). Binary operations can also use 2-address format if one operand is implicit,
as in the following instructions:
or

ADD A,B
ADD R1,B

; M [A] M [A] + M [B]


; R1 R1 + M [B]

The second instruction, in which one address is a register and the second is a memory address, defines a
register-memory architecture. As shown on the next page, such architectures strike a balance between
the two architectures just discussed.

c 2000-2014 Steven S. Lumetta. All rights reserved.



LD
MUL
LD
DIV
ADD
ST

R1,A
R1,B
R2,C
R2,D
R1,R2
R1,X

;
;
;
;
;
;

21

R1 M [A]
R1 R1 M [B]
R2 M [C]
R2 R2/M [D]
R1 R1 + R2
M [X] R1

The assignment requires six instructions using at most one memory address each; like memory-to-memory
architectures, register-memory architectures use relatively few registers. Note that two-register operations
are also allowed. Intels x86 ISA is a register-memory architecture.
Several ISAs of the past3 used a special-purpose register called the accumulator for ALU operations, and
are called accumulator architectures. The accumulator in such architectures is implicitly both a source
and the destination for any such operation, allowing a 1-address format for instructions, as shown below.
or

ADD B
ST
E

; ACC ACC + M [B]


; M [E] ACC

Accumulator architectures strike the same balance as register-memory architectures, but use fewer registers.
Note that memory location X is used as a temporary storage location as well as the final storage location in
the following code:
LD
MUL
ST
LD
DIV
ADD
ST

A
B
X
C
D
X
X

;
;
;
;
;
;
;

ACC M [A]
ACC ACC M [B]
M [X] ACC
ACC M [C]
ACC ACC/M [D]
ACC ACC + M [X]
M [X] ACC

The last addressing architecture that we discuss is rarely used for modern general-purpose processors, but
is perhaps the most familiar to you because of its use in scientific and engineering calculators for the last
fifteen to twenty years. A stack architecture maintains a stack of values and draws all ALU operands
from this stack, allowing these instructions to use a 0-address format. A special-purpose stack pointer (SP)
register points to the top of the stack in memory. and operations analogous to load (push) and store (pop)
are provided to move values on and off the stack. To implement our example assignment, we first transform
it into postfix notation (also called reverse Polish notation):
A B * C D / +
The resulting sequence of symbols transforms on a one-to-one basis into instructions for a stack architecture:
PUSH
PUSH
MUL
PUSH
PUSH
DIV
ADD
POP

A
B
C
D

;
;
;
;
;
;
;
;

SP SP 1, M [SP ] M [A]
SP SP 1, M [SP ] M [B]
M [SP + 1] M [SP + 1] M [SP ], SP SP + 1
SP SP 1, M [SP ] M [C]
SP SP 1, M [SP ] M [D]
M [SP + 1] M [SP + 1]/M [SP ], SP SP + 1
M [SP + 1] M [SP + 1] + M [SP ], SP SP + 1
M [X] M [SP ], SP SP + 1

A
B
AB
C
D
C/D
AB+C/D

A
AB
C
AB

AB

The values to the right are the values on the stack, starting with the top value on the left and progressing
downwards, after the completion of each instruction.

3 The

6510/6502 as well, if memory serves, as the 8080, Z80, and Z8000, which used to drive parlor video games.

c 2000-2014 Steven S. Lumetta. All rights reserved.


22

Common Special-Purpose Registers


This section illustrates the uses of special-purpose registers through a few examples.
The stack pointer (SP) points to the top of the stack in memory. Most older architectures support push
and pop operations that implicitly use the stack pointer. Modern architectures assign a general-purpose
register to be the stack pointer and reference it explicitly, although an assembler may support instructions
that appear to use implicit operands but in fact translate to machine instructions with explicit reference to
the register defined to be the SP.
The program counter (PC) points to the next instruction to be executed. Some modern architectures
expose it as a general-purpose register, although its distinct role in the implementation keeps such a model
from becoming as common as the use of a general-purpose register for the SP.
The processor status register (PSR), also known as the processor status word (PSW), contains all
status bits as well as a mode bit indicating whether the processor is operating in user mode or privileged
(operating system) mode. Having a register with this information allows more general access than is possible
solely through the use of control flow instructions.
The zero register appears in modern architectures of the RISC variety (defined in the next section of these
notes). The register is read-only and serves both as a useful constant and as a destination for operations
performed only for their side-effects (for example, setting status bits). The availability of a zero register also
allows certain opcodes to serve double duty. A register-to-register add instruction becomes a register move
instruction when one source operand is zero. Similarly, an immediate add instruction becomes an immediate
load instruction when one source operand is zero.

Reduced Instruction Set Computers


By the mid-1980s, the VAX architecture dominated the workstation and minicomputer markets, which
included most universities. Digital Equipment Corporation, the creator of the VAX, was second only to IBM
in terms of computer sales. VAXen, as the machines were called, used microprogrammed control units and
supported numerous addressing modes as well as very complex instructions ranging from square root to
find roots of polynomial equation.
The impact of increasingly dense integrated circuit technology had begun to have its effect, however, and
in view of increasing processor clock speeds, more and more programmers were using high-level languages
rather than writing assembly code. Although assembly programmers often made use of the complex VAX
instructions, compilers were usually unable to recognize the corresponding high-level language constructs
and thus were unable to make use of the instructions.
Increasing density also led to rapid growth in memory sizes, to the point that researchers began to question
the need for variable-length instructions. Recall that variable-length instructions allow shorter codes by
providing more efficient instruction encodings. With the trend toward larger memories, code length was
less important. The performance advantage of fixed-length instructions, which simplifies the datapath and
enables pipelining, on the other hand, was very attractive.
Researchers leveraged these ideas, which had been floating around the research community (and had appeared
in some commercial architectures) to create reduced instruction set computers, or RISC machines. The
competing VAXen were labeled CISC machines, which stands for complex instruction set computers.
RISC machines employ fixed-length instructions and a load-store architecture, allowing only a few addressing
modes and small offsets. This combination of design decisions enables deep pipelines and multiple instruction
issues in a single cycle (termed superscalar implementations), and for years, RISC machines were viewed by
many researchers as the proper design for future ISAs. However, companies such as Intel soon learned to
pipeline microoperations after decoding instructions, and CISC architectures now offer competitive if not
superior performance in comparison with RISC machines. The VAXen are dead, of course,4 having been
replaced by the Alpha.
4 Unless

you talk with customer support employees, for whom no machine ever dies.

c 2000-2014 Steven S. Lumetta. All rights reserved.


23

Procedure and System Calls


A procedure is a sequence of instructions that executes a particular task. Procedures are used as building
blocks for multiple, larger tasks. The concept of a procedure is fundamental to programming, and appears
in some form in every high-level language as well as in most low-level designs.5 For our purposes, the terms
procedure, subroutine, function, and method are synonymous, although they usually have slightly different
meanings from the linguistic point of view. Procedure calls are supported through call and return control
flow instructions. The first instruction in the code below, for example, transfers control to the procedure
DoSomeWork, which presumably does some work, then returns control to the instruction following the
call.

loop:

CALL
CMP
BEQ

DoSomeWork:

DoSomeWork
R6,#1
loop

RETN

; compare return value in R6 to 1


; keep doing work until R6 is not 1
; set R6 to 0 when all work is done, 1 otherwise

The procedure also places a return value in R6, which the instruction following the call compares with
immediate value 1. Until the two are not equal (when all work is done), the branch returns control to the
call and executes the procedure again.
As you may recall, the call and return use the stack pointer to keep track of nested calls. Sample RTL for
these operations appears below.
call RTL

SP SP 1
M [SP ] P C

return RTL

P C M [SP ]
SP SP + 1

P C procedure start
While an ISA provides the call and return instructions necessary to support procedures, it does not specify
how information is passed to or returned from a procedure. A standard for such decisions is usually developed
and included in descriptions of the architecture, however. This calling convention specifies how information
is passed between a caller and a callee. In particular, it specifies the following: where arguments must be
placed, either in registers or in specific stack memory locations; which registers can be used or changed by
the procedure; and where any return value must be placed.
The term calling convention is also used in the programming language community to describe the convention for deciding what information is passed for a given call operation. For example, are variables passed
by value, by pointers to values, or in some other way? However, once the things to be sent are decided, the
architectural calling convention that we discuss in this class is used to determine where to put the data in
order for the callee to be able to find it.

5 The

architecture that you used in the labs allowed limited use of procedures in its microprogram.

c 2000-2014 Steven S. Lumetta. All rights reserved.


24

Calling conventions for architectures with large register sets typically pass arguments
in registers, and nearly all conventions place the return value in a register. A calling
convention also divides the register set into caller saved and callee saved registers.
Caller saved registers can be modified arbitrarily by the called procedure, whereas any
value in a callee saved register must be preserved. Similarly, before calling a procedure,
a caller must preserve the values of any caller saved registers that are needed after the
call. Registers of both types usually saved on the stack by the appropriate code (caller
or callee).
A typical stack structure appears in the figure to the right. In preparation for a call, a
caller first stores any caller saved registers on the stack. Arguments to the procedure to
be called are pushed next. The procedure is called next, implicitly pushing the return
address (the address of the instruction following the call instruction). Finally, the called
procedure may allocate space on the stack for storage of callee saved registers as well as
local variables.

storage
for more
calls
SP

storage for
current
procedure
return
address
extra
arguments

last
procs
SP

saved
values
call
stack

As an example, the following calling convention can be applied to our example architecture: the first three
arguments must be placed in R0 through R2 (in order), with any remaining arguments on the stack; the
return value must be placed in R6; R0 through R2 are caller saved, as is R6, while R3 through R5 are callee
saved; R7 is used as the stack pointer. The code fragments below use this calling convention to implement
a procedure and a call of that procedure.
int add3 (int n1, int n2, int n3) {
return (n1 + n2 + n3);
}
...
printf (%d, add3 (10, 20, 30));

by convention:
n1 is in R0
n2 is in R1
n3 is in R2
return value is in R6

add3: ADD
ADD
RETN
...
PUSH
LDI
LDI
LDI
CALL
MOV
LDI
CALL
POP

R0,R0,R1
R6,R0,R2

R4
R0,#10
R1,#20
R2,#30
add3
R1,R6
R0,%d
printf
R4

; save the value in R4


; marshal arguments

; return value becomes second argument


; load a pointer to the string
; restore R4

The add3 procedure takes three integers as arguments, adds them together, and returns the sum. The
procedure is called with the constants 10, 20, and 30, and the result is printed. By the calling convention,
when the call is made, R0 must contain the value 10, R1 the value 20, and R2 the value 30. We assume that
the caller wants to preserve the value of R4, but does not care about R3 or R5. In the assembly language
version on the right, R4 is first saved to the stack, then the arguments are marshaled into position, and
finally the call is made. The procedure itself needs no local storage and does not change any callee saved
registers, thus must simply add the numbers together and place the result in R6. After add3 returns, its
return value is moved from R6 to R1 in preparation for the call to printf. After loading a pointer to the
format string into R0, the second call is made, and R4 is restored, completing the translation.
System calls are almost identical to procedure calls. As with procedure calls, a calling convention is used:
before invoking a system call, arguments are marshaled into the appropriate registers or locations in the
stack; after a system call returns, any result appears in a pre-specified register. The calling convention used
for system calls need not be the same as that used for procedure calls. Rather than a call instruction, system
calls are usually initiated with a trap instruction, and system calls are also known as traps. With many
architectures, a system call places the processor in privileged or kernel mode, and the instructions that implement the call are considered to be part of the operating system. The term system call arises from this fact.

c 2000-2014 Steven S. Lumetta. All rights reserved.


25

Interrupts and Exceptions


Unexpected processor interruptions arise both from interactions between a processor and external devices
and from errors or unexpected behavior in the program being executed. The term interrupt is reserved
for asynchronous interruptions generated by other devices, including disk drives, printers, network cards,
video cards, keyboards, mice, and any number of other possibilities. Exceptions occur when a processor
encounters an unexpected opcode or operand. An undefined instruction, for example, gives rise to an
exception, as does an attempt to divide by zero. Exceptions usually cause the current program to terminate,
although many operating systems will allow the program to catch the exception and to handle it more
intelligently. The table below summarizes the characteristics of the two types and compares them to system
calls.
type
interrupt
exception
trap/system call

generated by
external device
invalid opcode or operand
deliberate, via trap instruction

example
packet arrived at network card
divide by zero
print character to console

asynchronous
yes
no
no

unexpected
yes
yes
no

Interrupts occur asynchronously with respect to the program. Most designs only recognize interrupts between
instructions. In other words, the presence of interrupts is checked only after completing an instruction rather
than in every cycle. In pipelined designs, however, instructions execute simultaneously, and the decision as
to which instructions occur before an interrupt and which occur after must be made by the processor.
Exceptions are not asynchronous in the sense that they occur for a particular instruction, thus no decision
need be made as to instruction ordering. After determining which instructions were before an interrupt, a
pipelined processor discards the state of any partially executed instructions that occur after the interrupt
and completes all instructions that occur before. The terminated instructions are simply restarted after
the interrupt completes. Handling the decision, the termination, and the completion, however, significantly
increases the design complexity of the system.
The code associated with an interrupt, an exception, or a system call is a form of procedure called a
handler, and is found by looking up the interrupt number, exception number, or trap number in a table
of functions called a vector table. Separate vector tables exist for each type (interrupts, exceptions, and
system calls). Interrupts and exceptions share a need to save all registers and status bits before execution
of the corresponding handler code (and to restore those values afterward). Generally, the valuesincluding
the status word registerare placed on the stack. With system calls, saving and restoring any necessary
state is part of the calling convention. A special return from interrupt instruction is used to return control
from the interrupt handler to the interrupted code; a similar instruction forces the processor back into user
mode when returning from a system call.
Interrupts are also interesting in the sense that typical computers often have many interrupt-generating
devices but only a few interrupts. Interrupts are prioritized by number, and only an interrupt with higher
priority can interrupt another interrupt. Interrupts with equal or lower priority are blocked while an interrupt
executes. Some interrupts can also be blocked in some architectures by setting bits in a special-purpose
register called an interrupt mask. While an interrupt number is masked, interrupts of that type are blocked,
and can not occur.
As several devices may generate interrupts with the same interrupt number, interrupt handlers can be
chained together. Each handler corresponds to a particular device. When an interrupt occurs, control is
passed to the handler for the first device, which accesses device registers to determine whether or not that
device generated an interrupt. If it did, the appropriate service is provided. If not, or after the service is
complete, control is passed to the next handler in the chain, which handles interrupts from the second device,
and so forth until the last handler in the chain completes. At this point, registers and processor state are
restored and control is returned to the point at which the interrupt occurred.

c 2000-2014 Steven S. Lumetta. All rights reserved.


26

Control Flow Conditions


Control flow instructions may change the PC, loading it with an address specified by the instruction. Although any addressing mode can be supported, the most common specify an address directly in the instruction, use a register as an address, or use an address relative to a register.
Unconditional control flow instructions typically provided by an ISA include procedure calls and returns,
traps, and jumps. Conditional control flow instructions are branches, and are logically based on status bits
set by two types of instructions: comparisons and bit tests. Comparisons subtract one value from another
to set the status bits, whereas bit tests use an AND operation to check whether certain bits are set or not
in a value.
Many older architectures, such as that used for your labs, implement status bits as special-purpose registers
and implicitly set them for certain instructions. A branch based on R2 being less or equal to R3 can then
be written as shown on the next page. The status bits are set by subtracting R3 from R2 with the function
unit.
CMP
BLE

R2,R3
R1

; R2 < R3 : CN Z 110, R2 = R3 : CN Z 001, R2 > R3 : CN Z 000


; Z XOR C = 1 : P C R1

The status bits are not always implemented as special-purpose registers; instead, they may be kept in
general-purpose registers or not kept at all. For example, the Alpha ISA stores the results of comparisons
in general-purpose registers, and the same branch is instead implemented as follows:
CMPLE
BNE

R4,R2,R3
R4,R1

; R2 R3 : R4 1, R2 > R3 : R4 0
; R4 =
6 0 : P C R1

Finally, status bits can be calculated, used, and discarded within a single instruction, in which case the
branch is written as follows:
BLE

R1,R2,R3

; R2 R3 : P C R1

The three approaches have advantages and disadvantages similar to those discussed in the section on addressing architectures: the first has the shortest instructions, the second is the most general and simplest to
implement, and the third requires the fewest instructions.

Stack Operations
Two types of stack operations are commonly supported. Push and pop are the basic operations in many
older architectures, and values can be placed upon or removed from the stack using these instructions. In
more modern architectures, in which the SP becomes a general-purpose register, push and pop are replaced
with indexed loads and stores, that is, loads and stores using the stack pointer and an offset as the address
for the memory operation. Stack updates are performed using the ALU, subtracting and adding immediate
values from the SP as necessary to allocate and deallocate local storage.
Stack operations serve three purposes in a typical architecture. The first is to support procedure calls, as
illustrated in a previous section. The second is to provide temporary storage during interrupts, as mentioned
earlier.
The third use of stack operations is to support spill code generated by compilers. Compilers first translate
high-level languages into an intermediate representation much like assembly code but with an extremely large
(theoretically infinite) register set. The final translation step translates this intermediate representation into
assembly code for the target architecture, assigning architectural registers as necessary. However, as real
ISAs support only a finite number of registers, the compiler must occasionally spill values into memory. For
example, if ten values are in use at some point in the code, but the architecture has only eight registers, spill
code must be generated to store the remaining two values on the stack and to restore them when they are
needed.

c 2000-2014 Steven S. Lumetta. All rights reserved.


27

I/O
As a final topic for the course, we now consider how a processor connects to other devices to allow input
and output. We have already discussed interrupts, which are a special form of I/O in which only the signal
requesting attention is conveyed to the processor. Communication of data occurs through instructions similar
to loads and stores. A processor is designed with a number of I/O portsusually read-only or write-only
registers to which devices can be attached with opposite semantics. That is, a port is usually written by the
processor and read by a device or written by a device and read by the processor.
The question of exactly how I/O ports are accessed is an interesting one. One option is to create special
instructions, such as the in and out instructions of the x86 architecture. Port addresses can then be specified
in the same way that memory addresses are specified, but use a distinct address space. Just as two sets
of special-purpose registers can be separated by the ISA, such an independent I/O system separates I/O
ports from memory addresses by using distinct instructions for each class of operation.
Alternatively, device registers can be accessed using the same load and store instructions as are used to
access memory. This approach, known as memory-mapped I/O, requires no new instructions for I/O,
but demands that a region of the memory address space be set aside for I/O. The memory words with those
addresses, if they exist, can not be accessed during normal processor operations.

c 2012-2015 Steven S. Lumetta. All rights reserved.


28

ECE120: Introduction to Computer Engineering


Notes Set 4.4
Summary of Part 4 of the Course
With the exception of redundancy and coding, most of the material in this part of the course is drawn from
Patt and Patel Chapters 5 through 7.
In this short summary, we give you lists at several levels of difficulty of what we expect you to be able to do as
a result of the last few weeks of studying (reading, listening, doing homework, discussing your understanding
with your classmates, and so forth).
Well start with the easy stuff. You should recognize all of these terms and be able to explain what they
mean. For the specific circuits, you should be able to draw them and explain how they work.

von Neumann elements


- program counter (PC)
- instruction register (IR)
- memory address register (MAR)
- memory data register (MDR)
- processor data path
- bus
- control signals
instruction processing
- fetch
- decode
- execute
- register transfer language (RTL)
Instruction Set Architecture (ISA)
- encoding
- fields
- operation code (opcode)
- types of instructions
- operations
- data movement
- control flow
- addressing modes
- immediate
- register
- PC-relative
- indirect
- base + offset
systematic decomposition
- sequential
- conditional
- iterative

assemblers and assembly code


- opcode mnemonics
(such as ADD, JMP)
- two-pass process
- symbol table
- pseudo-ops / directives
logic design optimization
- bit-sliced (including multiple
bits per slice)
- serialized
- pipelined logic
- tree-based
control unit design strategies
- hardwired control
- single-cycle
- multi-cycle
- microprogrammed control
- microinstruction
- pipelining (of instruction processing)
error detection and correction
code/sparse representation
code word
bit error
odd/even parity bit
Hamming distance between code words
Hamming distance of a code
Hamming code
SEC-DED

c 2012-2015 Steven S. Lumetta. All rights reserved.


29

We expect you to be able to exercise the following skills:


Map RTL (register transfer language) operations into control signals for a given processor datapath.
Systematically decompose a (simple enough) problem to the level of LC-3 instructions.
Encode LC-3 instructions into machine code.
Read and understand programs written in LC-3 assembly/machine code.
Test and debug a small program in LC-3 assembly/machine code.
Be able to calculate the Hamming distance of a code/representation.
Know the relationships between Hamming distance and the abilities to detect and to correct bit errors.
We expect that you will understand the concepts and ideas to the extent that you can do the following:
Explain the basic organization of a computers microarchitecture as well as the role played by elements
of a von Neumann design in the processing of instructions.
Identify the stages of processing an instruction (such as fetch, decode, getting operands, execution,
and writing back results) in a processor control unit state machine diagram.
Explain the role of different types of instructions in allowing a programmer to express a computation.
Explain the importance of the three types of subdivisions in systematic decomposition (sequential,
conditional, and iterative).
Explain the process of transforming assembly code into machine code (that is, explain how an assembler
works, including describing the use of the symbol table).
Be able to use parity for error detection, and Hamming codes for error correction.
At the highest level, we hope that, while you do not have direct substantial experience in this regard from
our class, that you will nonetheless be able to begin to do the following when designing combinational logic:
Design and compare implementations using gates, decoders, muxes, and/or memories as appropriate,
and including reasoning about the relevant design tradeoffs in terms of area and delay.
Design and compare implementation as a bit-sliced, serial, pipelined, or tree-based design, again including reasoning about the relevant design tradeoffs in terms of area and delay.
Design and compare implementations of processor control units using both hardwired and microprogrammed strategies, and again including reasoning about the relevant design tradeoffs in terms of area
and delay.
Understand basic tradeoffs in the sparsity of code words with error detection and correction capabilities.

30

c 2012-2015 Steven S. Lumetta. All rights reserved.

You might also like