Ece120 Notes PDF
Ece120 Notes PDF
The halting problem is easy to state and easy to prove undecidable. The problem is this: given a Turing
machine and an input to the Turing machine, does the Turing machine finish computing in a finite number
of steps (a finite amount of time)? In order to solve the problem, an answer, either yes or no, must be given
in a finite amount of time regardless of the machine or input in question. Clearly some machines never finish.
For example, we can write a Turing machine that counts upwards starting from one.
You may find the proof structure for undecidability of the halting problem easier to understand if you first
think about a related problem with which you may already be familiar, the Liars paradox (which is at least
2,300 years old). In its stengthened form, it is the following sentence: This sentence is not true.
To see that no Turing machine can solve the halting problem, we
begin by assuming that such a machine exists, and then show that
yes
Turing
Halting
or
machine +
its existence is self-contradictory. We call the machine the Halting
Machine
(HM)
no
inputs
Machine, or HM for short. HM is a machine that operates on another
Turing machine and its inputs to produce a yes or no answer in finite time: either the machine in question
finishes in finite time (HM returns yes), or it does not (HM returns no). The figure illustrates HMs
operation.
From HM, we construct a second machine that we call
the HM Inverter, or HMI. This machine inverts the sense
of the answer given by HM. In particular, the inputs are
fed directly into a copy of HM, and if HM answers yes,
HMI enters an infinite loop. If HM answers no, HMI
halts. A diagram appears to the right.
Turing
machine +
inputs
Halting
Machine (HM)
HM
said yes?
no
done
yes
Halting Machine
count forever
Inverter (HMI)
The inconsistency can now be seen by asking HM whether
HMI halts when given itself as an input (repeatedly), as
shown below. Two copies of HM are thus being asked the same question. One copy is the rightmost in the
figure below and the second is embedded in the HMI machine that we are using as the input to the rightmost
HM. As the two copies of HM operate on the same input (HMI operating on HMI), they should return the
same answer: a Turing machine either halts on an input, or it does not; they are deterministic.
...
HMI
HMI
Halting
Machine (HM)
HM
said yes?
yes
Halting Machine
Inverter (HMI)
no
done
Halting
Machine (HM)
yes
or
no
count forever
Lets assume that the rightmost HM tells us that HMI operating on itself halts. Then the copy of HM in
HMI (when HMI executes on itself, with itself as an input) must also say yes. But this answer implies
that HMI doesnt halt (see the figure above), so the answer should have been no!
Alternatively, we can assume that the rightmost HM says that HMI operating on itself does not halt. Again,
the copy of HM in HMI must give the same answer. But in this case HMI halts, again contradicting our
assumption.
Since neither answer is consistent, no consistent answer can be given, and the original assumption that HM
exists is incorrect. Thus, no Turing machine can solve the halting problem.
Using more bits to represent the answer is not an attractive solution, since we might then
want to use more bits for the inputs, which in turn requires more bits for the outputs,
and so on. We cannot build something supporting an infinite number of bits. Instead, we
choose a value for N and build an add unit that adds two N -bit numbers and produces
an N -bit sum (and some overflow indicators, which we discuss in the next set of notes).
The diagram to the right shows how we might draw such a device, with two N -bit numbers
entering at from the top, and the N -bit sum coming out from the bottom.
Nbit
add unit
N
The function used for N -bit unsigned addition is addition modulo 2N . In a practical sense,
you can think of this function as simply keeping the last N bits of the answer; other bits
are simply discarded. In the example to the right, we add 12 and 6 to obtain 18, but then
discard the extra bit on the left, so the add unit produces 2 (an overflow).
Modular arithmetic defines a way of performing arithmetic
for a finite number of possible values, usually integers. As a
concrete example, lets use modulo 16, which corresponds to
the addition unit for our 4-bit examples.
...
16
1 0
a second group
of numbers
1100 (12)
+ 0110 (6)
10010 (2)
15 16
one group
of numbers
31
...
a third group
of numbers
Starting with the full range of integers, we can define equivalence classes for groups of 16 integers by simply breaking up
the number line into contiguous groups, starting with the numbers 0 to 15, as shown to the right. The
numbers -16 to -1 form a group, as do the numbers from 16 to 31. An infinite number of groups are defined
in this manner.
You can think of these groups as defining equivalence classes modulo 16. All of the first numbers in the
groups are equivalent modulo 16. All of the second numbers in the groups are equivalent modulo 16. And
so forth. Mathematically, we say that two numbers A and B are equivalent modulo 16, which we write as
(A = B) mod 16
if and only if A = B + 16k for some integer k.
It is worth noting that equivalence as defined by a particular modulus distributes over addition and multiplication. If, for example, we want to find the equivalence class for (A + B) mod 16, we can find the equivalence
classes for A (call it C) and B (call it D) and then calculate the equivalence class of (C + D) mod 16.
As a concrete example of distribution over multiplication, given (A = 1, 083, 102, 112 7, 323, 127) mod 10,
find A. For this problem, we note that the first number is equivalent to 2 mod 10, while the second number
is equivalent to 7 mod 10. We then write (A = 2 7) mod 10, and, since 2 7 = 14, we have (A = 4) mod 10.
Deriving 2s Complement
Given these equivalence classes, we might
instead choose to draw a circle to identify the equivalence classes and to associate
each class with one of the sixteen possible
4-bit patterns, as shown to the right. Using this circle representation, we can add by
counting clockwise around the circle, and
we can subtract by counting in a counterclockwise direction around the circle. With
an unsigned representation, we choose to
use the group from [0, 15] (the middle group
in the diagram markings to the right) as
the number represented by each of the patterns. Overflow occurs with unsigned addition (or subtraction) because we can only
choose one value for each binary pattern.
0001
1111
1110
..., 3, 13, 29, ...
1101
1100
equivalence classes
modulo 16
(binary patterns inside circle)
0100
0101
1011
..., 11, 5, 21
1010
..., 6, 10, 26, ...
0111
1001
..., 10, 6, 22
1000
..., 9, 7, 23
..., 8, 8, 24, ...
In fact, we can choose any single value for each pattern to create a representation, and our add unit will
always produce results that are correct modulo 16. Look back at our overflow example, where we added 12
and 6 to obtain 2, and notice that (2 = 18) mod 16. Normally, only a contiguous sequence of integers makes
a useful representation, but we do not have to restrict ourselves to non-negative numbers.
The 2s complement representation can then be defined by choosing a set of integers balanced around zero
from the groups. In the circle diagram, for example, we might choose to represent numbers in the range
[7, 7] when using 4 bits. What about the last pattern, 1000? We could choose to represent either -8 or 8.
The number of arithmetic operations that overflow is the same with both choices (the choices are symmetric
around 0, as are the combinations of input operands that overflow), so we gain nothing in that sense from either choice. If we choose to represent -8, however, notice that all patterns starting with a 1 bit then represent
negative numbers. No such simple check arises with the opposite choice, and thus an N -bit 2s complement
representation is defined to represent the range [2N 1 , 2N 1 1], with patterns chosen as shown in the circle.
An Algebraic Approach
Some people prefer an algebraic approach to understanding the definition of 2s complement, so we present
such an approach next. Lets start by writing f (A, B) for the result of our add unit:
f (A, B) = (A + B) mod 2N
We assume that we want to represent a set of integers balanced around 0 using our signed representation, and
that we will use the same binary patterns as we do with an unsigned representation to represent non-negative
numbers. Thus, with an N -bit representation, the patterns in the range [0, 2N 1 1] are the same as those
used with an unsigned representation. In this case, we are left with all patterns beginning with a 1 bit.
The question then is this: given an integer k, 2N 1 > k > 0, for which we want to find a pattern to
represent k, and any integer m 0 that we might want to add to k, can we find another integer p > 0
such that
(k + m = p + m) mod 2N
(1)
If we can, we can use ps representation to represent k and our unsigned addition unit f (A, B) will work
correctly.
To find the value p, start by subtracting m from both sides of Equation (1) to obtain:
(k = p) mod 2N
(2)
Note that (2N = 0) mod 2N , and add this equation to Equation (2) to obtain
(2N k = p) mod 2N
Let p = 2N k. For example, if N = 4, k = 3 gives p = 16 3 = 13, which is the pattern 1101. With N = 4
and k = 5, we obtain p = 16 5 = 11, which is the pattern 1011. In general, since 2N 1 > k > 0, we
have 2N 1 < p < 2N . But these patterns are all unusedthey all start with a 1 bit!so the patterns that
we have defined for negative numbers are disjoint from those that we used for positive numbers, and the
meaning of each pattern is unambiguous. The algebraic definition of bit patterns for negative numbers also
matches our circle diagram from the last section exactly, of course.
Overflow Conditions
This set of notes discusses the overflow conditions for unsigned and 2s complement addition. For both
types, we formally prove that the conditions that we state are correct. Many of our faculty want our students to learn to construct formal proofs, so we plan to begin exposing you to this process in our classes.
Prof. Lumetta is a fan of Prof. George Polyas educational theories with regard to proof techniques, and
in particular the idea that one builds up a repertoire of approaches by seeing the approaches used in practice.
Proof: Lets start with the if direction. In other words, cN = 1 implies overflow. Recall that unsigned
addition is the same as base 2 addition, except that we discard bits beyond cN 1 from the sum C. The
bit cN has place value 2N , so, when cN = 1 we can write that the correct sum C 2N . But no value that
large can be represented using the N -bit unsigned representation, so we have an overflow.
The other direction (only if) is slightly more complex: we need to show that overflow implies that cN = 1.
We use a range-based argument for this purpose. Overflow means that the sum C is outside the representable
range [0, 2N 1]. Adding two non-negative numbers cannot produce a negative number, so the sum cant
be smaller than 0. Overflow thus implies that C 2N .
Does that argument complete the proof? No, because some numbers, such as 2N +1 , are larger than 2N , but
do not have a 1 bit in the N th position when written in binary. We need to make use of the constraints
on A and B implied by the possible range of the representation.
In particular, given that A and B are represented as N -bit unsigned values, we can write
0 A
0 B
2N 1
2N 1
2N +1 2
Combining the new inequality with the one implied by the overflow condition, we obtain
2N
2N +1 2
All of the numbers in the range allowed by this inequality have cN = 1, completing our proof.
A
B
<0
< 2N 1
< 2N 1
But anything in the range specified by this inequality can be represented with N -bit 2s complement, and
thus the addition does not overflow.
1 This common mathematical phrasing means that we are using a problem symmetry to cut down the length of the proof
discussion. In this case, the names A and B arent particularly important, since addition is commutative (A + B = B + A).
Thus the proof for the case in which A is negative (and B is not) is identical to the case in which B is negative (and A is not),
except that all of the names are swapped. The term without loss of generality means that we consider the proof complete
even with additional assumptions, in our case that A < 0 and B 0.
We are now ready to state our main theorem. For convenience, lets use different names for the actual
sum C = A + B and the sum S returned from the add unit. We define S as the number represented by the
bit pattern produced by the add unit. When overflow occurs, S 6= C, but we always have (S = C) mod 2N .
Theorem: Addition of two N -bit 2s complement numbers A and B overflows if and only if one of the
following conditions holds:
1. A < 0 and B < 0 and S 0
2. A 0 and B 0 and S < 0
Proof: We once again start with the if direction. That is, if condition 1 or condition 2 holds, we have
an overflow. The proofs are straightforward. Given condition 1, we can add the two inequalities A < 0 and
B < 0 to obtain C = A + B < 0. But S 0, so clearly S 6= C, thus overflow has occurred.
Similarly, if condition 2 holds, we can add the inequalities A 0 and B 0 to obtain C = A + B 0. Here
we have S < 0, so again S 6= C, and we have an overflow.
We must now prove the only if direction, showing that any overflow implies either condition 1 or condition 2.
By the contrapositive2 of our Lemma, we know that if an overflow occurs, either both operands are negative,
or they are both positive.
Lets start with the case in which both operands are negative, so A < 0 and B < 0, and thus the real
sum C < 0 as well. Given that A and B are represented as N -bit 2s complement, they must fall in the
representable range, so we can write
2N 1
2N 1
A
B
<0
<0
<0
Given that an overflow has occurred, C must fall outside of the representable range. Given that C < 0, it
cannot be larger than the largest possible number representable using N -bit 2s complement, so we can write
2N
< 2N 1
< 2N 1
This range of integers falls within the representable range for N -bit 2s complement, so we can replace the
middle expression with S (equal to C modulo 2N ) to find that
0 S
< 2N 1
Thus, if we have an overflow and both A < 0 and B < 0, the resulting sum S 0, and condition 1 holds.
The proof for the case in which we observe an overflow when both operands are non-negative (A 0 and
B 0) is similar, and leads to condition 2. We again begin with inequalities for A and B:
0 A
0 B
< 2N 1
< 2N 1
C<
2N
we have a statement of the form (p implies q), its contrapositive is the statement (not q implies not p). Both statements
have the same truth value. In this case, we can turn our Lemma around as stated.
10
Given that an overflow has occurred, C must fall outside of the representable range. Given that C 0, it
cannot be smaller than the smallest possible number representable using N -bit 2s complement, so we can
write
2N 1
< 2N
C 2N
<0
This range of integers falls within the representable range for N -bit 2s complement, so we can replace the
middle expression with S (equal to C modulo 2N ) to find that
2N 1 S
<0
Thus, if we have an overflow and both A 0 and B 0, the resulting sum S < 0, and condition 2 holds.
Thus overflow implies either condition 1 or condition 2, completing our proof.
11
Truth Tables
You have seen the basic form of truth tables in the textbook and in class. Over
the semester, we will introduce several extensions to the basic concept, mostly with
the goal of reducing the amount of writing necessary when using truth tables. For
example, the truth table to the right uses two generalizations to show the carry
out C (also the unsigned overflow indicator) and the sum S produced by adding
two 2-bit unsigned numbers. First, rather than writing each input bit separately,
we have grouped pairs of input bits into the numbers A and B. Second, we have
defined multiple output columns so as to include both bits of S as well as C in the
same table. Finally, we have grouped the two bits of S into one column.
inputs
A B
00 00
00 01
00 10
00 11
01 00
01 01
01 10
01 11
10 00
10 01
10 10
10 11
11 00
11 01
11 10
11 11
outputs
C
S
0
00
0
01
0
10
0
11
0
01
0
10
0
11
1
00
0
10
0
11
1
00
1
01
0
11
1
00
1
01
1
10
Keep in mind as you write truth tables that only rarely does an operation correspond
to a simple and familiar process such as addition of base 2 numbers. We had to
choose the unsigned and 2s complement representations carefully to allow ourselves
to take advantage of a familiar process. In general, for each line of a truth table for
an operation, you may need to make use of the input representation to identify the
input values, calculate the operations result as a value, and then translate the value
back into the correct bit pattern using the output representation. Signed magnitude addition, for example, does not always correspond to base 2 addition: when the
signs of the two input operands differ, one should instead use base 2 subtraction. For other operations or
representations, base 2 arithmetic may have no relevance at all.
12
Function
AND
Notation
A AND B
AB
AB
AB
AB
Explanation
the all function: result is 1 iff
all input operands are equal to 1
OR
A OR B
A+B
AB
NOT
NOT A
A
A
A
logical complement/negation:
NOT 0 is 1, and NOT 1 is 0
XOR
exclusive OR
A XOR B
AB
English
or
A, B, or C
Schematic
A
B
AB
A+B
A XOR B
13
Last among the Boolean logic functions, we have the XOR, or exclusive OR function. Think of XOR as
the odd function: given a set of input values as operands, XOR evaluates to 1 if and only if an odd number
of the input values are 1. Only two variants of XOR notation are given: the first using the function name,
and the second used with Boolean algebra. Mathematics rarely uses this function.
Finally, we have included the meaning of the word or in English as a separate function entry to enable you
to compare that meaning with the Boolean logic functions easily. Note that many people refer to English
use of the word or as exclusive because one
true value excludes all others from being true. Do
inputs
outputs
not let this human language ambiguity confuse you
A
B
C
ABC
A
+
B
+C A ABC
about XOR! For all logic design purposes, XOR is
0
0
0
0
0
1
0
the odd function.
0 0 1
0
1
1
1
The truth table to the right provides values il0 1 0
0
1
1
1
lustrating these functions operating on three in0 1 1
0
1
1
0
puts. The AND, OR, and XOR functions are all
1 0 0
0
1
0
1
associative(A op B) op C = A op (B op C)
1 0 1
0
1
0
0
and commutativeA op B = B op A, as you
0
1
0
0
1 1 0
may have already realized from their definitions.
1
1
0
1
1 1 1
14
Logical Completeness
Why do we feel that such a short list of functions is enough? If you think about the number of possible
functions on N bits, you might think that we need many more functions to be able to manipulate bits.
With 10 bits, for example, there are 21024 such functions. Obviously, some of them have never been used in
any computer system, but maybe we should define at least a few more logic operations? In fact, we do not
even need XOR. The functions AND, OR, and NOT are sufficient, even if we only allow two input operands
for AND and OR!
The theorem below captures this idea, called logical completeness. In this case, we claim that the set of
functions {AND, OR, NOT} is sufficient to express any operation on any finite number of variables, where
each variable is a bit.
Theorem: Given enough 2-input AND, 2-input OR, and 1-input NOT functions, one can express any
Boolean logic function on any finite number of variables.
The proof of our theorem is by construction. In other words, we show a systematic approach for transforming an arbitrary Boolean logic function on an arbitrary number of variables into a form that uses only
AND, OR, and NOT functions on one or two operands. As a first step, we remove the restriction on the
number of inputs for the AND and OR functions. For this purpose, we state and prove two lemmas, which
are simpler theorems used to support the proof of a main theorem.
Lemma 1: Given enough 2-input AND functions, one can express an AND function on any finite number
of variables.
Proof: We prove the Lemma by induction.1 Denote the number of inputs to a particular AND function
by N .
The base case is N = 2. Such an AND function is given.
To complete the proof, we need only show that, given any
number of AND functions with up to N inputs, we can express an AND function with N + 1 inputs. To do so, we need
merely use one 2-input AND function to join together the
result of an N -input AND function with an additional input,
as illustrated to the right.
1 We
input 1
...
input N
input N+1
15
Lemma 2: Given enough 2-input OR functions, one can express an OR function on any finite number of
variables.
Proof: The proof of Lemma 2 is identical in structure to that of Lemma 1, but uses OR functions instead
of AND functions.
Lets now consider a small subset of functions on N variables. For any such function, you can write out the
truth table for the function. The output of a logic function is just a bit, either a 0 or a 1. Lets consider the
set of functions on N variables that produce a 1 for exactly one combination of the N variables. In other
words, if you were to write out the truth table for such a function, exactly one row in the truth table would
have output value 1, while all other rows had output value 0.
Lemma 3: Given enough AND functions and 1-input NOT functions, one can express any Boolean logic
function that produces a 1 for exactly one combination of any finite number of variables.
Proof: The proof of Lemma 3 is by construction. Let N be the number of variables on which the function
operates. We construct a minterm on these N variables, which is an AND operation on each variable or its
complement. The minterm is specified by looking at the unique combination of variable values that produces
a 1 result for the function. Each variable that must be a 1 is included as itself, while each variable that must
be a 0 is included as the variables complement (using a NOT function). The resulting minterm produces the
desired function exactly. When the variables all match the values for which the function should produce 1,
the inputs to the AND function are all 1, and the function produces 1. When any variable does not match
the value for which the function should produce 1, that variable (or its complement) acts as a 0 input to the
AND function, and the function produces a 0, as desired.
The table below shows all eight minterms for three variables.
A
0
0
0
0
1
1
1
1
inputs
B C
0 0
0 1
1 0
1 1
0 0
0 1
1 0
1 1
ABC
1
0
0
0
0
0
0
0
ABC
0
1
0
0
0
0
0
0
ABC
0
0
1
0
0
0
0
0
outputs
ABC ABC
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
ABC
0
0
0
0
0
1
0
0
ABC
0
0
0
0
0
0
1
0
ABC
0
0
0
0
0
0
0
1
16
inputs
A
0
0
1
1
B
0
1
0
1
outputs
AB
A+B
A NAND B A NOR B
1
1
1
0
1
0
0
0
inputs
A0 B1
0
0
0
0
0
1
0
1
1
0
1
0
1
1
1
1
0
0
0
0
0
1
0
1
1
0
1
0
1
1
1
1
B0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
C
0
0
0
0
0
0
0
1
0
0
1
1
0
1
1
1
outputs
S1 S0
0
0
0
1
1
0
1
1
0
1
1
0
1
1
0
0
1
0
1
1
0
0
0
1
1
1
0
0
0
1
1
0
= A1 A0 B1 B0 + A1 A0 B1 B0 + A1 A0 B1 B0 +
A1 A0 B1 B0 + A1 A0 B1 B0 + A1 A0 B1 B0
S1
= A1 A0 B1 B0 + A1 A0 B1 B0 + A1 A0 B1 B0 +
A1 A0 B1 B0 + A1 A0 B1 B0 + A1 A0 B1 B0 +
A1 A0 B1 B0 + A1 A0 B1 B0
S0
= A1 A0 B1 B0 + A1 A0 B1 B0 + A1 A0 B1 B0 +
A1 A0 B1 B0 + A1 A0 B1 B0 + A1 A0 B1 B0 +
A1 A0 B1 B0 + A1 A0 B1 B0
17
A 01010101
AND B 11110000
01010000
18
int
main ()
{
int answer = 42;
/* the Answer! */
*/
}
For our purposes, a C program consists of a set of variable declarations and a sequence of statements.
Both of these parts are written into a single C function called main, which executes when the program starts.
A simple example appears to the right. The program uses one variable called answer, which it initializes
to the value 42. The program prints a line of output to the monitor for the user, then terminates using
the return statement. Comments for human readers begin with the characters /* (a slash followed by an
asterisk) and end with the characters */ (an asterisk followed by a slash). The C language ignores white
19
space in programs, so we encourage you to use blank lines and extra spacing to make your programs easier
to read.
The variables defined in the main function allow a programmer to associate arbitrary symbolic names
(sequences of English characters, such as sum or product or highScore) with specific types of data,
such as a 16-bit unsigned integer or a double-precision floating-point number. In the example program above,
the variable answer is declared to be a 32-bit 2s complement number.
Those with no programming experience may at first find the difference between variables in algebra and
variables in programs slightly confusing. As a program executes, the values of variables can change from step
to step of execution.
The statements in the main function are executed one by one until the program terminates. Programs are
not limited to simple sequences of statements, however. Some types of statements allow a programmer to
specify conditional behavior. For example, a program might only print out secret information if the users
name is lUmeTTa. Other types of statements allow a programmer to repeat the execution of a group of
statements until a condition is met. For example, a program might print the numbers from 1 to 10, or ask
for input until the user types a number between 1 and 10. The order of statement execution is well-defined
in C, but the statements in main do not necessarily make up an algorithm: we can easily write a C program
that never terminates.
If a program terminates, the main function returns an integer to the operating system, usually by executing
a return statement, as in the example program. By convention, returning the value 0 indicates successful
completion of the program, while any non-zero value indicates a program-specific error. However, main is
not necessarily a function in the mathematical sense because the value returned from main is not necessarily
unique for a given set of input values to the program. For example, we can write a program that selects a
number from 1 to 10 at random and returns the number to the operating system.
Data Types
As you know, modern digital computers represent all information with binary digits (0s and 1s), or bits.
Whether you are representing something as simple as an integer or as complex as an undergraduate thesis,
the data are simply a bunch of 0s and 1s inside a computer. For any given type of information, a human
selects a data type for the information. A data type (often called just a type) consists of both a size
in bits and a representation, such as the 2s complement representation for signed integers, or the ASCII
representation for English text. A representation is a way of encoding the things being represented as a
set of bits, with each bit pattern corresponding to a unique object or thing.
A typical ISA supports a handful of data types in hardware in the sense that it provides hardware support for
operations on those data types. The arithmetic logic units (ALUs) in most modern processors, for example,
support addition and subtraction of both unsigned and 2s complement representations, with the specific
data type (such as 16- or 64-bit 2s complement) depending on the ISA. Data types and operations not
supported by the ISA must be handled in software using a small set of primitive operations, which form the
instructions available in the ISA. Instructions usually include data movement instructions such as loads
and stores and control instructions such as branches and subroutine calls in addition to arithmetic and logic
operations. The last quarter of our class covers these concepts in more detail and explores their meaning
using an example ISA from the textbook.
In class, we emphasized the idea that digital systems such as computers do not interpret the meaning of
bits. Rather, they do exactly what they have been designed to do, even if that design is meaningless. If,
for example, you store a sequence of ASCII characters in a computers memory as and then write computer
instructions to add consecutive groups of four characters as 2s complement integers and to print the result
to the screen, the computer will not complain about the fact that your code produces meaningless garbage.
In contrast, high-level languages typically require that a programmer associate a data type with each datum
in order to reduce the chance that the bits making up an individual datum are misused or misinterpreted
accidentally. Attempts to interpret a set of bits differently usually generate at least a warning message, since
20
such re-interpretations of the bits are rarely intentional and thus rarely correct. A compilera program
that transforms code written in a high-level language into instructionscan also generate the proper type
conversion instructions automatically when the transformations are intentional, as is often the case with
arithmetic.
Some high-level languages, such as Java, prevent programmers from changing the type of a given datum. If
you define a type that represents one of your favorite twenty colors, for example, you are not allowed to turn
a color into an integer, despite the fact that the color is represented as a handful of bits. Such languages are
said to be strongly typed.
The C language is not strongly typed, and programmers are free to interpret any bits in any manner they
see fit. Taking advantage of this ability in any but a few exceptional cases, however, results in arcane and
non-portable code, and is thus considered to be bad programming practice. We discuss conversion between
types in more detail later in these notes.
Each high-level language defines a number of primitive data types, which are always available. Most
languages, including C, also provide ways of defining new types in terms of primitive types, but we leave
that part of C for ECE 220. The primitive data types in C include signed and unsigned integers of various
sizes as well as single- and double-precision IEEE floating-point numbers.
The primitive integer types in C include both
unsigned and 2s complement representations.
2s complement
unsigned
These types were originally defined so as
8 bits char
unsigned char
to give reasonable performance when code
16 bits short
unsigned short
was ported. In particular, the int type is
short int
unsigned short int
intended to be the native integer type for
32
bits
int
unsigned
the target ISA. Using data types supported
unsigned int
directly in hardware is faster than using
32 or long
unsigned long
larger or smaller integer types. When C
64
bits
long
int
unsigned long int
was standardized in 1989, these types were
64
bits
long
long
unsigned long long
defined so as to include a range of existing
long
long
int
unsigned long long int
C compilers rather than requiring all compilers to produce uniform results. At the
time, most workstations and mainframes were 32-bit machines, while most personal computers were 16-bit
machines, thus flexibility was somewhat desirable. For the GCC compiler on Linux, the C integer data types
are defined in the table above. Although the int and long types are usually the same, there is a semantic
difference in common usage. In particular, on most architectures and most compilers, a long has enough
bits to identify a location in the computers memory, while an int may not. When in doubt, the size in
bytes of any type or variable can be found using the built-in C function sizeof.
Over time, the flexibility of size in C types has become
less important (except for the embedded markets, where
2s complement unsigned
one often wants even more accurate bit-width control), and
8 bits int8 t
uint8 t
the fact that the size of an int can vary from machine to
16 bits int16 t
uint16 t
machine and compiler to compiler has become more a source
32 bits int32 t
uint32 t
of headaches than a helpful feature. In the late 1990s, a
64 bits int64 t
uint64 t
new set of fixed-size types were recommended for inclusion
in the C library, reflecting the fact that many companies had already developed and were using such definitions to make their programs platform-independent. We encourage you to make use of these types, which
are shown in the table above. In Linux, they can be made available by including the stdint.h header file.
Floating-point types in C include float and double, which correspond respectively to single- and doubleprecision IEEE floating-point values. Although the 32-bit float type can save memory compared with use
of 64-bit double values, Cs math library works with double-precision values, and single-precision data are
uncommon in scientific and engineering codes. In contrast, single-precision floating-point operations dominated the graphics industry until recently, and are still well-supported even on todays graphics processing
units.
21
Variable Declarations
The function main executed by a program begins with a list of variable declarations. Each declaration
consists of two parts: a data type specification and a comma-separated list of variable names. Each variable
declared can also be initialized by assigning an initial value. A few examples appear below. Notice that
one can initialize a variable to have the same value as a second variable.
int
int
int
double
x = 42;
y = x;
z;
a, b, c, pi = 3.1416;
What happens if a programmer declares a variable but does not initialize it? Remember that bits can only
be 0 or 1. An uninitialized variable does have a value, but its value is unpredictable. The compiler tries
to detect uses of uninitialized variables, but sometimes it fails to do so, so until you are more familiar with
programming, you should always initialize every variable.
Variable names, also called identifiers, can include both letters and digits in C. Good programming style
requires that programmers select variable names that are meaningful and are easy to distinguish from one
another. Single letters are acceptable in some situations, but longer names with meaning are likely to help
people (including you!) understand your program. Variable names are also case-sensitive in C, which allows programmers to use capitalization to differentiate behavior and meaning, if desired. Some programs,
for example, use identifiers with all capital letters to indicate variables with values that remain constant
for the programs entire execution. However, the fact that identifiers are case-sensitive also means that a
programmer can declare distinct variables named variable, Variable, vaRIable, vaRIabLe, and VARIABLE.
We strongly discourage you from doing so.
1042
-3958
-1000
23
42
40
1002
962
-43
-11
2
62
8000
0
1
0
42
/*
/*
/*
/*
/*
/*
/*
/*
0x00000028
0x000003EA
0x000003C2
0xFFFFFFD5
0xFFFFFFF5
0x00000002
0x0000003E
0x00001F40
*/
*/
*/
*/
*/
*/
*/
*/
/* ...and j is changed!
*/
22
Arithmetic operators in C include addition (+), subtraction (-), negation (a minus sign not preceded by
another expression), multiplication (*), division (/), and modulus (%). No exponentiation operator exists;
instead, library routines are defined for this purpose as well as for a range of more complex mathematical
functions.
C also supports bitwise operations on integer types, including AND (&), OR (|), XOR (^), NOT (), and
left (<<) and right (>>) bit shifts. Right shifting a signed integer results in an arithmetic right shift
(the sign bit is copied), while right shifting an unsigned integer results in a logical right shift (0 bits are
inserted).
A range of relational or comparison operators are available, including equality (==), inequality (!=), and
relative order (<, <=, >=, and >). All such operations evaluate to 1 to indicate a true relation and 0 to
indicate a false relation. Any non-zero value is considered to be true for the purposes of tests (for example,
in an if statement or a while loop) in Cthese statements are explained later in these notes.
Assignment of a new value to a variable uses a single equal sign (=) in C. For example, the expression
A = B copies the value of variable B into variable A, overwriting the bits representing the previous value of A.
The use of two equal signs for an equality check and a single equal sign for assignment is a common source
of errors, although modern compilers generally detect and warn about this type of mistake. Assignment
in C does not solve equations, even simple equations. Writing A-4=B, for example, generates a compiler
error. You must solve such equations yourself to calculate the desired new value of a single variable, such
as A=B+4. For the purposes of our class, you must always write a single variable on the left side of an
assignment, and can write an arbitrary expression on the right side.
Many operators can be combined into a single expression. When an expression has more than one operator,
which operator is executed first? The answer depends on the operators precedence, a well-defined order on
operators that specifies how to resolve the ambiguity. In the case of arithmetic, the C languages precedence
specification matches the one that you learned in elementary school. For example, 1+2*3 evaluates to 7, not
to 9, because multiplication has precedence over addition. For non-arithmetic operators, or for any case in
which you do not know the precedence specification for a language, do not look it upother programmers
will not remember the precedence ordering, either! Instead, add parentheses to make your expressions clear
and easy to understand.
Basic I/O
The main function returns an integer to the operating system. Although we do not discuss how additional
functions can be written in our class, we may sometimes make use of functions that have been written in
advance by making calls to those functions. A function call is type of expression in C, but we leave further
description for ECE 220. In our class, we make use of only two additional functions to enable our programs
to receive input from a user via the keyboard and to write output to the monitor for a user to read.
Lets start with output. The printf function allows a program to print output to the monitor using a
programmer-specific format. The f in printf stands for formatted.1 When we want to use printf, we
write a expression with the word printf followed by a parenthesized, comma-separated list of expressions.
The expressions in this list are called the arguments to the printf function.
The first argument to the printf function is a format stringa sequence of ASCII characters between
quotation markswhich tells the function what kind of information we want printed to the monitor as well
as how to format that information. The remaining arguments are C expressions that give printf a copy of
any values that we want printed.
How does the format string specify the format? Most of the characters in the format string are simply
printed to the monitor. In the first example shown to on the next page, we use printf to print a hello
message followed by an ASCII newline character to move to the next line on the monitor.
1 The
original, unformatted variant of printing was never available in the C language. Go learn Fortran.
23
a, b; /* example variables */
c;
u;
d;
f;
24
version in the format string, the scanf function tries to convert input from the user into the appropriate
result, then stores the result in memory at the address given by the next argument. The programmer
is responsible for ensuring that the number of conversions in the format string matches the number of
arguments provided (not counting the format string itself). The programmer must also ensure that the type
of information produced by each conversion can be stored at the address passed for that conversionin other
words, the address of a variable with the correct type must be provided. Modern compilers often detect
missing & operators and incorrect variable types, but many only give warnings to the programmer. The
scanf function itself cannot tell whether the arguments given to it are valid or not.
If a conversion failsfor example, if a user types hello when scanf expects an integerscanf does not
overwrite the corresponding variable and immediately stops trying to convert input. The scanf function
returns the number of successful conversions, allowing a programmer to check for bad input from the user.
Types of Statements in C
Each statement in a C program specifies a complete operation. There are three types of statements, but
two of these types can be constructed from additional statements, which can in turn be constructed from
additional statements. The C language specifies no bound on this type of recursive construction, but code
readability does impose a practical limit.
The three types are shown to the right. They
are the null statement, simple statements, and compound statements. A
null statement is just a semicolon, and a
compound statement is just a sequence of
statements surrounded by braces.
A = B;
/* examples of simple statements
printf ("Hello, world!\n");
*/
*/
*/
*/
C = D;
N = 4;
L = D - N;
/* a compound statement
/* (a sequence of statements
/* between braces)
sequential
Remember that after variable declarations, the main function contains a sequence of statements. These statements are executed one at a time in the order given in the program,
as shown to the right for two statements. We say that the statements are executed in
sequential order.
first
subtask
A program must also be able to execute statements only when some condition holds. In
the C language, such a condition can be an arbitrary expression. The expression is first
evaluated. If the result is 0, the condition is considered to be false. Any result other
than 0 is considered to be true. The C statement for conditional execution is called an if
statement. Syntactically, we put the expression for the condition in
parentheses after the keyword if and follow the parenthesized expresconditional
sion with a compound statement containing the statements that should
if
be executed when the condition is true. Optionally, we can append the
keyword else and a second compound statement containing statements
Y
does some
condition hold?
to be executed when the condition evaluates to false. The corresponding flow chart is shown to the right.
then
subtask when
condition holds
second
subtask
else
subtask when
condition does
not hold
25
If instead we chose to assign the absolute value of variable x to itself, we can do so without an else block:
/* Set the variable x to its absolute value. */
if (0 > x) {
/* Is x less than 0? */
x = -x;
/* Then block: assign negative x to x. */
}
/* No else block is given--no work is needed. */
iterative
for
initialize for
first iteration
condition holds
for iterating?
Y
subtask for
one iteration
update for
next iteration
*/
Program Execution
We are now ready to consider
the execution of a simple program, illustrating how variables change value from step
to step and determine program behavior.
int
main ()
{
int check;
int friend;
*/
/* We have friendship! */
printf ("%d and %d are friends.\n", check, friend);
}
}
}
}
26
The program uses two integer variables, one for each of the numbers that we consider. We use a for loop
to iterate over all values of our first number, which we call check. The loop initializes check to 0, continues
until check reaches 8, and adds 1 to check after each loop iteration. We use a similar for loop to iterate
over all possible values of our second number, which we call friend. For each pair of numbers, we determine
whether they are friends using a bitwise AND operation. If the result is non-zero, they are friends, and we
print a message. If the two numbers are not friends, we do nothing, and the program moves on to consider
the next pair of numbers.
Now lets think about what
after executing...
check is...
and friend is...
happens when this program
(variable declarations) unpredictable bits unpredictable bits
executes. When the procheck = 0
0
unpredictable bits
gram starts, both variables
8 > check
0
unpredictable bits
are filled with random bits,
friend = 0
0
0
so their values are unpre8 > friend
0
0
dictable. The first step is
if (0 != (check & friend))
0
0
the initialization of the first
friend = friend + 1
0
1
for loop, which sets check
8 > friend
0
1
to 0. The condition for that
if (0 != (check & friend))
0
1
loop is 8 > check, which
friend = friend + 1
0
2
is true, so execution enters
(repeat last three lines six more times; number 0 has no friends!)
the loop body and starts to
8 > friend
0
8
execute the first statement,
check = check + 1
1
8
which is our second for
8 > check
1
8
loop. The next step is then
friend = 0
1
0
the initialization code for
8 > friend
1
0
the second for loop, which
if (0 != (check & friend))
1
0
sets friend to 0. The confriend = friend + 1
1
1
dition for the second loop is
8 > friend
1
1
8 > friend, which is true,
if (0 != (check & friend))
1
1
so execution enters the loop
printf ...
1
1
body and starts to execute
(our first friend!?)
the first statement, which
is the if statement. Since both variables are 0, the if condition is false, and nothing is printed. Having
finished the loop body for the inner loop (on friend), execution continues with the update rule for that
loopfriend = friend + 1then returns to check the loops condition again. This process repeats, always
finding that the number 0 (in check) is not friends (0 has no friends!) until friend reaches 8, at which
point the inner loop condition becomes false. Execution then moves to the update rule for the first for loop,
which increments check. Check is then compared with 8 to see if the loop is done. Since it is not, we once
again enter the loop body and start the second for loop over. The initialization code again sets friend to 0,
and we move forward as before. As you see above, the first time that we find our if condition to be true is
when both check and friend are equal to 1.
Is that result what you expected? To learn that the number 1 is friends with itself? If so, the program
works. If you assumed that numbers could not be friends with themselves, perhaps we should fix the bug?
We could, for example, add another if statement to avoid printing anything when check == friend.
Our program, you might also realize, prints each pair of friends twice. The numbers 1 and 3, for example,
are printed in both possible orders. To eliminate this redundancy, we can change the initialization in the
second for loop, either to friend = check or to friend = check + 1, depending on how we want to define
friendship (the same question as before: can a number be friends with itself?).
27
C
source
code
C
header
files
C preprocessor
preprocessed
source code
compiler
(strict sense)
intermediate
representation (IR)
ISAdependent
back end
28
The C Preprocessor*
The C language uses a preprocessor to support inclusion of common information (stored in header files) into
multiple source files. The most frequent use of the preprocessor is to enable the unique definition of new
data types and operations within header files that can then be included by reference within source files that
make use of them. This capability is based on the include directive, #include, as shown here:
#include <stdio.h>
#include "my header.h"
The preprocessor also supports integration of compile-time constants into source files before compilation.
For example, many software systems allow the definition of a symbol such as NDEBUG (no debug) to compile
without additional debugging code included in the sources. Two directives are necessary for this purpose:
the define directive, #define, which provides a text-replacement facility, and conditional inclusion (or
exclusion) of parts of a file within #if/#else/#endif directives. These directives are also useful in allowing
a single header file to be included multiple times
without causing problems, as C does not allow re- #if !defined(MY HEADER H)
#define MY HEADER H
definition of types, variables, and so forth, even
/* actual header file material goes here */
if the redundant definitions are identical. Most #endif /* MY HEADER H */
header files are thus wrapped as shown to the
right.
The preprocessor performs a simple linear pass on the source and does not parse or interpret any C syntax.
Definitions for text replacement are valid as soon as they are defined and are performed until they are
undefined or until the end of the original source file. The preprocessor does recognize spacing and will not
replace part of a word, thus #define i 5 will not wreak havoc on your if statements, but will cause
problems if you name any variable i.
Using the text replacement capabilities of the preprocessor does have drawbacks, most importantly in that
almost none of the information is passed on for debugging purposes.
Changing Types in C*
Changing the type of a datum is necessary from time to time, but sometimes a compiler can do the work
for you. The most common form of implicit type conversion occurs with binary arithmetic operations.
Integer arithmetic in C always uses types of at least the size of int, and all floating-point arithmetic uses
double. If either or both operands have smaller integer types, or differ from one another, the compiler
implicitly converts them before performing the operation, and the type of the result may be different from
those of both operands. In general, the compiler selects the final type according to some preferred ordering
in which floating-point is preferred over integers, unsigned values are preferred over signed values, and more
bits are preferred over fewer bits. The type of the result must be at least as large as either argument, but is
also at least as large as an int for integer operations and a double for floating-point operations.
Modern C compilers always extend an integer types bit width before converting from signed to unsigned.
The original C specification interleaved bit width extensions to int with sign changes, thus older compilers
may not be consistent, and implicitly require both types of conversion in a single operation may lead to
portability bugs.
The implicit extension to int can also be confusing in the sense that arithmetic that seems to work on
smaller integers fails with larger ones. For example, multiplying two 16-bit integers set to 1000 and printing
the result works with most compilers because the 32-bit int result is wide enough to hold the right answer.
In contrast, multiplying two 32-bit integers set to 100,000 produces the wrong result because the high bits
of the result are discarded before it can be converted to a larger type. For this operation to produce the
correct result, one of the integers must be converted explicitly (as discussed later) before the multiplication.
29
Implicit type conversions also occur due to assignments. Unlike arithmetic conversions, the final type must
match the left-hand side of the assignment (for example, a variable to which a result is assigned), and the
compiler simply performs any necessary conversion. Since the desired type may be smaller than the type of
the value assigned, information can be lost. Floating-point values are truncated when assigned to integers,
and high bits of wider integer types are discarded when assigned to narrower integer types. Note that a
positive number may become a negative number when bits are discarded in this manner.
Passing arguments to functions can be viewed as a special case of assignment. Given a function prototype,
the compiler knows the type of each argument and can perform conversions as part of the code generated
to pass the arguments to the function. Without such a prototype, or for functions with variable numbers of
arguments, the compiler lacks type information and thus cannot perform necessary conversions, leading to
unpredictable behavior. By default, however, the compiler extends any integer smaller than an int to the
width of an int and converts float to double.
Occasionally it is convenient to use an
explicit type cast to force conver- int
sion from one type to another. Such main ()
casts must be used with caution, as {
int numerator = 10;
they silence many of the warnings that
int denominator = 20;
a compiler might otherwise generate
when it detects potential problems.
printf ("%f\n", numerator / (double)denominator);
One common use is to promote intereturn
0;
gers to floating-point before an arith}
metic operation, as shown to the right.
The type to which a value is to be converted is placed in parentheses in front of the value. In most cases,
additional parentheses should be used to avoid confusion about the precedence of type conversion over other
operations.
30
31
You should recognize all of these terms and be able to explain what they mean. Note that we are not
saying that you should, for example, be able to write down the ASCII representation from memory. In that
example, knowing that it is a 7-bit representation used for English text is sufficient. You can always look up
the detailed definition in practice.
universal computational devices /
computing machines
undecidable
the halting problem
information storage in computers
bits
representation
data type
unsigned representation
2s complement representation
IEEE floating-point representation
ASCII representation
operations on bits
1s complement operation
carry (from addition)
overflow (on any operation)
Boolean logic and algebra
logic functions/gates
truth table
AND/conjunction
OR/disjunction
NOT/logical complement/
(logical) negation/inverter
XOR
logical completeness
minterm
mathematical terms
modular arithmetic
implication
contrapositive
proof approaches: by construction,
by contradiction, by induction
without loss of generality (w.l.o.g.)
32
Defining Optimality
In the notes on logic operations, you learned how to express an arbitrary function on bits as an OR of
minterms (ANDs with one input per variable on which the function operates). Although this approach
demonstrates logical completeness, the results often seem inefficient, as you can see by comparing the
following expressions for the carry out C from the addition of two 2-bit unsigned numbers, A = A1 A0
and B = B1 B0 .
C
=
=
A1 B1 + (A1 + B1 )A0 B0
A1 B1 + A1 A0 B0 + A0 B1 B0
(1)
(2)
A1 A0 B1 B0 + A1 A0 B1 B0 + A1 A0 B1 B0 +
A1 A0 B1 B0 + A1 A0 B1 B0 + A1 A0 B1 B0
(3)
These three expressions are identical in the sense that they have the same truth tablesthey are the same
mathematical function. Equation (1) is the form that we gave when we introduced the idea of using logic
to calculate overflow. In this form, we were able to explain the terms intuitively. Equation (2) results from
distributing the parenthesized OR in Equation (1). Equation (3) is the result of our logical completeness
construction.
Since the functions are identical, does the form actually matter at all? Certainly either of the first two
forms is easier for us to write than is the third. If we think of the form of an expression as a mapping from
the function that we are trying to calculate into the AND, OR, and NOT functions that we use as logical
building blocks, we might also say that the first two versions use fewer building blocks. That observation
does have some truth, but lets try to be more precise by framing a question. For any given function, there
are an infinite number of ways that we can express the function (for example, given one variable A on which
the function depends, you can OR together any number of copies of AA without changing the function).
What exactly makes one expression better than another?
In 1952, Edward Veitch wrote an article on simplifying truth functions. In the introduction, he said, This
general problem can be very complicated and difficult. Not only does the complexity increase greatly with
the number of inputs and outputs, but the criteria of the best circuit will vary with the equipment involved.
Sixty years later, the answer is largely the same: the criteria depend strongly on the underlying technology
(the gates and the devices used to construct the gates), and no single metric, or way of measuring, is
sufficient to capture the important differences between expressions in all cases.
Three high-level metrics commonly used to evaluate chip designs are cost, power, and performance. Cost
usually represents the manufacturing cost, which is closely related to the physical silicon area required for the
design: the larger the chip, the more expensive the chip is to produce. Power measures energy consumption
over time. A chip that consumes more power means that a users energy bill is higher and, in a portable
device, either that the device is heavier or has a shorter battery life. Performance measures the speed at
which the design operates. A faster design can offer more functionality, such as supporting the latest games,
or can just finish the same work in less time than a slower design. These metrics are sometimes related: if
a chip finishes its work, the chip can turn itself off, saving energy.
How do such high-level metrics relate to the problem at hand? Only indirectly in practice. There are
too many factors involved to make direct calculations of cost, power, or performance at the level of logic
expressions. Finding an optimal solutionthe best formulation of a specific logic function for a given
metricis often impossible using the computational resources and algorithms available to us. Instead, tools
typically use heuristic approaches to find solutions that strike a balance between these metrics. A heuristic
approach is one that is believed to yield fairly good solutions to a problem, but does not necessarily find an
optimal solution. A human engineer can typically impose constraints, such as limits on the chip area or
limits on the minimum performance, in order to guide the process. Human engineers may also restructure
the implementation of a larger design, such as a design to perform floating-point arithmetic, so as to change
the logic functions used in the design.
Today, manipulation of logic expressions for the purposes of optimization is performed almost entirely by
computers. Humans must supply the logic functions of interest, and must program the acceptable transformations between equivalent forms, but computers do the grunt work of comparing alternative formulations
and deciding which one is best to use in context.
Although we believe that hand optimization of Boolean expressions is no longer an important skill for
our graduates, we do think that you should be exposed to the ideas and metrics historically used for such
optimization. The rationale for retaining this exposure is threefold. First, we believe that you still need to be
able to perform basic logic reformulations (slowly is acceptable) and logical equivalence checking (answering
the question, Do two expressions represent the same function?). Second, the complexity of the problem is
a good way to introduce you to real engineering. Finally, the contextual information will help you to develop
a better understanding of finite state machines and higher-level abstractions that form the core of digital
systems and are still defined directly by humans today.
Towards that end, we conclude this introduction by discussing two metrics that engineers traditionally used
to optimize logic expressions. These metrics are now embedded in computer-aided design (CAD) tools
and tuned to specific underlying technologies, but the reasons for their use are still interesting.
The first metric of interest is a heuristic for the area needed for a design. The measurement is simple: count
the number of variable occurrences in an expression. Simply go through and add up how many variables
you see. Using our example function C, Equation (1) gives a count of 6, Equation (2) gives a count of 8,
and Equation (3) gives a count of 24. Smaller numbers represent better expressions, so Equation (1) is
the best choice by this metric. Why is this metric interesting? Recall how gates are built from transistors.
An N -input gate requires roughly 2N transistors, so if you count up the number of variables in the expression,
you get an estimate of the number of transistors needed, which is in turn an estimate for the area required
for the design.
A variation on variable counting is to add the number of operations, since each gate also takes space for
wiring (within as well as between gates). Note that we ignore the number of inputs to the operations, so
a 2-input AND counts as 1, but a 10-input AND also counts as 1. We do not usually count complementing
variables as an operation for this metric because the complements of variables are sometimes available at
no extra cost in gates or wires. If we add the number of operations in our example, we get a count of 10
for Equation (1)two ANDs, two ORs, and 6 variables, a count of 12 for Equation (2)three ANDS, one
OR, and 8 variables, and a count of 31 for Equation (3)six ANDs, one OR, and 24 variables. The relative
differences between these equations are reduced when one counts operations.
A second metric of interest is a heuristic for the performance of a design. Performance is inversely related
to the delay necessary for a design to produce an output once its inputs are available. For example, if you
know how many seconds it takes to produce a result, you can easily calculate the number of results that
can be produced per second, which measures performance. The measurement needed is the longest chain of
operations performed on any instance of a variable. The complement of a variable is included if the variables
complement is not available without using an inverter. The rationale for this metric is that gate outputs do
not change instantaneously when their inputs change. Once an input to a gate has reached an appropriate
voltage to represent a 0 or a 1, the transistors in the gate switch (on or off) and electrons start to move.
Only when the output of the gate reaches the appropriate new voltage can the gates driven by the output
start to change. If we count each function/gate as one delay (we call this time a gate delay), we get an
estimate of the time needed to compute the function. Referring again to our example equations, we find
that Equation (1) requires 3 gate delays, Equation (2) requires 2 gate delays, Equation (3) requires 2 or 3
gate delays, depending on whether we have variable complements available. Now Equation (2) looks more
attractive: better performance than Equation (1) in return for a small extra cost in area.
Heuristics for estimating energy use are too complex to introduce at this point, but you should be aware that
every time electrons move, they generate heat, so we might favor an expression that minimizes the number
of bit transitions inside the computation. Such a measurement is not easy to calculate by hand, since you
need to know the likelihood of input combinations.
Terminology
We use many technical terms when we talk about simplification of logic expressions, so we now introduce
those terms so as to make the description of the tools and processes easier to understand.
Lets assume that we have a logic function F (A, B, C, D) that we want to express concisely. A literal in an
expression of F refers to either one of the variables or its complement. In other words, for our function F ,
the following is a complete set of literals: A, A, B, B, C, C, D, and D.
When we introduced the AND and OR functions, we also introduced notation borrowed from arithmetic,
using multiplication to represent AND and addition to represent OR. We also borrow the related terminology,
so a sum in Boolean algebra refers to a number of terms ORd together (for example, A + B, or AB + CD),
and a product in Boolean algebra refers to a number of terms ANDd together (for example, AD, or
AB(C + D). Note that the terms in a sum or product may themselves be sums, products, or other types of
expressions (for example, A B).
The construction method that we used to demonstrate logical completeness made use of minterms for each
input combination for which the function F produces a 1. We can now use the idea of a literal to give
a simpler definition of minterm: a minterm for a function on N variables is a product (AND function)
of N literals in which each variable or its complement appears exactly once. For our function F , examples
of minterms include ABCD, ABCD, and ABCD. As you know, a minterm produces a 1 for exactly one
combination of inputs.
When we sum minterms for each output value of 1 in a truth table to express a function, as we did to obtain
Equation (3), we produce an example of the sum-of-products form. In particular, a sum-of-products (SOP)
is a sum composed of products of literals. Terms in a sum-of-products need not be minterms, however.
Equation (2) is also in sum-of-products form. Equation (1), however, is not, since the last term in the sum
is not a product of literals.
Analogously to the idea of a minterm, we define a maxterm for a function on N variables as a sum (OR
function) of N literals in which each variable or its complement appears exactly once. Examples for F
include (A + B + C + D), (A + B + C + D), and (A + B + C + D). A maxterm produces a 0 for exactly one
combination of inputs. Just as we did with minterms, we can multiply a maxterm corresponding to each
input combination for which a function produces 0 (each row in a truth table that produces a 0 output)
to create an expression for the function. The resulting expression is in a product-of-sums (POS) form:
a product of sums of literals. The carry out function that we used to produce Equation (3) has 10 input
combinations that produce 0, so the expression formed in this way is unpleasantly long:
C
However, the approach can be helpful with functions that produce mostly 1s. The literals in maxterms are
complemented with respect to the literals used in minterms. For example, the maxterm (A1 + A0 + B1 + B0 )
in the equation above produces a zero for input combination A1 = 1, A0 = 1, B1 = 0, B0 = 0.
An implicant G of a function F is defined to be a second function operating on the same variables for which
the implication G F is true. In terms of logic functions that produce 0s and 1s, if G is an implicant of F ,
the input combinations for which G produces 1s are a subset of the input combinations for which F produces
1s. Any minterm for which F produces a 1, for example, is an implicant of F .
In the context of logic design, the term implicant is used to refer to a single product of literals. In other
words, if we have a function F (A, B, C, D), examples of possible implicants of F include AB, BC, ABCD,
and A. In contrast, although they may technically imply F , we typically do not call expressions such as
(A + B), C(A + D), nor AB + C implicants.
Lets say that we have expressed function F in sum-of-products form. All of the individual product terms
in the expression are implicants of F . As a first step in simplification, we can ask: for each implicant, is it
possible to remove any of the literals that make up the product? If we have an implicant G for which the
answer is no, we call G a prime implicant of F . In other words, if one removes any of the literals from a
prime implicant G of F , the resulting product is not an implicant of F .
Prime implicants are the main idea that we use to simplify logic expressions, both algebraically and with
graphical tools (computer tools use algebra internallyby graphical here we mean drawings on paper).
A=0
ABC=100
AB=00
AB=10
ABC=001
ABC=101
AB=01
AB=11
ABC=011
ABC=111
A=1
ABC=010
ABC=110
By viewing a functions domain in this way, we can make a connection between a product of literals and
the structure of the domain. Lets use the 3-dimensional version as an example. We call the variables A, B,
and C, and note that the cube has 23 = 8 corners corresponding to the 23 possible combinations of A, B,
and C. The simplest product of literals in this case is 1, which is the product of 0 literals. Obviously, the
product 1 evaluates to 1 for any variable values. We can thus think of it as covering the entire domain of
the function. In the case of our example, the product 1 covers the whole cube. In order for the product 1 to
be an implicant of a function, the function itself must be the function 1.
What about a product consisting of a single literal, such as A or C? The dividing lines in the diagram
illustrate the answer: any such product term evaluates to 1 on a face of the cube, which includes 22 = 4 of
the corners. If a function evaluates to 1 on any of the six faces of the cube, the corresponding product term
(consisting of a single literal) is an implicant of the function.
Continuing with products of two literals, we see that any product of two literals, such as AB or BC,
corresponds to an edge of our 3-dimensional cube. The edge includes 21 = 2 corners. And, if a function
evaluates to 1 on any of the 12 edges of the cube, the corresponding product term (consisting of two literals)
is an implicant of the function.
Finally, any product of three literals, such as ABC, corresponds to a corner of the cube. But for a function
on three variables, these are just the minterms. As you know, if a function evaluates to 1 on any of the 8
corners of the cube, that minterm is an implicant of the function (we used this idea to construct the function
to prove logical completeness).
How do these connections help us to simplify functions? If were careful, we can map cubes onto paper in
such a way that product terms (the possible implicants of the function) usually form contiguous groups of 1s,
allowing us to spot them easily. Lets work upwards starting from one variable to see how this idea works.
The end result is called a Karnaugh map.
The first drawing shown to the right replicates our view of
the 1-dimensional hypercube, corresponding to the domain of a function on one variable, in this case the variable A. To the right of the
hypercube (line segment) are two variants of a Karnaugh map on one
variable. The middle variant clearly indicates the column corresponding to the product A (the other column corresponds to A). The right
variant simply labels the column with values for A.
A=0
The three drawings shown to the right illustrate the three possible
product terms on one variable. The functions shown in these Karnaugh
maps are arbitrary, except that we have chosen them such that each
implicant shown is a prime implicant for the illustrated function.
Lets now look at two-variable functions. We have replicated our drawing of the 2-dimensional hypercube (square)
to the right along with two variants of Karnaugh maps on
two variables. With only two variables (A and B), the
extension is fairly straightforward, since we can use the
second dimension of the paper (vertical) to express the
second variable (B).
A A
A=1
implicant: 1
implicant: A
implicant: A
A
0
AB=10
B
B
AB=01
A A
AB=00
AB=11
A
0
B
B
ABC=100
ABC=001
ABC=101
ABC=011
ABC=111
ABC=010
ABC=110
With three variables, we have 27 possible products of literals. You may have noticed that the count scales
as 3N for N variables; can you explain why? We illustrate several product terms below. Note that we
sometimes need to wrap around the end of the K-map, but that if we account for wrapping, the squares
covered by all product terms are contiguous. Also notice that both the width and the height of all product
terms are powers of two. Any square or rectangle that meets these two constraints corresponds to a product term! And any such square or rectangle that is filled with 1s is an implicant of the function in the K-map.
A
01
11
10
00
01
11
10
00
00
01
11
10
00
01
11
10
00
01
11
10
implicant: 1
implicant: C
implicant: AB
implicant: AC
implicant: ABC
Lets keep going. With a function on four variablesA, B, C, and Dwe can use a Gray code order on two
of the variables in each dimension. Which variables go with which dimension in the grid really doesnt matter,
so well assign AB to the horizontal dimension and CD to the vertical dimension. A few of the 81 possible
product terms are illustrated at the top of the next page. Notice that while wrapping can now occur in both
dimensions, we have exactly the same rule for finding implicants of the function: any square or rectangle (allowing for wrapping) that is filled with 1s and has both height and width equal to (possibly different) powers
of two is an implicant of the function. Furthermore, unless such a square or rectangle is part of a larger
square or rectangle that meets these criteria, the corresponding implicant is a prime implicant of the function.
AB
00
01
11
10
00
01
11
10
AB
AB
00
01
11
10
00
01
11
10
CD
CD
00
01
11
10
00
01
11
implicant: D
10
01
11
10
00
01
11
10
CD
CD
implicant: 1
AB
00
AB
00
01
11
10
00
01
11
10
CD
01
11
10
00
01
11
10
CD
implicant: AB
implicant: BD
00
implicant: ACD
implicant: ABCD
Finding a simple expression for a function using a K-map then consists of solving the following problem:
pick a minimal set of prime implicants such that every 1 produced by the function is covered by at least one
prime implicant. The metric that you choose to minimize the set may vary in practice, but for simplicity,
lets say that we minimize the number of prime implicants chosen.
Lets try a few! The table on the left below reproduces (from Notes Set 1.4) the truth table for addition of
two 2-bit unsigned numbers, A1 A0 and B1 B0 , to produce a sum S1 S0 and a carry out C. K-maps for each
output bit appear to the right. The colors are used only to make the different prime implicants easier to
distinguish. The equations produced by summing these prime implicants appear below the K-maps.
A1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
inputs
A0 B1
0
0
0
0
0
1
0
1
1
0
1
0
1
1
1
1
0
0
0
0
0
1
0
1
1
0
1
0
1
1
1
1
B0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
C
0
0
0
0
0
0
0
1
0
0
1
1
0
1
1
1
outputs
S1 S0
0
0
0
1
1
0
1
1
0
1
1
0
1
1
0
0
1
0
1
1
0
0
0
1
1
1
0
0
0
1
1
0
S1
A1A0
00
01
11
10
00
01
11
10
B1B0
S0
A1A0
00
01
11
10
00
01
11
10
B1B0
A1A0
00
01
11
00
01
11
10
B1B0
A1 B1 + A1 A0 B0 + A0 B1 B0
S1
A1 B1 B0 + A1 A0 B1 + A1 A0 B1 + A1 B1 B0 +
A1 A0 B1 B0 + A1 A0 B1 B0
S0
A0 B0 + A0 B0
10
In theory, K-maps extend to an arbitrary number of variables. Certainly Gray codes can be extended. An
N -bit Gray code is a sequence of N -bit patterns that includes all possible patterns such that any two
adjacent patterns differ in only one bit. The code is actually a cycle: the first and last patterns also differ
in only one bit. You can construct a Gray code recursively as follows: for an (N + 1)-bit Gray code, write
the sequence for an N -bit Gray code, then add a 0 in front of all patterns. After this sequence, append a
second copy of the N -bit Gray code in reverse order, then put a 1 in front of all patterns in the second copy.
The result is an (N + 1)-bit Gray code. For example, the following are Gray codes:
1-bit
2-bit
3-bit
4-bit
0, 1
00, 01, 11, 10
000, 001, 011, 010, 110, 111, 101, 100
0000, 0001, 0011, 0010, 0110, 0111, 0101, 0100, 1100, 1101, 1111, 1110, 1010, 1011, 1001, 1000
Unfortunately, some of the beneficial properties of K-maps do not extend beyond two variables in a dimension. Once you have three variables in one dimension, as is necessary if a function operates on five or
more variables, not all product terms are contiguous in the grid. The terms still require a total number of
rows and columns equal to a power of two, but they dont all need to be a contiguous group. Furthermore,
some contiguous groups of appropriate size do not correspond to product terms. So you can still make use of
K-maps if you have more variables, but their use is a little trickier.
Canonical Forms
What if we want to compare two expressions to determine whether they represent the same logic function?
Such a comparison is a test of logical equivalence, and is an important part of hardware design. Tools
today provide help with this problem, but you should understand the problem.
You know that any given function can be expressed in many ways, and that two expressions that look quite
different may in fact represent the same function (look back at Equations (1) to (3) for an example). But
what if we rewrite the function using only prime implicants? Is the result unique? Unfortunately, no.
In general, a sum of products is not unique (nor is a product of sums), even if the sum
contains only prime implicants.
For example, consensus terms may or may not be included in our expressions. (They
are necessary for reliable design of certain types of systems, as you will learn in a
later ECE class.) The green ellipse in the K-map to the right represents the consensus
term BC.
Z
Z
=
=
BC
00
01
11
10
00
01
11
10
00
A C +A B+B C
A C +A B
CD
CD
00
01
11
10
00
01
01
11
11
10
10
AB
Z
Z
AB
When we need to compare two things (such as functions), we need to transform them into what in mathematics is known as a canonical form, which simply means a form that is defined so as to be unique for
each thing of the given type. What can we use for logic functions? You already know two answers! The
canonical sum of a function (sometimes called the canonical SOP form) is the sum of minterms. The
canonical product of a function (sometimes called the canonical POS form) is the product of maxterms.
These forms technically only meet the mathematical definition of canonical if we agree on an order for the
min/maxterms, but that problem is solvable. However, as you already know, the forms are not particularly
convenient to use. In practice, people and tools in the industry use more compact approaches when comparing functions, but those solutions are a subject for a later class (such as ECE 462).
Two-Level Logic
Two-level logic is a popular way of expressing logic functions. The two levels refer
simply to the number of functions through which an input passes to reach an output, and
both the SOP and POS forms are examples of two-level logic. In this section, we illustrate
one of the reasons for this popularity and show you how to graphically manipulate
expressions, which can sometimes help when trying to understand gate diagrams.
We begin with one of DeMorgans laws, which we can illustrate both algebraically and
graphically: C = B + A = B A
A
B
A
B
Lets say that we have a function expressed in SOP form, such as Z = ABC + DE + F GHJ. The diagram
on the left below shows the function constructed from three AND gates and an OR gate. Using DeMorgans
law, we can replace the OR gate with a NAND with inverted inputs. But the bubbles that correspond to
inversion do not need to sit at the input to the gate. We can invert at any point along the wire, so we slide
each bubble down the wire to the output of the first column of AND gates. Be careful: if the wire splits,
which does not happen in our example, you have to replicate the inverter onto the other output paths as you
slide past the split point! The end result is shown on the right: we have not changed the function, but now
we use only NAND gates. Since CMOS technology only supports NAND and NOR directly, using two-level
logic makes it simple to map our expression into CMOS gates.
A
B
C
D
E
F
G
H
J
A
B
C
A
B
C
Z
first, we replace
this OR gate
using DeMorgans law
D
E
F
G
H
J
E
F
G
H
J
You may want to make use of DeMorgans other law, illustrated graphically to the right,
to perform the same transformation on a POS expression. What do you get?
A
B
Multi-Metric Optimization
As engineers, almost every real problem that you encounter will admit multiple metrics for evaluating possible
designs. Becoming a good engineer thus requires not only that you be able to solve problems creatively so
as to improve the quality of your solutions, but also that you are aware of how people might evaluate those
solutions and are able both to identify the most important metrics and to balance your design effectively
according to them. In this section, we introduce some general ideas and methods that may be of use to you
in this regard. We will not test you on the concepts in this section.
When you start thinking about a new problem, your first step should be to think carefully about metrics
of possible interest. Some important metrics may not be easy to quantify. For example, compatibility of
a design with other products already owned by a customer has frequently defined the success or failure of
computer hardware and software solutions. But how can you compute the compability of your approach as
a number?
Humansincluding engineersare not good at comparing multiple metrics simultaneously. Thus, once you
have a set of metrics that you feel is complete, your next step is to get rid of as many as you can. Towards
this end, you may identify metrics that have no practical impact in current technology, set threshold values
for other metrics to simplify reasoning about them, eliminate redundant metrics, calculate linear sums to
reduce the count of metrics, and, finally, make use of the notion of Pareto optimality. All of these ideas are
described in the rest of this section.
Lets start by considering metrics that we can quantify as real numbers. For a given metric, we can divide
possible measurement values into three ranges. In the first range, all measurement values are equivalently
useful. In the second range, possible values are ordered and interesting with respect to one another. Values
in the third range are all impossible to use in practice. Using power consumption as our example, the first
range corresponds to systems in which when a processors power consumption in a digital system is extremely
low relative to the power consumption of the system. For example, the processor in a computer might use
less than 1% of the total used by the system including the disk drive, the monitor, the power supply, and so
forth. One power consumption value in this range is just as good as any another, and no one cares about the
power consumption of the processor in such cases. In the second range, power consumption of the processor
10
makes a difference. Cell phones use most of their energy in radio operation, for example, but if you own
a phone with a powerful processor, you may have noticed that you can turn off the phone and drain the
battery fairly quickly by playing a game. Designing a processor that uses half as much power lengthens the
battery life in such cases. Finally, the third region of power consumption measurements is impossible: if you
use so much power, your chip will overheat or even burst into flames. Consumers get unhappy when such
things happen.
As a first step, you can remove any metrics for which all solutions are effectively equivalent. Until a little
less than a decade ago, for example, the power consumption of a desktop processor actually was in the first
range that we discussed. Power was simply not a concern to engineers: all designs of interest consumed so
little power that no one cared. Unfortunately, at that point, power consumption jumped into the third range
rather quickly. Processors hit a wall, and products had to be cancelled. Given that the time spent designing
a processor has historically been about five years, a lot of engineering effort was wasted because people had
not thought carefully enough about power (since it had never mattered in the past). Today, power is an
important metric that engineers must take into account in their designs.
However, in some areas, such as desktop and high-end server processors, other metrics (such as performance)
may be so important that we always want to operate at the edge of the interesting range. In such cases,
we might choose to treat a metric such as power consumption as a threshold: stay below 150 Watts for a
desktop processor, for example. One still has to make a coordinated effort to ensure that the system as a
whole does not exceed the threshold, but reasoning about threshold values, a form of constraint, is easier
than trying to think about multiple metrics at once.
Some metrics may only allow discrete quantification. For example, one could choose to define compatibility
with previous processor generations as binary: either an existing piece of software (or operating system) runs
out of the box on your new processor, or it does not. If you want people who own that software to make use
of your new processor, you must ensure that the value of this binary metric is 1, which can also be viewed
as a threshold.
In some cases, two metrics may be strongly correlated, meaning that a design that is good for one of the
metrics is frequently good for the other metric as well. Chip area and cost, for example, are technically
distinct ways to measure a digital design, but we rarely consider them separately. A design that requires a
larger chip is probably more complex, and thus takes more engineering time to get right (engineering time
costs money). Each silicon wafer costs money to fabricate, and fewer copies of a large design fit on one
wafer, so large chips mean more fabrication cost. Physical defects in silicon can cause some chips not to
work. A large chip uses more silicon than a small one, and is thus more likely to suffer from defects (and
not work). Cost thus goes up again for large chips relative to small ones. Finally, large chips usually require
more careful testing to ensure that they work properly (even ignoring the cost of getting the design right,
we have to test for the presence of defects), which adds still more cost for a larger chip. All of these factors
tend to correlate chip area and chip cost, to the point that most engineers do not consider both metrics.
After you have tried to reduce your set of metrics as much as possible, or simplified them by turning them into
thresholds, you should consider turning the last few metrics into a weighted linear sum. All remaining metrics
must be quantifiable in this case. For example, if you are left with three metrics for which a given design has
values A, B, and C, you might reduce these to one metric by calculating D = wA A + wB B + wC C. What
are the w values? They are weights for the three metrics. Their values represent the relative importance of
the three metrics to the overall evaluation. Here weve assumed that larger values of A, B, and C are either
all good or all bad. If you have metrics with different senses, use the reciprocal values. For example, if a
large value of A is good, a small value of 1/A is also good.
The difficulty with linearizing metrics is that not everyone agrees on the weights. Is using less power more
important than having a cheaper chip? The answer may depend on many factors.
When you are left with several metrics of interest, you can use the idea of Pareto optimality to identify
interesting designs. Lets say that you have two metrics. If a design D1 is better than a second design D2
for both metrics, we say that D1 dominates D2 . A design D is then said to be Pareto optimal if no other
design dominates D. Consider the figure on the left below, which illustrates seven possible designs measured
11
with two metrics. The design corresponding to point B dominates the designs corresponding to points A
and C, so neither of the latter designs is Pareto optimal. No other point in the figure dominates B, however,
so that design is Pareto optimal. If we remove all points that do not represent Pareto optimal designs, and
instead include only those designs that are Pareto optimal, we obtain the version shown on the right. These
are points in a two-dimensional space, not a line, but we can imagine a line going through the points, as
illustrated in the figure: the points that make up the line are called a Pareto curve, or, if you have more
than two metrics, a Pareto surface.
good
good
B
B
E
A
D
metric 2
metric 2
C
F
dominated
by point B
bad
bad
bad
good
metric 1
bad
good
metric 1
12
Logic Properties
Table 1 (on the next page) lists a number of properties of Boolean logic. Most of these are easy to derive
from our earlier definitions, but a few may be surprising to you. In particular, in the algebra of real numbers,
multiplication distributes over addition, but addition does not distribute over multiplication. For example,
3 (4 + 7) = (3 4) + (3 7), but 3 + (4 7) 6= (3 + 4) (3 + 7). In Boolean algebra, both operators
distribute over one another, as indicated in Table 1. The consensus properties may also be nonintuitive.
Drawing a K-map may help you understand the consensus property on the right side of the table. For the
consensus variant on the left side of the table, consider that since either A or A must be 0, either B or C
or both must be 1 for the first two factors on the left to be 1 when ANDed together. But in that case, the
third factor is also 1, and is thus redundant.
As mentioned previously, Boolean algebra has an elegant symmetry known as a duality, in which any logic
statement (an expression or an equation) is related to a second logic statement. To calculate the dual
form of a Boolean expression or equation, replace 0 with 1, replace 1 with 0, replace AND with OR, and
replace OR with AND. Variables are not changed when finding the dual form. The dual form of a dual form is
the original logic statement. Be careful when calculating a dual form: our convention for ordering arithmetic
operations is broken by the exchange, so you may want to add explicit parentheses before calculating the
dual. For example, the dual of AB + C is not A + BC. Rather, the dual of AB + C is (A + B)C. Add
parentheses as necessary when calculating a dual form to ensure that the order of operations does not change.
Duality has several useful practical applications. First, the principle of duality states that any theorem
or identity has the same truth value in dual form (we do not prove the principle here). The rows of Table 1
are organized according to this principle: each row contains two equations that are the duals of one another.
Second, the dual form is useful when designing certain types of logic, such as the networks of transistors
connecting the output of a CMOS gate to high voltage and ground. If you look at the gate designs in the
textbook (and particularly those in the exercises), you will notice that these networks are duals. A function/expression is not a theorem nor an identity, thus the principle of duality does not apply to the dual
of an expression. However, if you treat the value 0 as true, the dual form of an expression has the same
truth values as the original (operating with value 1 as true). Finally, you can calculate the complement
of a Boolean function (any expression) by calculating the dual form and then complementing each variable.
13
0A= 0
0+A=A
AA= A
A+A =1
AB = A + B
A B + C = (A + C)(B + C)
A B+A C +B C =A B+A C
DeMorgans laws
distribution
consensus
Table 1: Boolean logic properties. The two columns are dual forms of one another.
When we develop combinational logic designs, we may also choose to leave some aspects unspecified. In
particular, the value of a Boolean logic function to be implemented may not matter for some input combinations. If we express the function as a truth table, we may choose to mark the functions value for some
input combinations as dont care, which is written as x (no quotes).
What is the benefit of using dont care values? Using dont care values allows you to choose from
among several possible logic functions, all of which produce the desired results (as well as some combination
of 0s and 1s in place of the dont care values). Each input combination marked as dont care doubles
the number of functions that can be chosen to implement the design, often enabling the logic needed for
implementation to be simpler.
For example, the K-map to the right specifies a function F (A, B, C) with two dont
care entries. If you are asked to design combinational logic for this function, you can
choose any values for the two dont care entries. When identifying prime implicants,
each x can either be a 0 or a 1.
Depending on the choices made for the xs, we obtain one of the following four functions:
AB
00
01
11
10
00
01
11
10
F
F
=
=
A B+B C
A B+B C +A B C
F
F
=
=
B
B+A C
AB
0
1
Given this set of choices, a designer typically chooses the third: F = B, which corresponds to the K-map
shown to the right of the equations. The design then produces F = 1 when A = 1, B = 1, and C = 0
(ABC = 110), and produces F = 0 when A = 1, B = 0, and C = 0 (ABC = 100). These differences are
marked with shading and green italics in the new K-map. No implementation ever produces an x.
14
Lets start with the assumption that the user only presses one button at a time. In this case, we can treat
input combinations in which more than one button is pressed as dont care values in the truth tables for
the outputs. K-maps for all four output bits appear below. The xs indicate dont care values.
CL [1]
0
CL [0]
LB
00
01
11
10
CM [1]
LB
00
01
11
10
M
1
LB
CM [0]
00
01
11
10
LB
00
01
11
10
When we calculate the logic function for an output, each dont care value can be treated as either 0 or 1,
whichever is more convenient in terms of creating the logic. In the case of CM [1], for example, we can treat
the three xs in the ellipse as 1s, treat the x outside of the ellipse as a 0, and simply use M (the implicant
represented by the ellipse) for CM [1]. The other three output bits are left as an exercise, although the result
appears momentarily.
The implementation at right takes full advantage of the dont care
parts of our specification. In this case, we require no logic at all; we
need merely connect the inputs to the correct outputs. Lets verify the
operation. We have four cases to consider. First, if none of the buttons
are pushed (LBM = 000), we get no ice cream, as desired (CM = 00
and CL = 00). Second, if we request lychee ice cream (LBM = 100),
the outputs are CL = 10 and CM = 00, so we get a full serving of
lychee and no mango. Third, if we request a blend (LBM = 010), the
outputs are CL = 01 and CM = 01, giving us half a serving of each
flavor. Finally, if we request mango ice cream (LBM = 001), we get
no lychee but a full serving of mango.
L
(lychee flavor)
CL[1]
CL[0]
(lychee output control)
B
(blend of two flavors)
M
(mango flavor)
CM[1]
CM[0]
(mango output control)
The K-maps for this implementation appear below. Each of the dont care xs from the original design
has been replaced with either a 0 or a 1 and highlighted with shading and green italics. Any implementation
produces either 0 or 1 for every output bit for every possible input combination.
CL [1]
CL [0]
LB
00
01
11
10
CM [1]
LB
00
01
11
10
CM [0]
LB
00
01
11
10
LB
00
01
11
10
As you can see, leveraging dont care output bits can sometimes significantly simplify our logic. In the
case of this example, we were able to completely eliminate any need for gates! Unfortunately, the resulting
implementation may sometimes produce unexpected results. Based on the implementation, what happens if
a user presses more than one button? The ice cream cup overflows!
Lets see why. Consider the case LBM = 101, in which weve pressed both the lychee and mango buttons.
Here CL = 10 and CM = 10, so our dispenser releases a full serving of each flavor, or two servings total.
Pressing other combinations may have other repercussions as well. Consider pressing lychee and blend
(LBM = 110). The outputs are then CL = 11 and CM = 01. Hopefully the dispenser simply gives us one
and a half servings of lychee and a half serving of mango. However, if the person who designed the dispenser
assumed that no one would ever ask for more than one serving, something worse might happen. In other
words, giving an input of CL = 11 to the ice cream dispenser may lead to other unexpected behavior if its
designer decided that that input pattern was a dont care.
The root of the problem is that while we dont care about the value of any particular output marked x for
any particular input combination, we do actually care about the relationship between the outputs.
What can we do? When in doubt, it is safest to make choices and to add the new decisions to the specification
rather than leaving output values specified as dont care. For our ice cream dispenser logic, rather than
leaving the outputs unspecified whenever a user presses more than one button, we could choose an acceptable
outcome for each input combination and replace the xs with 0s and 1s. We might, for example, decide to
produce lychee ice cream whenever the lychee button is pressed, regardless of other buttons (LBM = 1xx,
15
which means that we dont care about the inputs B and M , so LBM = 100, LBM = 101, LBM = 110,
or LBM = 111). That decision alone covers three of the four unspecified input patterns. We might also decide that when the blend and mango buttons are pushed together (but without the lychee button, LBM=011),
our logic produces a blend. The resulting K-maps are shown below, again with shading and green italics
identifying the combinations in which our original design specified dont care.
CL [1]
CL [0]
LB
00
01
11
10
CM [1]
LB
00
01
11
10
CM [0]
LB
00
01
11
10
LB
00
01
11
10
16
For completeness, the K-maps corresponding to this implementation are given here.
CL [1]
CL [0]
LB
00
01
11
10
CM [1]
LB
00
01
11
10
CM [0]
LB
00
01
11
10
LB
00
01
11
10
17
carry C 0 0 1 1
A
0 0 1
B + 0 0 1
sum S
0 1 1
0
1
0
1
1
0
0
1
1 (0)
1 1
0 1
0 0
information flows
in this direction
Focus now on the addition of a single column. Except for the first
and last bits, which we might choose to handle slightly differently, the
addition process is identical for any column. We add a carry in bit
(possibly 0) with one bit from each of our numbers to produce a sum
bit and a carry out bit for the next column. Column addition is the
task that our bit slice logic must perform.
The diagram to the right shows an abstract model of our adder bit
slice. The inputs from the next least significant bit come in from the
right. We include arrowheads because figures are usually drawn with
inputs coming from the top or left and outputs going to the bottom or
right. Outside of the bit slice logic, we index the carry bits using the
0
1
1
0
M+1
Cout
AM
BM
adder
bit
slice M
S
SM
C in
18
bit number. The bit slice has C M provided as an input and produces C M+1 as an output. Internally, we
use Cin to denote the carry input, and Cout to denote the carry output. Similarly, the bits AM and BM
from the numbers A and B are represented internally as A and B, and the bit SM produced for the sum S is
represented internally as S. The overloading of meaning should not confuse you, since the context (designing
the logic block or thinking about the problem as a whole) should always be clear.
The abstract device for adding three inputs bits and producing two output bits is called a full adder. You
may also encounter the term half adder, which adds only two input bits. To form an N -bit adder, we
integrate N copies of the full adderthe bit slice that we design nextas shown below. The result is called
a ripple carry adder because the carry information moves from the low bits to the high bits slowly, like a
ripple on the surface of a pond.
Cout
AN1
BN1
adder
C in
bit
slice N1
N1
Cout
AN2
BN2
adder
C in
bit
slice N2
...
Cout
A1
B1
adder
bit
slice 1
C in
Cout
A0
B0
adder
bit
slice 0
S N1
S N2
S1
S0
C in
B
0
1
0
1
0
0
1
1
Cin
0
0
1
1
0
1
0
1
Cout
0
0
0
1
0
1
1
1
S
0
1
1
0
1
0
0
1
Cout
AB
00
01
11
10
00
01
11
10
Cout
A B + A Cin + B Cin
A B Cout + A B Cout +
A B Cout + A B Cout
A B Cout
Cin
AB
Cin
The equation for Cout implements a majority function on three bits. In particular, a carry is produced
whenever at least two out of the three input bits (a majority) are 1s. Why do we mention this name?
Although we know that we can build any logic function from NAND gates, common functions such as those
used to add numbers may benefit from optimization. Imagine that in some technology, creating a majority
function directly may produce a better result than implementing such a function from logic gates. In such
a case, we want the person designing the circuit to know that can make use of such an improvement. We
rewrote the equation for S to make use of the XOR operation for a similar reason: the implementation of
XOR gates from transistors may be slightly better than the implementation of XOR based on NAND gates.
If a circuit designer provides an optimized variant of XOR, we want our design to make use of the optimized
version.
19
Cout
Cin
Cout
Cin
The gate diagrams above implement a single bit slice for an adder. The version on the left uses AND and
OR gates (and an XOR for the sum), while the version on the right uses NAND gates, leaving the XOR as
an XOR.
Lets discuss the design in terms of area and speed. As an estimate of area, we can count gates, remembering
that we need two transistors per input on a gate. For each bit, we need three 2-input NAND gates, one
3-input NAND gate, and a 3-input XOR gate (a big gate; around 30 transistors). For speed, we make rough
estimates in terms of the amount of time it takes for a CMOS gate to change its output once its input has
changed. This amount of time is called a gate delay. We can thus estimate our designs speed by simply
counting the maximum number of gates on any path from input to output. For this measurement, using a
NAND/NOR representation of the design is important to getting the right answer. Here we have two gate
delays from any of the inputs to the Cout output. The XOR gate may be a little slower, but none of its
inputs come from other gates anyway. When we connect multiple copies of our bit slice logic together to
form an adder, the A and B inputs to the outputs is not as important as the delay from Cin to the outputs.
The latter delay adds to the total delay of our comparator on a per-bit-slice basisthis propagation delay
gives rise to the name ripple carry. Looking again at the diagram, notice that we have two gate delays
from Cin to Cout . The total delay for an N -bit comparator based on this implementation is thus two gate
delays per bit, for a total of 2N gate delays.
Nbit adder
Cout
C in
S
N
You may already know that most computers have a word size specified as part of the Instruction Set
Architecture. The word size specifies the number of bits in each operand when the computer adds two
numbers, and is often used widely within the microarchitecture as well (for example, to decide the number of
wires to use when moving bits around). Most desktop and laptop machines now have a word size of 64 bits,
but many phone processors (and desktops/laptops a few years ago) use a 32-bit word size. Embedded
microcontrollers may use a 16-bit or even an 8-bit word size.
20
Having seen how we can build an N -bit adder from simple chunks
of logic operating on each pair of bits, you should not have much
difficulty in understanding the diagram to the right. If we start with
a design for an N -bit addereven if that design is not built from
bit slices, but is instead optimized for that particular sizewe can
create a 2N -bit adder by simply connecting two copies of the N -bit
adder. We give the adder for the less significant bits (the one on the
right in the figure) an initial carry of 0, and pass the carry produced
by the adder for the less significant bits into the carry input of the
adder for the more significant bits. We calculate overflow based on
the results of the adder for more significant bits (the one on the left
in the figure), using the method appropriate to the type of operands
we are adding (either unsigned or 2s complement).
Nbit adder
Nbit adder
Cout
Cout
C in
S
C in
S
You should also realize that this connection need not be physical. In other words, if a computer has an N -bit
adder, it can handle operands with 2N bits (or 3N , or 10N , or 42N ) by using the N -bit adder repeatedly,
starting with the least significant bits and working upward until all of the bits have been added. The
computer must of course arrange to have the operands routed to the adder a few bits at a time, and must
ensure that the carry produced by each addition is then delivered to the carry input (of the same adder!) for
the next addition. In the coming months, you will learn how to design hardware that allows you to manage
bits in this way, so that by the end of our class, you will be able to design a simple computer on your own.
21
As humans, we typically start comparing at the most significant bit. After all,
if we find a difference in that bit, we are done, saving ourselves some time. In
the example to the right, we know that A < B as soon as we reach bit 4 and
observe that A4 < B4 . If we instead start from the least significant bit, we
must always look at all of the bits.
B 0 0 0 1 0 0 0 1
B7 B6 B5 B4 B3 B2 B1 B0
lets design logic that
When building hardware to compare all of the bits at once, however, hardware
compares in this direction
for comparing each bit must exist, and the final result must be able to consider
all of the bits. Our choice of direction should thus instead depend on how effectively we can build the
corresponding functions. For a single bit slice, the two directions are almost identical. Lets develop a bit
slice for comparing from least to most significant.
An Abstract Model
Comparison of two numbers, A and B, can produce three possible answers: A < B, A = B, or A > B (one
can also build an equality comparator that combines the A < B and A > B cases into a single answer).
As we move from bit to bit in our design, how much information needs to pass from one bit to the next? Here
you may want to think about how you perform the task yourself. And perhaps to focus on the calculation
for the most significant bit. You need to know the values of the two bits that you are comparing. If those
two are not equal, you are done. But if the two bits are equal, what do you do? The answer is fairly simple:
pass along the result from the less significant bits. Thus our bit slice logic for bit M needs to be able to
accept three possible answers from the bit slice logic for bit M 1 and must be able to pass one of three
possible answers to the logic for bit M + 1. Since log2 (3) = 2, we need two bits of input and two bits of
output in addition to our input bits from numbers A and B.
The diagram to the right shows an abstract model of our comparator bit
A
B
slice. The inputs from the next least significant bit come in from the
right. We include arrowheads because figures are usually drawn with
inputs coming from the top or left and outputs going to the bottom or
A
B
right. Outside of the bit slice logic, we index these comparison bits using
C
Z comparator C
C
the bit number. The bit slice has C1M1 and C0M1 provided as inputs
bit
slice M C
C
Z
C
and produces C1M and C0M as outputs. Internally, we use C1 and C0 to
denote these inputs, and Z1 and Z0 to denote the outputs. Similarly, the
bits AM and BM from the numbers A and B are represented internally simply as A and B. The overloading
of meaning should not confuse you, since the context (designing the logic block or thinking about the problem
as a whole) should always be clear.
M
M
1
M1
1
M
0
M1
0
22
C1
0
0
1
1
C0
0
1
0
1
meaning
A=B
A<B
A>B
not used
A
0
0
1
1
B
0
1
0
1
Z1
0
0
1
0
Z0
0
1
0
0
AB
These forms should also be intuitive, given the representation that we chose: A > B if and only if A = 1
and B = 0; A < B if and only if A = 0 and B = 1.
Implementation diagrams for our
A
A
one-bit functions appear to the right.
Z1
Z1
The diagram to the immediate right
shows the implementation as we might
Z0
Z0
initially draw it, and the diagram on
B
B
the far right shows the implementation
converted to NAND/NOR gates for a more accurate estimate of complexity when implemented in CMOS.
The exercise of designing the logic for bit 0 is also useful in the sense that the logic structure illustrated
forms the core of the full design in that it identifies the two cases that matter: A < B and A > B.
Now we are ready to design the full function. Lets start by writing a full truth table, as shown on the left
below.
A
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
B
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
C1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
C0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
Z1
0
0
1
x
0
0
0
x
1
1
1
x
0
0
1
x
Z0
0
1
0
x
1
1
1
x
0
0
0
x
0
1
0
x
A
0
0
0
0
0
0
1
1
1
1
1
1
x
B
0
0
0
1
1
1
0
0
0
1
1
1
x
C1
0
0
1
0
0
1
0
0
1
0
0
1
1
C0
0
1
0
0
1
0
0
1
0
0
1
0
1
Z1
0
0
1
0
0
0
1
1
1
0
0
1
x
Z0
0
1
0
1
1
1
0
0
0
0
1
0
x
A
0
0
0
0
0
0
1
1
1
1
1
1
B C1
0
0
0
0
0
1
1
0
1
0
1
1
0
0
0
0
0
1
1
0
1
0
1
1
other
C0
0
1
0
0
1
0
0
1
0
0
1
0
Z1
0
0
1
0
0
0
1
1
1
0
0
1
x
Z0
0
1
0
1
1
1
0
0
0
0
1
0
x
In the truth table, we marked the outputs as dont care (xs) whenever C1 C0 = 11. You might recall that
we ran into problems with our ice cream dispenser control in Notes Set 2.2. However, in that case we could
not safely assume that a user did not push multiple buttons. Here, our bit slice logic only accepts inputs
23
from other copies of itself (or a fixed value for bit 0), andassuming that we design the logic correctlyour
bit slice never generates the 11 combination. In other words, that input combination is impossible (rather
than undesirable or unlikely), so the result produced on the outputs is irrelevant.
It is tempting to shorten the full truth table by replacing groups of rows. For example, if AB = 01, we
know that A < B, so the less significant bits (for which the result is represented by the C1 C0 inputs) dont
matter. We could write one row with input pattern ABC1 C0 = 01xx and output pattern Z1 Z0 = 01. We
might also collapse our dont care output patterns: whenever the input matches ABC1 C0 =xx11, we dont
care about the output, so Z1 Z0 =xx. But these two rows overlap in the input space! In other words, some
input patterns, such as ABC1 C0 = 0111, match both of our suggested new rows. Which output should take
precedence? The answer is that a reader should not have to guess. Do not use overlapping rows to shorten
a truth table. In fact, the first of the suggested new rows is not valid: we dont need to produce output 01 if
we see C1 C0 = 11. Two valid short forms of this truth table appear to the right of the full table. If you have
an other entry, as shown in the rightmost table, this entry should always appear as the last row. Normal
rows, including rows representing multiple input patterns, are not required to be in any particular order.
Use whatever order makes the table easiest to read for its purpose (usually by treating the input pattern as
a binary number and ordering rows in increasing numeric order).
In order to translate our design into algebra, we transcribe the
truth table into a K-map for each output variable, as shown to the
right. You may want to perform this exercise yourself and check
that you obtain the same solution. Implicants for each output
are marked in the K-maps, giving the following equations:
Z1
Z0
= A B + A C1 + B C1
= A B + A C0 + B C0
Z1
Z0
C1C0
00
01
11
10
00
01
AB
C1C0
00
01
11
10
00
01
AB
11
11
10
10
C1
Z1
Z0
24
N1
C1
A
B
Z1 comparator C 1
N1
C0
BN1
Z0
bit
slice N1 C 0
AN2
N2
C1
A
B
Z1 comparator C 1
N2
C0
A1
BN2
Z0
...
bit
slice N2 C 0
C1
A0
A
B
Z1 comparator C 1
C0
B1
Z0
bit
slice 1
C1
A
B
Z1 comparator C 1
C0
C0
B0
Z0
bit
slice 0
C0
Z1
= A B + A C1 + B C1
= A B + A + B C1
= A B + A B C1
Similarly,
Z0
= A B + A B C0
Notice that the second term in each equation now includes the complement of first term from the other
equation. For example, the Z1 equation includes the complement of the AB product that we need to
compute Z0 . We may be able to improve our design by combining these computations.
An implementation based on our
new algebraic formulation appears
to the right. In this form, we
seem to have kept the same number of gates, although we have replaced the 3-input gates with inverters. However, the middle inverters disappear when we convert
to NAND/NOR form, as shown below to the right. Our new design requires only two inverters and
six 2-input gates, a substantial reduction relative to the original implementation.
Is there a disadvantage? Yes, but
only a slight one. Notice that the
path from the A and B inputs to
the outputs is now four gates (maximum) instead of three. Yet the path
from C1 and C0 to the outputs is
still only two gates. Thus, overall,
we have merely increased our N -bit
comparators delay from 2N +1 gate
delays to 2N + 2 gate delays.
Z1
Z0
C0
Z1
Z0
C0
25
Extending to 2s Complement
What about comparing 2s complement numbers? Can we make use of the unsigned comparator that we
just designed?
Lets start by thinking about the sign of the numbers A and B. Recall that 2s complement records a
numbers sign in the most significant bit. For example, in the 8-bit numbers shown in the first diagram in
this set of notes, the sign bits are A7 and B7 . Lets denote these sign bits in the general case by As and Bs .
Negative numbers have a sign bit equal to 1, and non-negative numbers have a sign bit equal to 0. The table
below outlines an initial evaluation of the four possible combinations of sign bits.
As
0
0
1
1
Bs
0
1
0
1
interpretation
A 0 AND B 0
A 0 AND B < 0
A < 0 AND B 0
A < 0 AND B < 0
solution
use unsigned comparator on remaining bits
A>B
A<B
unknown
A3 A2 A1 A 0
B3 B2 B1 B0
A 1 1 0 0 (4)
B 1 1 1 0 (2)
4 = 4 + 8
6 = 2 + 8
Lets define Ar = A + 2N 1 as the value of the remaining bits for A and Br similarly for B. What happens
if we just go ahead and compare Ar and Br using an (N 1)-bit unsigned comparator? If we find that
Ar < Br we know that Ar 2N 1 < Br 2N 1 as well, but that means A < B! We can do the same with
either of the other possible results. In other words, simply comparing Ar with Br gives the correct answer
for two negative numbers as well.
All we need to design is a logic block for the sign bits.
At this point, we might write out a K-map, but instead
lets rewrite our high-level table with the new information, as shown to the right.
As
0
0
1
1
Bs
0
1
0
1
solution
pass result from less significant bits
A>B
A<B
pass result from less significant bits
Looking at the table, notice the similarity to the highlevel design for a single bit of an unsigned value. The
only difference is that the two A 6= B cases are reversed. If we swap As and Bs , the function is identical.
We can simply use another bit slice but swap these two inputs. Implementation of an N -bit 2s complement
comparator based on our bit slice comparator is shown below. The blue circle highlights the only change
from the N -bit unsigned comparator, which is to swap the two inputs on the sign bit.
an Nbit 2s complement comparator composed of bit slices
AN1
N1
C1
N1
0
BN1
A
B
Z1 comparator C 1
Z0
bit
slice N1 C 0
AN2
N2
C1
N2
0
BN2
A
B
Z1 comparator C 1
Z0
bit
slice N2 C 0
A1
...
C1
1
0
A0
B1
A
B
Z1 comparator C 1
Z0
bit
slice 1
C0
C1
0
0
B0
A
B
Z1 comparator C 1
Z0
bit
slice 0
C0
26
Further Optimization
Lets return to the topic of optimization. To what extent did the
representation of the three outcomes affect our ability to develop a
good bit slice design? Although selecting a good representation can
be quite important, for this particular problem most representations
lead to similar implementations.
C1
0
0
1
1
C0
0
1
0
1
original
A=B
A<B
A>B
not used
alternate
A=B
A>B
not used
A<B
B
Z0
C0
Why didnt it work? Should we consider still other representations? In fact, none of the possible representations that we might choose for a bit slice can cut the delay down to one gate delay per bit. The problem
is fundamental, and is related to the nature of CMOS. For a single bit slice, we define the incoming and
outgoing representations to be the same. We also need to have at least one gate in the path to combine
the C1 and C0 inputs with information from the bit slices A and B inputs. But all CMOS gates invert the
sense of their inputs. Our choices are limited to NAND and NOR. Thus we need at least two gates in the
path to maintain the same representation.
One simple answer is to use different representations for odd and even bits. Instead, we optimize a logic
circuit for comparing two bits. We base our design on the alternate representation. The implementation is
shown below. The left shows an implementation based on the algebra, and the right shows a NAND/NOR
implementation. Estimating by gate count and number of inputs, the two-bit design doesnt save much over
two single bit slices in terms of area. In terms of delay, however, we have only two gate delays from C1
and C0 to either output. The longest path from the A and B inputs to the outputs is five gate delays. Thus,
for an N -bit comparator built with this design, the total delay is only N + 3 gate delays. But N has to be
even.
a comparator 2bit slice (alternate representation, NAND/NOR)
C1
A1
A1
Z1
Z1
B1
B1
A0
A0
B0
B0
Z0
Z0
C0
C0
As you can imagine, continuing to scale up the size of our logic block gives us better performance at
the expense of a more complex design. Using the alternate representation may help you to see how one
can generalize the approach to larger groups of bitsfor example, you may have noticed the two bitwise
comparator blocks on the left of the implementations above.
27
Subtraction
Our discussion of arithmetic implementation has focused so far on addition. What about other operations,
such as subtraction, multiplication, and division? The latter two require more work, and we will not discuss
them in detail until later in our class (if at all).
Subtraction, however, can be performed almost trivially using logic that we have already designed. Lets
say that we want to calculate the difference D between two N -bit numbers A and B. In particular, we
want to find D = A B. For now, think of A, B, and D as 2s complement values. Recall how we defined
the 2s complement representation: the N -bit pattern that we use to represent B is the same as the base 2
bit pattern for (2N B), so we can use an adder if we first calculate the bit pattern for B, then add the
resulting pattern to A. As you know, our N -bit adder always produces a result that is correct modulo 2N ,
so the result of such an operation, D = 2N + A B, is correct so long as the subtraction does not overflow.
How can we calculate 2N B? The same way that we do by hand! Calculate
the 1s complement, (2N 1) B, then add 1. The diagram to the right
shows how we can use the N -bit adder that we designed in Notes Set 2.3 to
build an N -bit subtracter. New elements appear in blue in the figurethe
rest of the logic is just an adder. The box labeled 1s comp. calculates
the 1s complement of the value B, which together with the carry in value of 1
correspond to calculating B. Whats in the 1s comp. box? One inverter
per bit in B. Thats all we need to calculate the 1s complement. You might
now ask: does this approach also work for unsigned numbers? The answer is
yes, absolutely. However, the overflow conditions for both 2s complement and
unsigned subtraction are different than the overflow condition for either type
of addition. What does the carry out of our adder signify, for example? The
answer may not be immediately obvious.
B
N
1s comp.
N
Nbit adder
Cout
What does the
carry out mean?
C in
S
N
D=AB
Lets start with the overflow condition for unsigned subtraction. Overflow means that we cannot represent
the result. With an N -bit unsigned number, we have A B 6 [0, 2N 1]. Obviously, the difference cannot
be larger than the upper limit, since A is representable and we are subtracting a non-negative (unsigned)
value. We can thus assume that overflow occurs only when A B < 0. In other words, when A < B.
28
To calculate the unsigned subtraction overflow condition in terms of the bits, recall that our adder is calculating 2N + A B. The carry out represents the 2N term. When A B, the result of the adder is at
least 2N , and we see a carry out, Cout = 1. However, when A < B, the result of the adder is less than 2N ,
and we see no carry out, Cout = 0. Overflow for unsigned subtraction is thus inverted from overflow for
unsigned addition: a carry out of 0 indicates an overflow for subtraction.
What about overflow for 2s complement subtraction? We can use arguments similar to those that we used
to reason about overflow of 2s complement addition to prove that subtraction of one negative number from
a second negative number can never overflow. Nor can subtraction of a non-negative number from a second
non-negative number overflow.
If A 0 and B < 0, the subtraction overflows iff A B 2N 1 . Again using similar arguments as
before, we can prove that the difference D appears to be negative in the case of overflow, so the product
AN 1 BN 1 DN 1 evaluates to 1 when this type of overflow occurs (these variables represent the most
significant bits of the two operands and the difference; in the case of 2s complement, they are also the sign
bits). Similarly, if A < 0 and B 0, we have overflow when A B < 2N 1 . Here we can prove that D 0
on overflow, so AN 1 BN 1 DN 1 evaluates to 1.
Our overflow condition for N -bit 2s complement subtraction is thus given by the following:
AN 1 BN 1 DN 1 + AN 1 BN 1 DN 1
If we calculate all four overflow conditionsunsigned and 2s complement, addition and subtractionand
provide some way to choose whether or not to complement B and to control the Cin input, we can use the
same hardware for addition and subtraction of either type.
C2
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
C1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
C0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
T4
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
T5
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
T4
C3 C2
00
01
11
10
00
01
11
10
00
01
11
10
00
01
11
10
C1C0
T5
T4
= C3 + C2 + C1 + C0
T5
= C3 + C2 C1 + C2 C0
C3 C2
C1C0
29
As shown to the right of the truth tables, we can then draw simpler K-maps for T4 and T5 , and can solve
the K-maps to find equations for each, as shown to the right (check that you get the same answers).
How do we merge these results to form our final expression for L? We AND each of the term functions (T4
and T5 ) with the appropriate minterm for the high bits of C, then OR the results together, as shown here:
C6 C5 C4 T4 + C6 C5 C4 T5
C6 C5 C4 (C3 + C2 + C1 + C0 ) + C6 C5 C4 (C3 + C2 C1 + C2 C0 )
Rather than trying to optimize by hand, we can at this point let the CAD tools take over, confident that we
have the right function to identify an upper-case ASCII letter.
Breaking the truth table into pieces and using simple logic to reconnect the pieces is one way to make use of
abstraction when solving complex logic problems. In fact, recruiters for some companies often ask questions
that involve using specific logic elements as building blocks to implement other functions. Knowing that you
can implement a truth table one piece at a time will help you to solve this type of problem.
Lets think about other ways to tackle the problem of calculating L. In Notes Sets 2.3 and 2.4, we developed
adders and comparators. Can we make use of these as building blocks to check whether C represents an
upper-case letter? Yes, of course we can: by comparing C with the ends of the range of upper-case letters,
we can check whether or not C falls in that range.
The idea is illustrated on the left below using two 7-bit comparators constructed as discussed in Notes Set 2.4.
The comparators are the black parts of the drawing, while the blue parts represent our extensions to calculate L. Each comparator is given the value C as one input. The second value to the comparators is either
the letter A (0x41) or the letter Z (0x5A). The meaning of the 2-bit input and result to each comparator is
given in the table on the right below. The inputs on the right of each comparator are set to 0 to ensure that
equality is produced if C matches the second input (B). One output from each comparator is then routed
to a NOR gate to calculate L. Lets consider how this combination works. The left comparator compares C
with the letter A (0x41). If C 0x41, the comparator produces Z0 = 0. In this case, we may have a letter.
On the other hand, if C < 0x41, the comparator produces Z0 = 1, and the NOR gate outputs L = 0, since we
do not have a letter in this case. The right comparator compares C with the letter Z (0x5A). If C 0x5A,
the comparator produces Z1 = 0. In this case, we may have a letter. On the other hand, if C > 0x5A, the
comparator produces Z1 = 1, and the NOR gate outputs L = 0, since we do not have a letter in this case.
Only when 0x41 C 0x51 does L = 1, as desired.
0x41
7
discard
0x5A
Z1
C1
7bit
comparator
Z0
C0
Z1
C1
7bit
comparator
Z0
C0
discard
Z1
0
0
1
1
Z0
0
1
0
1
meaning
A=B
A<B
A>B
not used
30
What if we have only 8-bit adders available for our use,
such as those developed in Notes Set 2.3? Can we still calculate L? Yes. The diagram shown to the right illustrates
the approach, again with black for the adders and blue
for our extensions. Here we are actually using the adders
as subtracters, but calculating the 1s complements of the
constant values by hand. The zero extend box simply
adds a leading 0 to our 7-bit ASCII letter. The left adder
subtracts the letter A from C: if no carry is produced, we
know that C < 0x41 and thus C does not represent an
upper-case letter, and L = 0. Similarly, the right adder
subtracts 0x5B (the letter Z plus one) from C. If a carry
is produced, we know that C 0x5B, and thus C does
not represent an upper-case letter, and L = 0. With the
right combination of carries (1 from the left and 0 from the
right), we obtain L = 1.
0xBE
0xA4
zero extend
8
8bit adder
Cout
C in
S
8bit adder
1
Cout
C in
S
discard
discard
Looking carefully at this solution, however, you might be struck by the fact that we are calculating two sums
and then discarding them. Surely such an approach is inefficient?
We offer two answers. First, given the design shown above, a good CAD tool recognizes that the sum outputs
of the adders are not being used, and does not generate logic to calculate them. The logic for the two carry
bits used to calculate L can then be optimized. Second, the design shown, including the calculation of the
sums, is similar in efficiency to what happens at the rate of about 1015 times per second, 24 hours a day, seven
days a week, inside processors in data centers processing HTML, XML, and other types of human-readable
Internet traffic. Abstraction is a powerful tool.
Later in our class, you will learn how to control logical connections between hardware blocks so that you
can make use of the same hardware for adding, subtracting, checking for upper-case letters, and so forth.
31
Q
0
1
P
1
0
first row
What if Q = 1, though? In this case, the lower gate forces P to 0, and the upper gate in turn forces Q to 1.
Another stable state! The Q = 1 state appears as the second row of the truth table.
We have identified all of the stable states.1 Notice that our cross-coupled inverters can store a bit. Unfortunately, we have no way to specify which value should be stored, nor to change the bits value once the gates
have settled into a stable state. What can we do?
1 Most
logic families also allow unstable states in which the values alternate rapidly between 0 and 1. These metastable
states are beyond the scope of our class, but ensuring that they do not occur in practice is important for real designs.
32
Lets add an input to the upper gate, as shown
The S
to the right. We call the input S.
stands for setas you will see, our new input
allows us to set our stored bit Q to 1. The use
of a complemented name for the input indicates
that the input is active low. In other words,
the input performs its intended task (setting Q
to 1) when its value is 0 (not 1).
S
1
1
0
Q
0
1
1
P
1
0
0
Think about what happens when the new input is not active, S = 1. As you know, ANDing any value with 1
produces the same value, so our new input has no effect when S = 1. The first two rows of the truth table
are simply a copy of our previous table: the circuit can store either bit value when S = 1. What happens
when S = 0? In that case, the upper gates output is forced to 1, and thus the lower gates is forced to 0.
This third possibility is reflected in the last row of the truth table.
Now we have the ability to force bit Q to
have value 1, but if we want Q = 0, we
just have to hope that the circuit happens
to settle into that state when we turn on
the power. What can we do?
As you probably guessed, we add an input
to the other gate, as shown to the right.
the inputs purWe call the new input R:
pose is to reset bit Q to 0, and the input
is active low. We extend the truth table
= 0 and S = 1,
to include a row with R
which forces Q = 0 and P = 1.
R
1
1
1
0
0
R
the complement markings
indicate that the inputs
are active low
S
1
1
0
1
0
Q
0
1
1
0
1
P
1
0
0
1
1
S
latch. One can also build R-S latches (with active
The circuit that we have drawn has a name: an R S
latch (labeled incorrectly). Can you figure out
high set and reset inputs). The textbook also shows an Rhow to build an R-S latch yourself?
S
latch. What happens if we set S = 0 and R
= 0 at the same time?
Lets think a little more about the RNothing bad happens immediately. Looking at the design, both gates produce 1, so Q = 1 and P = 1. The
back to 1 at around the same time, the stored bit may end
bad part happens later: if we raise both S and R
up in either state.2
from ever being 1
We can avoid the problem by adding gates to prevent the two control inputs (S and R)
at the same time. A single inverter might technically suffice, but lets build up the structure shown below,
have no practical effect at the moment. A
noting that the two inverters in sequence connecting D to R
is forced to 0, and the bit is reset.
truth table is shown to the right of the logic diagram. When D = 0, R
S
Q
D
0
1
R
0
1
S
1
0
Q
0
1
P
1
0
P
R
Unfortunately, except for some interesting timing characteristics, the new design has the same functionality
as a piece of wire. And, if you ask a circuit designer, thin wires also have some interesting timing characteristics. What can we do? Rather than having Q always reflect the current value of D, lets add some extra
inputs to the new NAND gates that allow us to control when the value of D is copied to Q, as shown on the
next page.
2 Or,
33
S
Q
P
WE
WE
1
1
0
0
0
0
D
0
1
0
1
0
1
The W E (write enable) input controls whether or not Q mirrors the value of D.
The first two rows in the truth table are replicated from our wire design: a value
of W E = 1 has no effect on the first two NAND gates, and Q = D. A value of W E = 0
= 1, S = 1, and the bit Q can
forces the first two NAND gates to output 1, thus R
occupy either of the two possible states, regardless of the value of D, as reflected in
the lower four lines of the truth table.
R
0
1
1
1
1
1
S
1
0
1
1
1
1
Q
0
1
0
0
1
1
P
1
0
1
1
0
0
WE
The circuit just shown is called a gated D latch, and is an important mechanism
for storing state in sequential logic. (Random-access memory uses a slightly different technique to connect
the cross-coupled inverters, but latches are used for nearly every other application of stored state.) The D
stands for data, meaning that the bit stored is matches the value of the input. Other types of latches
(including S-R latches) have been used historically, but D latches are used predominantly today, so we omit
discussion of other types. The gated qualifier refers to the presence of an enable input (we called it W E)
to control when the latch copies its input into the stored bit. A symbol for a gated D latch appears to the
since P = Q
in a gated D latch.
right. Note that we have dropped the name P in favor of Q,
34
35
Consider the circuit shown below, for which the output is given by the equation S = AB + B
A
B
A
B
C
S
B goes high
B goes low
a glitch in S
The timing diagram on the right shows a glitch in the output when the input shifts from ABC = 110 to 100,
that is, when B falls. The problem lies in the possibility that the upper AND gate, driven by B, might go
goes high. In such a case, the OR gate output S falls until the
low before the lower AND gate, driven by B,
second AND gate rises, and the output exhibits a glitch.
A circuit that might exhibit a glitch in an output that functionally remains stable at 1 is said to have a
static-1 hazard. The qualifier static here refers to the fact that we expect the output to remain static,
while the 1 refers to the expected value of the output.
The presence of hazards in circuits can be problematic in certain cases. In domino logic, for example, an
output is precharged and kept at 1 until the output of a driving circuit pulls it to 0, at which point it stays
low (like a domino that has been knocked over). If the driving circuit contains static-1 hazards, the output
may fall in response to a glitch.
Similarly, hazards can lead to unreliable behavior in sequential feedback circuits. Consider the addition of
a feedback loop to the circuit just discussed, as shown in the figure below. The output of the circuit is now
CS,
where S denotes the state after S feeds back through the lower
given by the equation S = AB + B
AND gate. In the case discussed previously, the transition from ABC = 110 to 100, the glitch in S can
break the feedback, leaving S low or unstable. The resulting sequential feedback circuit is thus unreliable.
A
B
C
A
B
C
S
unknown/unstable
Eliminating static hazards from two-level circuits is fairly straightforward. The Karnaugh map to the right corresponds to our original circuit; the solid lines indicate the
implicants selected by the AND gates. A static-1 hazard is present when two adjacent 1s
in the K-map are not covered by a common implicant. Static-0 hazards do not occur in
two-level SOP circuits.
AB
00 01 11 10
0
1
1 0 1 1
0 0 1 0
Eliminating static hazards requires merely extending the circuit with consensus terms in order to ensure that
some AND gate remains high through every transition between input states with output 1.3 In the K-map
shown, the dashed line indicates the necessary consensus term, AC.
3 Hazard
36
Dynamic Hazards*
Consider an input transition for which we expect to see a change in an output. Under certain timing
conditions, the output may not transition smoothly, but instead bounce between its original value and its
new value before coming to rest at the new value. A circuit that might exhibit such behavior is said to
contain a dynamic hazard. The qualifier dynamic refers to the expected change in the output.
Dynamic hazards appear only in more complex circuits, such as the one shown below. The output of this
+ AC + B
C + BD.
circuit is defined by the equation Q = AB
A
B
f
j
g
Q
h
Consider the transition from the input state ABCD = 1111 to 1011, in
which B falls from 1 to 0. For simplicity, assume that each gate has a
T f g h i j Q
delay of 1 time unit. If B goes low at time T = 0, the table shows the
0 0 0 0 1 1 1
progression over time of logic levels at several intermediate points in the
1 1 1 1 1 1 1
circuit and at the output Q. Each gate merely produces the appropriate
2 1 1 1 0 0 0
output based on its inputs in the previous time step. After one delay,
3 1 1 1 0 1 1
the three gates with B as a direct input change their outputs (to stable,
4 1 1 1 0 1 0
final values). After another delay, at T = 2, the other three gates respond to the initial changes and flip their outputs. The resulting changes induce another set of changes at
T = 3, which in turn causes the output Q to change a final time at T = 4.
The output column in the table illustrates the possible impact of a dynamic hazard: rather than a smooth
transition from 1 to 0, the output drops to 0, rises back to 1, and finally falls to 0 again. The dynamic hazard
in this case can be attributed to the presence of a static hazard in the logic that produces intermediate value j.
37
Essential Hazards*
Essential hazards are inherent to the function of a circuit and may appear in any implementation. In
sequential feedback circuit design, they must be addressed at a low level to ensure that variations in logic
path lengths (timing skew) through a circuit do not expose them. With clocked synchronous circuits,
essential hazards are abstracted into a single form: clock skew, or disparate clock edge arrival times at a
circuits flip-flops.
An example demonstrates the possible effects: consider the construction of a clocked synchronous circuit to
recognize 0-1 sequences on an input IN . Output Q should be held high for one cycle after recognition, that
is, until the next rising clock edge. A description of states and a state diagram for such a circuit appear below.
S1 S0
00
01
10
11
state
A
B
C
unused
1/0
meaning
nothing, 1, or 11 seen last
0 seen last
01 recognized (output high)
0/0
0/0
1/0
0/1
1/1
For three states, we need two (= log2 3) flip-flops. Denote the internal state S1 S0 . The specific internal
state values for each logical state (A, B, and C) simplify the implementation and the example. A state table
and K-maps for the next-state logic appear below. The state table uses one line per state with separate
columns for each input combination, making the table more compact than one with one line per state/input
combination. Each column contains the full next-state information, including output. Using this form of the
state table, the K-maps can be read directly from the table.
S1 S0
00
01
11
10
IN
0
1
01/0 00/0
01/0 10/0
x
x
01/1 00/1
S1+
S1 S0
S0+
S1 S0
IN
0
1
0 0 x 0
0 1 x 0
S1 S0
00 01 11 10
00 01 11 10
IN
0
1
1 1 x 1
0 0 x 0
00 01 11 10
IN
0
1
0 0 x 1
0 0 x 1
Examining the K-maps, we see that the excitation and output equations are S1+ = IN S0 , S0+ = IN , and
Q = S1 . An implementation of the circuit using two D flip-flops appears below. Imagine that mistakes in
routing or process variations have made the clock signals path to flip-flop 1 much longer than its path into
flip-flop 0, as illustrated.
IN
D0 S0
D1 S1
CLK
a long, slow wire
Due to the long delays, we cannot assume that rising clock edges arrive at the flip-flops at the same time.
The result is called clock skew, and can make the circuit behave improperly by exposing essential hazards.
In the logical B to C transition, for example, we begin in state S1 S0 = 01 with IN = 1 and the clock edge
rising. Assume that the edge reaches flip-flop 0 at time T = 0. After a flip-flop delay (T = 1), S0 goes low.
After another AND gate delay (T = 2), input D1 goes low, but the second flip-flop has yet to change state!
Finally, at some later time, the clock edge reaches flip-flop 1. However, the output S1 remains at 0, leaving
the system in state A rather than state C.
Fortunately, in clocked synchronous sequential circuits, all essential hazards are related to clock skew. This
fact implies that we can eliminate a significant amount of complexity from circuit design by doing a good
job of distributing the clock signal. It also implies that, as a designer, you should avoid specious addition of
logic in a clock path, as you may regret such a decision later, as you try to debug the circuit timing.
38
state
low
high
pulse low
pulse high
L
H
PL
PH
clock
clock
clock
clock
state
00
CLK D
01
11
10
PH
PL
PL
PL
PH
PH
PL
PL
PH
PH
Consider the sequential feedback state table for a positive edge-triggered D flip-flop, shown above. In
designing and analyzing such circuits, we assume that only one input bit changes at a time. The state
table consists of one row for each state and one column for each input combination. Within a row, input
combinations that have no effect on the internal state of the circuit (that is, those that do not cause any
change in the state) are said to be stable; these states are circled. Other states are unstable, and the circuit
changes state in response to changes in the inputs.
For example, given an initial state L with low output, low clock, and high input D, the solid arcs trace the
reaction of the circuit to a rising clock edge. From the 01 input combination, we move along the column to
the 11 column, which indicates the new state, PH. Moving down the column to that states row, we see that
the new state is stable for the input combination 11, and we stop. If PH were not stable, we would continue
to move within the column until coming to rest on a stable state.
An essential hazard appears in such a table as a difference between the final state when flipping a bit once
and the final state when flipping a bit thrice in succession. The dashed arcs in the figure illustrate the
concept: after coming to rest in the PH state, we reset the input to 01 and move along the PH row to find
a new state of PL. Moving up the column, we see that the state is stable. We then flip the clock a third
time and move back along the row to 11, which indicates that PH is again the next state. Moving down
the column, we come again to rest in PH, the same state as was reached after one flip. Flipping a bit three
times rather than once evaluates the impact of timing skew in the circuit; if a different state is reached after
two more flips, timing skew could cause unreliable behavior. As you can verify from the table, a D flip-flop
has no essential hazards.
A group of flip-flops, as might appear in a clocked synchronous circuit, can and usually does have essential
hazards, but only dealing with the clock. As you know, the inputs to a clocked synchronous sequential
circuit consist of a clock signal and other inputs (either external of fed back from the flip-flops). Changing
an input other than the clock can change the internal state of a flip-flop (of the master-slave variety), but
flip-flop designs do not capture the number of input changes in a clock cycle beyond one, and changing an
input three times is the same as changing it once. Changing the clock, of course, results in a synchronous
state machine transition.
The detection of essential hazards in a clocked synchronous design based on flip-flops thus reduces to examination of the state machine. If the next state of the machine has any dependence on the current state, an
essential hazard exists, as a second rising clock edge moves the system into a second new state. For a single
D flip-flop, the next state is independent of the current state, and no essential hazards are present.
39
Registers
This set of notes introduces registers, an abstraction used for storage of groups of bits in digital systems.
We introduce some terminology used to describe aspects of register design and illustrate the idea of a shift
register. The registers shown here are important abstractions for digital system design. In the Fall 2012
offering of our course, we will cover this material on the third midterm.
Registers
A register is a storage element composed from one or more
flip-flops operating on a common clock. In addition to the flipflops, most registers include logic to control the bits stored by
the register. For example, the D flip-flops described previously
copy their inputs at the rising edge of each clock cycle, discarding whatever bits they have stored during that cycle. To
enable a flip-flop to retain its value, we might try to hide the
rising edge of the clock from the flip-flop, as shown to the right.
IN
LOAD
CLK
IN
LOAD
CLK
Q
IN3
LOAD
Q
IN
CLK
IN2
IN1
IN0
LOAD
CLK
Q3
Q2
Q1
Q0
40
Shift Registers
Certain types of registers include logic
D
D
D
D
SI
SO
to manipulate data held within the register. A shift register is an important
example of this type. The simplest
CLK
shift register is a series of D flip-flops,
with the output of each attached to
Q3
Q2
Q1
Q0
the input of the next, as shown to the
right. In the circuit shown, a serial input SI accepts a single bit of data per cycle and delivers the bit four
cycles later to a serial output SO. Shift registers serve many purposes in modern systems, from the obvious
uses of providing a fixed delay and performing bit shifts for processor arithmetic to rate matching between
components and reducing the pin count on programmable logic devices such as field programmable gate
arrays (FPGAs), the modern form of the programmable logic array mentioned in the textbook.
An example helps to illustrate the rate matching problem: historical I/O buses used fairly slow clocks, as they
had to drive signals and be arbitrated over relatively long distances. The Peripheral Control Interconnect
(PCI) standard, for example, provided for 33 and 66 MHz bus speeds. To provide adequate data rates, such
buses use many wires in parallel, either 32 or 64 in the case of PCI. In contrast, a Gigabit Ethernet (local
area network) signal travelling over a fiber is clocked at 1.25 GHz, but sends only one bit per cycle. Several
layers of shift registers sit between the fiber and the I/O bus to mediate between the slow, highly parallel
signals that travel over the I/O bus and the fast, serial signals that travel over the fiber. The latest variant
of PCI, PCIe (e for express), uses serial lines at much higher clock rates.
Returning to the figure above, imagine that the outputs Qi feed into logic clocked at 1/4th the rate of the
shift register (and suitably synchronized). Every four cycles, the flip-flops fill up with another four bits, at
which point the outputs are read in parallel. The shift register shown can thus serve to transform serial
data to 4-bit-parallel data at one-quarter the clock speed. Unlike the registers discussed earlier, the shift
register above does not support parallel load, which prevents it from transforming a slow, parallel stream
of data into a high-speed serial stream. The use of serial load requires N cycles for an N-bit register, but
can reduce the number of wires needed
SHIFT
to support the operation of the shift regSI
SO
ister. How would you add support for
parallel load? How many additional inputs would be necessary?
The shift register shown above is also incapable of storing a value rather than
continuously shifting. The addition of
the same structure that we used to control register loading can be applied to
control shifting, as shown to the right.
CLK
Q3
Q2
Q1
Q0
SI
C1
C0
41
IN3
IN2
IN1
IN0
Q i+1 INi Q i1
C1
Qi
C0
Q i+1 INi Q i1
C1
Qi
C0
Q i+1 INi Q i1
C1
Qi
C0
Q i+1 INi Q i1
C1
Qi
C0
bidirectional
shift register bit
bidirectional
shift register bit
bidirectional
shift register bit
bidirectional
shift register bit
CLK
SO
Q3
Q2
Q1
Q0
At each rising clock edge, the action specified by C1 C0 is taken. When C1 C0 = 00, the register holds its
current value, with the register value appearing on Q[3 : 0] and each flip-flop feeding its output back into
its input. For C1 C0 = 01, the shift register shifts left: the serial input, SI, is fed into flip-flop 0, and Q3 is
passed to the serial output, SO. Similarly, when C1 C0 = 11, the shift register shifts right: SI is fed into
flip-flop 3, and Q0 is passed to SO. Finally, the case C1 C0 = 10 causes all flip-flops to accept new values
from IN [3 : 0], effecting a parallel load.
Several specialized shift operations are used to support data manipulation in modern processors (CPUs).
Essentially, these specializations dictate the form of the glue logic for a shift register as well as the serial
input value. The simplest is a logical shift, for which SI and SO are hardwired to 0; incoming bits are
always 0. A cyclic shift takes SO and feeds it back into SI, forming a circle of register bits through which
the data bits cycle.
Finally, an arithmetic shift treats the shift register contents as a number in 2s complement form. For
non-negative numbers and left shifts, an arithmetic shift is the same as a logical shift. When a negative
number is arithmetically shifted to the right, however, the sign bit is retained, resulting in a function similar
to division by two. The difference lies in the rounding direction. Division by two rounds towards zero in most
processors: 5/2 gives 2. Arithmetic shift right rounds away from zero for negative numbers (and towards
zero for positive numbers): 5 >> 1 gives 3. We transform our previous shift register into one capable of
arithmetic shifts by eliminating the serial input and feeding the most significant bit, which represents the
sign in 2s complement form, back into itself for right shifts, as shown below.
IN3
IN2
IN1
IN0
Q i+1 INi Q i1
C1
Qi
C0
Q i+1 INi Q i1
C1
Qi
C0
Q i+1 INi Q i1
C1
Qi
C0
Q i+1 INi Q i1
C1
Qi
C0
bidirectional
shift register bit
bidirectional
shift register bit
bidirectional
shift register bit
bidirectional
shift register bit
C1
C0
CLK
SO
Q3
Q2
Q1
Q0
42
43
You should recognize all of these terms and be able to explain what they mean. For the specific circuits,
you should be able to draw them and explain how they work. Actually, we dont care whether you can draw
something from memorya full adder, for exampleprovided that you know what a full adder does and
can derive a gate diagram correctly for one in a few minutes. Higher-level skills are much more valuable.
Boolean functions and logic gates
- NOT/inverter
- AND
- OR
- XOR
- NAND
- NOR
- XNOR
- majority function
specific logic circuits
- full adder
- ripple carry adder
- N-to-M multiplexer (mux)
- N-to-2N decoder
S
latch
- R- R-S latch
- gated D latch
- master-slave implementation of a
positive edge-triggered D flip-flop
- (bidirectional) shift register
- register supporting parallel load
design metrics
- metric
- optimal
- heuristic
- constraints
- power, area/cost, performance
- computer-aided design (CAD) tools
- gate delay
general math concepts
- canonical form
- N -dimensional hypercube
tools for solving logic problems
- truth table
- Karnaugh map (K-map)
- implicant
- prime implicant
- bit-slicing
- timing diagram
device technology
- complementary metal-oxide
semiconductor (CMOS)
- field effect transistor (FET)
- transistor gate, source, drain
Boolean logic terms
- literal
- algebraic properties
- dual form, principle of duality
- sum, product
- minterm, maxterm
- sum-of-products (SOP)
- product-of-sums (POS)
- canonical sum/SOP form
- canonical product/POS form
- logical equivalence
digital systems terms
- word size
- N -bit Gray code
- combinational/combinatorial logic
- two-level logic
- dont care outputs (xs)
- sequential logic
- state
- active low inputs
- set a bit (to 1)
- reset a bit (to 0)
- master-slave implementation
- positive edge-triggered
- clock signal
- square wave
- rising/positive clock edge
- falling/negative clock edge
- clock gating
- clocked synchronous sequential circuits
- parallel/serial load of registers
- logical/arithmetic/cyclic shift
44
We can transform this bit-sliced design to a serial design with a single copy of the bit slice logic, M + Q
flip-flops, and M gates (and sometimes an inverter). The strategy is illustrated on the right below. A
single copy of the bit slice operates on one set of P external input bits and produces one set of Q external
output bits each clock cycle. In the design shown, these output bits are available during the next cycle, after
they have been stored in the flip-flops. The M bits to be passed to the next bit slice are also stored in
flip-flops, and in the next cycle are provided back to the same physical bit slice as inputs. The first cycle of a
multi-cycle operation must be handled slightly differently, so we add selection logic and an control signal, F .
For the first cycle, we apply F = 1, and the initial values are passed into the bit slice. For all other bits,
we apply F = 0, and the values stored in the flip-flops are returned to the bit slices inputs. After all bits
have passed through the bit sliceafter N cycles for an N -bit designthe final M bits are stored in the
flip-flops, and the results are calculated by the output logic.
a serialized bitsliced design
perslice inputs
F
F
Bi
P
CLK
initial
values
M
M
select
logic
bit
slice
M
flip
flops
Q
flip
flops
output
logic
results
perslice outputs
The selection logic merits explanation. Given that the original design initialized the bits to constant values
(0s or 1s), we need only simple logic for selection. The two drawings on the left above illustrate how Bi ,
the complemented flip-flop output for a bit i, can be combined with the first-cycle signal F to produce an
appropriate input for the bit slice. Selection thus requires one extra gate for each of the M inputs, and we
need an inverter for F if any of the initial values is 1.
state
LOCKED
DRIVER
UNLOCKED
ALARM
drivers door
locked
unlocked
unlocked
locked
other doors
locked
locked
unlocked
locked
alarm on
no
no
no
yes
Another tool used with FSMs is the next-state table (sometimes called a state transition table, or just
a state table), which maps the current state and input combination into the next state of the FSM. The
abstract variant shown below outlines desired behavior at a high level, and is often ambiguous, incomplete,
and even inconsistent. For example, what happens if a user pushes two buttons? What happens if they
push unlock while the alarm is sounding? These questions should eventually be considered. However, we
can already start to see the intended use of the design: starting from a locked car, a user can push unlock
once to gain entry to the drivers seat, or push unlock twice to open the car fully for passengers. To lock
the car, a user can push the lock button at any time. And, if a user needs help, pressing the panic
button sets off an alarm.
state
LOCKED
DRIVER
(any)
(any)
action/input
push unlock
push unlock
push lock
push panic
next state
DRIVER
UNLOCKED
LOCKED
ALARM
push "unlock"
push
"lock"
LOCKED
DRIVER
push "lock"
push
"panic"
push
"panic"
push
"unlock"
push
"lock"
Implementing an FSM using digital logic requires that we translate the design into bits,
eliminate any ambiguity, and complete the
specification. How many internal bits should
we use? What are the possible input values, and how are their meanings represented
in bits? What are the possible output values, and how are their meanings represented
in bits? We will consider these questions for
several examples in the coming weeks.
push
"lock"
ALARM
push "panic"
UNLOCKED
push
"panic"
For now, we simply define answers for our example design, the keyless entry system. Given four states, we
need at least log2 (4) = 2 bits of internal state, which we store in two flip-flops and call S1 S0 . The table
below lists input and output signals and defines their meaning.
outputs
inputs
D
R
A
U
L
P
We can now choose a representation for our states and rewrite the list of states, using bits both for the states
and for the outputs. We also include the meaning of each state for clarity in our example. Note that we can
choose the internal representation in any way. Here we have matched the D and R outputs when possible to
simplify the output logic needed for the implementation. The order of states in the list is not particularly
important, but should be chosen for convenience and clarity (including transcribing bits into to K-maps, for
example).
meaning
vehicle locked
driver door unlocked
all doors unlocked
alarm sounding
state
LOCKED
DRIVER
UNLOCKED
ALARM
S1 S0
00
10
11
01
drivers door
D
0
1
1
0
other doors
R
0
0
1
0
alarm on
A
0
0
0
1
We can also rewrite the next-state table in terms of bits. We use Gray code order on both axes, as these
orders make it more convenient to use K-maps. The values represented in this table are the next FSM
state given the current state S1 S0 and the inputs U , L, and P . Our symbols for the next-state bits are S1+
and S0+ . The + superscript is a common way of expressing the next value in a discrete series, here induced
by the use of clocked synchronous logic in implementing the FSM. In other words, S1+ is the value of S1 in
the next clock cycle, and S1+ in an FSM implemented as a digital system is a Boolean expression based on
the current state and the inputs. For our example problem, we want to be able to write down expressions
for S1+ (S1 , S0 , U, L, P ) and S1+ (S1 , S0 , U, L, P ), as well as expressions for the output logic U (S1 , S0 ), L(S1 , S0 ),
and P (S1 , S0 ).
current state
S1 S0
00
01
11
10
In the process of writing out the
next-state table, we have made decisions for all of the questions that
we asked earlier regarding the abstract state table. These decisions
are also reflected in the complete
state transition diagram shown to
the right. The states have been
extended with state bits and output bits, as S1 S0 /DRA. You
should recognize that we can also
leave some questions unanswered
by placing xs (dont cares) into
our table. However, you should
also understand at this point that
any implementation will produce
bits, not xs, so we must be careful not to allow arbitrary choices
unless any of the choices allowed
is indeed acceptable for our FSMs
purpose. We will discuss this process and the considerations necessary as we cover more FSM design
examples.
000
00
01
11
10
001
01
01
01
01
011
01
01
01
01
U LP
010 110
00
00
00
00
00
00
00
00
111
01
01
01
01
101
01
01
01
01
100
10
01
11
11
ULP=000
ULP=000,010,
or 110
LOCKED
00/000
ULP=100
ULP=010 or 110
ULP=001,011,
101, or 111
DRIVER
10/100
ULP=100
ULP=001,011,
101, or 111
ULP=
010 or
110
ULP=010
or 110
ALARM
01/001
ULP=001,011,101, or 111
UNLOCKED
ULP=000
or 100
11/110
ULP=000,001,011,
111,101, or 100
We have deliberately omitted calculation of expressions for the next-state variables S1+ and S0+ , and for the
outputs U , L, and P . We expect that you are able to do so from the detailed state table above, and may
assign such an exercise as part of your homework.
Synchronous Counters
A counter is a clocked sequential circuit with a state diagram consisting of a single logical cycle. Not all
counters are synchronous. In other words, not all flip-flops in a counter are required to use the same clock
signal. A counter in which all flip-flops do utilize the same clock signal is called a synchronous counter.
Except for a brief introduction to other types of counters in the next section, our class focuses entirely on
clocked synchronous designs, including counters.
000
111
S1
0
0
1
1
0
0
1
1
S0
0
1
0
1
0
1
0
1
S2+
0
0
0
1
1
1
1
0
S1+
0
1
1
0
0
1
1
0
S0+
1
0
1
0
1
0
1
0
S2
S1
S1S0
0
S2
1
00
01
11
10
S0
00
01
11
10
S2+ = S2 S1 S0 + S2 S1 + S2 S0
S+ =
S1 S0 + S1 S0
1
S0+
S0
011
100
S1S0
00
01
11
10
S2
S2
010
101
S1S0
0
3bit
binary
counter
cycle
110
The cycle of states shown to the right corresponds to the states of a 3-bit binary
counter. The numbers in the states represent both internal state bits S2 S1 S0
and output bits Z2 Z1 Z0 . We transcribe this diagram into the next-state table
shown on the left below. We then write out K-maps for the next state bits S2+ ,
S1+ , and S0+ , as shown to the right, and use the K-maps to find expressions for
these variables in terms of the current state.
S2
0
0
0
0
1
1
1
1
001
= S2 (S1 S0 )
= S1 S0
= S0 1
The first form of the expression for each next-state variable is taken directly from the corresponding K-map.
We have rewritten each expression to make the emerging pattern more obvious. We can also derive the pattern intuitively by asking the following: given a binary counter in state SN 1 SN 2 . . . Sj+1 Sj Sj1 . . . S1 S0 ,
when does Sj change in the subsequent state? The answer, of course, is that Sj changes when all of the bits
below Sj are 1. Otherwise, Sj remains the same in the next state. We thus write Sj+ = Sj (Sj1 . . . S1 S0 )
and implement the counter as shown below for a 4-bit design. Note that the usual order of output bits along
the bottom is reversed in the figure, with the most significant bit at the right rather than the left.
a 4bit synchronous binary counter with serial gating
S0
S1
Q
S2
Q
S3
Q
CLK
Z0
Z1
Z2
Z3
The calculation of the left inputs to the XOR gates in the counter shown above is performed with a series of
two-input AND gates. Each of these gates ANDs another flip-flop value into the product. This approach,
called serial gating, implies that an N -bit counter requires more than N 2 gate delays to settle into the
next state. An alternative approach, called parallel gating, calculates each input independently with a
single logic gate, as shown below. The blue inputs to the AND gate for S3 highlight the difference from the
previous figure (note that the two approaches differ only for bits S3 and above). With parallel gating, the
fan-in of the gates (the number of inputs) and the fan-out of the flip-flop outputs (number of other gates
into which an output feeds) grow with the size of the counter. In practice, large counters use a combination
of these two approaches.
a 4bit synchronous binary counter with parallel gating
S0
S1
Q
S2
Q
S3
Q
CLK
Z0
Z1
Z2
Z3
Ripple Counters
A second class of counter drives some of its flip-flops with a clock signal and feeds flip-flop outputs into
the clock inputs of its remaining flip-flops, possibly through additional logic. Such a counter is called a
ripple counter, because the effect of a clock edge ripples through the flip-flops. The delay inherent to the
ripple effect, along with the complexity of ensuring that timing issues do not render the design unreliable,
are the major drawbacks of ripple counters. Compared with synchronous counters, however, ripple counters
consume less energy, and are sometimes used for devices with restricted energy supplies.
General ripple counters can be tricky because of timing issues, but certain types
are easy. Consider the design of binary ripple counter. The state diagram for
a 3-bit binary counter is replicated to the right. Looking at the states, notice
that the least-significant bit alternates with each state, while higher bits flip
whenever the next smaller bit (to the right) transitions from one to zero. To
take advantage of these properties, we use positive edge-triggered D flip-flops
outputs wired back to their inputs. The clock
with their complemented (Q)
input is fed only into the first flip-flop, and the complemented output of each
flip-flop is also connected to the clock of the next.
000
111
110
001
3bit
binary
counter
cycle
101
010
011
100
1 Recall
that flip-flops record the clock state internally. The logical activity required to record such state consumes energy.
Beginning with the state 0000, at the rising clock edge, the left (S0 ) flip-flop toggles to 1. The second (S1 )
flip-flop sees this change as a falling clock edge and does nothing, leaving the counter in state 0001. When
the next rising clock edge arrives, the left flip-flop toggles back to 0, which the second flip-flop sees as a rising
clock edge, causing it to toggle to 1. The third (S2 ) flip-flop sees the second flip-flops change as a falling
edge and does nothing, and the state settles as 0010. We leave verification of the remainder of the cycle as
an exercise.
Timing Issues*
Ripple counters are a form of a more general strategy known as clock gating.2 Clock gating uses logic to
control the visibility of a clock signal to flip-flops (or latches). Historically, digital system designers rarely
used clock gating techniques because of the complexity introduced for the circuit designers, who must ensure
that clock edges are delivered with little skew along a dynamically changing set of paths to flip-flops. Today,
however, the power benefits of hiding the clock signal from flip-flops have made clock gating an attractive
strategy. Nevertheless, digital logic designers and computer architects still almost never use clock gating
strategies directly. In most of the industry, CAD tools insert logic for clock gating automatically. A handful
of companies (such as Intel and Apple/Samsung) design custom circuits rather than relying on CAD tools
to synthesize hardware designs from standard libraries of elements. In these companies, clock gating is used
widely by the circuit design teams, and some input is occasionally necessary from the higher-level designers.
More aggressive gating strategies are also used in modern designs, but these usually require more time to
transition between the on and off states and can be more difficult to get right automatically (with the tools),
hence hardware designers may need to provide high-level information about their designs. A flip-flop that
does not see any change in its clock input still has connections to high voltage and ground, and thus allows
a small amount of leakage current. In contrast, with power gating, the voltage difference is removed,
and the circuit uses no power at all. Power gating can be trickyas you know, for example, when you turn
the power on, you need to make sure that each latch settles into a stable state. Latches may need to be
initialized to guarantee that they settle, which requires time after the power is restored.
If you want a deeper understanding of gating issues, take ECE482, Digital Integrated Circuit Design, or
ECE527, System-on-a-Chip Design.
Machine Models
Before we dive fully into FSM design, we must point out that we have placed a somewhat artificial restriction
on the types of FSMs that we use in our course. Historically, this restriction was given a name, and machines
of the type that we have discussed are called Moore machines. However, outside of introductory classes,
almost no one cares about this name, nor about the name for the more general model used almost universally
in hardware design, Mealy machines.
What is the difference? In a Moore machine, outputs depend only on the internal state bits of the FSM
(the values stored in the flip-flops). In a Mealy machine, outputs may be expressed as functions both
of internal state and FSM inputs. As we illustrate shortly, the benefit of using input signals to calculate
outputs (the Mealy machine model) is that input bits effectively serve as additional system state, which
means that the number of internal state bits can be reduced. The disadvantage of including input signals in
the expressions for output signals is that timing characteristics of input signals may not be known, whereas
an FSM designer may want to guarantee certain timing characteristics for output signals.
In practice, when such timing guarantees are needed, the designer simply adds state to the FSM to accommodate the need, and the problem is solved. The coin-counting FSM that we designed for our class lab
assignments, for example, required that we use a Moore machine model to avoid sending the servo controlling
the coins path an output pulse that was too short to enforce the FSMs decision about which way to send
the coin. By adding more states to the FSM, we were able to hold the servo in place, as desired.
2 Fall
2012 students: This part may seem a little redundant, but were going to remove the earlier mention of clock gating in
future semesters.
10
Why are we protecting you from the model used in practice? First, timing issues add complexity to a topic
that is complex enough for an introductory course. And, second, most software FSMs are Moore machines,
so the abstraction is a useful one in that context, too.
In many design contexts, the timing issues implied by a Mealy model can be relatively simple to manage.
When working in a single clock domain, all of the input signals come from flip-flops in the same domain, and
are thus stable for most of the clock cycle. Only rarely does one need to keep additional state to improve
timing characteristics in these contexts. In contrast, when interacting across clock domains, more care is
sometimes needed to ensure correct behavior.
We now illustrate the state reduction benefit of the Mealy machine model with a simple example, an FSM
that recognizes the pattern of a 0 followed by a 1 on a single input and outputs a 1 when it observes the
pattern. As already mentioned, Mealy machines often require fewer flip-flops. Intuitively, the number of
combinations of states and inputs is greater than the number of combinations of states alone, and allowing
a function to depend on inputs reduces the number of internal states needed.
A Mealy implementation of the FSM appears on the left below, and an example timing diagram illustrating
the FSMs behavior is shown on the right. The machine shown below occupies state A when the last bit
seen was a 0, and state B when the last bit seen was a 1. Notice that the transition arcs in the state
diagram are labeled with two values instead of one. Since outputs can depend on input values as well as
state, transitions in a Mealy machine are labeled with input/output combinations, while states are labeled
only with their internal bits (or just their names, as shown below). Labeling states with outputs does not
make sense for a Mealy machine, since outputs may vary with inputs. Notice that the outputs indicated on
any given transition hold only until that transition is taken (at the rising clock edge), as is apparent in the
timing diagram. When inputs are asynchronous, that is, not driven by the same clock signal, output pulses
from a Mealy machine can be arbitrarily short, which can lead to problems.
1/1
0/0
CLK
IN
OUT
1/0
0/0
For a Moore machine, we must create a special state in which the output is high. Doing so requires that we
split state B into two states, a state C in which the last two bits seen were 01, and a state D in which the
last two bits seen were 11. Only state C generates output 1. State D also becomes the starting state for the
new state machine. The state diagram on the left below illustrates the changes, using the transition diagram
style that we introduced earlier to represent Moore machines. Notice in the associated timing diagram that
the output pulse lasts a full clock cycle.
1
A/0
C/1
0
1
0
D/0
CLK
IN
OUT
OUT rises with CLK
11
In Step 1, we translate our description in human language into a model with states and desired behavior.
At this stage, we simply try to capture the intent of the description and are not particularly thorough nor
exact.
Step 2 begins to formalize the model, starting with its input and output behavior. If we eventually plan
to develop an implementation of our FSM as a digital system (which is not the only choice, of course!), all
input and output must consist of bits. Often, input and/or output specifications may need to match other
digital systems to which we plan to connect our FSM. In fact, most problems in developing large digital
systems today arise because of incompatibilities when composing two or more separately designed pieces (or
modules) into an integrated system.
Once we know the I/O behavior for our FSM, in Step 3 we start to make any implicit assumptions clear
and to make any other decisions necessary to the design. Occasionally, we may choose to leave something
undecided in the hope of simplifying the design with dont care entries in the logic formulation.
In Step 4, we select an internal representation for the bits necessary to encode the state of our FSM. In
practice, for small designs, this representation can be selected by a computer in such a way as to optimize the
implementation. However, for large designs, such as the LC-3 instruction set architecture that we study later
in this class, humans do most of the work by hand. In the later examples in this set of notes, we show how
even a small design can leverage meaningful information from the design when selecting the representation,
leading to an implementation that is simpler and is easier to build correctly. We also show how one can use
abstraction to simplify an implementation.
12
By Step 5, our design is a complete specification in terms of bits, and we need merely derive logic expressions
for the next-state variables and the output signals. This process is no different than for combinational logic,
and should already be fairly familiar to you.
Finally, in Step 6, we translate our logic expressions into gates, inserting flip-flops (or registers) to hold the
internal state bits of the FSM. In later notes, we will use more complex building blocks when implementing
an FSM, building up abstractions in order to simplify the design process in much the same way that we have
shown for combinational logic.
S+1
S0
0
S1
COUNT A
00/00
COUNT B
01/01
COUNT C
11/11
COUNT D
10/10
Z1
S1
Q
S0
S0
0
D
1
S1
Z0
S0
Q
S2
COUNT A
000/000
COUNT B
001/001
COUNT C
011/011
COUNT D
010/010
COUNT H
100/100
COUNT G
101/101
COUNT F
111/111
COUNT E
110/110
S1
S1S0
00
01
11
10
S2
S0
S1S0
00
01
11
10
S2
S1S0
00
01
11
S2
10
13
S2 S0 + S1 S0
S1+
S0+
=
=
S2 S0 + S1 S0
S2 S1 + S2 S1
CLOCK
Notice that the equations for S2+ and S1+ share a common term, S1 S0 . This design does not allow much
choice in developing good equations for the next-state
logic, but some designs may enable you to reduce the
design complexity by explicitly identifying and making
use of common algebraic terms and sub-expressions for
different outputs. In modern design processes, identifying such opportunities is generally performed by a
computer program, but its important to understand
how they arise. Note that the common term becomes a
single AND gate in the implementation of our counter,
as shown to the right.
Z2
S2
Q
Z1
S1
Q
Z0
S0
Q
yellow
violet
green
color
black
blue
green
cyan
red
violet
yellow
white
RGB
000
001
010
011
100
101
110
111
blue
You immediately recognize that you merely need a counter with five states.
How many flip-flops will we need? At least three, since log2 (5) = 3. Given
that well need three flip-flops, and that the colors well need to produce as
outputs are all unique bit patterns, we can again choose to use the counters internal state directly as our
output values.
A fully-specified transition diagram for our
color sequencer appears to the right. The
states again form a loop, and are marked
with the internal state value S2 S1 S0 and
the output RGB.
As before, we can use the transition diagram to fill in K-maps for the next-state
values S2+ , S1+ , and S0+ as shown to
the right. For each of the three states
not included in our transition diagram, we
BLACK
000/000
S+2
YELLOW
110/110
S1
S1S0
0
S2
1
00
01
11
10
VIOLET
101/101
GREEN
010/010
S+0
S1S0
0
00
01
11
10
S1S0
00
01
11
10
S2
S2
1
BLUE
001/001
14
CLOCK
S2
Q
=
=
S2 S1 + S1 S0
S2 S0 + S1 S0
S0+
S1
S1
Q
S0
Q
BLACK
000/000
YELLOW
110/110
VIOLET
101/101
GREEN
010/010
BLUE
001/001
CYAN
011/011
RED
100/100
WHITE
111/111
Notice that the FSM does not move out of the WHITE state (ever). You may at this point wonder whether
more careful decisions in selecting our next-state expressions might address this issue. To some extent, yes.
For example, if we replace the S2 S1 term in the equation for S2+ with S2 S0 , a decision allowed by the dont
care boxes in the K-map for our design, the resulting transition diagram does not suffer from the problem
15
that weve found. However, even if we do change our implementation slightly, we need to address another
aspect of the problem: how can the FSM ever get into the unexpected states?
What is the initial state of the three flip-flops in our implementation? The initial state may not even be 0s
and 1s unless we have an explicit mechanism for initialization. Initialization can work in two ways. The
first approach makes use of the flip-flop design. As you know, a flip-flop is built from a pair of latches, and
we can make use of the internal reset lines on these latches to force each flip-flop into the 0 state (or the
1 state) using an additional input.
Alternatively, we can add some extra logic to our design. Consider adding a few AND gates and a RESET
input (active low), as shown in the dashed box in the figure below. In this case, when we assert RESET
by setting it to 0, the FSM moves to state 000 in the next cycle, putting it into the BLACK state. The
approach taken here is for clarity; one can optimize the design, if desired. For example, we could simply
connect RESET as an extra input into the three AND gates on the left rather than adding new ones, with
the same effect.
RESET
CLOCK
S2
Q
S1
Q
S0
Q
The Multiplexer
We may sometimes want a more powerful initialization mechanismone that allows us to force the FSM
into any specific state in the next cycle. In such a case, we can add the logic block shown in the dashed
boxes in the figure at the top of the next page to each of our flip-flop inputs. The block has two inputs from
the left and one from the top. The top input allows us to choose which of the left inputs is forwarded to the
output. In our design, the top input comes from IN IT . When IN IT = 0, the top AND gate in each of the
three blocks outputs a 0, and the bottom AND gate forwards the corresponding next-state input from our
design. The OR gate thus also forwards the next-state input, and the system moves into the next state for
our FSM whenever IN IT = 0.
What happens when IN IT = 1? In this case, the bottom AND gate in each of the blocks in the dashed
boxes produces a 0, and the top AND gate as well as the OR gate forwards one of the Ix signals. The state
of our FSM in the next cycle is then given by I2 I1 I0 . In other words, we can put the FSM into any desired
state by applying that state to the I2 I1 I0 inputs, setting IN IT = 1, and waiting for the next cycle.
16
I2 I1 I0
INIT
CLOCK
S2
Q
S1
Q
S0
Q
S1
S0
S
2
D3
D3
D2
D1
D0
Q
D2
Q
17
Specific configurations of multiplexers are often referred to as N -to-M multiplexers. Here the value N
refers to the number of inputs, and M refers to the number of outputs. The number of select bits can then
be calculated as log2 (N/M )N/M is generally a power of twoand one way to build such a multiplexer is
to use M copies of an (M/N )-to-1 multiplexer.
HALT A
HALT B
HALT C
o
sg
es
press
halt
COUNT D
pr
o
sg
es
press
halt
COUNT C
pr
sg
es
press
halt
COUNT B
COUNT A
pr
press
halt
HALT D
18
COUNT A
/00
H
In this figure, the states are marked with output values Z1 Z0 and transition arcs are labeled in terms of
our two input buttons, G and H. The uninterrupted
counting cycle is labeled with H to indicate that it
continues until we press H.
HALT A
/00
COUNT B
/01
H
HALT B
/01
HALT C
/11
COUNT C
/11
COUNT D
/10
H
HALT D
/10
state
first counting state
first halted state
state
COUNT A
HALT A
COUNT A
HALT A
00
COUNT B
HALT A
description
counting, output Z1 Z0 = 00
halted, output Z1 Z0 = 00
HG
01
11
unspecified unspecified
COUNT B unspecified
10
HALT A
unspecified
Lets start with the COUNT A state. We know that if neither button is pressed (HG = 00), we want the
counter to move to the COUNT B state. And, if we press the halt button (HG = 10), we want the counter
to move to the HALT A state. What should happen if a user presses the go button (HG = 01)? Or if
the user presses both buttons (HG = 11)? Answering these questions is part of fully specifying our design.
We can choose to leave some parts unspecified, but any implementation of our system will imply answers,
and thus we must be careful. We choose to ignore the go button while counting, and to have the halt
button override the go button. Thus, if HG = 01 when the counter is in state COUNT A, the counter
moves to state COUNT B. And, if HG = 11, the counter moves to state HALT A.
Use of explicit bit patterns for the inputs HG may help you to check that all four possible input values are
covered from each state. If you choose to use a transition diagram instead of a state table, you might even
want to add four arcs from each state, each labeled with a specific value of HG. When two arcs connect the
same two states, we can either use multiple labels or can indicate bits that do not matter using a dont-care
symbol, x. For example, the arc from state COUNT A to state COUNT B could be labeled HG = 00, 01 or
HG = 0x. The arc from state COUNT A to state HALT A could be labeled HG = 10, 11 or HG = 1x. We
can also use logical expressions as labels, but such notation can obscure unspecified transitions.
Now consider the state HALT A. The transitions specified so far are that when we press go (HG = 01), the
counter moves to the COUNT B state, and that the counter remains halted in state HALT A if no buttons
are pressed (HG = 00). What if the halt button is pressed (HG = 10), or both buttons are pressed
19
(HG = 11)? For consistency, we decide that halt overrides go, but does nothing special if it alone is
pressed while the counter is halted. Thus, input patterns HG = 10 and HG = 11 also take state HALT A
back to itself. Here the arc could be labeled HG = 00, 10, 11 or, equivalently, HG = 00, 1x or HG = x0, 11.
HG=0x
COUNT C
/11
HG=1x
H
G
=0
H
G
=0
HG=1x
HG=0x
COUNT D
/10
HG=0x
COUNT B
/01
HG=0x
COUNT A
/00
H
G
=0
HG=1x
HG=1x
HG=01
HALT A
/00
HALT B
/01
HG=x0,11
HALT C
/11
HG=x0,11
HALT D
/10
HG=x0,11
HG=x0,11
COUNT A
000/00
H
GH
COUNT B
001/01
H
GH
COUNT C
011/11
H
GH
COUNT D
010/10
GH
HALT A
100/00
H+G
HALT B
101/01
H+G
HALT C
111/11
H+G
HALT D
110/10
H+G
The equivalent state listing and state table appear below. We have ordered the rows of the state table in
Gray code order to simplify transcription of K-maps.
state
COUNT A
COUNT B
COUNT C
COUNT D
HALT A
HALT B
HALT C
HALT D
S2 S1 S0
000
001
011
010
100
101
111
110
description
counting, output Z1 Z0 = 00
counting, output Z1 Z0 = 01
counting, output Z1 Z0 = 11
counting, output Z1 Z0 = 10
halted, output Z1 Z0 = 00
halted, output Z1 Z0 = 01
halted, output Z1 Z0 = 11
halted, output Z1 Z0 = 10
state
COUNT A
COUNT B
COUNT C
COUNT D
HALT D
HALT C
HALT B
HALT A
S2 S1 S0
000
001
011
010
110
111
101
100
00
001
011
010
000
110
111
101
100
HG
01
11
001 100
011 101
010 111
000 110
000 110
010 111
011 101
001 100
10
100
101
111
110
110
111
101
100
20
Having chosen a representation, we
can go ahead and implement our
design in the usual way. As shown
to the right, K-maps for the nextstate logic are complicated, since
we have five variables and must
consider implicants that are not
contiguous in the K-maps. The S2+
logic is easy enough: we only need
two terms, as shown.
S2
S+1
HG
00
01
11
10
000
001
011
010
S2 S1S0
S+0
HG
00
01
11
10
000
001
011
010
S2 S1S0
HG
00
01
11
10
000
001
011
010
S2 S1S0
110
110
110
111
111
111
101
101
101
100
100
HOLD
S2 H + S2 (H + G)
HOLD
HOLD
=
=
S2 H + S2 H + S2 G
H + S2 G
In other words, the counter should hold its current value (stop counting) if we press the halt button or if
the counter was already halted and we didnt press the go button. As desired, the current value of the
counter (S1 S0 ) has no impact on this decision. You may have noticed that the expression we derived for
HOLD also matches S2+ , the next-state value of S2 in the K-map above.
Now lets re-write our state transition table in terms of HOLD. The left version uses state names for clarity;
the right uses state values to help us transcribe K-maps.
state
COUNT A
COUNT B
COUNT C
COUNT D
HALT A
HALT B
HALT C
HALT D
S2 S1 S0
000
001
011
010
100
101
111
110
HOLD
0
1
COUNT B HALT
COUNT C HALT
COUNT D HALT
COUNT A HALT
COUNT B HALT
COUNT C HALT
COUNT D HALT
COUNT A HALT
A
B
C
D
A
B
C
D
state
COUNT A
COUNT B
COUNT C
COUNT D
HALT A
HALT B
HALT C
HALT D
S2 S1 S0
000
001
011
010
100
101
111
110
HOLD
0
1
001 100
011 101
010 111
000 110
001 100
011 101
010 111
000 110
HOLD
S1+
HOLD S0 + HOLD S1
S0+
HOLD S1 + HOLD S0
21
S+2
S+1
HOLD S2
00
01
11
10
00
01
11
10
S1S0
S+0
HOLD S2
00
01
11
10
00
01
11
10
S1S0
HOLD S2
00
01
11
10
00
01
11
10
S1S0
An implementation appears below. By using semantic meaning in our choice of representationin particular
the use of S2 to record whether the counter is currently halted (S2 = 1) or counting (S2 = 0)we have
enabled ourselves to separate out the logic for deciding whether to advance the counter fairly cleanly from
the logic for advancing the counter itself. Only the HOLD bit in the diagram is used to determine whether
or not the counter should advance in the current cycle.
Lets check that the implementation matches our original design. Start by verifying that the HOLD variable
is calculated correctly, HOLD = H + S2 G, then look back at the K-map for S2+ in the low-level design to
verify that the expression we used does indeed match. Next, verify that S1+ and S0+ are correctly implemented.
CLOCK
HOLD
D
Z1
S1
(halt button)
H
G
(go button)
Q
D
S2
Q
D
Z0
S0
this bit records whether
or not the counter is
currently halted
Finally, we check our abstraction. When HOLD = 1, the next-state logic for S1+ and S0+ reduces to
S1+ = S1 and S0+ = S0 ; in other words, the counter stops counting and simply stays in its current state.
When HOLD = 0, these equations become S1+ = S0 and S0+ = S1 , which produces the repeating sequence
for S1 S0 of 00, 01, 11, 10, as desired. You may want to look back at our two-bit Gray code counter design
to compare the next-state equations.
We can now verify that the implementation produces the correct transition behavior. In the counting states,
S2 = 0, and the HOLD value simplifies to HOLD = H. Until we push the halt button, S2 remains 0,
and and the counter continues to count in the correct sequence. When H = 1, HOLD = 1, and the counter
stops at its current value (S2+ S1+ S0+ = 1S1 S0 , which is shorthand for S2+ = 1, S1+ = S1 , and S0+ = S0 ).
In any of the halted states, S2 = 1, and we can reduce HOLD to HOLD = H + G. Here, so long as
we press the halt button or do not press the go button, the counter stays in its current state, because
HOLD = 1. If we release halt and press go, we have HOLD = 0, and the counter resumes counting
(S2+ S1+ S0+ = 0S0 S1 , which is shorthand for S2+ = 0, S1+ = S0 , and S0+ = S1 ). We have now verified the
implementation.
22
A
B
C
D
S2 S1 S0
000
101
011
010
state
HALT A
HALT B
HALT C
HALT D
S2 S1 S0
111
110
100
001
CLOCK
S2
Q
(halt button)
H
Z1
G
(go button)
S1
Q
Z0
S0
Q
23
This set of notes explains the process that Prof. Jones used to develop the FSM for the lab. The lab simulates a vending machine mechanism for automatically identifying coins (dimes and quarters only), tracking
the amount of money entered by the user, accepting or rejecting coins, and emitting a signal when a total
of 35 cents has been accepted. In the lab, we will only drive a light with the paid in full signal. Sorry, no
candy nor Dew will be distributed!
The signal A in the timing diagram is an output from the FSM, and indicates whether or not the coin should
be accepted. This signal controls the servo that drives the gate, and thus determines whether the coin is
accepted (A = 1) as payment or rejected (A = 0) and returned to the user.
Looking at the timing diagram, you should note that our FSM makes a decision based on its current state
and the input T and enters a new state at the rising clock edge. The value of A in the next cycle thus
determines the position of the gate when the coin eventually rolls to the end of the slope. As we said earlier,
our FSM is thus a Moore machine: the output A does not depend on the input T , but only on the current
internal state bits of the the FSM. However, you should also now realize that making A depend on T is not
adequate for this lab. If A were to rise with T and fall with the rising clock edge (on entry to the next
state), or even fall with the falling edge of T , the gate would return to the reject position by the time the
coin reached the gate, regardless of our FSMs decision!
1 The
full system actually allows four sensors to differentiate four types of coins, but our lab uses only two of these sensors.
24
An Abstract Model
We start by writing down
state
dime (T = 0) quarter (T = 1) accept? (A) paid? (P )
states for a users expected
START
DIME
QUARTER
no
behavior. Given the fairly
DIME
PAID
yes
no
tight constraints that we
PAID
yes
no
QUARTER
have placed on our lab,
PAID
yes
yes
few combinations are possible. For a total of 35 cents, a user should either insert a dime followed by a quarter, or a quarter followed by
a dime. We begin in a START state, which transitions to states DIME or QUARTER when the user inserts
the first coin. With no previous coin, we need not specify a value for A. No money has been deposited, so we
set output P = 0 in the START state. We next create DIME and QUARTER states corresponding to the
user having entered one coin. The first coin should be accepted, but more money is needed, so both of these
states output A = 1 and P = 0. When a coin of the opposite type is entered, each state moves to a state
called PAID, which we use for the case in which a total of 35 cents has been received. For now, we ignore
the possibility that the same type of coin is deposited more than once. Finally, the PAID state accepts the
second coin (A = 1) and indicates that the user has paid the full price of 35 cents (P = 1).
We next extend our design
dime (T = 0) quarter (T = 1) accept? (A) paid? (P )
state
to handle user mistakes.
START
DIME
QUARTER
no
If a user enters a second
DIME
REJECTD
PAID
yes
no
dime in the DIME state,
REJECTD
PAID
no
no
REJECTD
our FSM should reject the
QUARTER
PAID
REJECTQ
yes
no
coin. We create a REREJECTQ
PAID
REJECTQ
no
no
JECTD state and add it
PAID
yes
yes
as the next state from
DIME when a dime is entered. The REJECTD state rejects the dime (A = 0) and continues to wait for a
quarter (P = 0). What should we use as next states from REJECTD? If the user enters a third dime (or a
fourth, or a fifth, and so on), we want to reject the new dime as well. If the user enters a quarter, we want
to accept the coin, at which point we have received 35 cents (counting the first dime). We use this reasoning
to complete the description of REJECTD. We also create an analogous state, REJECTQ, to handle a user
who inserts more than one quarter.
What should happen after a user has paid 35 cents and bought one item? The FSM at that point is in the
PAID state, which delivers the item by setting P = 1. Given that we want the FSM to allow the user to
purchase another item, how should we choose the next states from PAID? The behavior that we want from
PAID is identical to the behavior that we defined from START. The 35 cents already deposited was used
to pay for the item delivered, so the machine is no longer holding any of the users money. We can thus
simply set the next states from PAID to be DIME when a dime is inserted and QUARTER when a quarter
is inserted.
At this point, we make a
dime (T = 0) quarter (T = 1) accept? (A) paid? (P )
state
decision intended primarPAID
DIME
QUARTER
yes
yes
ily to simplify the logic
DIME
REJECTD
PAID
yes
no
needed to build the lab.
REJECTD
REJECTD
PAID
no
no
Without a physical item
PAID
REJECTQ
yes
no
QUARTER
delivery mechanism with a
REJECTQ
PAID
REJECTQ
no
no
specification for how its input must be driven, the behavior of the output signal P can be fairly flexible. For example, we could build
a delivery mechanism that used the rising edge of P to open a chute. In this case, the output P = 0 in
the start state is not relevant, and we can merge the state START with the state PAID. The way that we
handle P in the lab, we might find it strange to have a paid light turn on before inserting any money, but
keeping the design simple enough for a first lab exercise is more important. Our final abstract state table
appears above.
25
These meanings are not easy to apply to all of our states. For example, in the
PAID state, the last coin inserted may have been of either type, or of no type
at all, since we decided to start our FSM in that state as well. However, for the
other four states, the meanings provide a clear and unique set of bit pattern
assignments, as shown to the right. We can choose any of the remaining four bit
patterns (010, 011, 101, or 111) for the PAID state. In fact, we can choose all
of the remaining patterns for the PAID state. We can always represent any state
with more than one pattern if we have spare patterns available. Prof. Jones used
the logic design.
state
PAID
DIME
REJECTD
QUARTER
REJECTQ
S2 S1 S0
???
000
001
100
110
This particular example is slightly tricky. The four free patterns do not share any single bit in common,
so we cannot simply insert xs into all K-map entries for which the next state is PAID. For example, if we
insert an x into the K-map for S2+ , and then choose a function for S2+ that produces a value of 1 in place of
the dont care, we must also produce a 1 in the corresponding entry of the K-map for S0+ . Our options for
PAID include 101 and 111, but not 100 nor 110. These latter two states have other meanings.
Lets begin by writing a next-state table
S2+ S1+ S0+
consisting mostly of bits, as shown to
S+2
S2S1
state
S2 S1 S0 T = 0 T = 1
the right. We use this table to write
00
01
11
10
PAID
PAID
000
100
0 x x
00 0
out a K-map for S2+ as follows: any of
DIME
000
001
PAID
the patterns that may be used for the
1 1
1
01 x
REJECTD
001
001
PAID
S0T
PAID state obey the next-state rules for
1 1 1
11 x
QUARTER
100
PAID
110
PAID. Any next-state marked as PAID
0
0 0 0
10
REJECTQ
110
PAID
110
is marked as dont care in the K-map,
since we can choose patterns starting with either or both values to represent our PAID state. The resulting K-map appears to the far right. As shown, we simply set S2+ = T , which matches our original meaning
for S2 . That is, S2 is the type of the last coin inserted.
Based on our choice for S2+ , we can rewrite the K-map as shown to the right, with green
italics and shading marking the values produced for the xs in the specification. Each
of these boxes corresponds to one transition into the PAID state. By specifying the S2
value, we cut the number of possible choices from four to two in each case. For those
combinations in which the implementation produces S2+ = 0, we must choose S1+ = 1,
but are still free to leave S0+ marked as a dont care. Similarly, for those combinations
in which the implementation produces S2+ = 1, we must choose S0+ = 1, but are still
free to leave S1+ marked as a dont care.
S2
S2S1
00
01
11
10
00
01
11
10
S0T
+
+
The K-maps for S1+ and S0+ are shown to the right. We have
S1
S0
S2S1
S2S1
00
01
11
10
00
01
11
10
not given algebraic expressions for either, but have indicated our
0 0
0 1 1
0
00 0
00 1
choices by highlighting the resulting replacements of dont care
0 1
1
0 0 0
01 0
01 1
entries with the values produced by our expressions. At this
S0T
S0T
point, we can review the state patterns actually produced by each
0 0 0
0 0 0
11 0
11 1
of the four next-state transitions into the PAID state. From the
10 0
0 0 0
10 1
0 0 0
DIME state, we move into the 101 state when the user inserts a
quarter. The result is the same from the REJECTD state. From the QUARTER state, however, we move
into the 010 state when the user inserts a dime. The result is the same from the REJECTQ state. We must
thus classify both patterns, 101 and 010, as PAID states. The remaining two patterns, 011 and 111, cannot
26
state
PAID1
PAID2
DIME
REJECTD
QUARTER
REJECTQ
EXTRA1
EXTRA2
S2 S1 S0
010
101
000
001
100
110
011
111
A
1
1
1
0
1
0
x
x
P
1
1
0
0
0
0
x
x
DIME
000/10
T=0
T=0
REJECTD
001/00
T=1
T=1
T=0
T=0
PAID1
010/11
EXTRA1
011/xx
EXTRA2
111/xx
PAID2
101/11
T=1
T=1
T=0
T=1
T=0
REJECTQ
110/00
T=1
T=1
QTR
100/10
T=1
27
ULP=000
ULP=100
ULP=000,x10
LOCKED
00/000
DRIVER
10/100
ULP=x10
ULP=xx1
ULP=xx1
ULP=x10
(We will add a
timeout here.)
ULP=100
ULP=x10
ALARM
01/001
ULP=x00
ULP=xx1
UNLOCKED
11/110
28
We expand the ALARM state into T separate states based on the value
of the counter. As shown to the right, we name the states ALARM(1)
through ALARM(T). All of these alarm states use S1 S0 = 01, but they can
be differentiated using a timer (the counter value).
We need to make design decisions about how the arcs entering and leaving the ALARM state in our original design should be used once we have
incorporated the timeout. As a first step, we decide that all arcs entering
ALARM from other states now enter ALARM(1). Similarly, if the user
presses the panic button P in any of the ALARM(t) states, the system
returns to ALARM(1). Effectively, pressing the panic button resets the
timer.
timer=T1
ULP=x00
ALARM(2)
timer=T2
ULP=x00
Outgoing arcs to LOCKED
(on ULP=x10) are replicated.
Finally, the self-loop back to the ALARM state on U LP = x00 becomes the countdown arcs in our expanded states, taking ALARM(t) to
ALARM(t+1), and ALARM(T) to LOCKED.
Now that we have a complete specification for the extended design, we can
implement it. We want to reuse our original design as much as possible,
but we have three new features that must be considered. First, when we
enter the ALARM(1) state, we need to set the counter value to T 1.
Second, we need the counter value to count downward while in the ALARM
state. Finally, we need to move back to the LOCKED state when a timeout
occursthat is, when the counter reaches zero.
...
The only arc leaving the ALARM state goes to the LOCKED state on
U LP = x10. We replicate this arc for all ALARM(t) states: the user can
push the lock button at any time to silence the alarm.
ALARM(1)
ALARM(T)
timer=0
ULP=x00
LOCKED
The first problem is fairly easy. Our counter supports parallel load, and the only value that we need to load
is T 1, so we apply the constant bit pattern for T 1 to the load inputs and raise the LD input whenever
we enter the ALARM(1) state. In our original design, we chose to enter the ALARM state whenever the
user pressed P , regardless of the other buttons. Hence we can connect P directly to our counters LD input.
The second problem is handled by the counters countdown functionality. In the ALARM(t) states, the
counter will count down each cycle, moving the system from ALARM(t) to ALARM(t+1).
The last problem is slightly trickier, since we need to change S1 S0 . Notice that S1 S0 = 01 for the ALARM
state and S1 S0 = 00 for the LOCKED state. Thus, we need only force S0 to 0 when a timeout occurs. We
can use a single 2-to-1 multiplexer for this purpose. The 0 input of the mux comes from the original S0+
logic, and the 1 input is a constant 0. All other state logic remains unchanged. When does a timeout
occur? First, we must be in the ALARM(T) state, so S1 S0 = 01 and the counters Z output is raised.
Second, the input combination must be U LP = xx0notice that both U LP = x00 and U LP = x10 return
to LOCKED from ALARM(T). A single, four-input AND gate thus suffices to obtain the timeout signal,
S1 S0 Z P , which we connect to the select input of the mux between the S0+ logic and the S0 flip-flop.
The extension thus requires only a counter, a mux, and a gate, as shown below.
U L P
S1 S0
D
S1+
logic
S0+
logic
1
0
T1
counter
LD
S1
S0
Q
29
Memory
A computer memory is a group of storage elements and the logic necessary to move data in and out of the
elements. The size of the elements in a memorycalled the addressability of the memoryvaries from a
single binary digit, or bit, to a byte (8 bits) or more. Typically, we refer to data elements larger than a
byte as words, but the size of a word depends on context.
Each element in a memory is assigned a unique name, called an address, that allows an external circuit
to identify the particular element of interest. These addresses are not unlike the street addresses that you
use when you send a letter. Unlike street addresses, however, memory addresses usually have little or no
redundancy; each possible combination of bits in an address identifies a distinct set of bits in the memory. The
figure on the right below illustrates the concept. Each house represents a storage element and is associated
with a unique address.
N
DATA_IN
ADDR
2k x N
R/W
memory
CS
DATA_OUT
000
001
010
011
100
101
110
111
The memories that we consider in this class have several properties in common. These memories support
two operations: write places a word of data into an element, and read retrieves a copy of a word of data
from an element. The memories are also volatile, which means that the data held by a memory are erased
when electrical power is turned off or fails. Non-volatile forms of memory include magnetic and optical
storage media such as DVDs, CD-ROMs, disks, and tapes, as well as some programmable logic devices,
such as ROMs. Finally, the memories considered in this class are random access memories (RAMs),
which means that the time required to access an element in the memory is independent of the element being
accessed. In contrast, serial memories such as magnetic tape require much less time to access data near
the current location in the tape than data far away from the current location.
The figure on the left above shows a generic RAM structure. The memory contains 2k elements of N bits
each. A k-bit address input, ADDR, identifies the memory element of interest for any particular operation.
The read/write input, R/W , selects the operation to be performed: if R/W is high, the operation is a read;
if it is low, the operation is a write. Data to be written into an element are provided through N inputs at
the top, and data read from an element appear on N outputs at the bottom. Finally, a chip select input,
CS, functions as an enable control for the memory; when CS is low, the memory neither reads nor writes
any location.
Random access memory further divides into two important types: static RAM, or SRAM, and dynamic
RAM, or DRAM. SRAM employs active logic in the form of a two-inverter loop to maintain stored values.
DRAM uses a charged capacitor to store a bit; the charge drains over time and must be replaced, giving rise
to the qualifier dynamic. Static thus serves only to differentiate memories with active logic elements
from those with capacitive elements. Both types are volatile, that is, both lose all data when the power
supply is removed. We study both SRAM and DRAM in some detail in this course.
30
SELECT
SELECT
S Q
RQ
BIT
BIT
Two diagrams of an SRAM cell (a single bit) appear above. On the left is the physical implementation: a
dual-inverter loop hooked to opposing BIT lines through transistors controlled by a SELECT line. On the
right is a logical implementation2 modeled after that given by Mano & Kime.
The physical cell works as follows. When SELECT is high, the transistors connect the inverter loop to
the bit lines. When writing a cell, the lines are held at opposite logic values, forcing the inverters to match
the values on the lines and storing the value from the BIT input. When reading a cell, the bit lines are
disconnected from other logic, allowing the inverter loop to drive the lines to their current values. The value
stored previously is thus copied onto the BIT line as an output, and the opposite value is placed on the
BIT line. When SELECT is low, the transistors effectively disconnect the inverters from the bit lines, and
the cell holds its current value until SELECT goes high again.
The logical cell retains the SELECT line, replaces the inverter loop with an S-R latch, and splits the bit
and bit read lines (C and C).
When SELECT is low, all AND gates
lines into bit write lines (B and B)
output 0, and the isolated cell holds its value. When SELECT is high, the cell can be written by raising
input signals, which set or reset the latch, as appropriate. Similarly, the latched value appears
the B or B
Recall that the markers represent connections to many-input gates, which appear in the
on C and C.
read logic described below.
C
cell
0
cell
1
cell
2
cell
3
...
cell
12
cell
13
cell
14
cell
15
C
B
B
read
logic
write
logic
4to16
decoder
4
ADDR
A number of cells are combined into a bit slice, as shown above. The cells share bit lines and read/write
logic, which appears to the right in the figure. Based on the ADDR input, a decoder sets one cells SELECT
line high to enable a read or write operation to the cell. Details of the read and write logic are shown below,
to the left and right, respectively.
1 Chips combining both DRAM and processor logic are available, and are used by some processor manufacturers (such as
IBM). Research is underway to couple such logic types more efficiently by building 3D stacks of chips.
2 Logical implementation here implies that the functional behavior of the circuit is equivalent to that of the real circuit.
The real circuit is that shown on the left of the figure.
S Q
RQ
31
DATA_IN
CS
R/W
CS
DATA_OUT
The read logic requires only the bit read lines and the chip select signal as inputs. The read lines function
logically as many-input OR gates, and the results of these lines are used to set or reset the S-R latch in the
read logic. When CS is high, the value held in the latch is then placed on the DAT A OU T line.
The write logic requires two enable inputs, CS and R/W , as well as a data input. When CS is high and a
write operation is requested, that is, when R/W is low, the output of the AND gate in the lower left of the
diagram goes high, enabling the AND gates to the right of the diagram to place the data input on the bit
write lines.
C
cell
48
cell
49
cell
50
cell
51
...
cell
60
cell
61
cell
62
cell
63
C
B
B
C
cell
35
...
cell
44
cell
45
cell
46
cell
47
C
B
B
C
cell
16
cell
17
cell
18
cell
19
...
cell
28
cell
29
cell
30
cell
31
cell
34
ADDR(5:4)
cell
33
2to4
decoder
cell
32
B
B
C
cell
0
cell
1
cell
2
cell
3
...
cell
12
cell
13
cell
14
cell
15
read
logic
B
B
write
logic
4to16
decoder
4
ADDR(3:0)
The outputs of the cell selection decoder can be used to control multiple bit slices, as shown above. Selection
between bit slices is then based on other bits from the ADDRESS input. In the figure above, a 2-to-4
decoder enables one of four sets of tri-state buffers that connect the bit read and write lines to the read
and write logic. The many-input OR gatesa fictional construct of the logical representationhave been
replicated for each bit slice. In a real implementation, the transistor-gated connections to the bit lines
eliminate the need for the OR gates, and the extra logic amounts to only a pair of transistors per bit slice.
The approach shown above, in which one or more cells are selected through a two-dimensional indexing
scheme, is known as coincident selection. The qualifier coincident arises from the notion that the
32
desired cell coincides with the intersection of the active row and column select lines.
The benefit of coincident selection is easily calculated in terms of the number of gates required for the
decoders. Decoder complexity is roughly equal to the number of outputs, as each output is a minterm and
requires a unique gate to calculate it. Fanout trees for input terms and inverted terms add relatively few
gates. Consider a 1M8b RAM chip. The number of addresses is 220 . One option is to use a single bit slice
and a 20-to-1048576 decoder, or about 220 gates. Alternatively, we can use 8,192 bit slices of 1,024 cells
(remember that we must output eight bits). For this implementation, we need two 10-to-1024 decoders, or
about 211 gates. As chip area is roughly proportional to the number of gates, the savings are substantial.
Other schemes are possible as well: if we want a more square chip area, we might choose to use 4,096 bit
slices of 2,048 cells along with one 11-to-2048 decoder and one 9-to-512 decoder. This approach requires
roughly 50% more decoder gates than our previous example, but is still far superior to the single bit slice
implementation.
Memories are typically unclocked devices. However, as you have seen, the circuits are highly structured, which
enables engineers to cope with the complexity of sequential feedback design. Devices used to control memories
are typically clocked, and the interaction between the two can be fairly complex. Timing diagrams for reads
and writes to SRAM are shown at the top of the next page. A write operation appears on the left. In the first
cycle, the controller raises the chip select signal and places the memory address to be written on the address
inputs.
CLK
ADDR
CS
R/W
DATA_IN
ADDR valid
DATA valid
write cycle
CLK
ADDR
CS
R/W
DATA_OUT
ADDR valid
read cycle
DATA valid
Once the memory has had time to set up the appropriate select lines internally, the R/W input is lowered and
data are placed on the data inputs. The delay, which is specified by the memory manufacturer, is necessary
to avoid writing data to the incorrect element within the memory. In the diagram, the delay is one cycle, but
delay logic can be used to tune the timing to match the memorys specification, if desired. At some point
after new data have been delivered to the memory, the write operation completes within the memory. The
time from the application of the address until the (worst-case) completion of the write operation is called
the write cycle of the memory, and is also specified by the manufacturer. One the write cycle has passed,
the controlling logic raises R/W , waits for the change to settle within the memory, then removes the address
and lowers the chip select signal. The reason for the delay is the same: to avoid mistakenly overwriting
another memory location.
A read operation is quite similar. As shown on the right, the controlling logic places the address on the
input lines and raises the chip select signal. No races need be considered, as read operations on SRAM do
not affect the stored data. After a delay called the read cycle, the data can be read from the data outputs.
The address can then be removed and the chip select signal lowered.
For both reads and writes, the number of cycles required for an operation depends on a combination of the
clock cycle of the controller and the cycle time of the memory. For example, with a 25 nanosecond write
cycle and a 10 nanosecond clock cycle, a write requires three cycles. In general, the number of cycles required
is given by the formula memory cycle time/clock cycle time.
Bidirectional Signals
We have on several previous occasions discussed the utility of tri-state buffers in gating outputs and constructing multiplexers. With shift registers, we also considered the use of tri-state buffers to use the same
lines for reading the register and parallel load of the register. In this section, we consider in general the
application of tri-state buffers to reduce pin count and examine the symbols used to denote their presence.
IN
IN
some
circuit
OUT
some
circuit
IN
EN
33
IN
some
circuit
OUT
some
circuit
EN
OUT
EN
OUT
EN
some
circuit
IN/OUT
The figure above shows three groups of equivalent circuits. We begin with the generic circuit on the left,
with N inputs and N outputs. The circuit may have additional inputs and outputs beyond those shown, but
it will be convenient to restrict this discussion to an equal number of inputs and outputs. The second group
in the figure extends the first by using an enable input, EN , to gate the circuit outputs with N tri-state
buffers. The left member of the group adds the buffers externally, while the right member (third from the
left in the overall figure) adds them implicitly, as indicated by the inverted triangle symbol near the OU T
pins. This symbol is not meant to point towards the pins, but is rather always drawn in the orientation
shown, regardless of output direction in a figure. The third group further extends the circuit by connecting
its gated outputs to its inputs, either externally or internally (fourth and fifth from the left, respectively).
The resulting connections are called bidirectional signals, as information can flow either into or out of
the circuit. Bidirectional signals are important for memory devices, for which the number of logical inputs
and outputs can be quite large. Data inputs and outputs, for example, are typically combined into a single
set of bidirectional signals. The arrowheads in the figure are not a standard part of the representation, but
are sometimes provided to clarify the flow of information. The labels provide complete I/O information and
allow you to identify bidirectional signals.
With bidirectional signals, and with all outputs gated by tri-state buffers, it is important to ensure that multiple circuits are not simultaneously allowed to drive a set of wires, as attempting to drive wires to different
logic values creates a short circuit from high voltage to ground, which can easily destroy the system.
SELECT
SELECT
DQ
Ctl
BIT
Two diagrams of a DRAM cell appear above. On the left is the physical implementation: a capacitor attached
to a BIT line through a transistor controlled by a SELECT line. On the right is a logical implementation
modeled after that given by Mano & Kime.
The logical implementation employs a D latch to record the value on the bit write line, B, whenever the
SELECT line is high. The output of the latch is also placed on the bit read line, C, when SELECT is
high. Rather than many-input gates, a tri-state buffer controls the output gating to remind you that DRAM
cells are read only when selected.
As illustrated by the physical cell structure, DRAM storage is capacitivea bit is stored by charging or not
34
charging a capacitor. When SELECT is low, the capacitor is isolated, and it holds its charge. However, the
resistance across the transistor is finite, and some charge leaks out onto the bit line. Charge also leaks into
the substrate on which the device is constructed. After some amount of time, all of the charge dissipates,
and the bit is lost. To avoid such loss, the cell must be refreshed periodically by reading the contents and
writing them back with active logic.
When the SELECT line is high during a write operation, logic driving the bit line forces charge onto the
capacitor or removes all charge from it. For a read operation, the bit line is first brought to an intermediate
voltage level (a voltage level between 0 and 1), then SELECT is raised, allowing the capacitor to either
pull a small amount of charge from the bit line or to push a small amount of charge onto the bit line. The
resulting change in voltage is then detected by a sense amplifier3 at the end of the bit line. A sense amp
is analogous to a marble on a mountaintop: a small push causes the marble to roll rapidly downhill in the
direction of the push. Similarly, a small change in voltage causes a sense amps output to move rapidly to
a logical 0 or 1, depending on the direction of the small change. Sense amplifiers also appear in SRAM
implementations. While not technically necessary, as they are with DRAM, the use of a sense amp to react
to small changes in voltage makes reads faster.
Each read operation on a DRAM cell brings the voltage on its capacitor closer to the intermediate voltage
level, in effect destroying the data in the cell. DRAM is thus said to have destructive reads. To preserve
data during a read, the data read must be written back into the cells. For example, the output of the sense
amplifiers can be used to drive the bit lines, rewriting the cells with the appropriate data.
At the chip level, typical DRAM inputs and outputs differ from those of SRAM. Due to the large size and
high density of many DRAMs, addresses are split into row and column components and provided through a
common set of pins. The DRAM stores the components in registers to support this approach. Additional inputs, known as the row and column address strobesRAS and CAS, respectivelyare used to indicate
when address components are available. These control signals are also used to manage the DRAM refresh
process (see Mano & Kime for details). As you might guess from the structure of coincident selection, DRAM
refresh occurs on a row-by-row basis; raising the SELECT line for a row destructively reads the contents of
all cells on that row, forcing the cells to be rewritten and effecting a refresh. The row is thus a natural basis
for the refresh cycle. The DRAM data pins provide bidirectional signals for reading and writing elements of
the DRAM. An output enable input, OE, controls tri-state buffers with the DRAM to determine whether
or not the DRAM drives the data pins. The R/W input, which controls the type of operation, is also present.
CLK
ADDR
RAS
CAS
OE
R/W
DATA
ROW
COL
valid
write cycle
CLK
ADDR
RAS
CAS
OE
R/W
DATA
ROW
COL
hiZ
valid
read cycle
Timing diagrams for DRAM writes and reads appear above. In both cases, the row component of the address
is first applied to the address pins, then RAS is raised.4 In the next cycle of the controlling logic, the column
component is applied to the address pins, and CAS is raised.
For a write, as shown on the left, the R/W signal and the data can also be applied in the second cycle.
The DRAM has internal timing and control logic that prevent races from overwriting an incorrect element
(remember that the row and column addresses have to be stored in registers). The DRAM again specifies
a write cycle, after which the operation is guaranteed to be complete. In order, the R/W signal is then
raised, the CAS signal lowered, and the RAS signal lowered. Other orders of signal removal have different
meanings, such as initiation of a refresh.
3 The
4 In
implementation of a sense amplifier lies outside the scope of this class, but you should understand their role in memory.
practice, RAS, CAS, and OE are active low signals, and are thus usually written and appear as RAS, CAS, and OE.
35
For a read operation, the output enable signal, OE, is lowered after CAS is raised. The DAT A pins, which
should be floating (in other words, not driven by any logic), are then driven by the DRAM. After the read
cycle, valid data appear on the DAT A pins, and OE, CAS, and RAS are lowered in order after the data
are read.
A typical DRAM implementation provides several approaches to managing refresh, but does not initiate any
refreshes internally. Refresh requirements are specified, but managing the refresh itself is left to a DRAM
controller. The duties of this controller also include mapping addresses into row and column components,
managing timing for signals to and from the DRAM, and providing status indicators on the state of the
DRAM.
As an example of refresh rates and requirements for modern DRAMs, I obtained a few specifications for a
16Mx4b EDO DRAM chip manufactured by Micron Semiconductor. The cells are structured into 4,096 rows,
each of which must be refreshed every 64 milliseconds. Using a certain style of refresh (CAS-before-RAS, or
CBR), the process of refreshing a single row takes roughly 100 nanoseconds. The most common approach
to managing refresh, termed distributed refresh, cycles through rows one at a time over a period of the
required refresh time, in this case 64 milliseconds. Row refreshes occur regularly within this period, or about
every 16 microseconds. The refreshes keep the DRAM busy 0.64% of the time; at other times, it can be
used for reads and writes. Alternatively, we might choose a burst refresh approach, in which we refresh all
rows in a burst. A burst refresh requires roughly 410 microseconds for the DRAM under discussion, as all
4,096 rows must be refreshed, and each row requires about 100 nanoseconds. A delay of 410 microseconds is
a long delay by processor standards, thus burst refresh is rarely used.
36
min = values[0];
for (idx = 1; 10 > idx; idx = idx + 1) {
if (min > values[idx]) {
min = values[idx];
}
}
/* The minimum value from array is now in min.
*/
The code uses array notation, which we have not used previously in our class, so lets first discuss the meaning
of the code.
The code uses three variables. The variable values represents the 10 values in our set. The suffix [10]
after the variable name tells the compiler that we want an array of 10 integers (int) indexed from 0 to 9.
These integers can be treated as 10 separate variables, but can be accessed using the single name values
along with an index (again, from 0 to 9 in this case). The variable idx holds a loop index that we use to
examine each of the values one by one in order to find the minimum value in the set. Finally, the variable
min holds the smallest known value as the program examines each of the values in the set.
The program body consists of two statements. We assume that some other piece of codeone not shown
herehas initialized the 10 values in our set before the code above executes. The first statement initializes
the minimum known value (min) to the value stored at index 0 in the array (values[0]). The second
statement is a loop in which the variable index takes on values from 1 to 9. For each value, an if statement
compares the current known minimum with the value stored in the array at index given by the idx variable.
If the stored value is smaller, the current known value (again, min) is updated to reflect the programs having
found a smaller value. When the loop finishes all nine iterations, the variable min holds the smallest value
among the set of 10 integers stored in the values array.
START
37
min = values[0]
10 > idx?
END
T
idx = 1
min >
values[idx]?
T
min = values[idx]
1 We
technically only need a 1032-bit memory, but we round up the size of the address space to reflect more realistic
memory designs; one can always optimize later.
38
Now lets go through the flow chart and identify states. Initialization of min and idx need not occur serially,
and the result of the first comparison between idx and the constant 10 is known in advance, so we can merge
all three operations into a single state, which we call INIT.
We can also merge the updates of min and idx into a second FSM state, which we call COPY. However, the
update to min occurs only when the comparison (min > value[idx]) is true. We can use logic to predicate
execution of the update. In other words, we can use the output of the comparator, which is available after
the comparator has finished comparing the two values (in a high-level FSM state that we have yet to define),
to determine whether or not the register holding min loads a new value in the COPY state.
Our model of use for this FSM involves external logic filling the memory (the array of integer values),
executing the FSM code, and then checking the answer. To support this use model, we create a FSM state
called WAIT for cycles in which the FSM has no work to do. Later, we also make use of an external input
signal START to start the FSM execution. The WAIT state logically corresponds to the START bubble in
the flow chart.
Only the test for the if statement
remains. Using a serial comparaINIT
F
tor to compare two 32-bit values remin = values[0]
START
END
10 > idx?
quires 32 cycles. However, we need
WAIT
an additional cycle to move values into
T
our shift registers so that the comparaidx = 1
PREP
tor can see the first bit. Thus our sinmin >
F
gle comparison operation breaks into
values[idx]?
two high-level states. In the first
COMPARE
T
state, which we call PREP, we copy
min to one of the shift registers, copy
min = values[idx]
values[idx] to the other shift register, and reset the counter that measures the cycles needed for our serial comparator. We then move to a
idx = idx + 1
second high-level state, which we call
COMPARE, in which we feed one bit per
COPY
cycle from each shift register to the serial comparator. The COMPARE state
executes for 32 cycles, after which the comparator produces the one-bit answer that we need, and we can
move to the COPY state. The association between the flow chart and the high-level FSM states is illustrated
in the figure shown to the right above.
We can now also draw an abstract state diagram for
START
our FSM, as shown to the right. The FSM begins in
signal
always
WAIT
INIT
PREP
the WAIT state. After external logic fills the values array, it signals the FSM to begin by raising the START
not end
signal. The FSM transitions into the INIT state, and
of loop
always
in the next cycle into the PREP state. From PREP, the
end of
loop
FSM always moves to COMPARE, where it remains for
COPY
COMPARE
32 cycles while the serial comparator executes a comafter 32 cycles
parison. After COMPARE, the FSM moves to the COPY
state, where it remains for one cycle. The transition from COPY depends on how many loop iterations have
executed. If more loop iterations remain, the FSM moves to PREP to execute the next iteration. If the loop
is done, the FSM returns to WAIT to allow external logic to read the result of the computation.
39
40
The last major component is the serial comparator, which is based on the design developed in Notes Set 3.1.
The two bits to be compared in a cycle come from shift registers A and B. The first bit indicator comes
from the zero indicator of counter CNT. The comparator actually produces two outputs (Z1 and Z0), but the
meaning of the Z1 output by itself is A > B. In the diagram, this signal has been labeled THEN.
There are two additional elements in the figure that we have yet to discuss. Each simply compares the value
in a register with a fixed constant and produces a 1-bit signal. When the FSM finishes an iteration of the
loop in the COPY state, it must check the loop condition (10 > idx) and move either to the PREP state or,
when the loop finishes, to the WAIT state to let the external logic read the answer from the MIN register. The
loop is done when the current iteration count is nine, so we compare IDX with nine to produce the DONE
signal. The other constant comparison is between the counter CNT and the value 31 to produce the LAST
signal, which indicates that the serial comparator is on its last cycle of comparison. In the cycle after LAST
is high, the THEN output of the comparator indicates whether or not A > B.
meaning
reset IDX counter to 0
increment IDX counter
load new value into MIN register
load new value into shift register A
load new value into shift register B
reset CNT counter
datapath
output
DONE
LAST
THEN
meaning
last loop iteration finished
serial comparator executing
last cycle
if statement condition true
based on
IDX = 9
CNT = 31
A > B
Using the datapath controls signals and outputs, we can now write a more formal state transition table
for the FSM, as shown below. The actions column of the table lists the changes to register and counter
values that are made in each of the FSM states. The notation used to represent the actions is called
register transfer language (RTL). The meaning of an individual action is similar to the meaning of the
corresponding statement from our C code or from the flow chart. For example, in the WAIT state, IDX 0
means the same thing as idx = 0;. In particular, both mean that the value currently stored in the IDX
counter is overwritten with the number 0 (all 0 bits).
The meaning of RTL
is slightly different from the usual
interpretation
of
high-level programming
languages,
however, in terms
of when the actions
happen. A list of
C statements is
generally executed
one at a time. In
contrast, the entire
list of RTL actions
state
WAIT
actions (simultaneous)
IDX 0 (to read VALUES[0] in INIT)
INIT
PREP
COMPARE
COPY
condition
START
START
(always)
next state
INIT
WAIT
PREP
(always)
COMPARE
LAST
LAST
DONE
DONE
COPY
COMPARE
WAIT
PREP
41
for an FSM state is executed simultaneously, at the end of the clock cycle. As you know, an FSM moves
from its current state into a new state at the end of every clock cycle, so actions during different cycles
usually are associated with different states. We can, however, change the value in more than one register
at the end of the same clock cycle, so we can execute more than one RTL action in the same state, so long
as the actions do not exceed the capabilities of our datapath (the components must be able to support the
simultaneous execution of the actions). Some care must be taken with states that execute for more than one
cycle to ensure that repeating the RTL actions is appropriate. In our design, only the WAIT and COMPARE
states execute for more than one cycle. The WAIT state resets the IDX counter repeatedly, which causes
no problems. The COMPARE statement has no RTL actionsall of the shifting, comparison, and counting
activity needed to do its work occurs within the datapath itself.
One additional piece of RTL syntax needs explanation. In the COPY state, the first action begins with
THEN:, which means that the prefixed RTL action occurs only when the THEN signal is high. Recall that
the THEN signal indicates that the comparator has found A > B, so the equivalent C code is if (A > B)
{min = values[idx]}.
S4 S3 S2 S1 S0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
IDX.RST
1
0
0
0
0
IDX.CNT
0
1
0
0
1
MIN.LD
0
1
0
0
THEN
A.LD
0
0
1
0
0
B.LD
0
0
1
0
0
CNT.RST
0
0
1
0
0
The WAIT state needs to set IDX to 0 but need not affect other register or counter values, so WAIT produces
a 1 only for IDX.RST. The INIT state needs to load values[0] into the MIN register while simultaneously
incrementing the IDX counter (from 0 to 1), so INIT produces 1s for IDX.CNT and MIN.LD. The PREP state
loads both shift registers and resets the counter CNT by producing 1s for A.LD, B.LD, and CNT.RST. The
COMPARE state does not change any register values, so it produces all 0s. Finally, the COPY state increments
the IDX counter while simultaneously loading a new value into the MIN register. The COPY state produces 1
for IDX.CNT, but must use the signal THEN coming from the datapath to decide whether or not MIN is loaded.
The advantage of a one-hot encoding becomes obvious when we write
equations for the six control signals
and the next-state logic, as shown
to the right. Implementing the logic
to complete our design now requires
only a handful of small logic gates.
IDX.RST =
IDX.CNT =
MIN.LD =
A.LD =
B.LD =
CNT.RST =
S4
S3 + S0
S3 + S0 THEN
S2
S2
S2
S+
4
S4 START + S0 DONE
S+
3
S+
2
S+
1
S+
0
=
=
S4 START
S3 + S0 DONE
=
=
S2 + S1 LAST
S1 LAST
Notice that the terms in each control signal can be read directly from the rows of the state table and ORd
together. The terms in each of the next-state equations represent the incoming arcs for the corresponding
state. For example, the WAIT state has one self-loop (the first term) and a transition arc coming from the
COPY state when the loop is done. These expressions complete our design.
42
memory
design as a collection of latches
number of addresses
addressability
read/write logic
serial/random access memory (RAM)
volatile/non-volatile (N-V)
static/dynamic RAM (SRAM/DRAM)
SRAM cell
DRAM cell
bit lines and sense amplifiers
von Neumann model
processing unit
register file
arithmetic logic unit (ALU)
word size
control unit
program counter (PC)
instruction register (IR)
implementation as FSM
input and output units
memory
memory address register (MAR)
memory data register (MDR)
tri-state buffer
meaning of Z/hi-Z output
use in distributed mux
43
44
13
Sparse Representations
Representations used by computers must avoid ambiguity: a single bit pattern in a representation cannot be
used to represent more than one value. However, the converse need not be true. A representation can have
several bit patterns representing the same value, and not all bit patterns in a representation need be used
to represent values.
Lets consider a few example of representations with unused patterns. Historically, one common class of
representations of this type was those used to represent individual decimal digits. We examine three examples
from this class.
The first is Binary-coded Decimal (BCD), in which decimal digits are encoded individually using their representations in the unsigned (binary) representation. Since we have 10 decimal digits, we need 10 patterns, and
four bits for each digit. But four bits allow 24 = 16 bit patterns. In BCD, the patterns 1010, 1011, . . . , 1111
are unused. It is important to note that BCD is not the same as the unsigned representation. The decimal
number 732, for example, requires 12 bits when encoded as BCD: 0111 0011 0010. When written using a
12-bit unsigned representation, 732 is written 001011011100. Operations on BCD values were implemented
in early processors, including the 8086, and are thus still available in the x86 instruction set architecture
today!
The second example is an Excess-3 code, in which each decimal digit d is represented by the pattern corresponding to the 4-bit unsigned pattern for d + 3.
For example, the digit 4 is represented as 0111, and the digit 7 is represented
as 1010. The Excess-3 code has some attractive aspects when using simple
hardware. For example, we can use a 4-bit binary adder to add two digits c
and d represented in the Excess-3 code, and the carry out signal produced by
the adder is the same as the carry out for the decimal addition, since c + d 10
is equivalent to (c + 3) + (d + 3) 16.
The third example of decimal digit representations is a 2-out-of-5 code. In
such a code, five bits are used to encode each digit. Only patterns with
exactly two 1s are used. There are exactly ten such patterns, and an example
representation is shown to the right (more than one assignment of values to
patterns has been used in real systems).
digit
1
2
3
4
5
6
7
8
9
0
a 2-out-of-5
representation
00011
00101
00110
01001
01010
01100
10001
10010
10100
11000
Error Detection
Errors in digital systems can occur for many reasons, ranging from cosmic ray strikes to defects in chip
fabrication to errors in the design of the digital system. As a simple model, we assume that an error takes
the form of changes to some number of bits. In other words, a bit that should have the value 0 instead has
the value 1, or a bit that should have the value 1 instead has the value 0. Such an error is called a bit error.
14
Digital systems can be designed with or without tolerance to errors. When an error occurs, no notification
nor identification of the error is provided. Rather, if error tolerance is needed, the system must be designed
to be able to recognize and identify errors automatically. Often, we assume that each of the bits may be in
error independently of all of the others, each with some low probability. With such an assumption, multiple
bit errors are much less likely than single bit errors, and we can focus on designs that tolerate a single bit
error. When a bit error occurs, however, we must assume that it can happen to any of the bits.
The use of many patterns to represent a smaller number of values, as is the case in a 2-out-of-5 code,
enables a system to perform error detection. Lets consider what happens when a value represented using
a 2-out-of-5 code is subjected to a single bit error. Imagine that we have the digit 7. In the table on the
previous page, notice that the digit 7 is represented with the pattern 10001.
As we mentioned, we must assume that the bit error can occur in any of the five bits, thus we have five
possible bit patterns after the error occurs. If the error occurs in the first bit, we have the pattern 00001. If
the error occurs in the second bit, we have the pattern 11001. The complete set of possible error patterns
is 00001, 11001, 10101, 10011, and 10000.
Notice that none of the possible error patterns has exactly two 1s, and thus none of them is a meaningful
pattern in our 2-out-of-5 code. In other words, whenever a digital system represents the digit 7 and a single
bit error occurs, the system will be able to detect that an error has occurred.
What if the system needs to represent a different digit? Regardless of which digit is represented, the pattern
with no errors has exactly two 1s, by the definition of our representation. If we then flip one of the five bits
by subjecting it to a bit error, the resulting error pattern has either one 1 (if the bit error changes a 1 to
a 0) or three 1s (if the bit error changes a 0 to a 1). In other words, regardless of which digit is represented,
and regardless of which bit has an error, the resulting error pattern never has a meaning in the 2-out-of-5
code. So this representation enables a digital system to detect any single bit error!
Parity
The ability to detect any single bit error is certainly useful. However, so far we have only shown how to
protect ourselves when we want to represent decimal digits. Do we need to develop a separate error-tolerant
representation for every type of information that we might want to represent? Or can we instead come up
with a more general approach? The answer to the second question is yes: we can, in fact, systematically
transform any representation into a representation that allows detection of a single bit error. The key to
this transformation is the idea of parity.
Consider an arbitrary representation for
some type of information. For each pattern
used in the representation, we can count
the number of 1s. The resulting count is
either odd or even. By adding an extra
bitcalled a parity bitto the representation, and selecting the parity bits value
appropriately for each bit pattern, we can
ensure that the count of 1s is odd (called
odd parity) or even (called even parity)
for all values represented. The idea is illustrated in the table to the right for the
3-bit unsigned representation. The parity
bits are shown in bold.
value
represented
0
1
2
3
4
5
6
7
3-bit
unsigned
000
001
010
011
100
101
110
111
number
of 1s
0
1
1
2
1
2
2
3
with odd
parity
0001
0010
0100
0111
1000
1011
1101
1110
with even
parity
0000
0011
0101
0110
1001
1010
1100
1111
Either approach to selecting the parity bits ensures that any single bit error can be detected. For example,
if we choose to use odd parity, a single bit error changes either a 0 into a 1 or a 1 into a 0. The number
of 1s in the resulting error pattern thus differs by exactly one from the original pattern, and the parity of
the error pattern is even. But all valid patterns have odd parity, so any single bit error can be detected by
simply counting the number of 1s.
15
Hamming Distance
Next, lets think about how we might use representationswe might also think of them as codesto protect
a system against multiple bit errors. As we have seen with parity, one strategy that we can use to provide
such error tolerance is the use of representations in which only some of the patterns actually represent values.
Lets call such patterns code words. In other words, the code words in a representation are those patterns
that correspond to real values of information. Other patterns in the representation have no meaning.
As a tool to help us understand error tolerance, lets define a measure of the distance between code words
in a representation. Given two code words X and Y , we can calculate the number NX,Y of bits that must
change to transform X into Y . Such a calculation merely requires that we compare the patterns bit by bit
and count the number of places in which they differ. Notice that this relationship is symmetric: the same
number of changes are required to transform Y into X, so NY,X = NX,Y . We refer to this number NX,Y
as the Hamming distance between code word X and code word Y . The metric is named after Richard
Hamming, a computing pioneer and an alumnus of the UIUC Math department.
The Hamming distance between two code words tells us how many bit errors are necessary in order for
a digital system to mistake one code word for the other. Given a representation, we can calculate the
minimum Hamming distance between any pair of code words used by the representation. The result is called
the Hamming distance of the representation, and represents the minimum of bit errors that must occur
before a system might fail to detect errors in a stored value.
The Hamming distance for nearly all of the representations that we introduced in earlier sections is 1. Since
more than half of the patterns (and often all of the patterns!) correspond to meaningful values, some pairs
of code words must differ in only one bit, and these representations cannot tolerate any errors. For example,
the decimal value 42 is stored as 101010 using a 6-bit unsigned representation, but any bit error in that
pattern produces another valid pattern corresponding to one of the following decimal numbers: 10, 58, 34,
46, 40, 43. Note that the Hamming distance between any two patterns is not necessarily 1. Rather, the
Hamming distance of the unsigned representation, which corresponds to the minimum between any pair of
valid patterns, is 1.
In contrast, the Hamming distance of the 2-out-of-5 code that we discussed earlier is 2. Similarly, the
Hamming distance of any representation extended with a parity bit is at least 2.
Now lets think about the problem slightly differently. Given a particular representation, how many bit
errors can we detect in values using that representation? A representation with Hamming distance d can
detect up to d 1 bit errors. To understand this claim, start by selecting a code word from the representation
and changing up to d 1 of the bits. No matter how one chooses to change the bits, these changes cannot
result in another code word, since we know that any other code word has to require at least d changes from
our original code word, by the definition of the representations Hamming distance. A digital system using
the representation can thus detect up to d 1 errors. However, if d or more errors occur, the system might
sometimes fail to detect any error in the stored value.
Error Correction
Detection of errors is important, but may sometimes not be enough. What can a digital system do when it
detects an error? In some cases, the system may be able to find the original value elsewhere, or may be able
to re-compute the value from other values. In other cases, the value is simply lost, and the digital system
may need to reboot or even shut down until a human can attend to it. Many real systems cannot afford such
a luxury. Life-critical systems such as medical equipment and airplanes should not turn themselves off and
wait for a humans attention. Space vehicles face a similar dilemma, since no human may be able to reach
them.
Can we use a strategy similar to the one that we have developed for error detection in order to try to
perform error correction, recovering the original value? Yes, but the overheadthe extra bits that we
need to provide such functionalityis higher.
16
Lets start by thinking about a code with Hamming distance 2, such as 4-bit 2s complement with odd parity.
We know that such a code can detect one bit error. Can it correct such a bit error, too?
Imagine that a system has stored the decimal value 6 using the pattern 01101, where the last bit is the odd
parity bit. A bit error occurs, changing the stored pattern to 01111, which is not a valid pattern, since it has
an even number of 1s. But can the system know that the original value stored was 6? No, it cannot. The
original value may also have been 7, in which case the original pattern was 01110, and the bit error occurred
in the final bit. The original value may also have been -1, 3, or 5. The system has no way of resolving this
ambiguity. The same problem arises if a digital system uses a code with Hamming distance d to detect up
to d 1 errors.
Error correction is possible, however, if we assume that fewer bit errors occur
three-copy
value
(or if we instead use a representation with a larger Hamming distance). As
represented
code
a simple example, lets create a representation for the numbers 0 through 3
0
000000
by making three copies of the 2-bit unsigned representation, as shown to the
010101
1
right. The Hamming distance of the resulting code is 3, so any two bit errors
2
101010
can be detected. However, this code also enables us to correct a single bit
3
111111
error. Intuitively, think of the three copies as voting on the right answer.
Since a single bit error can only corrupt one copy, a majority vote always gives the right answer! Tripling
the number of bits needed in a representation is not a good general strategy, however. Notice also that
correcting a pattern with two bit errors can produce the wrong result.
Lets think about the problem in terms of Hamming distance. Assume that we use a code with Hamming
distance d and imagine that up to k bit errors affect a stored value. The resulting pattern then falls within a
neighborhood of distance k from the original code word. This neighborhood contains all bit patterns within
Hamming distance k of the original pattern. We can define such a neighborhood around each code word.
Now, since d bit errors are needed to transform a code word into any other code word, these neighborhoods
are disjoint so long as 2k d 1. In other words, if the inequality holds, any bit pattern in the representation
can be in at most one code words neighborhood. The digital system can then correct the errors by selecting
the unique value identified by the associated neighborhood. Note that patterns encountered as a result of
up to k bit errors always fall within the original code words neighborhood; the inequality ensures that the
neighborhood identified in this way is unique. We can manipulate the inequality to express the number
of errors k that can be corrected in terms of the Hamming distance d of the code. A code with Hamming
distance d allows up to d1
2 errors to be corrected, where x represents the integer floor function on x, or
rounding x down to the nearest integer.
Hamming Codes
Hamming also developed a general and efficient approach for extending an arbitrary representation to allow
correction of a single bit error. The approach yields codes with Hamming distance 3. To understand how
a Hamming code works, think of the bits in the representation as being numbered starting from 1. For
example, if we have seven bits in the code, we might write a bit pattern X as x7 x6 x5 x4 x3 x2 x1 .
The bits with indices that are powers of two are parity check bits. These include x1 , x2 , x4 , x8 , and so forth.
The remaining bits can be used to hold data. For example, we could use a 7-bit Hamming code and map the
bits from a 4-bit unsigned representation into bits x7 , x6 , x5 , and x3 . Notice that Hamming codes are not
so useful for small numbers of bits, but require only logarithmic overhead for large numbers of bits. That is,
in an N -bit Hamming code, only log2 (N + 1) bits are used for parity checks.
How are the parity checks defined? Each parity bit is used to provide even parity for those bits with indices
for which the index, when written in binary, includes a 1 in the single position in which the parity bits index
contains a 1. The x1 bit, for example, provides even parity on all bits with odd indices. The x2 bit provides
even parity on x2 , x3 , x6 , x7 , x10 , and so forth.
In a 7-bit Hamming code, for example, x1 is chosen so that it has even parity together with x3 , x5 , and x7 .
Similarly, x2 is chosen so that it has even parity together with x3 , x6 , and x7 . Finally, x4 is chosen so that
it has even parity together with x5 , x6 , and x7 .
17
value
represented
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
4-bit
unsigned
(x7 x6 x5 x3 )
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
x4
0
0
1
1
1
1
0
0
1
1
0
0
0
0
1
1
x2
0
1
0
1
1
0
1
0
1
0
1
0
0
1
0
1
x1
0
1
1
0
0
1
1
0
1
0
0
1
1
0
0
1
7-bit
Hamming
code
0000000
0000111
0011001
0011110
0101010
0101101
0110011
0110100
1001011
1001100
1010010
1010101
1100001
1100110
1111000
1111111
Lets do a couple of examples based on the pattern for the decimal number 9, 1001100. First, assume that
no error occurs. We calculate check bit c4 by checking whether x4 , x5 , x6 , and x7 together have even parity.
Since no error occurred, they do, so c4 = 0. Similarly, for c2 we consider x2 , x3 , x6 , and x7 . These also have
even parity, so c2 = 0. Finally, for c1 , we consider x1 , x3 , x5 , and x7 . As with the others, these together
have even parity, so c1 = 0. Writing c4 c2 c1 , we obtain 000, and conclude that no error has occurred.
Next assume that bit 3 has an error, giving us the pattern 1001000. In this case, we have again that c4 = 0,
but the bits corresponding to both c2 and c1 have odd parity, so c2 = 1 and c1 = 1. Now when we write the
check bits c4 c2 c1 , we obtain 011, and we are able to recognize that bit 3 has been changed.
A Hamming code can only correct one bit error, however. If two bit errors occur, correction will produce
the wrong answer. Lets imagine that both bits 3 and 5 have been flipped in our example pattern for the
decimal number 9, producing the pattern 1011000. Calculating the check bits as before and writing them
as c4 c2 c1 , we obtain 110, which leads us to incorrectly conclude that bit 6 has been flipped. As a result, we
correct the pattern to 1111000, which represents the decimal number 14.
SEC-DED Codes
We now consider one final extension of Hamming codes to enable a system to perform single error correction
while also detecting any two bit errors. Such codes are known as Single Error Correction, Double Error
Detection (SEC-DED) codes. Creating such a code from a Hamming code is trivial: add a parity bit
covering the entire Hamming code. The extra parity bit increases the Hamming distance to 4. A Hamming
distance of 4 still allows only single bit error correction, but avoids the problem of Hamming distance 3 codes
when two bit errors occur, since patterns at Hamming distance 2 from a valid code word cannot be within
distance 1 of another code word, and thus cannot be corrected to the wrong result.
In fact, one can add a parity bit to any representation with an odd Hamming distance to create a new
representation with Hamming distance one greater than the original representation. To proof this convenient
fact, begin with a representation with Hamming distance d, where d is odd. If we choose two code words
from the representation, and their Hamming distance is already greater than d, their distance in the new
representation will also be greater than d. Adding a parity bit cannot decrease the distance. On the other
hand, if the two code words are exactly distance d apart, they must have opposite parity, since they differ
by an odd number of bits. Thus the new parity bit will be a 0 for one of the code words and a 1 for the
other, increasing the Hamming distance to d + 1 in the new representation. Since all pairs of code words
have Hamming distance of at least d + 1, the new representation also has Hamming distance d + 1.
18
19
Moving to the last of the three questions posed for instruction format definition, we explore a range of answers developed over the last few decades. Answers are usually chosen based on the number of bits necessary,
and we use this metric to organize the possibilities. The figure below separates approaches into two dimensions: the vertical dimension divides addressing into registers and memory, and the horizontal dimension
into varieties within each type.
fewer
bits
register
implicit
memory
implicit
special-purpose
registers
"zero page"
memory
relative
addresses
general-purpose
registers
segmented
memory
full
addresses
more
bits
As a register file contains fewer registers than a memory does words, the use of register operands rather than
memory addresses reduces the number of bits required to specify an operand. Our example architecture
used only register operands to stay within the limit imposed by the decision to use only 16-bit instructions.
Both register and memory addresses, however, admit a wide range of implementations.
Implicit operands of either type require no additional bits for the implicit address. A typical procedure
call instruction, for example, pushes a return address onto the stack, but the stack pointer can be named
implicitly, without the use of bits in the instruction beyond the opcode bits necessary to specify a procedure
call. Similarly, memory addresses can be implicitly equated to other memory addresses; an increment
instruction operating on a memory address, for example, implicitly writes the result back to the same
address. The opposite extreme provides full addressing capabilities, either to any register in the register file
or to any address in the memory. As addressing decisions are usually made for classes of instructions rather
than individual operations, I have used the term general-purpose registers to indicate that the registers
are used in any operation.
Special-purpose registers, in contrast, split the register file and allow only certain registers to be used in
each operation. For example, the Motorola 680x0 series, used until recently in Apple Macintosh computers,
provides distinct sets of address and data registers. Loads and stores use the address registers; ALSU
operations use the data registers. As a result, each instruction selects from a smaller set of registers and
thus requires fewer bits in the instruction to name the register for use.
As full memory addresses require many more bits than full register addresses, a wider range of techniques
has been employed to reduce the length. Zero page addresses, as defined in the 6510 (6502) ISA used by
Commodore PETs,1 C64s,2 and VIC 20s, prefixed a one-byte address with a zero byte, allowing shorter
instructions when memory addresses fell within the first 256 memory locations. Assembly and machine
language programmers made heavy use of these locations to produce shorter programs.
Relative addressing appeared in the context of control flow instructions of our example architecture, but
appears in many modern architectures as well. The Alpha, for example, has a relative form of procedure
call with a 21-bit offset (plus or minus a megabyte). The x86 architecture has a short form of branch
instructions that uses an 8-bit offset.
Segmented memory is a form of relative addressing that uses a register (usually implicit) to provide the high
bits of an address and an explicit memory address (or another register) to provide the low bits. In the x86
architecture, for example, 20-bit addresses are found by adding a 16-bit segment register extended with four
zero bits to a 16-bit offset.
1 My
2 My
20
Addressing Architectures
One question remains for the definition of instruction formats: how many addresses are needed for each
instruction, and how many of the addresses can be memory addresses? The first part of this question usually
ranges from zero to three, and is very rarely allowed to go beyond three. The answer to the second part
determines the addressing architecture implemented by an ISA. We now illustrate the tradeoffs between
five distinct addressing architectures through the use of a running example, the assignment X = AB + C/D.
A binary operator requires two source operands and one destination operand, for a total of three addresses.
The ADD instruction, for example, has a 3-address format:
or
ADD A,B,C
ADD R1,R2,R3
If all three addresses can be memory addresses, the ISA is dubbed a memory-to-memory architecture.
Such architectures may have small register sets or even lack a register file completely. To implement the
assignment, we assume the availability of two memory locations, T1 and T2, for temporary storage:
MUL T1,A,B
DIV T2,C,D
ADD X,T1,T2
; T 1 M [A] M [B]
; T 2 M [C]/M [D]
; X M [T 1] + M [T 2]
The assignment requires only three instructions to implement, but each instruction contains three full memory
addresses, and is thus very long.
At the other extreme is the load-store architecture used by the ISA that we developed earlier. In a loadstore architecture, only loads and stores can use memory addresses; all other operations use only registers.
As most instructions use only registers, this type of addressing architecture is also called a register-toregister architecture. The example assignment translates to the code at the top of the next page, which
assumes that R1, R2, and R3 are free for use.
LD
LD
MUL
LD
LD
DIV
ADD
ST
R1,A
R2,B
R1,R1,R2
R2,C
R3,D
R2,R2,R3
R1,R1,R2
R1,X
;
;
;
;
;
;
;
;
R1 M [A]
R2 M [B]
R1 R1 R2
R2 M [C]
R3 M [D]
R2 R2/R3
R1 R1 + R2
M [X] R1
Eight instructions are necessary, but no instruction requires more than one full memory address, and several
use only register addresses, allowing the use of shorter instructions. The need to move data in and out of
memory explicitly, however, also requires a reasonably large register set, as is available in the Sparc, Alpha,
and IA-64 architectures.
Architectures that use other combinations of memory and register addresses with 3-address formats are not
named. Unary operators and transfer operators require only one source operand, thus can use a 2-address
format (for example, NOT A,B). Binary operations can also use 2-address format if one operand is implicit,
as in the following instructions:
or
ADD A,B
ADD R1,B
The second instruction, in which one address is a register and the second is a memory address, defines a
register-memory architecture. As shown on the next page, such architectures strike a balance between
the two architectures just discussed.
R1,A
R1,B
R2,C
R2,D
R1,R2
R1,X
;
;
;
;
;
;
21
R1 M [A]
R1 R1 M [B]
R2 M [C]
R2 R2/M [D]
R1 R1 + R2
M [X] R1
The assignment requires six instructions using at most one memory address each; like memory-to-memory
architectures, register-memory architectures use relatively few registers. Note that two-register operations
are also allowed. Intels x86 ISA is a register-memory architecture.
Several ISAs of the past3 used a special-purpose register called the accumulator for ALU operations, and
are called accumulator architectures. The accumulator in such architectures is implicitly both a source
and the destination for any such operation, allowing a 1-address format for instructions, as shown below.
or
ADD B
ST
E
Accumulator architectures strike the same balance as register-memory architectures, but use fewer registers.
Note that memory location X is used as a temporary storage location as well as the final storage location in
the following code:
LD
MUL
ST
LD
DIV
ADD
ST
A
B
X
C
D
X
X
;
;
;
;
;
;
;
ACC M [A]
ACC ACC M [B]
M [X] ACC
ACC M [C]
ACC ACC/M [D]
ACC ACC + M [X]
M [X] ACC
The last addressing architecture that we discuss is rarely used for modern general-purpose processors, but
is perhaps the most familiar to you because of its use in scientific and engineering calculators for the last
fifteen to twenty years. A stack architecture maintains a stack of values and draws all ALU operands
from this stack, allowing these instructions to use a 0-address format. A special-purpose stack pointer (SP)
register points to the top of the stack in memory. and operations analogous to load (push) and store (pop)
are provided to move values on and off the stack. To implement our example assignment, we first transform
it into postfix notation (also called reverse Polish notation):
A B * C D / +
The resulting sequence of symbols transforms on a one-to-one basis into instructions for a stack architecture:
PUSH
PUSH
MUL
PUSH
PUSH
DIV
ADD
POP
A
B
C
D
;
;
;
;
;
;
;
;
SP SP 1, M [SP ] M [A]
SP SP 1, M [SP ] M [B]
M [SP + 1] M [SP + 1] M [SP ], SP SP + 1
SP SP 1, M [SP ] M [C]
SP SP 1, M [SP ] M [D]
M [SP + 1] M [SP + 1]/M [SP ], SP SP + 1
M [SP + 1] M [SP + 1] + M [SP ], SP SP + 1
M [X] M [SP ], SP SP + 1
A
B
AB
C
D
C/D
AB+C/D
A
AB
C
AB
AB
The values to the right are the values on the stack, starting with the top value on the left and progressing
downwards, after the completion of each instruction.
3 The
6510/6502 as well, if memory serves, as the 8080, Z80, and Z8000, which used to drive parlor video games.
22
you talk with customer support employees, for whom no machine ever dies.
23
loop:
CALL
CMP
BEQ
DoSomeWork:
DoSomeWork
R6,#1
loop
RETN
The procedure also places a return value in R6, which the instruction following the call compares with
immediate value 1. Until the two are not equal (when all work is done), the branch returns control to the
call and executes the procedure again.
As you may recall, the call and return use the stack pointer to keep track of nested calls. Sample RTL for
these operations appears below.
call RTL
SP SP 1
M [SP ] P C
return RTL
P C M [SP ]
SP SP + 1
P C procedure start
While an ISA provides the call and return instructions necessary to support procedures, it does not specify
how information is passed to or returned from a procedure. A standard for such decisions is usually developed
and included in descriptions of the architecture, however. This calling convention specifies how information
is passed between a caller and a callee. In particular, it specifies the following: where arguments must be
placed, either in registers or in specific stack memory locations; which registers can be used or changed by
the procedure; and where any return value must be placed.
The term calling convention is also used in the programming language community to describe the convention for deciding what information is passed for a given call operation. For example, are variables passed
by value, by pointers to values, or in some other way? However, once the things to be sent are decided, the
architectural calling convention that we discuss in this class is used to determine where to put the data in
order for the callee to be able to find it.
5 The
architecture that you used in the labs allowed limited use of procedures in its microprogram.
24
Calling conventions for architectures with large register sets typically pass arguments
in registers, and nearly all conventions place the return value in a register. A calling
convention also divides the register set into caller saved and callee saved registers.
Caller saved registers can be modified arbitrarily by the called procedure, whereas any
value in a callee saved register must be preserved. Similarly, before calling a procedure,
a caller must preserve the values of any caller saved registers that are needed after the
call. Registers of both types usually saved on the stack by the appropriate code (caller
or callee).
A typical stack structure appears in the figure to the right. In preparation for a call, a
caller first stores any caller saved registers on the stack. Arguments to the procedure to
be called are pushed next. The procedure is called next, implicitly pushing the return
address (the address of the instruction following the call instruction). Finally, the called
procedure may allocate space on the stack for storage of callee saved registers as well as
local variables.
storage
for more
calls
SP
storage for
current
procedure
return
address
extra
arguments
last
procs
SP
saved
values
call
stack
As an example, the following calling convention can be applied to our example architecture: the first three
arguments must be placed in R0 through R2 (in order), with any remaining arguments on the stack; the
return value must be placed in R6; R0 through R2 are caller saved, as is R6, while R3 through R5 are callee
saved; R7 is used as the stack pointer. The code fragments below use this calling convention to implement
a procedure and a call of that procedure.
int add3 (int n1, int n2, int n3) {
return (n1 + n2 + n3);
}
...
printf (%d, add3 (10, 20, 30));
by convention:
n1 is in R0
n2 is in R1
n3 is in R2
return value is in R6
add3: ADD
ADD
RETN
...
PUSH
LDI
LDI
LDI
CALL
MOV
LDI
CALL
POP
R0,R0,R1
R6,R0,R2
R4
R0,#10
R1,#20
R2,#30
add3
R1,R6
R0,%d
printf
R4
The add3 procedure takes three integers as arguments, adds them together, and returns the sum. The
procedure is called with the constants 10, 20, and 30, and the result is printed. By the calling convention,
when the call is made, R0 must contain the value 10, R1 the value 20, and R2 the value 30. We assume that
the caller wants to preserve the value of R4, but does not care about R3 or R5. In the assembly language
version on the right, R4 is first saved to the stack, then the arguments are marshaled into position, and
finally the call is made. The procedure itself needs no local storage and does not change any callee saved
registers, thus must simply add the numbers together and place the result in R6. After add3 returns, its
return value is moved from R6 to R1 in preparation for the call to printf. After loading a pointer to the
format string into R0, the second call is made, and R4 is restored, completing the translation.
System calls are almost identical to procedure calls. As with procedure calls, a calling convention is used:
before invoking a system call, arguments are marshaled into the appropriate registers or locations in the
stack; after a system call returns, any result appears in a pre-specified register. The calling convention used
for system calls need not be the same as that used for procedure calls. Rather than a call instruction, system
calls are usually initiated with a trap instruction, and system calls are also known as traps. With many
architectures, a system call places the processor in privileged or kernel mode, and the instructions that implement the call are considered to be part of the operating system. The term system call arises from this fact.
25
generated by
external device
invalid opcode or operand
deliberate, via trap instruction
example
packet arrived at network card
divide by zero
print character to console
asynchronous
yes
no
no
unexpected
yes
yes
no
Interrupts occur asynchronously with respect to the program. Most designs only recognize interrupts between
instructions. In other words, the presence of interrupts is checked only after completing an instruction rather
than in every cycle. In pipelined designs, however, instructions execute simultaneously, and the decision as
to which instructions occur before an interrupt and which occur after must be made by the processor.
Exceptions are not asynchronous in the sense that they occur for a particular instruction, thus no decision
need be made as to instruction ordering. After determining which instructions were before an interrupt, a
pipelined processor discards the state of any partially executed instructions that occur after the interrupt
and completes all instructions that occur before. The terminated instructions are simply restarted after
the interrupt completes. Handling the decision, the termination, and the completion, however, significantly
increases the design complexity of the system.
The code associated with an interrupt, an exception, or a system call is a form of procedure called a
handler, and is found by looking up the interrupt number, exception number, or trap number in a table
of functions called a vector table. Separate vector tables exist for each type (interrupts, exceptions, and
system calls). Interrupts and exceptions share a need to save all registers and status bits before execution
of the corresponding handler code (and to restore those values afterward). Generally, the valuesincluding
the status word registerare placed on the stack. With system calls, saving and restoring any necessary
state is part of the calling convention. A special return from interrupt instruction is used to return control
from the interrupt handler to the interrupted code; a similar instruction forces the processor back into user
mode when returning from a system call.
Interrupts are also interesting in the sense that typical computers often have many interrupt-generating
devices but only a few interrupts. Interrupts are prioritized by number, and only an interrupt with higher
priority can interrupt another interrupt. Interrupts with equal or lower priority are blocked while an interrupt
executes. Some interrupts can also be blocked in some architectures by setting bits in a special-purpose
register called an interrupt mask. While an interrupt number is masked, interrupts of that type are blocked,
and can not occur.
As several devices may generate interrupts with the same interrupt number, interrupt handlers can be
chained together. Each handler corresponds to a particular device. When an interrupt occurs, control is
passed to the handler for the first device, which accesses device registers to determine whether or not that
device generated an interrupt. If it did, the appropriate service is provided. If not, or after the service is
complete, control is passed to the next handler in the chain, which handles interrupts from the second device,
and so forth until the last handler in the chain completes. At this point, registers and processor state are
restored and control is returned to the point at which the interrupt occurred.
26
R2,R3
R1
The status bits are not always implemented as special-purpose registers; instead, they may be kept in
general-purpose registers or not kept at all. For example, the Alpha ISA stores the results of comparisons
in general-purpose registers, and the same branch is instead implemented as follows:
CMPLE
BNE
R4,R2,R3
R4,R1
; R2 R3 : R4 1, R2 > R3 : R4 0
; R4 =
6 0 : P C R1
Finally, status bits can be calculated, used, and discarded within a single instruction, in which case the
branch is written as follows:
BLE
R1,R2,R3
; R2 R3 : P C R1
The three approaches have advantages and disadvantages similar to those discussed in the section on addressing architectures: the first has the shortest instructions, the second is the most general and simplest to
implement, and the third requires the fewest instructions.
Stack Operations
Two types of stack operations are commonly supported. Push and pop are the basic operations in many
older architectures, and values can be placed upon or removed from the stack using these instructions. In
more modern architectures, in which the SP becomes a general-purpose register, push and pop are replaced
with indexed loads and stores, that is, loads and stores using the stack pointer and an offset as the address
for the memory operation. Stack updates are performed using the ALU, subtracting and adding immediate
values from the SP as necessary to allocate and deallocate local storage.
Stack operations serve three purposes in a typical architecture. The first is to support procedure calls, as
illustrated in a previous section. The second is to provide temporary storage during interrupts, as mentioned
earlier.
The third use of stack operations is to support spill code generated by compilers. Compilers first translate
high-level languages into an intermediate representation much like assembly code but with an extremely large
(theoretically infinite) register set. The final translation step translates this intermediate representation into
assembly code for the target architecture, assigning architectural registers as necessary. However, as real
ISAs support only a finite number of registers, the compiler must occasionally spill values into memory. For
example, if ten values are in use at some point in the code, but the architecture has only eight registers, spill
code must be generated to store the remaining two values on the stack and to restore them when they are
needed.
27
I/O
As a final topic for the course, we now consider how a processor connects to other devices to allow input
and output. We have already discussed interrupts, which are a special form of I/O in which only the signal
requesting attention is conveyed to the processor. Communication of data occurs through instructions similar
to loads and stores. A processor is designed with a number of I/O portsusually read-only or write-only
registers to which devices can be attached with opposite semantics. That is, a port is usually written by the
processor and read by a device or written by a device and read by the processor.
The question of exactly how I/O ports are accessed is an interesting one. One option is to create special
instructions, such as the in and out instructions of the x86 architecture. Port addresses can then be specified
in the same way that memory addresses are specified, but use a distinct address space. Just as two sets
of special-purpose registers can be separated by the ISA, such an independent I/O system separates I/O
ports from memory addresses by using distinct instructions for each class of operation.
Alternatively, device registers can be accessed using the same load and store instructions as are used to
access memory. This approach, known as memory-mapped I/O, requires no new instructions for I/O,
but demands that a region of the memory address space be set aside for I/O. The memory words with those
addresses, if they exist, can not be accessed during normal processor operations.
28
29
30