Unit Notes
Unit Notes
UNIT NOTES
(TJD 2023)
Unit Coordinator:
Dr Philip Schrader
School Mathematics & Statistics, Chemistry and Physics
Revised by Philip Schrader 2022. Based on earlier versions prepared by various colleagues including Dr Amy Glen, A/Prof
Gerd Schröder-Turk, Dr Mark Lukas, and others.
This publication is copyright. Except as permitted by the Copyright Act, no part of it may in any form or by any electronic,
mechanical, photocopying, recording or any other means be reproduced, stored in a retrieval system or be broadcast or
transmitted without the prior written permission of the publisher.
PREVIOUS UNIT COORDINATORS
1 Recurrence Relations 1
1.1 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Introduction to MATLAB 11
2.1 Using MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Scalars, vectors and matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Plotting in two dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Plotting in three dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 For loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6 Selection statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.7 User-defined functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3 Computer Arithmetic 44
3.1 Numbers and their representations . . . . . . . . . . . . . . . . . . . . . . 44
3.1.1 Base conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1.2 Addition in other bases . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 Scientific notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Computer storage of numbers . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Subtractive cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.5 Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
i
ii CONTENTS
5 Matrices 94
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.2 Matrix arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2.1 Addition and subtraction . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2.2 Multiplication by scalars . . . . . . . . . . . . . . . . . . . . . . . . 96
5.2.3 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.2.4 Matrix operations in MATLAB . . . . . . . . . . . . . . . . . . . . 99
5.3 Some special matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.4 Further matrix operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.4.1 Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.4.2 Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.4.3 Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.4.4 Transposes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.4.5 Matrix functions in MATLAB . . . . . . . . . . . . . . . . . . . . . 105
5.5 Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Recurrence Relations
1.1 Sequences
A sequence can be defined informally as a list of objects arranged in some definite order,
for example
1, 3, 5, 7, 9, 11, 13, . . .
The objects in a sequence are called the terms, and are usually denoted by symbols
such as t1 , t2 , t3 , and so on. We shall be dealing mainly with numerical sequences,
that is sequences whose terms are numbers, but there are other types of sequences. For
example, it makes sense to consider sequences of sets, sequences of triangles, or sequences
of functions.
The subscripts 1, 2, 3, . . . indicate the position of that particular term in the sequence.
Thus t1 is the first term, t2 is the second term, and so on. The symbol tn denotes the nth
term, commonly called the general term of the sequence. The sequence whose nth term
is tn is usually denoted by (tn ), or by t1 , t2 , t3 , . . . .
Sequences may be specified in one of three ways. We illustrate each way with reference
to Example 1.1.
(a) One way is simply to write out the first few terms followed by three dots, for example,
1, 3, 5, 7, 9, . . . , but this method is only useful if these first few terms are sufficient
to establish the pattern of the sequence. Otherwise, they just look like a bunch of
puzzling numbers.
(b) Another method is to give a formula for the general term, for example, tn =
2n − 1. This explains how to calculate any term in the sequence. For example,
1
2 Chapter 1. Recurrence Relations
(c) A third method is to describe how to construct the list of terms, by starting with
the first one t1 , and working up the sequence progressively. This requires a formula
for calculating any particular term from the ones that come before it. For example,
t1 = 1
tn = tn−1 + 2 (for n ≥ 2).
This means that for each n ≥ 2, we compute tn by adding 2 to tn−1 . This formula
doesn’t work for t1 because there is no t0 ; that’s why we have to specify t1 explicitly.
The main purpose of this chapter is to study the third method of defining a sequence.
This method is an example of a recursive definition. A function that is defined recur-
sively is recognisable because its own definition refers back to earlier values of itself. For
example, the function f which computes the sequence of Example 1.1 can be recursively
defined by:
f (1) = 1;
f (n) = f (n − 1) + 2 (for n # 2).
Recursive functions are very common in computer programming, and can be very
complicated. However, in this unit we will usually only deal with sequences. Here is the
definition that we’ll use:
It is important to realise that a recursive definition of a sequence has two parts, and
that neither part is sufficient on its own. The first part (B) is called the basis, and the
second part (R) is known as the recurrence relation or recurrence formula. The
basis describes how the sequence starts, and the recurrence relation describes how the
sequence continues once it is started. For this reason the specifications in the basis are
also known as the initial conditions of the recurrence relation.
◮ Example 1.2 In 1202 Leonardo of Pisa, also known simply as Fibonacci, introduced the
sequence
1, 1, 2, 3, 5, 8, 13, 21, 34, . . .
where each term is the sum of the previous two. This is now called the Fibonacci
sequence and has become very famous because it arises so often in mathematics and
Section 1.1. Sequences 3
(B) F1 = 1, F2 = 1;
(R) Fn = Fn−1 + Fn−2 (n # 3).
We can use (R) to calculate F3 , then F4 , then F5 and so on. In this way we could calculate
as many terms of the Fibonacci sequence as we want. Although we are not given a specific
formula for particular terms such as F1000 , we accept that we could calculate this, or any
other term, by using (B) and repeated use of (R). ◭
It is also useful to note that the recursive part of any recursive definition can be
expressed in more than one way. For example, the recursive part of the definition of the
Fibonacci sequence could be written as
It is important to understand that both this formula and the one given earlier say exactly
the same thing, namely that each term in the Fibonacci sequence (from the third term
onwards) is the sum of the two before it. All that differs is the value of n that we must
use to get a specific instance of this rule.
The recursive method of defining sequences often turns out to be more efficient than
direct methods. The next example shows this.
1! = 1 = 1
2! = 1×2 = 2
3! = 1×2×3 = 6
4! = 1×2×3×4 = 24
5! = 1×2×3×4×5 = 120
However it soon becomes clear to anyone wanting to compile a list of values of n! (up to
n = 10, say) that the recursive definition is much quicker, because to go from (n−1)! to n!,
only one multiplication is required. Factorials will be further discussed in Section 4.2.2. ◭
We often wish to find formulas for the general terms of recursively defined sequences.
Sometimes this is easy to do, but not always. Some apparently simple recurrence formulas
define sequences which appear to be quite erratic and unpredictable.
4 Chapter 1. Recurrence Relations
t1 = 3; tn = tn−1 + 5 (n # 2).
According to the recursive formula for (tn ), the difference between successive terms in the
sequence t1 , t2 , t3 , . . . is always 5. This is an example of an arithmetic progression.
In fact the recursive definition of any arithmetic progression has the form
(B) t1 = a, and
(R) tn = tn−1 + d (n # 2),
where a is the first term and d is the common difference. As you probably learnt at school,
the formula for the general term of an arithmetic sequence is
tn = d × (n − 1) + a.
Arithmetic progressions arise in situations where a fixed amount is being added to a total
at successive stages, for example in bank balances where simple interest is being added
periodically. ◭
t1 = 3; tn = 5tn−1 (n # 2).
tn = a · rn−1 (n # 1).
◮ Example 1.7 Write the first three terms of the sequence defined by t0 = 3 and tn =
2tn−1 − 2 for n # 1.
Solution. t0 = 3, t1 = 2 · 3 − 2 = 4, and t2 = 2 · 4 − 2 = 6. ◭
6 Chapter 1. Recurrence Relations
Sometimes it is useful to have a recurrence formula for a sequence even though you
know a formula for the general term. (Such a general formula could be complicated
whereas a recurrence formula might be simple and show up something important.) The
next examples show two methods for finding a recurrence formula.
◮ Example 1.8 Suppose we want a recurrence formula for the sequence (sn ) where sn =
2n2 − 5 for n # 0. We take the difference of successive terms:
! "
sn − sn−1 = 2n2 − 5 − 2(n − 1)2 − 5
= 2n2 − 5 − 2(n2 − 2n + 1) + 5
= 2n2 − 2n2 + 4n − 2
= 4n − 2
◮ Example 1.9 Suppose we want a recurrence formula for the sequence (sn ) where sn = n
n+1
for n # 1. We take the quotient of successive terms:
#
sn n n−1
=
sn−1 n+1 n
n n
= ·
n+1 n−1
n2
=
n2 − 1
n2
Therefore a recurrence relation is sn = n2 −1
sn−1 . ◭
Note that generally one of these methods provides a “nicer” answer than the other,
but either method produces a correct answer.
We shall soon be looking at ways a computer can be programmed to add a set of
numbers. If these numbers occur as terms of a sequence, then a recursive approach will
be most appropriate.
Suppose that we have a sequence of numbers (tn ), and we construct another sequence
(sn ) recursively as follows:
(B) s0 = 0, and
(R) sn = sn−1 + tn for n # 1.
It is easy to check using the recursive formula repeatedly that
s1 = s0 + t 1 = t1 ,
s2 = s1 + t 2 = t1 + t2 ,
s3 = s2 + t 3 = t1 + t2 + t3 ,
s4 = s3 + t 4 = t1 + t2 + t3 + t4 ,
s5 = s4 + t 5 = t1 + t2 + t3 + t4 + t5 ,
Section 1.1. Sequences 7
and it is clear that the general rule is that sn is always the sum of the first n terms of the
sequence (tn ).
sn = 1 + 2 + 3 + · · · + n
for each n # 1. ◭
◮ Example 1.11 Consider the sequence (pn ) defined recursively in terms of a given sequence
(tn ) by the rule
(B) p0 = 1, and
(R) pn = pn−1 × tn for n # 1.
It is easy to check that each pn is the product of the first n terms of the sequence (tn ).
In particular, if tn = n for each n # 1, then (pn ) is just the sequence of factorials (n!). ◭
In many relevant applied problems, a sequence will be defined in words and from the
given information it is important to be able to formulate its recursive definition. To help
in this formulation, it is useful to draw a time-line showing when the given events occur.
◮ Example 1.12 A bank pays compound interest of 5% per year (calculated daily but com-
pounded yearly) on a certain type of account. Suppose that Joan opens such an account
with an initial amount of $1000 and deposits $100 at the end of each year thereafter (just
after the interest is compounded). Find a recursive definition for the amount An dollars
in the account at the beginning of year n, n # 1. Find the values of A1 , A2 and A3 .
A time-line representing the problem is given below.
amount: A1 A2 A3 An−1 An
5% 5% 5%
❄ ❄
❄ ❄
❄ "❆ ❄ ❄
❄
" ✲
❆❆"
"
year: 1 2 n−1 n
Clearly, by definition, A1 = 1000. At the end of year 1, the bank will pay $0.05 × 1000
in interest and Joan will deposit $100, so the amount A2 at the beginning of year 2 will
be A2 = 1000 + (0.05)1000 + 100 = (1.05)1000 + 100 = 1150. Now, considering year n − 1
and year n, we find that (see the time-line)
Together with A1 = 1000, this is the recursive definition for An . Using this definition, we
get A1 = 1000, A2 = 1150 and A3 = 1307.50. ◭
8 Chapter 1. Recurrence Relations
Exercises
Some of the questions for this section ask you to find a formula for the general term
of a sequence. You have not been taught any methods for doing this yet, so you
should play with these questions in any ways you can think of. They are designed
to build up your intuition about patterns of numbers. (The best way to see patterns
is to write out the first few terms of a sequence.)
1. Find a formula for the nth term tn of the sequences defined as follows:
2. Give a recursive definition of (tn ), if the formula for the general term tn is:
(a) tn = 3n (n # 1).
1 × 3 × 5 × · · · × (2n − 1)
(b) tn = (n # 1).
2 × 4 × 6 × · · · × (2n)
(c) tn = n2 (n # 0).
xn
(d) tn = (n # 1).
n!
3. Find a formula for the general term an , if
4. In a certain video game there is a spaceship in the middle of a screen, and a number
of inter-galactic space invaders spread across the rest of the screen. The player
tries to eliminate the invaders by shooting them with a laser gun attached to the
spaceship. However, at the end of every 5 second period, each of the remaining
invaders splits into two invaders; so if the player is not very good at shooting, the
screen is eventually filled with invaders.
Let In , for n # 1, denote the number of invaders on the screen at the beginning of
the nth time period, and let A denote the number of invaders on the screen at the
start of the game. (So I1 = A.)
Section 1.1. Sequences 9
(a) Draw a time-line representing the problem and write down a recurrence relation
for In if the player never eliminates any invader.
(b) How long does it take before there are over 1 000 000 invaders on the screen if
A = 3?
Now suppose that the player is able to eliminate invaders, and becomes more skillful
as the game progresses. In fact, in the nth period the player can eliminate as many
as 2n invaders.
(c) Draw a time-line and write down the recursive formula for In in this case.
(d) Find, by trial and error, the smallest value of A which leads to the screen
eventually being filled by invaders (i.e. over 1 000 000).
(e) Repeat parts (c) and (d) if in the nth period the player is able to eliminate
n2 invaders.
(a) For each A between 1 and 10, evaluate the terms in the sequence until you
reach the cyclical pattern.
(b) Find, by trial and error, the smallest value of A for which some term in the
sequence (tn ) is greater than 100.
(c) Can you find a value for A which gives a sequence (tn ) that does not eventually
collapse to . . . , 1, 4, 2, 1, 4, 2, . . .? There is a large reward for the first person
who finds such an A or proves that there is none! (This is a famous unsolved
problem that has been around since the late 1930s.)
10 Chapter 1. Recurrence Relations
5. (a) 1, 4, 2, 1, . . .
2, 1, 4, 2, . . .
3, 10, 5, 16, 8, 4, 2, 1, 4, . . .
4, 2, 1, 4, . . .
5, 16, 8, 4, 2, 1, 4, . . .
6, 3, 10, 5, 16, 8, 4, 2, 1, 4, . . .
7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1, 4, . . .
8, 4, 2, 1, 4, . . .
9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1, 4, . . .
10, 5, 16, 8, 4, 2, 1, 4, . . .
(b) A = 15: 15, 46, 23, 70, 35, 106, . . .
Chapter 2
Introduction to MATLAB
Objectives:
! To understand the basic features of the MATLAB package, in particular the repre-
sentation of scalars, vectors and matrices, operations between them, and elementary
MATLAB functions.
Read Chapters 1–5 of the textbook Introduction to MATLAB by Etter (2nd or 3rd
edition), excluding Section 4.1.5 concerning other types of two-dimensional plots, and do
a selection of the problems at the end of Chapters 2–5. (Keep in mind that if you have
the older 2nd edition of the textbook, the MATLAB user interface has changed since it
was published.) Also read Section 7.1 of the textbook, up to and including the command
ezplot. Section 3.6 on user-defined functions can be read after the other sections. Note
that in Section 3.3 titled “Trigonometric Functions”, the only functions that will be used
in this unit are sin(x) and cos(x). When you read about a new command, try it out in
MATLAB by typing in an example from the textbook. The textbook applies MATLAB in
many different applications, but it is not necessary to go through all of these applications.
In parallel to the textbook, also read and work through the following notes.
11
12 Chapter 2. Introduction to MATLAB
MATLAB prior to the 8th (R2012a or earlier), please keep in mind such differences in
the user interface when reading these notes.
To start MATLAB, select and load the MATLAB application, as you would normally
open any application on your computer (e.g., double click the MATLAB icon). The de-
fault screen when opening MATLAB is made up of three windows: Current Folder (on
the left), Workspace (on the right) and the central Command Window (which is the most
important). The Current Folder window shows the chosen folder containing your MAT-
LAB files (if you’ve created and saved any such files – see below). By default, the current
folder will be the MATLAB subfolder within your Documents folder and this is where
we recommend saving all of your MATLAB files (as explained shortly). The Workspace
window (which used to be the Command History window in earlier versions of MATLAB)
shows the variables/output from the current session. From that window, you can view,
manipulate, save, or clear variables from the current session. The Command Window is
where MATLAB commands or MATLAB programs (called script M-files with the exten-
sion .m) are run and output is displayed. This window has the MATLAB prompt >> (or
EDU>> for the Student Edition) which means it is ready for you to enter a command.
If you want to execute just one or two MATLAB commands, it is simplest to enter
each command at the prompt in the Command Window (one at at time), followed by
the Enter or Return key each time. However, a MATLAB program (which consists of a
sequence of commands over several lines) should be entered and saved as a script M-file.
This allows you to modify the program easily by simply editing the file. All M-files must
have a .m extension.
To create a script M-file, select the New Script button on the MATLAB toolstrip
under the HOME tab (or alternatively, click on New (Script) from the toolstrip under the
EDITOR tab). This brings up an Editor window above the Command Window (if it wasn’t
already open) where you can type the commands (i.e., lines of code) that constitute your
MATLAB program. As shown in examples that follow, the first one or more lines of a
MATLAB program should be a short description of the purpose of the program and there
should also be adequate comments throughout the program. On each line, the description
and comments must be preceded by the % symbol. To save the program as a .m file, select
Save/Save As on the toolstrip under the EDITOR tab and give the program an appropriate
and legal filename, i.e., one that starts with a letter and contains only letters, numbers or
the underscore (_), without any spaces, and is not the same name as a built-in function or
command (e.g., not something like plot). The same rules apply to variable names within
a MATLAB program. The .m extension should be automatically added to your filename
(e.g. program1.m), but if not, you need to add the .m extension yourself before saving.
As mentioned above, we recommend that you save all of your MATLAB programs (script
M-files) in the MATLAB subfolder within your Documents folder and make sure that the
contents of this MATLAB folder is showing in the Current Folder window.
Section 2.2. Scalars, vectors and matrices 13
To execute a program (called program1.m, for example), first make sure that the M-
file is showing up in the list of files in the Current Folder window and then simply
type program1 (without the .m extension) at the prompt in the Command Window and
press Enter or Return on your keyboard. (Alternatively, with your MATLAB program
showing in the Editor window, click the Run button on the toolstrip under the EDITOR
tab.) Executing/running a program has the same effect as if all the commands (in your
program) had been typed in the Command Window. If you want to make any changes to
the program and the Editor window is still open, then simply make the changes, save the
file and again hit the Run button or enter the name of the program (e.g. program1) at the
prompt in the Command Window. (Note that, within the Command Window, you can
repeatedly click the up arrow key ↑ on your keyboard to scroll back through previously
typed commands or filenames.) To open an existing M-file, click the Open button on the
toolstrip and select the file. Alternatively, double click the M-file from the list of files in
the Current Folder window. For some more information about M-files, see the section
entitled “Script Files” in the MATLAB User Guide.
The online help facility in MATLAB is very useful. The command help provides a
help menu for all topics. However, if you know the name of the topic or command that you
want to check, it is quicker to type help commandname (e.g. help plot) at the prompt in
the Command Window. You can also obtain help on the help command itself by entering
help help at the prompt. The help facility can also be accessed via MATLAB’s Help
menu or the Help button on the toolstrip under the HOME tab.
To exit MATLAB, choose Quit MATLAB from the MATLAB drop-down menu. Alterna-
tively, type quit or exit at the prompt (in the Command Window) and then press Enter
or Return on your keyboard.
◮ Example
( 2.1+
1 2
) ,
C = * 3 4 - is a 3 × 2 matrix (3 rows, 2 columns).
5 6
. /
B = 7 8 9 9.5 is a 1 × 4 matrix (1 row, 4 columns), called a row vector.
A = [2.3] is a 1 × 1 matrix, which is equivalent to the scalar 2.3.
The elements of a matrix are identified by their row and column numbers in that
order. For example, in the first matrix C3,2 = 6.
The matrices above can be entered in MATLAB in explicit list form as follows.
(Note the % symbol is used before a comment.)
B=[7,8,9,9.5] % or
B=[7 8 9 9.5] % the commas are replaced by spaces
A=2.3
Having entered a vector or matrix, an element can be found using round brack-
ets, for example
Two (or more) scalars, vectors or matrices (of appropriate size) can be concate-
nated easily by simply enclosing them in square brackets, for example
Note the second command adds three elements to the vector, the 8th being the
one specified (2), and the 6th and 7th assigned the default value zero (since they
were not specified). ◭
Section 2.2. Scalars, vectors and matrices 15
Exercise With A, B and C entered as above, write down the output of each of the
MATLAB statements:
P=[0.7;0.8;0.9]
[P(1) B]
[C,P]
[A;C(2,2)]
Colon operator
The colon operator is a very important operator in MATLAB. It is used in several ways
– to define vectors with equal increments, to specify elements of a vector or matrix, and
in the for loop statement. These uses are illustrated in the following example.
◮ Example 2.2
B=[7 8 9 9.5]
x=B(1:3) % defines x=[7 8 9]
C=[1 2; 3 4; 5 6]
C(2,:) % gives second row 3 4 (here : means any column position)
E=[1 2 3; 4 5 6; 7 8 9]
F=E(1:2,2:3) % defines F=top right 2 by 2 matrix
Here ( +
1 2 3 0 1
) , 2 3
E=* 4 5 6 - and F =
5 6
7 8 9
The following program uses a for loop to compute and list all the even numbers
from 2 to 20.
for n=1:10
x=2*n
end
Transpose operator
The transpose D of an m × n matrix C is the n × m matrix in which the elements of the
ith row of D are the corresponding elements of the ith column of C. For example,
( +
1 2 0 1
) , 1 3 5
if C = * 3 4 - its transpose is D =
2 4 6
5 6
Note also that the columns of D are the corresponding rows of C. Clearly the transpose
of a row vector is just the corresponding column vector, and the transpose of a column
vector is the corresponding row vector. In MATLAB the transpose operator is denoted
by the prime symbol, as in
C=[1 2; 3 4; 5 6]
D=C’ % D is the transpose of C
It gives
table =
0 5
1 10
2 12
3 10
4 5
Input
Sometimes it is useful to allow the user of a program to enter the value of some variable
(possibly a scalar, vector or matrix) using the keyboard. This can be done with the input
command as follows:
Enter value of n
in the Command Window and waits for a response. If you type say 25 and press
Enter/Return, then n has the value 25.
Section 2.2. Scalars, vectors and matrices 17
in the Command Window and waits for a response. If you type [2,3,5.5] and press
Enter/Return, then x is defined to be the vector [2,3,5.5].
Output
The format of the numerical output in MATLAB can be specified using one of the following
format statements. In each case the output is rounded to the given number of decimal
places.
format short % default, 4 dec places and <=3 whole digits e.g. 15.2345
format long % 14 decimal places e.g. 15.23453170892113
format bank % 2 decimal places e.g. 15.23
format short e % 4 decimal places e.g. 1.5235e+001 (e+001 means x10^1)
format long e % 15 decimal places e.g. 1.523453170892113e+001
format compact % suppress blank lines
format loose % default
The value of a variable (scalar, vector or matrix) can be displayed using the disp
command. This command can also be used to display text strings as follows.
temp=35;
disp(’Temperature is’)
disp(temp)
disp(’degrees C’)
This gives
Temperature is
35
degrees C
If more control of the output format is desired, one can use the command fprintf
(but this is not essential in this unit). It is used as follows
temp=28.53;
fprintf(’Temperature is %4.1f degrees C’,temp)
where we have 4 places for the numerical output (including decimal point) with 1 decimal
place. For more information on fprintf see the MATLAB help facility.
Note that some of the array operations are denoted by two symbols with . at the front
(which is essential). The reason for this is that the single symbols *, / and ^ have been
reserved for other operations between arrays or matrices; in particular, A*B means matrix
multiplication of matrices A and B .
Section 2.2. Scalars, vectors and matrices 19
plot(x,y,’o’)
xlabel(’x’)
ylabel(’y’)
title(’Function y(x)=x^2+3x-4’)
The plot command can produce plots of both point type (using one of the plot symbols
’o’, ’+’ or ’*’) or line type (using the default of solid or one of ’--’ (dashed) or ’:’
(dotted)). The line type plots are produced simply by joining the points defined by the
vectors x and y in plot(x,y). Therefore, to create an accurate line graph of a function,
generally one needs to use a small increment for the x values, say 0.01.
The axis command is used to control the limits and scaling of a plot. It is included
after the plot command in the following possible ways.
A grid can be added to a plot using grid on (after the plot command) and it can be
removed using grid off.
Section 2.3. Plotting in two dimensions 21
◮ Example 2.6 Write a MATLAB program to plot the graphs of the functions
f (x) = (x3 − 3x2 )(2x + 5) and g(x) = (x + 3)2
together using values of x between −3 and 3 in increments of 0.01. Set the y-axis
limits to be −50 and 50, and include a grid.
% Plot graphs of two functions
clear
x=-3:0.01:3;
f=(x.^3 - 3*x.^2).*(2*x + 5); % note element-by-element multiplication
% between the brackets
g=(x+3).^2;
plot(x,f,x,g,’--’)
xlabel(’x’)
ylabel(’y’)
title(’Graphs of f(x)=(x^3-3x^2)(2x+5) (solid) and g(x)=(x+3)^2 (dashed)’)
axis([-3,3,-50,50])
grid on
Execute this program. ◭
Another way to plot more than one graph in the same figure is to use the hold on
command, which holds the figure ready for subsequent plots. It is used as follows:
x=-3:0.01:3;
f=(x.^3 - 3*x.^2).*(2*x + 5);
plot(x,f)
hold on
g=(x+3).^2;
plot(x,g,’--’)
hold off
Exercise Write a MATLAB program to compute and plot the functions
2 − 3x
f (x) = 2 and g(x) = (x/5)4
4x + 5x
together for 1 $ x $ 3.
MATLAB has built-in functions to evaluate the elementary functions of mathemat-
√
ics; in particular sqrt(x) for x, sin(x) for sin(x), cos(x) for cos(x), exp(x) for the
exponential function ex , where e is the special number e = 2.71828 · · · , and log(x) for
the natural log function ln(x) (defined to be the number y that satisfies the equation
ey = x). Note that if x is a vector, then the built-in MATLAB function produces a vector
of function values. If the function is part of a larger expression, be careful to use the
correct array operations.
22 Chapter 2. Introduction to MATLAB
x=0:0.1:3;
f=(sin(x)).^2 + 3*x./(1+exp(x));
Exercise Write down the appropriate MATLAB statements to generate the values of
the function
g(x) = (3 + x cos(x))e2x
for x between 0 and 1 in increments of 0.1. Execute the statements in MATLAB.
◮ Example 2.8 The following MATLAB program generates the values of f (x) =
1 + sin(2πx) for x between 0 and 1 in increments of 0.01, plots the function and
sums its values.
Execute this program in MATLAB. Can you explain why the sum of the values of
f equals 101, the number of elements in the vector x? ◭
√
◮ Example 2.9 Plot the graphs of f (x) = x/(x + 1) for 0 $ x $ 9 and g(x) =
x/2 + 1 for −5 $ x $ 1 together in the same figure. Note that we can’t plot both
√
functions over −5 $ x $ 9 because x is undefined for x < 0.
Note that we can use the large increment 1 to plot g(x) because the graph is a
straight line (and MATLAB just joins the defined points by straight line segments).
Execute this program in MATLAB. ◭
The ezplot command is an easy way to plot an algebraically defined function. Note
that the function is entered as an algebraic expression, without using any array operations.
The interval domain of the independent variable is entered in square brackets. If the
interval is not present, the default domain is [−2π, 2π]. For example
ezplot(’exp(-x^2)’)
2
plots e−x on [−2π, 2π] while
ezplot(’exp(-x^2)’,[-1,1])
2
plots e−x on [−1, 1].
A useful plotting command is zoom. As its name suggests, it allows you to zoom in on
any part of a 2-D graph and see it enlarged in the plot window. The statement zoom on
turns on the zoom mode. Then, to zoom in about a particular point on the screen, move
the cursor to the point and click on the mouse button (left one for a PC). This expands
the plot by a factor of 2 centred on the point. The process can be repeated over and over
again to zoom in closer. You can also click and drag the mouse to zoom in on a particular
rectangular area. To zoom out by a factor of 2, click the right mouse button on a PC or
shift and click on an Apple Mac. The statement zoom out returns the plot to its initial
state and zoom off turns off the zoom mode. In MATLAB 5.3 and later versions, the
zoom (in or out) mode can also be activated using the menu bar of the figure window.
24 Chapter 2. Introduction to MATLAB
◮ Example 2.10 The following MATLAB program allows you to zoom in on the
graph of f (x) = x2 .
% Zoom in on a graph
x=-2:0.01:2;
f=x.^2;
plot(x,f)
xlabel(’x’)
ylabel(’f’)
title(’Graph of f(x)=x^2’)
zoom on
Execute the program and zoom in (and out) on a few different points of the graph.
Notice that as you zoom in, the section of graph looks more and more like a line
segment. In fact this is true for any point of the graph. This means that near
any point (x1 , y1 ) of the graph, the graph can be approximated well by a line
segment. If you have studied some calculus, you will know that this property holds
because the function f (x) = x2 is differentiable, that the line segment is part of the
tangent line at the point, and the slope of the line is the derivative f ′ (x1 ) = 2x1 .
The equation of the line is y = 2x1 (x − x1 ) + y1 . Choose a point on the graph (say
(x1 , y1 ) = (1, 1)) and plot this line together with the graph of f (x) = x2 . ◭
◮ Example 2.11 Use the following program to zoom in on the graphs of the two
functions f (x) = sin(x) and g(x) = x near the origin.
Notice that as you zoom in, the graphs become closer and closer. This means that
for x near 0, sin(x) is close to x, and the nearer x is to 0, the closer sin(x) is to
x. This is an important fact about the sin function. What function do you think
h(x) = 3 sin(x2 ) is close to near x = 0? (Answer: 3x2 ) Modify the program to
verify this graphically. ◭
Section 2.3. Plotting in two dimensions 25
Another useful command to interact with a 2-D graph window is ginput. This com-
mand records the coordinates of any desired position of the cursor, which can then be
printed or operated on in any way. In its simplest form, the command is used as:
[xin yin]=ginput
where the variable names (xin and yin) can be chosen as desired. Then, with each click
of the mouse button, the x and y coordinates of the cursor (cross) are saved in the column
vectors xin and yin. To stop this process, press the Enter or Return key. For more
information on the ginput command, look it up using the help facility.
A common mathematical problem is to solve two simultaneous equations in two vari-
ables, say x and y. A simple example is: solve y = 2x and y = 1 − x. This simple example
can be solved algebraically by writing
y = 2x = 1 − x
so 3x = 1 and therefore x = 1/3. Substituting this into the first (or second) equation
gives y = 2 × 1/3 = 2/3. Thus the solution is x = 1/3, y = 2/3. This solution can also be
estimated graphically by plotting the lines y = 2x and y = 1 − x, and finding (accurately)
their point of intersection.
For more complicated equations it is usually impossible to find the solution(s) al-
gebraically, but one can still approximate the solution(s) graphically. The MATLAB
commands plot, zoom and ginput are very useful for this purpose.
◮ Example 2.12 Find the positive solution of the two equations y = cos(x) and
y = x2 . This is equivalent to finding the intersection point of the graphs of the
functions cos(x) and x2 in the first quadrant (i.e., in the positive quadrant of the
X-Y plane). The following program plots the graphs and allows the user to zoom
in on the intersection point to approximate the solution.
Execute the program and zoom in several times by clicking on the intersection
point in the plot window. Press Enter/Return to stop the zoom and ginput
procedures. Then the estimated coordinates of the intersection point, i.e. the
approximate solution values, will be displayed in the Command Window. If a
certain accuracy is required, it is important to use a sufficiently small increment
for the plots and to zoom in to the intersection point sufficiently many times. ◭
and note the difference in the plot. Also on the last line use each of the commands
contour(x,y,z) and meshc(xgrid,ygrid,z) to plot contours of this function.
Exercise Use the MATLAB command meshc to plot the surface defined by
Also use the command contour to draw the contours of f . Note that you can specify the
number n of contour lines by adding this as an option in the form contour(x,y,f,n). If
you want to draw the contours corresponding to specific values of the function f , this can
be done using contour(x,y,f,vec), where vec is a vector containing the desired values.
Experiment with each of these options in the contour plot of the given function f . Use
ginput with the contour plot to estimate the positions of the local maximum and local
minimum of f . What do you think the exact positions are?
Section 2.5. For loop 27
t1 = 3; tn = tn−1 + 2n for n # 2.
The following MATLAB program implements this recursive definition and com-
putes 10 terms of the sequence.
% Compute a sequence
clear
t=3 % initialize variable as first term
for n=2:10
t=t+2*n % update variable: new value = old value + 2*n
end
The statement t=t+2*n in this program may look odd at first sight (because it
is incorrect as an equation). However, such statements are common in computer
programming and have a logical meaning. For this statement, the result of the
computer addition of the values of t and 2*n is assigned to and stored in the
variable t, i.e. the old value of this variable is replaced by the new value. In this
way, at the end of each pass of the loop, the variable t has the value of the new
term of the sequence. Run this program and check the first few terms by hand. ◭
28 Chapter 2. Introduction to MATLAB
Execute this program. Extend the program to compute the total interest In paid
by the bank up to the beginning of year n for n = 1, . . . , 10. ◭
end
disp(’Fibonacci sequence: ’)
disp(f)
plot(f,’o’)
xlabel(’i’)
ylabel(’f’)
title(’Fibonacci number f(i)’)
pause
plot(ratio,’o’)
xlabel(’i-1’)
ylabel(’ratio’)
title(’Ratios f(i)/f(i-1) of Fibonacci numbers’)
Execute this program and observe the behaviour of the ratios. It is known that
√
the ratios approach the golden ratio (1 + 5)/2 = 1.618 · · · as i approaches ∞.
Note that the purpose of the statements initializing f and ratio is to allocate the
correct final size to these vectors. Without these statements, MATLAB must resize
the vectors at each pass of the for loop, which is not as efficient. ◭
A common task arising in many applications is to find the sum of a set of numbers. If
the numbers can be generated easily as a vector v, the most efficient way of doing this in
MATLAB is to use the command sum(v) (as in Example 2.8). Sometimes it is difficult
or inconvenient to generate a vector of values and then a for loop can be used.
◮ Example 2.16 The following MATLAB program computes the sum of the series
1 1 1 1
S = 1+ + + + ··· +
1 1×2 1×2×3 1 × 2 × · · · × 20
% Sum a finite series using for loop
clear
format long
N=20;
s=1;
term=1; % first term
for n=1:N
term=term/n; % new term is defined recursively
s=s+term;
end
disp(’sum of series is’)
disp(s)
30 Chapter 2. Introduction to MATLAB
Note that this program implements two recursive definitions - one for the terms
of the series (computed in the variable term) and one to accumulate the terms in a
partial sum (computed in the variable s). Then at the completion of the for loop,
the variable s has the value of the whole sum.
Run this program to see the result. If you have studied the exponential function,
you may recognize the above series. The sum of the corresponding infinite series
is the number e = 2.71828182845905 · · · . ◭
In MATLAB long for loops should be avoided where possible. In particular, a for
loop should not be used to generate simple function values. It is much more efficient to
use the basic array operations in MATLAB as demonstrated in the following example.
◮ Example 2.17 The following two programs both compute the logarithms of the
integers from 1 to 800000 and time the total computation.
Run the two programs and compare the elapsed times. Which one is more efficient
and by what multiple? Also run the second program without the statement that
initializes the size of y and compare the elapsed time. ◭
Section 2.5. For loop 31
Exercise Write two MATLAB programs to compute the sum of the finite geometric
series
S = 1 + r + r2 + · · · + rn
where the common ratio is r = 0.9 and n = 2000, one using array operations and the
other using a for loop. Use the etime command to time the two programs. Which one
is more efficient and by how much?
Consider the problem of finding a root z of an equation f (x) = 0, i.e. a value z such
that f (z) = 0. Graphically, z is the x-coordinate of a point at which the graph of f (x)
crosses the x-axis.
A well-known simple iterative method for root finding is the Newton-Raphson method.
Given the equation f (x) = 0 and an initial estimate x0 of the root z, this method generates
a sequence of estimates recursively by the formula
f (xn )
xn+1 = xn − , n # 0,
f ′ (xn )
where f ′ is the derivative of the function f . From the following diagram it is easy to see
how the method works. It simply uses the tangent line determined by the current point
xn to generate the next point xn+1 as the point where the line crosses the x-axis. Since
the slope of the tangent line at (xn , f (xn )) is f ′ (xn ), and so the equation of the line is
the next point is defined by setting y = 0 and solving for x. This yields (check this) the
expression for xn+1 above.
z
0
y
xn xn+1
−2 y=f(x)
y=f(xn)+f′(xn) (x−xn)
−4
−6
−1 −0.5 0 0.5 1 1.5 2
x
It can be shown that if the starting estimate x0 is sufficiently close to the root z and
the slope of y = f (x) at z is not zero, then the iterates xn , n = 1, 2, . . . , of the Newton-
Raphson method converge quickly to z. In fact, after each step the number of correct
decimal places is approximately doubled. However, if the starting estimate is not close
enough to a root, the iterates may not even converge.
Run this program with 10 iterations and starting estimates of 1.4, 1, 0.1, 0.001
√
and 0.00001. In the last case the final estimate is a long way from 2. Explain
this using a graph of f (x) = x2 − 2 and the above geometric interpretation of the
Newton-Raphson method. ◭
Section 2.6. Selection statements 33
if y > 0.5
... % some statement(s)
end
which means that the statement(s) are executed if and only if the value of y is > 0.5.
◮ Example 2.19 Work through the following MATLAB program. What do you
think the final value of count represents?
clear
count=0;
for x=-1:0.1:1
y=x^2;
if y > 0.5
count=count+1;
disp([count y])
end
end
Execute the program and check that it gives what you expect. ◭
Instead of > in y > 0.5 above, we could have any one of:
relational operator meaning
< less than
<= less than or equal
> greater than
>= greater than or equal
== equal
~= not equal
A statement of the form y > 0.5 is called a logical expression. Depending on the value
of the variable y, the expression y > 0.5 has value 0 (false) or 1 (true). For example the
statements
y=0.8;
y > 0.5
Similarly, logical expressions involving arrays are treated element-by-element. For exam-
ple the statements
a=[2 4 6];
b=[3 5 1];
a < b
Nested if
If statements can be nested as in the following.
◮ Example 2.20
clear
count=0;
for x=-1:0.1:1
y=x^2;
if y > 0.5
count=count+1;
disp([count y])
if x > 0.5
disp(’Both x and y are > 0.5’)
end
end
end
Execute the program and check that it gives what you expect. ◭
Section 2.6. Selection statements 35
◮ Example 2.21
Execute the program several times with different values of x and check that it gives
what you expect. ◭
When there are more than two cases to consider in levels, one uses elseif as follows.
◮ Example 2.22
Execute the program several times with different values of temp and check that it
gives what you expect. ◭
◮ Example 2.23 Write a MATLAB program using the find command to find the
values of y = x2 for x = −1, −0.9, . . . , 1 that are greater than 0.5, and the number
of these values. Then redefine these values to equal 0.5 and plot the resulting
points.
Note that the length command returns the length of a vector. Run this program
and note that the value of number is the same as the final value of count in
Example 2.19. ◭
has the interesting property that the terms appear to settle down to the simple
cyclical pattern 1,4,2,1,4,2,. . . no matter what the starting positive integer a = t1 .
(See the corresponding exercise at the end of Section 1.1.) It is unknown if this
is true in general and is a famous unsolved problem. The following MATLAB
program generates the terms for any starting integer. Read through the program
and make sure you understand the if-else statements, and the remainder function
rem. Note we have used the command break to jump out of the for loop when
the terms start cycling.
t(n)=t(n-1)/2;
else
t(n)=3*t(n-1) + 1;
end
if t(n)==1 % if terms start cycling
break % jump out of for loop
end
end
disp(’Terms of recurrence formula: ’)
disp(t)
Execute the program a few times with different starting integers. Extend the
program to also display the number of iterations of the formula used before cycling
starts. (Hint: Try using the statement nt=length(t).) ◭
Exercise Write MATLAB programs to solve parts (d) and (e) of Exercise 4 from Section
1.1 (about the space invaders game).
◮ Example 2.25 A certain parcel delivery company has size limits on the parcels it
will accept. The parcel must be no more than 1 metre in length and no more than
0.3 cubic metres in volume. In a particular load, the parcels have the following
sizes:
length volume
0.8 0.15
1.1 0.35
0.75 0.25
1.2 0.2
0.9 0.4
1.15 0.2
0.6 0.15
The size data (numbers) are stored in this format in a file called parcels.dat. Write
a MATLAB program to classify the parcels and for each parcel, print out its length,
volume, and whether its length and volume are acceptable or too big. Also find
and print out: the number of parcels, the number of parcels with length acceptable,
the number of parcels with volume acceptable, the number of parcels with both
length and volume acceptable.
38 Chapter 2. Introduction to MATLAB
Run this program and check that it gives what you expect. Extend the program
using the find command to find the same numbers as above, i.e. the number of
parcels with length acceptable, the number of parcels with volume acceptable and
the number of parcels with both length and volume acceptable, and check that the
results are the same as found above. ◭
Section 2.7. User-defined functions 39
◮ Example 2.26 The following user-defined function (saved as the function M-file
areacirc.m) has input argument r equal to the radius or vector of radii of some
circles and output argument equal to the areas of these circles.
function A=areacirc(r)
% Compute areas of circles
A=pi*r.^2; % the variable name A is the same as in first line
The following MATLAB program (saved as a separate M-file, say progcirc.m) uses
this function M-file to evaluate the area of a circle of given radius and to plot the
area of a circle against its radius for values less than or equal to the given radius.
Consider the problem of finding a zero of a function f (x) (or a root of the equation
f (x) = 0), i.e., a value z such that f (z) = 0. Graphically, the value of z is the x-coordinate
of a point at which the graph of f (x) crosses the x-axis.
MATLAB has a command called fzero that implements a sophisticated iterative
algorithm for finding the zero of a function near a given point. The algorithm has the
important properties that it is guaranteed to converge to a zero and it converges quickly
once the iterates become sufficiently close to the zero. Note that the computed zero is
not the exact zero but a very good estimate of it. The form of the command is
40 Chapter 2. Introduction to MATLAB
z=fzero(’func’,x0)
where func is either a built-in MATLAB function (e.g. sin) or a user-defined function,
and x0 is a rough estimate of the zero.
◮ Example 2.27 Find the smallest positive zero z of the function f (x) = x sin(x).
First get a rough estimate of z using a hand-drawn or MATLAB plot. This gives
z ≈ 3. Then create the function M-file called (say) fxsin.m containing
function y=fxsin(x)
y=x.*sin(x);
Run the program to see the result. The exact value of the zero is z = π. How
accurate is the computed value? ◭
In MATLAB 6 and on, one can also enter the expression for a simple function directly
in the fzero command as
z=fzero(’x*sin(x)’,0.8) (using the variable name x) or
z=fzero(inline(’t*sin(t)’),0.8) (using a variable name other than x).
This may be suitable for a simple expression, but in general one one should use a user-
defined function as above.
◮ Example 2.28 Consider Example 2.12 again, i.e. find the positive solution of
the equations y = cos(x) and y = x2 . If x and y satisfy these equations, then
y = x2 = cos(x), so x2 − cos(x) = 0. This means the x value of the solution is
the zero of f (x) = x2 − cos(x) for x > 0. A simple sketch shows that this zero is
near 0.8. To find the zero (and hence the solution of the equations) in MATLAB,
create a function M-file called func.m containing
function y=func(x)
y=x.^2 - cos(x);
0.8241 0.6792
To find a local minimum point (a local trough in the graph) or local maximum point
(a local peak in the graph) of a function, one can plot the function in MATLAB and use
zoom and ginput (as before) to estimate the point. Clearly this also applies to a global
minimum point (deepest trough) and a global maximum point (highest peak).
Exercise Write a MATLAB program to plot the graph of y = x sin(6x) for x ∈ [0, 2π]
and use the zoom and ginput commands to find the coordinates of the global minimum
and global maximum points.
There is a special command fminbnd in MATLAB to accurately compute the location
of a local minimum of a function of one variable. The command is used as:
xmin=fminbnd(’funcm’,x1,x2)
where funcm is either a built-in MATLAB function (e.g. sin) or a user-defined function.
Then xmin has a value that locally minimizes the function in the interval between x1 and
x2. Note that xmin is only a local minimizer of the function in the interval [x1,x2]; it is not
necessarily a global minimizer in the interval. In other words, if the graph of the function
has several troughs in the interval, fminbnd will find the location of one of the troughs,
but not necessarily the deepest trough. To find a particular local minimum (perhaps the
global minimum), it is best to plot the function first and narrow down the interval so
it contains only the one desired local minimum. With this new interval entered in the
fminbnd command, the location of the desired minimum will be found. The minimum
value of the function is then found using
ymin=funcm(xmin)
42 Chapter 2. Introduction to MATLAB
◮ Example 2.29 The following MATLAB program computes a local minimum point
of a function contained in a specified subinterval.
We can use the program on the function f (x) = x sin(6x), which is defined in a
function M-file called funcm.m as follows:
function y=funcm(x)
y=x.*sin(6*x);
Run the program with the domain equal to [0, 2*pi]. Use the graph to identify
a small subinterval containing the global minimum (and no other local minima).
After entering this subinterval, you should obtain the values
xmin=6.0260
ymin=-6.0237
Run the program with both the domain interval and the subinterval equal to
[0, 2*pi]. Do you get the same results as before? ◭
In MATLAB 6 and on, it is possible to enter the expression for a simple function
directly in the fminbnd command, for example
y=fminbnd(’x^4-2*x’,-2,2) (using the variable name x) or
y=fminbnd(inline(’t^4-2*t’),-2,2) (using a variable name other than x).
This may be suitable for a simple expression, but in general one should use a user-defined
function as above.
Section 2.7. User-defined functions 43
The fminbnd command can also be used to compute a local maximum point of a
function f (x). To do this, let g(x) = −f (x) and compute a local minimizer xm of g(x)
using the fminbnd command with a function M-file defining g(x). Then xm also defines a
local maximum of f (x). To convince yourself of this, draw a simple graph of a function
f (x) having a local maximum at x =xm. Then draw the graph of g(x) = −f (x) by simply
reflecting the graph of f (x) about the x-axis. Notice that g(x) has a local minimum at
x =xm.
Exercise Use the program in Example 2.29 with a new function M-file to find the global
maximum point of f (x) = x sin(6x) for 0 $ x $ 2π.
Chapter 3
Computer Arithmetic
x = (±)Dn . . . D1 D0 .d1 d2 d3 . . . ,
(Recall that 100 = 1, 10−1 = 1/10, 10−2 = 1/102 = 1/100, and so on.)
Included in the set of real numbers are all whole or natural numbers 1, 2, 3, . . . , all
integers, 0, ±1, ±2, ±3, . . . , and all fractions or rational numbers of the form p/q where p
and q are integers with q ∕= 0. In the decimal expansion of any whole number, the digits
d1 , d2 , d3 to the right of the decimal point are all equal to 0, and are usually not included.
The decimal expansions of some fractions terminate in the same manner. All others are
repeating, and the repeating parts are often indicated by lines or dots.
◮ Example 3.1 The decimal expansions of the rational numbers 198/125 and 125/198
are
198/125 = 1.584 and 125/198 = .63131313131 . . . = .631
The set of real numbers also includes all irrational numbers. Their decimal expansions
neither terminate nor repeat.
√
◮ Example 3.2 The decimal expansions of π and 2 begin
√
π = 3.14159265 . . . and 2 = 1.41421356 . . . .
44
Section 3.1. Numbers and their representations 45
Binary numbers The fact that 10 is used as the base for the everyday number system
is most likely a product of human anatomy. If people had 6 fingers on each hand then
numbers would probably most commonly be expressed in the duodecimal system (which
uses 12 as the base). The binary system uses the number 2 as the base and only two
digits, 0 and 1 are needed in binary expansions of real numbers. It is very suitable for
mechanical or electronic devices with components for which there are just two natural
states, for example switches that are either on (1) or off (0), or capacitors that are either
charged (1) or not charged (0).
The binary expansion of a real number x has the form
x = (±)Bm . . . B1 B0 .b1 b2 b3 . . . ,
(Recall that 20 = 1, 2−1 = 1/2, 2−2 = 1/22 = 1/4, 2−3 = 1/23 = 1/8, and so on.)
The disadvantage with this system is that longer strings of digits are needed to represent
integers or fractions to the same accuracy. For example, the sixteen digit binary number
1001000011111101 has the five decimal digit expansion 37117, and the binary expansion
of π to twenty places, namely, 11.00100100001111111010, approximates π no better than
its decimal expansion to six places. In general, about 10 binary digits are required to
match every three decimal digits, and the reason for this is that
This approximation is commonly used in engineering and science and is well worth re-
membering. In fact when computer scientists speak of kilobytes or megabytes, they are
usually talking of 210 = 1024 or 220 = 1048576 bytes rather than a thousand or a million
bytes. (In computer jargon a byte consists of a small number of bits.)
Octal and hexadecimal numbers To reduce the tedium and higher likelihood of
error associated with long sequences of bits, it is often convenient to use a system with a
base that is a power of 2. The numbers 8 and 16 are the bases of the popular octal and
hexadecimal systems, respectively.
The 8 octal digits are normally written as 0, 1, 2, 3, 4, 5, 6, and 7. The representation of
numbers in octal is more compact than it is in binary because every octal digit corresponds
to three binary digits according to:
octal 0 1 2 3 4 5 6 7
binary 000 001 010 011 100 101 110 111
46 Chapter 3. Computer Arithmetic
The fact that the octal digits are the same symbols that we use in decimal expansions
has the potential for confusion. To avoid this we write the base as a subscript.
◮ Example 3.3 (367.35)8 is an octal number whose decimal expansion can be ob-
tained as follows:
There are 16 hexadecimal digits and for these we need another 6 symbols apart from
the 10 decimal digits. For these it is usual to use A, B, C, D, E and F . The decimal
equivalents of these hexadecimal digits are given by
A = 10 B = 11 C = 12 D = 13 E = 14 F = 15.
0 1 2 3 4 5 6 7 8 9 A B C D E F
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
To make it clear that we are using hexadecimal representation we write the base 16
as a subscript.
Conversions from decimal There is a very simple method for doing such conversions
of decimal numbers to other bases. We illustrate it by considering conversions to binary.
Suppose that Bm Bm−1 . . . B1 B0 is the binary expansion of a whole number N . This
means that
N = Bm × 2m + Bm−1 × 2m−1 + . . . + B1 × 21 + B0 × 20
(3.1)
= 2(Bm × 2m−1 + Bm−1 × 2m−2 + . . . + B1 × 20 ) + B0
◮ Example 3.5 To convert the decimal number 6023 to binary, divide repeatedly by
2 and note the remainders. Since
6023 = 2 × 3011 + 1
3011 = 2 × 1505 + 1
1505 = 2 × 752 + 1
752 = 2 × 376 + 0
376 = 2 × 188 + 0
188 = 2 × 94 + 0
94 = 2 × 47 + 0 ,
47 = 2 × 23 + 1
23 = 2 × 11 + 1
11 = 2×5 + 1
5 = 2×2 + 1
2 = 2×1 + 0
1 = 2×0 + 1
The same approach works for other bases. So, for example, to convert numbers to
hexadecimal, we divide repeatedly by 16, noting the remainders along the way, and to
convert to octal we repeatedly divide by 8.
48 Chapter 3. Computer Arithmetic
◮ Example 3.6 To convert the decimal number 6023 to octal, divide repeatedly by
8 and note the remainders. Since
6023 = 8 × 752 + 7
752 = 8 × 94 + 0
94 = 8 × 11 + 6 ,
11 = 8×1 + 3
1 = 8×0 + 1
we see that (6023)10 = (13607)8 . Notice how the digits in the octal expansion
can be obtained simply by grouping the binary digits in threes, starting from the
right. ◭
◮ Example 3.7 To convert the octal number 13607 to decimal, note that
(13607)8 = 1 × 84 + 3 × 83 + 6 × 82 + 7 × 80
= 1 × 4096 + 3 × 512 + 6 × 64 + 7 × 1
= 4096 + 1536 + 384 + 7
= 6023.
Binary, octal and hexadecimal conversions. Conversions between binary and octal
and between binary and hexadecimal are easy because we simply group the binary digits
in threes or fours respectively.
◮ Example 3.8
2. Octal to binary : (34217)8 = (11 100 010 001 111)2 (The leading 0 is omitted)
4. Hexadecimal to binary (5E3B)16 = (101 1110 0011 1011)2 (The leading 0 is omitted)
6. Hexadecimal to octal (F 2C)16 = (1111 0010 1100)2 = (111 100 101 100)2 = (7454)8
Section 3.1. Numbers and their representations 49
A B 2 F 1
A B C D E F
1 5 1B C 7
10 11 12 13 14 15
B 0 E B 8
Note that with the aid of the table, the addition in each column can be done
mentally and the sum is then represented in hexadecimal, e.g. (F + C)16 is the
number twenty seven, which is 16 + 11 = (1B)16 , so we write B and carry the 1. ◭
So the only new arithmetic you need to learn is addition in a general base.
Exercises
1. Make the following base conversions :
4. Write out a multiplication table which shows the products of all pairs of hexadecimal
digits.
◮ Example 3.12 The speed of light c in a vacuum is exactly 299, 792, 500 m/s. The
floating point form of c is
c = 0.2997925 × 109 ,
in which 0.2997925 is the mantissa and 9 is the exponent. ◭
We move from one form to another by shifting the decimal point left or right and adjusting
the exponent accordingly. However we shall adopt the convention that the floating point
decimal representation of a non-zero real number x has the form
where the exponent ε is an integer and where δ1 , δ2 , δ3 are decimal digits with δ1 ∕= 0. In
other words, the leading non-zero digit in the expansion is to occur immediately to the
right of the decimal point. This digit δ1 is often called the most significant digit, δ2 is
called the second-most significant digit and so on.
The (decimal) mantissa of any non-zero number x lies between 0.1 and 1, and so the
size or magnitude of x is reflected mainly by its exponent. If |x| is very large then the
exponent is large and positive, whereas if |x| is very small (i.e. x is very close to 0), then
the exponent is large and negative.
52 Chapter 3. Computer Arithmetic
! = 0.6626196 . . . × 10−33
The large but negative exponent indicates that this is an extremely small num-
ber. ◭
The same idea can be applied to other systems besides decimals. For example, the
floating point binary representation of a non-zero real number x has the form
x = (±) 0.β1 β2 β3 . . . × 2γ ,
where the binary exponent γ is an integer, and where β1 , β2 , β3 , . . . are binary digits with
β1 ∕= 0. Similarly, octal floating point representations of non-zero numbers have the form
x = (±) 0.µ1 µ2 µ3 . . . × 8ν ,
where the octal exponent ν is an integer, and where µ1 , µ2 , µ3 , . . . are octal digits with
µ1 ∕= 0.
0.41360342 × 8−(12)8 .
The first 2 bits represent the signs of the number and its exponent respectively
(we assume that 0 ↔ + and 1 ↔ −), the next 6 bits (001 010) represent the octal
exponent (12)8 and the last 24 bits (100 001 011 110 000 011 100 010) represent the
mantissa. ◭
Section 3.3. Computer storage of numbers 53
Overflow and underflow errors: The largest positive number that can be stored in
the form described above is, in octal form,
Roundoff errors: Since our computer uses a mantissa with 8 octal digits, it cannot
store accurately any number with more than 8 significant digits in its floating point octal
form. Such numbers must be approximated by numbers having no more than 8 octal
digits in the mantissa. One common and easy way to obtain such an approximation is to
chop all but the first few significant digits (8 in this case).
An alternative to chopping is the so called symmetric rounding. When symmetric
rounding is used, the last digit kept is adjusted up by one if the next digit is at least half
of the base. So for example, if only 3 octal digits are being kept, 4.233 would be rounded
to 4.23, but 4.236 would be rounded to 4.24. Symmetric rounding is more familiar with
decimals. For example, if we are rounding to 3 significant decimal figures, 5.231 is rounded
to 5.23 but 5.237 is rounded to 5.24.
The errors associated with these approximations are called roundoff errors. The round-
off error when using symmetric rounding is, on average, half the roundoff error when using
chopping.
There are two ways of looking at the error involved in approximating one number by
another. If the number x∗ is used to approximate another number x, then the absolute
error in the approximation is defined by
εabs = |x∗ − x| .
54 Chapter 3. Computer Arithmetic
On the other hand, the relative error compares the magnitude of the error with the
magnitude of the numbers in question, and is defined by
|x∗ − x|
εrel = .
|x|
◮ Example 3.15 If we keep just the first three significant digits in the decimal ex-
pansion of c, the speed of light (as in Example 3.12) then the absolute error is
Roundoff error and the size of the mantissa: The results in Example 3.15
illustrate the fundamental property that when we chop or round numbers to K significant
decimal digits, the relative roundoff error (in practically all cases) is approximately 10−K .
Similar results hold for other bases. For example, in the computer described at the
beginning of this section, where 24 binary digits are set aside for storing the 8 octal
digit mantissa of a non-zero number, the relative roundoff error will be (in most cases)
approximately 8−8 . Since 8−8 ≈ 10−7 , such a computer stores numbers correct to about
7 significant decimal digits.
IEEE floating point standard: Beginning in 1985, all computers used the ANSI/IEEE
(American National Standards Institute/Institute of Electrical and Electronic Engineers)
Standard 754-1985 for floating point arithmetic. This standard was superseded in 2008
Section 3.4. Subtractive cancellation 55
by the current version (IEEE 754-2008), which includes nearly all of the original IEEE
754-1985 standard as well as the IEEE Standard for Radix-Independent Floating-Point
Arithmetic (IEEE 854-1987). For the usual double precision representation, the IEEE
754-2008 standard has a 64-bit word and uses base 2. The Standard involves an extension
of the ideas just discussed, and although the details will not be discussed further here,
you can find more information about this in appropriate books and on the web.
Notice that the mantissa of the product 0.1986482 is the product of the mantissas (0.2997925
and 0.6626196), and the exponents 9 and −33 are added to give the exponent of the prod-
uct. (Sometimes it is necessary to adjust the exponent down by one.)
If we chop both c and ! to 3 significant figures and multiply, we get
f (x + h) − f (x)
f ′ (x) ≈ .
h
56 Chapter 3. Computer Arithmetic
This difference quotient must give a better and better approximation of f ′ (x) as
h → 0. However if the values of f (x) are known only to a certain number of
significant digits, then there is a definite limit to the accuracy to which we can
approximate f ′ (x).
√
Consider the case if f (x) = x, and we want to estimate f ′ (2), the derivative
at x = 2, using only 5 decimal figure accuracy. The following table shows the value
of the difference quotient, for various choices of h.
√ √ √ √ √ √
h 2+h 2 2 + h − 2 ( 2 + h − 2)/h
1 1.7321 1.4142 0.3179 0.3179
10−1 1.4491 1.4142 0.0349 0.349
−2
10 1.4177 1.4142 0.0035 0.35
−3
10 1.4146 1.4142 0.0004 0.4
10−4 1.4142 1.4142 0 0
None of these is a particularly good estimate of the exact answer, which we can show
√
using calculus is 1/(2 2) which is 0.35355, to 5 significant figures.
Exercises
5. Find the absolute and relative errors in the following calculations, given that the
numbers 7.351 and 19.163 are to be chopped to 3 significant digits:
6. The sum
220
1 1 1 1 1 1 1
= + + + + . . . + +
n=1
n2 1 4 9 16 361 400
can be calculated by adding the terms in decreasing order of magnitude, as indicated
here, or in increasing order of magnitude, as in
220
1 1 1 1 1 1
2
= + + ... + + .
n=1
n 400 361 9 4 1
(a) Carry out these summations, using only 2 significant figures at each stage of
the calculations?
(b) Which answer is more likely to be correct? Justify your claims.
Section 3.4. Subtractive cancellation 57
Execute the program with b = 5, 10, 100 and 1000. Why is the second computed
root more accurate than the first, especially for large b?
8. Consider the problem of computing the sum of the finite geometric series
S = 1 + r + r2 + · · · + rn ,
where the common ratio satisfies 0 < r < 1 and n is large. Is it better to compute the
sum in ascending or descending order to obtain the most accurate value? (Remember
there will be accumulated roundoff errors in the computed values.) Since we know
a formula for the exact value of the sum, we can just compute the sum in both ways
and compare the errors. This is implemented in the following MATLAB program,
with a range of r from 1/50 to 49/50 in increments of 1/50.
for i=0:n
sumd=sumd+r^i;
suma=suma+r^(n-i);
end
sumex=(1-r^(n+1))/(1-r); % formula for exact sum
errd(j)=sumd-sumex;
erra(j)=suma-sumex;
end
plot(rr,errd,rr,erra,’--’)
xlabel(’Common ratio r’)
ylabel(’Errors in finite sum of geometric series’)
title(’Sum order: descending (solid), ascending (-)’)
Execute the above program with n = 100. What do you observe about the errors
in the two computed sums? Execute the program again with n = 500.
Note that care must be taken in the use of the exact formula for sumex. If r is very
close to 1, subtractive cancellation in both the numerator and denominator will lead
to a large roundoff error in the computed value.
2. (a) (110000011)2
(b) (11210104)5
(c) (BDA8890)16
(d) (A497B)16
7. Note that for a = c = 1 and large b, subtractive cancellation occurs in the term
√
-b+sqrt(b^2-4*a*c) because b2 − 4ac is close to b. This makes the computed
root r less accurate than s.
Chapter 4
60
Section 4.1. Basic Set Theory 61
(There is disagreement among mathematicians about whether or not the number zero
should belong to N, the so-called natural numbers. Consequently, some textbooks will
have 0 ∈ N while others will have 0 ∈/ N. Watch out for this when you are looking at
other textbooks.)
A ⊆ B.
! Two sets are equal if they have exactly the same elements as each other, or equiva-
lently if each set is contained in the other. That is,
For example, the three sets A = {x | 3x2 + 4x + 1 = 0}, B = {−1, −1/3}, and
C = {−1, −1/3, −1} are equal.
i) A ⊆ A;
There is only one such set with no elements, which is called the empty set and is denote
by ∅. That is, if S and T are both empty sets then S = T = ∅, since S and T have
exactly the same elements (none). The empty set is regarded as a proper subset of every
other non-empty set. More formally, for any set A, we have ∅ ⊆ A ⊆ U. This statement
is best proved to be true by showing that it cannot be false. We observe that ∅ ⊆ A is
false only if there is an element in ∅ which is not in the set A. Since ∅ has no elements
at all, this is absurd. We must conclude that ∅ ⊆ A for every set A.
◮ Example 4.1 For any set A, the power set P(A) is defined to be the set of all
subsets of A. Some examples are below.
Exercises
1. Which of the following statements are true, and which are false, given that
A = {p, q, r}?
(a) p ∈ A
(b) p ⊂ A
(c) {p} ∈ A
(d) {p} ⊂ A
(e) p ∈ {A}
(f ) {p} ⊂ {A}
Section 4.1. Basic Set Theory 63
2. Which of the following sets are equal: ∅, {0}, {∅}, and {∅, ∅}?
A ∪ B = {x | x ∈ A or x ∈ B}.
Example: R = Q ∪ I.
Intersection: The intersection of two sets A and B, denote by A ∩ B, is the set of all
elements that belong to both A and B, i.e.,
A ∩ B = {x | x ∈ A and x ∈ B}.
Disjoint Union: Two sets A and B are said to be disjoint (or non-intersecting) if A
and B have no elements in common, i.e., A and B are disjoint if A ∩ B = ∅, in
which case A ∪ B is called the disjoint union of A and B.
Example: The set of all real numbers R is the disjoint union of the rational numbers
(fractions) Q and the irrational numbers I.
Difference: The difference of two sets A and B, denoted by A \ B (or sometimes A − B),
is the set of elements that belong to A but do not belong to B, i.e.,
A \ B = {x | x ∈ A and x ∕∈ B}.
A ⊕ B = (A ∪ B) \ (A ∩ B) or A ⊕ B = (A \ B) ∪ (B \ A).
Complement: The complement of a set A with respect to a given universal set U,
denoted by Ā (or AC ), is the set of elements that belong to U but which do not
belong to A. That is,
Ā = {x | x ∈ U, x ∕∈ A} = U \ A.
64 Chapter 4. Counting & Discrete Probability
Direct Product: The direct product (or Cartesian product) of two sets A and B, denoted
by A × B, is the set of all ordered pairs (a, b) where a ∈ A and b ∈ B, i.e.,
A × B = {(a, b) | a ∈ A, b ∈ B}.
For example, (x, y) × {1, 2, 3} = {(x, 1), (x, 2), (x, 3), (y, 1), (y, 2), (y, 3)}. Note that
the order of the elements in a pair matters; for instance, (x, 1) is not the same as
(1, x) in that (1, x) ∕∈ (x, y) × {1, 2, 3}.
◮ Example 4.2 If U = {1, 2, . . . , 10}, A = {1, 2, 4, 8}, and B = {1, 3, 5, 7, 9}, then
(a) A ∪ B = {1, 2, 3, 4, 5, 7, 8, 9}
(b) A ∩ B = {1}
(c) A ⊕ B = {2, 3, 4, 5, 7, 8, 9}
(d) A \ B = {2, 4, 8}
(e) B \ A = {3, 5, 7, 9}
(f ) Ā = {3, 5, 6, 7, 9, 10} ◭
! If B ⊆ A, then the circle representing the set B will lie entirely within the circle
representing the set A.
U
B A
! If A and B are disjoint sets, then the circles representing these sets do not overlap.
U
B A
Section 4.1. Basic Set Theory 65
! For any two sets A and B, it is possible that some elements are in A but not in B,
some elements are in B but not in A, some elements are in both A and B, and some
elements are neither A nor B. This more general situation is depicted below.
U
A B
Idempotent Laws
! A∪A=A
! A∩A=A
Associative Laws
! (A ∪ B) ∪ C = A ∪ (B ∪ C)
! (A ∩ B) ∩ C = A ∩ (B ∩ C)
Commutative Laws
! A∪B =B∪A
! A∩B =B∩A
Distributive Laws
! A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
! A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
Identity Laws
! A∪∅=A
! A∪U=U
! A∩U=A
! A∩∅=∅
66 Chapter 4. Counting & Discrete Probability
Involution Law
!  = A
Complement Laws
! A ∪ Ā = U
! A ∩ Ā = ∅
! Ū = ∅
! ∅
¯ =U
DeMorgan’s Laws
! (A ∪ B) = Ā ∩ B̄
! (A ∩ B) = Ā ∪ B̄
Exercises
4. Let U = {a, b, c, d, . . . , x, y, z} (the set of letters of the English alphabet) and let us
define the following four sets:
A = {a, j, k}
E = {b, c, d, e, g, p, t, v}
F = {a, b, c, d, e, f, g, h, i, j, k, l, m}
V = {a, e, i, o, u}
(a) A ∪ E
(b) V ∩ F
(c) F̄
(d) E \ V
(e) A ⊕ V
(f ) A ∪ (E ∩ F )
5. Prove that for any sets A and B, we have A∩B ⊆ A ⊆ A∪B and A∩B ⊆ B ⊆ A∪B.
6. Given any two sets A and B with A ∩ B = ∅, what are the sets A \ B and B \ A?
(A ∪ B) \ (A ∩ B) = (A \ B) ∪ (B \ A).
This shows that the LHS or the RHS can be used to define A ⊕ B.
9. Given any two sets A and B such that A ∩ B̄ = ∅, what are the sets A ∩ B and
A ∪ B?
! The set of denominations of Australian banknotes, namely {5, 10, 20, 50, 100}, is
clearly finite.
! The set of prime numbers {2, 3, 5, 7, 11, 13, . . .} is infinite (proved by Euclid).
! The unit interval on the real line, namely [0, 1] = {x ∈ R | 0 ≤ x ≤ 1}, is an infinite
set.
! It is well-known that the unit interval [0,1] on the real line is uncountable.
Proposition 4.2 Suppose A and B are disjoint finite sets. Then A ∪ B is finite and
|A ∪ B| = |A| + |B|.
That is to say, if S is the disjoint union of finite sets A and B, then |S| = |A| + |B|.
For any finite sets A and B, the set A is the disjoint union of A \ B and A ∩ B; hence
we deduce the following result from Proposition 4.2.
Corollary 4.3 Suppose A and B are finite sets. Then |A \ B| = |A| − |A ∩ B|.
Exercises
12. Prove Proposition 4.2.
13. Verify Proposition 4.2 and Corollaries 4.3–4.4 for the sets U = {1, 2, 3, 4, 5, 6},
A = {2, 4, 6}, and B = {1, 3, 5}.
Theorem 4.5 (Inclusion-Exclusion Principle) Suppose A and B are finite sets. Then
the sets A ∪ B and A ∩ B are finite, and we have
The name comes from the idea that the principle is based on over-generous inclusion,
followed by compensating with exclusion. That is, to find the number of elements in A or
B (or both), i.e., the number of elements in the union of A and B, we count the number
of elements in A and the number of elements in B (inclusion) and then we subtract
the number of elements that are in both A and B (exclusion) since these elements were
counted twice.
Using mathematical induction (see Chapter 9, later), one can generalise the preceding
theorem to any number of finite sets. For instance, in the case of three finite sets A, B,
and C, we have
Exercises
14. Convince yourself that formula (4.1) is true.
15. For a group of students undergoing a Mathematics degree, the following facts were
observed.
Out of a total of 50 students, 30 were taking a course in Statistics, 18 were taking
a course in Computer Science, and 26 were taking a course in Engineering. More-
over, 9 students were taking courses in both Statistics and Computer Science, 16
students were taking courses in both Statistics and Engineering, 8 students were
taking courses in both Computer Science and Engineering, and 47 students were
taking courses in at least one of the three areas: Statistics, Computer Science, and
Engineering.
Find the number of students who were:
(a) not taking courses in any of the three areas (Statistics, Computer Science, and
Engineering);
(b) taking courses in all three of the areas.
More generally, let E1 , E2 , . . . , Ek be independent events and suppose that each event
Ei can occur in ni ways. Then all of the events together can occur in n1 n2 · · · nk different
ways.
Sum Rule: If A and B are events such that A can occur in m different ways and B can
occur in n different ways, then there are m + n different ways in which either A or B (but
not both) can occur.
70 Chapter 4. Counting & Discrete Probability
Exercises
16. How many ordered pairs of integers (x, y) are there such that 0 < |xy| ≤ 5?
17. A cycling club consists of 17 male members and 12 female members. The club
needs to decide on which of its members will compete in a forthcoming cycling
championship. Assuming all members are available and capable of competing in
any event at the championship, determine the number of ways that the club can
choose:
n! = 1 · 2 · · · (n − 1) · n = n · (n − 1) · · · 2 · 1.
For example, 3! = 3 · 2 · 1 = 6, 4! = 4 · 3 · 2 · 1 = 24, and 5! = 5 · 4! = 5(24) = 120.
Note: By convention 0! = 1. This is consistent with the combinatorial interpretation
of there being exactly one way to arrange 0 objects, i.e., only one permutation of zero
elements (namely, the empty set ∅) – see Corollary 4.9 later.
5 6
Binomial Coefficients: For any positive integers n, r with r ≤ n, the symbol nr , read
as “n choose r”, is defined by
7 8
n n! n(n − 1) · · · (n − r + 1)
= = .
r r!(n − r)! r(r − 1) · · · 2 · 1
Section 4.2. Basic Counting Techniques 71
By convention, 7 8 7 8
n n! 0 0!
= = 1 and = = 1.
0 0!n! 0 0!0!
5n 6
Fact: r
has exactly r factors in both the numerator and denominator.
Examples:
7 8 7 8 7 8
7 7·6·5 10 10 · 9 8 8·7·6·5·4
= = 35 = = 45 = = 56
3 3·2·1 2 2·1 5 5·4·3·2·1
Note: Proposition 4.6 is easily verified using the definition of a binomial coefficient. The
identity is also evident from the left/right symmetry of Pascal’s Triangle (discussed in
the next subsection).
5 6
To compute 11 9
in the same way as in the preceding examples, there will be 9 factors
in both the numerator and denominator. But 11 − 9 = 2, so by Proposition 4.6 we have
7 8 7 8
11 11 11 · 10
= = = 55.
9 2 2·1
This triangle has many interesting properties (Google it!). Of particular interest is
the fact that the numbers in Pascal’s Triangle satisfy the following two properties.
72 Chapter 4. Counting & Discrete Probability
! Every other number in the triangle can be obtained by adding the two numbers
directly above it.
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
.. .. .. .. .. ..
. . . . . .
Since the numbers in Pascal’s Triangle are binomial coefficients, we can state the above
properties more succinctly in the form of an identity, as follows.
This is a remarkably handy theorem that crops up all over the place.
Exercises
18. Expand and simplify (2x4 − y)3 .
Roughly speaking:
Most counting problems we shall be dealing with can be classified into one of four
categories. We will explain each of these categories by means of an example.
Consider the set {a, b, c, d}. Suppose we wish to choose two letters from the given set
of four letters. Depending on our interpretation, we may obtain the following answers.
Permutations with repetitions: The order of listing the letters is important, and rep-
etition is allowed. In this case, there are 4 · 4 = 16 possible choices.
aa ab ac ad
ba bb bc bd
ca cb cc cd
da db dc dd
Permutations without repetitions: The order of listing the letters is important, and
repetition is not allowed. In this case there are 4 · 3 = 12 possible choices.
ab ac ad
ba bc bd
ca cb cd
da db dc
Combinations without repetitions: The order of listing the letters is not important,
and repetition is not allowed. In this case there are 4·3
2
= 6 possible choices.
ab ac ad
bc bd
cd
Combinations with repetitions: The order of listing the letters is not important, and
repetition is allowed. In this case there are 4·3
2
+ 4 = 10 possible choices.
aa ab ac ad
bb bc bd
cc cd
dd
For example:
Theorem 4.8 For any positive integers n, r with r ≤ n, the number of r-permutations
of n distinct objects, denoted by P (n, r), is given by
n!
P (n, r) = n(n − 1)(n − 2) · · · (n − r + 1) =
(n − r)!
n! = n · (n − 1) · · · 2 · 1.
Exercises
20. Find the number of 3-letter words using only the six letters a, b, c, d, e, f without
repetition.
22. Find the number of ways a judge can award first, second, and third places in a
contest with 18 contestants.
23. Find the number of ways that 9 people can arrange themselves:
Often we would like to know the number of permutations of a multiset, i.e., of a set of
objects, some of which are alike.
Theorem 4.10 Suppose a set of n objects can be sorted into k types: n1 objects of type
1, n2 objects of type 2, . . . , nk objects of type k. Then the number of permutations of
these n (= n1 + n2 + · · · + nk ) objects is
n!
n1 !n2 ! · · · nk !
For example, let us find the number of 9-letter words that can be formed using the
letters of the word unusually. We seek the number of permutations of 9 letters of which
9!
3 are u’s and 2 are l’s and no other letters are repeated. This number is 3!2!1!1!1!1! = 30240,
by Theorem 4.10.
5 6
Note: The expression appearing in Theorem 4.10 is often denoted by n1 ,n2n,...,nk , called
a multinomial coefficient.
r
9 · n:;· · · n< = n .
n
r times
Exercises
25. Find the number of different “words” that can be formed from all the letters in each
of the following words:
(a) maths
(b) running
(c) committee
26. In a small box of Christmas lights, there are 4 identical red lights, 2 identical green
lights, and 3 identical yellow lights. The lights are to be placed in a row along a
window sill. Find the number of different arrangements of the 9 lights.
27. A box contains 12 Cadbury TM dairy milk chocolate bars. Find the number of ordered
samples of size 3 if:
(a) you eat each chocolate bar before selecting the next one;
(b) you are indecisive and put each chocolate bar back in the box before selecting
the next one.
Theorem 4.11 For any positive integers n, r with r ≤ n, the number of r-combinations
of n distinct objects is given by
P (n, r) n!
C(n, r) = =
r! r!(n − r)!
Exercises
28. A jar contains 8 red jellybeans and 6 purple jellybeans (yum!). Find the number of
ways 2 jellybeans can be taken from the jar if:
29. In a Discrete Mathematics class, consisting of 12 students, the lecturer has prepared
3 different final examinations. Find the number of different ways in which the
lecturer can allocate 4 students to each of the three exam papers, assuming each
students sits only one exam.
Let us consider a particular problem that I faced the other day. A certain supermarket
sells k = 4 different flavours of yoghurt: blueberry (b), plain (p), strawberry (s), and
vanilla (v). In how many ways can I buy r = 8 tubs of yoghurt?
Since order does not matter, this is an example of counting combinations with repe-
titions. Each combination can be listed with the b’s first, followed by p’s, then s’s, and
lastly v’s. For instance, we have combinations of the form:
Now we can easily count these “codewords” since each codeword contains r + k − 1 = 11
digits where r = 8 are 0’s and k−1 = 3 are 1’s. Hence the number of possible combinations
is
11 · 10 · 9
C(11, 8) = C(11, 3) = = 165.
3·2·1
Too many different possibilities for me to worry about, so I just bought one tub of each
flavour of yoghurt!
Using similar reasoning as above, we can prove the following more general result.
78 Chapter 4. Counting & Discrete Probability
Theorem 4.12 Suppose there are k different kinds of objects. Then the number of
combinations of r such objects is
C(r + k − 1, r) = C(r + k − 1, k − 1).
Exercises
30. (a) In how many ways can 100 be written as a sum of four non-negative integers?
That is, find the number of non-negative integer solutions to the equation
x + y + w + z = 100.
(b) In how many ways can 100 be written as a sum of four positive integers?
31. (a) Prove the following theorem of De Moivre: For any positive integer n, the
number of positive integer solutions to the equation x1 + · · · + xr = n is
C(n − 1, r − 1).
(b) Using De Moivre’s theorem, prove that the number of non-negative integer
solutions to the equation x1 + · · · + xr = n is C(n + r − 1, r − 1).
! In a selection of 677 people from a telephone directory, there will be at least two
people whose first names and last names begin with the same letters.
More generally, we observe that if n pigeonholes are occupied by kn+1 or more pigeons
for some positive integer k, then at least one pigeonhole is occupied by k + 1 or more
pigeons.
Theorem 4.14 (Generalised Pigeonhole Principle) If m pigeons are placed into k
pigeonholes, then at least one pigeonhole will contain more than ⌊ m−1
k
⌋ pigeons (where ⌊x⌋
denotes the greatest integer less than or equal to x).
Section 4.3. Discrete Probability Theory 79
Exercises
32. Prove that any set of m + 1 distinct elements taken from the set {1, 2, . . . , 2m}
contains two consecutive numbers.
33. Suppose five points are chosen from the interior of a square with each side of length 2.
√
Show that the distance between at least two of the points must be no more than 2.
34. Determine the minimum number of people needed to ensure that at least three of
them are born in the same month.
35. An employee’s time-clock shows that she worked 81 hours over a period of 10 days.
Show that on some pair of consecutive days, the employee worked at least 17 hours.
! On roulette wheel, there are 37 possible slots where the ball might come to rest.
We cannot predict the outcome each time, but we can record the distribution of the
number of times each slot occurs over a long period of time.
There are two ways to get probabilities.
! Note that the numerator $ denominator with the numerator non-negative and the
denominator positive; hence we have 0 $ p $ 1.
We can represent the sample space S and the events of a random phenomenon using set
theory and Venn diagrams, as follows.
! Disjoint Events: Two events A and B that have no outcomes in common are said
to be disjoint events, i.e., A and B are disjoint events if they cannot both occur at
the same time.
! Union of Events A ∪ B: The union of two events A and B is the event consisting
of all outcomes which are in A or B or both – it is the shaded area in the following
figure.
Along with the terminology and diagrammatical representations that we have just
considered, we have some mathematical notation too.
! Since probabilities can be thought of as long term relative frequencies, all rules for
probabilities must work for proportions as well.
RULE 1: For any event A, the probability of A occurring lies between 0 and 1, i.e.,
0 $ P (A) $ 1.
RULE 2: The collection S of all possible outcomes has probability 1, i.e., P (S) = 1.
Remarks:
RULE 3: For any event A, the probability that A does not occur is P (Ac ) = 1 − P (A).
RULE 4: For any events A and B, we have
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Conditional Probability: If a card is drawn from a deck and not replaced, the outcome
for the first card could influence the outcome for the second card. Thus, the probability
of getting an Ace on the second card depends on the result for the first card. So, knowing
the result for event A may change the probability of another event B occurring.
Notation: P (B occurs given that A has occurred) ≡ P (B | A)
RULE 5: Provided P (A) > 0, the conditional probability of B given A is
P (A ∩ B)
P (B | A) =
P (A)
Section 4.3. Discrete Probability Theory 83
Notes:
! Since we know A has occurred, this is the same as restricting the sample space to
just those outcomes which make up event A.
P (A ∩ B) = P (A)P (B | A) = P (B)P (A | B)
◮ Example 4.3 Suppose two cards are drawn from a deck without replacement. Let
A be the event that the first card is an Ace and let B be the event that the second
card is an Ace. The probability that the two cards are both Aces is
P(both cards are Aces) = P (A ∩ B) = P (A)P (B | A)
where
P (A) = 4/52 and P (B | A) = 3/51
and hence
P (A ∩ B) = (4/52) × (3/51) ≈ 0.0045. ◭
Independent Events: Two events A and B are independent if knowledge about whether
A occurs does not alter the probability that B also occurs.
Hence, for independent events, we have
P (B | A) = P (B)
P (A | B) = P (A)
P (A ∩ B) = P (A)P (B)
◮ Example 4.4 Now suppose two cards are drawn from a deck with replacement.
Let A be the event that the first card is an Ace and let B be the event that the
second card is an Ace. Clearly the two events are independent and we have
P (A) = 4/52 and P (B | A) = P (B) = 4/52.
So the probability of both cards being Aces is
P (A ∩ B) = P (A)P (B) = (4/52) × (4/52) = (4/52)2 ≈ 0.0059. ◭
◮ Example 4.5 A certain brand of lawn seed contains 35% Couch and 65% Rye
Grass. When the seed is sown 90% of the Couch germinates and 83% of the Rye
Grass germinates.
For a randomly chosen seed, let C be the event that the seed is Couch, let R
be the event that it is Rye Grass and let G be the event that it germinates.
P (C) = 0.35
P (R) = 0.65
P (G | C) = 0.90
P (G | R) = 0.83
(c) Find P (C ∩ G) and P (R ∩ G) and hence find P (G), i.e., the probability that the
seed germinates.
Using Rule 5, we have
P (G) = P (G ∩ R) + P (G ∩ Rc )
where Rc = C, so
P (G ∩ C) 0.315
P (C | G) = = ≈ 0.3686. ◭
P (G) 0.8545
Exercises
36. The American Red Cross says that 45% of the US population has Type O blood,
40% Type A, 11% Type B, and the rest Type AB.
For a randomly selected blood donor:
(a) What is the probability that this donor has Type AB blood?
(b) What is the probability that this donor has Type A or Type B blood?
(c) What is the probability that this donor does not have Type O blood?
(d) It is also true that 85% of the population has Rh+ blood, and that this is
independent of blood group. What is the probability that this donor has Type
O, Rh+ blood?
86 Chapter 4. Counting & Discrete Probability
! A discrete random variable is random variable X that can take one of only a
countable (often finite) number of possible different values x1 , x2 , . . .
P (X = xi ) = pi
where pi = P (X = xi ).
◮ Example 4.6 Suppose that in a particular local council area the number of dogs
that can be kept by a householder is limited to a maximum of three. Let D denote
the number of dogs in a household.
Records show the following probability distribution for D.
d 0 1 2 3
P (D = d) 0.5 0.3 0.15 0.05
= 0 × P (D = 0) + 1 × P (D = 1) + 2 × P (D = 2) + 3 × P (D = 3)
= 0 × 0.5 + 1 × 0.3 + 2 × 0.15 + 3 × 0.05
= 0.75
So the average number of dogs per household in the local council area is 0.75. ◭
◮ Example 4.8 A technician charges a call-out fee of $60 plus an additional $2 for
each minute that he spends on a job. He has found that the time (T ) taken on a
job has a mean (µT ) of 18 minutes and a standard deviation (σT ) of 3.5 minutes.
Let C be the total cost, excluding parts, of the job.
(a) Write down the formula for C in terms of T , and hence the values of a and b.
We have
C = 60 + 2T
and so a = 60 and b = 2 in this case.
(b) Hence find the mean and standard deviation of the cost C.
We have
µC = 60 + 2µT = 60 + 2 × 18 = 96 minutes
and
σC = 2σT = 2 × 3.5 = 7 minutes. ◭
2. The first three sets are all different. The third and fourth are equal.
3. (a) ∅, {a}, {b}, {c}, {d}, {a, b}, {a, c}, {a, d}, {b, c}, {b, d}, {c, d}, {a, b, c},
{a, b, d}, {a, c, d}, {b, c, d}, {a, b, c, d}
(b) 24 = 16
(c) 8
4. (a) {a, j, k, b, c, d, e, g, p, t, v}
(b) {a, e, i}
(c) {n, o, p, q, r, s, t, u, v, w, x, y, z}
(d) {b, c, d, g, p, t, v}
Section 4.4. Solutions to exercises 89
(e) {j, k, e, i, o, u}
(f ) {a, j, k, b, c, d, e, g}
7. A ∩ (B ∪ C) = {x | x ∈ A, x ∈ B ∪ C}
= {x | x ∈ A, x ∈ B or x ∈ A, x ∈ C}
= {x | x ∈ (A ∩ B) or x ∈ (A ∩ C)}
= (A ∩ B) ∪ (A ∩ C)
8. (a) B \ A = {x | x ∈ B, x ∕∈ A}
= {x | x ∈ B, x ∈ Ā}
= {x | x ∈ B ∩ Ā}
=B∩A
10. (a) The sets A = {1, 2}, B = {2, 3}, C = {2, 4} satisfy A ∩ B = A ∩ C with A ∕= C
and B ∕= C.
(b) The sets A = {1, 2}, B = {2, 3}, C = {1, 3} satisfy A ∪ B = A ∪ C with A ∕= C
and B ∕= C.
11. (a) By definition, A × ∅ = {(x, y) | x ∈ A, y ∈ ∅}. But there does not exist any
element y ∈ ∅, so there cannot exist any pair (x, y) with x ∈ A and y ∈ ∅.
Therefore, A × ∅ contains no elements, i.e., A × ∅ = ∅. Similarly, ∅ × A = ∅.
90 Chapter 4. Counting & Discrete Probability
12. Proof of Proposition 4.2: Suppose A and B are disjoint finite sets. To count
the number of elements in A ∪ B, first count the number of elements in A, namely
|A, and then the only other elements of A ∪ B are those in B, but not in A. Since
A ∩ B = ∅, no element of B is in A, so there are |B| elements in B \ A. Hence
|A ∪ B| = |A| + |B| and this number is finite because both |A| and |B| are finite.
14. Shade in a Venn diagram to convince yourself that formula (4.1) is true.
The following information is given: |U| = |M | = 50, |S| = 30, |C| = 18, |S ∩ C| = 9,
|S ∩ E| = 16, |C ∩ E| = 8, |S ∪ C ∪ E| = 47.
47 = 30 + 18 + 26 − 9 − 16 − 8 + |S ∩ C ∩ E|
and hence |S ∩ C ∩ E| = 6.
17. (a) 17 + 12 = 29
(b) 17 · 12 = 204
(c) 29 · 28 = 812
18. Using the Binomial Theorem with a = 2x4 and b = −y, the (simplified) expansion
is 8x12 − 12x8 y + 6x4 y 2 − y 3 .
Section 4.4. Solutions to exercises 91
19. 831409920x8
20. 120
22. 18 · 17 · 16 = 4896
(a) There are five letters and no repetitions, so the answer is 5! = 120.
(b) There are 7 letters of which 3 are n and no other letter is repeated, so the
answer is 7!/3! = 840.
(c) There are 9 letters of which 2 are m, 2 are t, and 2 are e, so the answer is
9!/2!2!2!1!1!1! = 45, 360.
27. (a) This is sampling without replacement, so the answer is P (12, 3) = 12!/9! =
1320.
(b) This is sampling with replacement, so the answer is 123 = 1728.
29. There are C(12, 4) = 495 ways to choose 4 of the 12 students to take the first exam,
and then C(8, 4) = 70 ways of choosing 4 of the remaining 8 students to take the
second exam. The remaining students take the third exam. Thus the answer is
495 · 70 = 34, 650.
92 Chapter 4. Counting & Discrete Probability
30. (a) We can view each of the solutions (say x = 25, y = 5, w = 20, z = 50) as a
combination of r = 100 objects consisting of 25 a’s, 5 b’s, 20 c’s, and 50 d’s,
where there are k = 4 kinds of objects. By Theorem 4.12, the number of non-
negative integers solutions is C(100 + 3, 3) = C(103, 3) = 103·102·101
3·2·1
= 176, 851.
(b) We wish to find the number of positive integer solutions to x + y + w + z = 100.
Consider writing 100 = 1 + 1 + · · · + 1 where there are 100 1’s and 99 + signs.
To decompose 100 into 4 summands, we need only choose 3 of the 99 + signs.
Hence, the number positive integer solutions to x + y + w + z = 100 is
7 8
99 99 · 98 · 97
C(99, 3) = = = 156, 849.
3 3·2·1
31. The proofs of parts (a) and (b) generalise what was done in the preceding exercise.
32. Let the following m pairs represent pigeonholes: (1, 2), (3, 4), . . . , (2m − 1, 2m). If
A is a set of m + 1 elements taken from the set {1, 2, . . . , 2m}, then for each element
j ∈ A, put j in pigeonhole (i, i + 1) if j = i or j = i + 1. Since A contains m + 1
elements, one of the pigeonholes (pairs) contains two distinct elements of A and
these two elements are consecutive.
33. Partition the square into 4 sub-squares of equal size, each of whose sides have
length 1. By the Pigeonhole Principle (Theorem 4.13), two of the points lie in
√
one of the sub-squares. The length of the diagonal of each square is 2 so the
√
distance between the two points is at most 2.
34. By the Generalised Pigeonhole Principle (Theorem 4.14), if the pigeonholes are the
12 months in a year (k = 12), then the minimum number m of people needed to
ensure that at least 3 of them are born in the same month satisfies ⌊(m−1)/12⌋ = 2.
Thus m = 2 × 12 + 1 = 25.
35. Let the 81 hours be pigeons and let the pigeonholes be the following 5 sets of
two consecutive days: {1, 2}, {3, 4}, {5, 6}, {7, 8}, {9, 10}. By the Generalised
Pigeonhole Principle (Theorem 4.14), the employee worked at least ⌊(81−1)/5⌋+1 =
16 + 1 = 17 hours.
Section 4.4. Solutions to exercises 93
Matrices
5.1 Introduction
A matrix is a rectangular array of numbers. These numbers are called the terms,
elements, or entries of the matrix.
It is usual to use upper case letters to denote matrices and lower case letters to denote
their elements. We use subscripts to indicate the position of an element within a matrix.
Suppose T is an m × n matrix. We write
A D
t11 t12 t13 . . . t1n
B E
B t21 t22 t23 . . . t2n E
B E
T =B E
B t31 t32 t33 . . . t3n E ,
B .. .. .. . . . E
C . . . . .. F
tm1 tm2 tm3 . . . tmn
where tij is the element of T which appears in the ith row and the jth column. Instead of
writing down some of the elements of a matrix in their correct position as we have done
above, it is often convenient to use the following more compact form: T = [tij ]. These
equations are both meant to associate the elements tij with the matrix T .
94
Section 5.2. Matrix arithmetic 95
◮ Example 5.2 If [aij ] is the matrix A in Example 5.1, then a11 = 2, a12 = 4,
a21 = 3, a22 = −6, a31 = 4, and a32 = 0. ◭
Some types of matrices are given special names. A matrix which has the same number
of rows and columns is called a square matrix. Of the matrices in Example 5.1, only C
is square.
A matrix which has just a single row is usually called a row vector, and a matrix
with just one column is a column vector. The matrix B in Example 5.1 is a column
vector.
Thus two matrices are equal if and only if they are the same in every respect, that is, one
is a duplicate of the other.
In certain circumstances we can add, subtract, multiply and divide matrices. The rules
are similar in many respects to those for ordinary numbers, but there are some important
differences.
◮ Example 5.3 Of the matrices in Example 5.1, only A and D have the same size.
So it is possible to form the sum A + D, and the differences A − D and D − A.
We compute A + D and A − D below. The entries in D − A are the same as the
96 Chapter 5. Matrices
kA = A + A + · · · + A (k terms).
c = a1 b1 + a2 b2 + · · · + an bn .
! "
◮ Example 5.4 If A = 3 5 −1 4 , and B is the column vector given in Exam-
! " ! "
ple 5.1, then AB = 3 × 1 + 5 × 2 + (−1) × 5 + 4 × 3 = 20 . ◭
In order to understand how to form the product AB of more general matrices A and
B, it helps to think of A as a collection of row vectors and B as a collection of column
vectors. Suppose that A = [aij ], B = [bij ], A has size m × n and B has size p × q. The
product AB exists if and only if n = p, that is, the number of columns of A equals the
number of rows of B. If this is true, then AB is an m × q matrix, and AB = [cij ], where
According to this formula, the element cij in the ith row and the jth column of the product
AB is obtained by multiplying the ith row of A with the jth column of B. The condition
n = p ensures that these row and column vectors have the same number of elements. The
following diagram shows the location of the ith row of A, and the jth column of B. These
must be multiplied as row and column vectors to obtain cij .
A D A D
a11 a12 . . . a1n A D c11 ... ... . . . c1q
B .. .. .. .. E b11 . . . b1j . . . b1q B .. .. E
B . . . . EB E B . . E
B EB b21 . . . b2j . . . b2q E B E
B EB B .. .. E
B
B
ai1 ai2 . . . ain EB
EC .. .. .. .. .. E
E=B . cij . E
B .. .. .. .. E . . . . . F B
B .. ..
E
E
C . . . . F C . . F
bn1 . . . bnj . . . bnq
am1 am2 . . . amn cm1 . . . ... . . . cmq
It is important to realise that the order of the factors counts. If A and B are matrices,
it may be possible to form neither, just one, or both of the products AB and BA. To
emphasise the importance of the order of the factors, we say that in the product AB, A
premultiplies B, or that B postmultiplies A. When the product AB does exist we
say that A and B are conformable for multiplication (in that order).
98 Chapter 5. Matrices
◮ Example 5.5 Refer to the matrices in Example 5.1. The products which do exist
amongst these matrices are CC, which is 3 × 3, and CA and CD, which are both
3 × 2. All other combinations are not conformable. We compute CA below:
A DA D
3 5 −7 2 4
B EB E
CA = C4 2 0 F C3 −6F
0 −1 6 4 0
A D
3 × 2 + 5 × 3 + (−7) × 4 3 × 4 + 5 × (−6) + (−7) × 0
B E
= C 4×2+2×3+0×4 4 × 4 + 2 × (−6) + 0 × 0 F
0 × 2 + (−1) × 3 + 6 × 4 0 × 4 + (−1) × (−6) + 6 × 0
A D
6 + 15 − 28 12 − 30 + 0
B E
= C 8 + 6 + 0 16 − 12 + 0F
0 − 3 + 24 0+6+0
A D
−7 −18
B E
= C 14 4 F ◭
21 6
The rule for multiplying matrices may look peculiar at first, but the reason why matrix
multiplication is defined this way will become clearer in the next chapter.
As you know, multiplication of numbers is commutative, i.e., if A and B are numbers
then AB = BA. Matrix multiplication, however, does not satisfy this law, as the next
example shows.
Clearly, AB ∕= BA. ◭
If A and B are particular matrices for which AB = BA, then we say that A and
B commute. The above example shows that there is no commutative law for matrix
multiplication. However, there are some familiar laws that do apply, and these are shown
in Table 5.2 on the next page.
Section 5.2. Matrix arithmetic 99
In the table, A, B, and C are any matrices conformable for multiplication in the orders
shown in the table and λ is any number.
Exercises
G H G H
w 3−x 2 −1
1. If = , solve for w, x, y and z.
y2 3z 16 1
G H G H G H
2 −1 3 −2 1 3/2 3 0 11
2. Compute 7 −2 −3 .
−1 3 2 −5 1 1/2 1 −4 7
G H G H
1 −3 1 −1 5
3. Let A = and B = .
2 1 0 2 −3
Compute the products AB and BA, if they exist.
A D
G H −1 7 G H
8 −1 2 B E 2 1
4. Let A = , B = C 3 −2F, and C = .
2 0 −5 3 5
1 5
Verify that (AB)C = A(BC).
(a) Find BA and show that it is made up of the 2 × 2 matrix BA1 on the left and
the 2 × 1 matrix BA2 on the right.
(b) Verify the above result using MATLAB.
(c) Show that the result is true in general for any 2 × 2 matrix B and 2 × 3 matrix
A, which is partitioned into a 2 × 2 matrix A1 on the left and a 2 × 1 matrix
A2 on the right.
where d1 , d2 , . . . , dn are the diagonal elements. To describe a diagonal matrix, we often use
a shorter form of notation in which only the diagonal elements are shown. The expression
diag(d1 , d2 , . . . , dn ) is another way of representing the diagonal matrix shown above.
A scalar matrix is a diagonal matrix whose diagonal elements are all equal. If
S = diag(λ, λ, . . . , λ) then S is a scalar matrix, and if A is any other matrix of the same
size, then SA = AS = λA. It is possible to show that the converse is also true: if S is a
square matrix which commutes with all other matrices of the same size, then S must be
a scalar matrix.
Section 5.3. Some special matrices 101
The number 1 has the special property that 1 · a = a · 1 = a for any number a.
There are matrices which have similar properties. Let In denote the n × n scalar matrix
diag(1, 1, . . . , 1). Then if A is any n × n matrix, In A = AIn = A. More generally, if A is
any m × n matrix, then
Im A = AIn = A.
We call In the identity (or unit) matrix of size n. Where there is no confusion about
size, we often ignore the subscript and write I for the identity matrix of the appropriate
size.
In MATLAB, some special matrices are defined as follows:
Notice how we needed to be careful about our left and right because the commutative
law doesn’t hold for matrix multiplication.
G H
1 0 −1
◮ Example 5.8 Let A = . You can easily check that each of the fol-
1 1 0
lowing matrices are right inverses of A:
A D A D
0 1 1 0
B E B E
R1 = C 0 0F and R2 = C−1 1F .
−1 1 0 0
You can also check that A doesn’t have any left inverses by trying to solve the
equations arising from
A D A D
a b G H 1 0 0
B E 1 0 −1 B E
Cc dF = C0 1 0 F .
1 1 0
e f 0 0 1
The above example shows that the story for matrix inverses is much more complicated
than it is for number inverses. We have a non-zero matrix that has more than one right
inverse, and yet no left inverses! Fortunately, the situation for square matrix inverses is
not so complicated, as the next two theorems show.
A proof of this theorem is beyond the scope of this course, so you will just have to
believe it for now. The theorem shows that for square matrices the concepts of left inverse
and right inverse are the same. So we don’t use these words for square matrices: If A
is a square matrix and B is a matrix such that AB = I or BA = I, then we call B an
inverse of A.
Section 5.4. Further matrix operations 103
Proof. Let A be a square matrix, and suppose that both B and C are inverses of A. To
prove the theorem we must show that B = C.
B = BI property of I
= B(AC) because C is an inverse of A
= (BA)C associative law
= IC because B is an inverse of A
= C property of I
Since a square matrix has only one inverse (if it has one at all), we can give it a special
notation: we write the inverse of A as A−1 . Two simple rules of inversion are shown in
Table 5.3 below.
(A−1 )−1 = A
(AB)−1 = B −1 A−1
Every number except 0 has an inverse, or reciprocal. The situation is not so straight-
forward for matrices. In many of the applications of matrix theory the problem of finding
inverses arises. For now, we will just learn the formula for 2 × 2 inverses.
G H
a b
Inverses of 2 × 2 matrices: Let A = . The determinant of A, written det(A),
c d
is defined by:
det(A) = ad − bc.
It can be shown that A has an inverse if and only if det(A) ∕= 0, in which case
G H
1 d −b
A−1 = .
det(A) −c a
The above example shows that there are square matrices which have no inverses. Such
matrices are called singular or non-invertible. A matrix which does have an inverse is
called non-singular or invertible.
5.4.2 Division
Dividing a number b by a non-zero number a is the same as multiplying b by a1 , that is,
a−1 . So we can think of dividing a matrix B by a non-singular matrix A as multiplying
B by the inverse A−1 . But remember that we cannot divide B by A unless A−1 exists.
Furthermore, we need to be careful about the order of multiplication. Premultiplying
and postmultiplying by A−1 do not necessarily give the same result. That is, BA−1 is
not necessarily equal to A−1 B. Because of these complications, we rarely speak about a
matrix operation called division. It is better to speak about pre-multiplication or post-
multiplication by an inverse.
5.4.3 Powers
The nth power of a square matrix A is obtained by taking the product of n copies of A.
That is,
An := A × A × · · · × A (n terms).
This makes sense when n is a positive integer. However we can extend the definition of
An to cover other integer values of n. First we define
A0 := I.
5.4.4 Transposes
Interchanging the rows and columns of an m × n matrix A gives an n × m matrix called
the transpose of A, which we denote by AT . We can think of AT as the reflection of A
across its main diagonal. Formally, if A = [aij ] and AT = [cij ], then aij = cji , for each i
and each j.
A D
G H 1 5
1 2 3 4 B 6E
B2 E
◮ Example 5.10 If A = , then AT = B E. ◭
5 6 7 8 C3 7F
4 8
Some rules for matrix transposition are shown in Table 5.4 (where λ is any number).
(AT )T = A
(A + B)T = AT + B T
(λA)T = λAT
(AB)T = B T AT
(AT )−1 = (A−1 )T
Note that the inverse matrix computed by inv(A) becomes more and more inac-
curate as c decreases until for c = 10−20 , MATLAB prints the message
Exercises
G H
0 1
9. If A = , show that A2 + I2 = 0.
−1 0
A D A D A D
0 0 −1 0 0 −1 0 0 0 0 0 −1
B0 0 0 −1EE B E B 1 0E
B B1 0 0 0E B0 0 E
10. Let A = B E, B = B E, and C = B E.
C1 0 0 0F C0 0 0 1F C0 −1 0 0F
0 1 0 0 0 0 −1 0 1 0 0 0
G H G H
3 5 3 11
11. Suppose that A = and B = .
7 11 10 37
12. Find a counterexample to the following assertion: “If A is singular and B is singular,
then A + B is singular.”
G H G H
2 1 0 3 −7
13. For the matrices A = and B = , verify that (AB)T = B T AT .
3 5 5 1 −2
14. Explain why the products AAT and ATA exist for any matrix A.
17. (a) Write a MATLAB program to define the n × n Hilbert matrix H = [hij ],
hij = 1/(i + j − 1), where n is entered by the user, and the diagonal matrix
D = diag(1, 3, . . . , 2n − 1).
(b) Extend the program to compute, for n = 5, the products DH and HD.
(c) Note that multiplying H by D on the left has the effect of multiplying each
row of H by the corresponding diagonal element of D, whereas multiplying
H by D on the right has the effect of multiplying each column of H by the
corresponding diagonal element of D. Show that this is true in general for DA
and AD, where A is an arbitrary square matrix and D is an arbitrary diagonal
matrix of the same size.
18. Write a MATLAB program using the inv command to compute the inverse of the
Hilbert matrix of size n defined above. Run the program with n equal to 5, 10 and
20. Is the Hilbert matrix ill-conditioned?
G H
1 2
19. (a) Let A = .
0 1/2
Write a MATLAB program to find the matrix powers A2 , A3 , . . ., An , where
n = 10. What do these matrices converge to as n → ∞?
A D
1 2 3 4
B0 1/2 1/3 1/4E
B E
(b) Repeat part (a) for the matrix A = B E.
C0 0 1/3 1/4F
0 0 0 1/4
108 Chapter 5. Matrices
5. A must be square.
6. (a) (i) Choose any pair of matrices which do not commute. For example, choose
A and B as in Example 5.6.
(ii) Same as (i).
(b) If either statement holds, then A and B commute.
G H
−391 −1358
7. ABAB + 2C = .
146 −345
G H
11 −1 17
8. BA = .
14 2 22
15. Proof.
AB = BA ⇔ (AB)T = (BA)T
⇔ B TAT = ATB T (see Table 5.4). %
G H G H G H
0 1 1 0 0 2
16. If A = and B = , then AB = . So A and B are symmetric
1 0 0 2 1 0
but AB is not.
Section 5.5. Solutions to exercises 109
6.1 Introduction
We are all familiar with the following diagram:
Y ✻
✛ ! ✲
(0, 0) X
The mathematical concept depicted in this diagram is usually called “the X–Y plane”,
“the Euclidean plane” or “real 2-space”, and is denoted by R2 . The notation reminds us
that the plane is really a set:
R2 = R × R = {(x, y) | x ∈ R, y ∈ R}.
Each point in the plane is an ordered pair (x, y), where x measures how far the point
is from the origin (0, 0) in the X, or horizontal, direction, and y measures how far the
point is from the origin in the Y , or vertical, direction. We have the convention that X is
110
Section 6.1. Introduction 111
positive to the right and negative to the left, and that Y is positive upwards and negative
downwards.
There is a strong link between linear transformations and matrices, which we now
develop. First, we represent points in the plane by matrices, as follows:
G H
x
The point (x, y) will be represented by the matrix .
y
Now let’s compute a certain matrix product:
G HG H G H
a b x ax + by
= .
c d y cx + dy
This shows that the linear transformation T can be thought of in terms of a matrix
multiplication. And it works the other way too: if we start with a matrix, we can define
a linear transformation based on it.
which is the second column of A. This means that, given the matrix A, we can read off
the images of the points (1, 0) and (0, 1) from the columns of A. It also means that, if
we know the images of the points (1, 0) and (0, 1) under a linear transformation TA (for
instance if they are easily identified geometrically), then we can immediately write down
the matrix A. We will be using this fact in the next section.
G H
1 0
◮ Example 6.1 Let A = . Then
1 1
G H G H
x x
TA ( ) = A
y y
G HG H
1 0 x
=
1 1 y
G H
x
=
x+y
So TA is the mapping that sends (x, y) to (x, x+y) no matter what x and y are. For
example, (1, 2) is sent to (1, 1 + 2) = (1, 3). Note that some points aren’t shifted
at all by TA , for example, (0, 1) is sent to (0, 0 + 1) = (0, 1), namely itself! ◭
Like any mapping, TA is just shuffling everything around a bit. Some points stay where
they are, while others get moved to different points. However, unlike many possible maps
we could define from R2 to R2 , a matrix transformation is easy to visualise. That is, there
is a way of looking at TA that enables us to see exactly what kind of shuffling is going on.
This special way of looking at TA involves looking at what TA does to geometric figures.
The figure that we will be using mostly is depicted in Figure 6.2 on the next page. It is
a square in the first quadrant with one corner at the origin and of side length 1. We call
this a unit square and denote it by □1 . If we need to refer to the unit squares in the
other quadrants we will use the notation □i for the unit square in the ith quadrant. We
label the corners of □1 as O, P , Q and R, and we label their images under TA with a
dash, for example, we denote TA (P ) as P ′ .
Section 6.1. Introduction 113
Y ✻
R(0, 1) ! ! Q(1, 1)
✛ ! ! ✲
O(0, 0) P (1, 0) X
◮ Example 6.2 Let’s see what happens to □1 under the mapping TA of Example 6.1:
(0, 0) is sent to (0, 0)
(1, 0) is sent to (1, 1)
(0, 1) is sent to (0, 1)
(1, 1) is sent to (1, 2)
So TA deforms □1 into the parallelogram depicted in Figure 6.3 below.
Y ✻
!
Q′
2
"
"
! !
"
R′ " P′
"
"
!"
"
✛ ✲
′ 1 X
O
If you look at what TA does to □3 , you will begin to see the overall pattern:
Points with positive x-coordinate are dragged upwards, and points with negative
x-coordinate are dragged downwards, while points on the Y -axis are not shifted at
all. ◭
◮ Example 6.3 The following MATLAB program plots the four unit squares □1 , . . . , □4 ,
and their images under the linear transformation used in the examples above.
Asq2x=Asq2(1,:);
Asq2y=Asq2(2,:);
Asq3=A*[sq3x;sq3y];
Asq3x=Asq3(1,:);
Asq3y=Asq3(2,:);
Asq4=A*[sq4x;sq4y];
Asq4x=Asq4(1,:);
Asq4y=Asq4(2,:);
% Plot the images of the four squares
subplot(2,2,1), plot(Asq2x,Asq2y)
axis([-2,2,-2,2]), axis square
subplot(2,2,2), plot(Asq1x,Asq1y)
axis([-2,2,-2,2]), axis square
subplot(2,2,3), plot(Asq3x,Asq3y)
axis([-2,2,-2,2]), axis square
subplot(2,2,4), plot(Asq4x,Asq4y)
axis([-2,2,-2,2]), axis square
Read through the program and make sure you understand how it works. Run the
program and verify the description given above of the effect of the transforma-
tion. ◭
To finish this section, we introduce a little more notation: the symbol 6→ means
“is mapped to” or “is sent to” or “goes to”. So the statement “the function f , where
f (x, y) = (x + 1, y 2 )” can also be written as “the function f , where (x, y) 6→ (x + 1, y 2 )”.
Exercises
1. Find the images of the points on □1 under the linear transformations whose matrices
are given below and give geometrical descriptions of these transformations. Verify
your results by entering the matrices into the MATLAB program in Example 6.3.
G H
1 1
(a)
0 1
G H
2 0
(b)
0 2
G H
1 0
(c)
0 1
4. (a) Find
G theHimage of □1 under the linear transformation whose matrix is A =
3 0
and find its area.
0 2
(b) Compare your result with det(A), the determinant of A.
G H G H G HG H
3 0 0 1 0 1 1 4
(c) Repeat parts (a) and (b) using , ,
0 −2 1 0 −1 0 0 1
G H
1 2
and in place of A.
2 4
(d) Use your discoveries to guess the general result.
6.2.1 Dilations
G H
α 0
A dilation is a linear transformation whose matrix is , where α and β are the
0 β
horizontal and vertical dilation factors, respectively. Note that both α and β must be
positive. Such a transformation has the effect of stretching or shrinking coordinates along
the two axes. Stretching occurs when the dilation factor is greater than 1, and shrinking
occurs when the dilation factor is less than 1. Nothing happens if the dilation factor
equals 1.
G H
1/3 0
◮ Example 6.5 Consider the transformation T whose matrix is . We
0 2
can tell from the defining matrix that the x-coordinates will be shrunk by one
third and the y-coordinates will be doubled. The effect of T on □1 is shown in
Figure 6.4 below.
Y ✻
2 ! !
✛ ! ! ✲
0 1 X
3
Enter the matrix of T into the MATLAB program given in Example 6.3 to see the
effect of T on all four unit squares. ◭
118 Chapter 6. Linear Transformations of the Plane
6.2.2 Shears
G H
1 α
A shear is a linear transformation whose matrix is (a horizontal shear) or
0 1
G H
1 0
(a vertical shear). The number α, which can be positive or negative, is called
α 1
the shear factor.
We have already seen a vertical shear in Example 6.2. This transformation had the
effect of dragging the 1st and 4th quadrants upwards and the 2nd and 3rd quadrants
downwards, while the Y -axis was left fixed. The shear factor determines by how much
these quadrants are dragged, with larger factors resulting in larger drags. The sign of the
shear factor determines the direction of the drag. A negative vertical shear means that
the 1st and 4th quadrants go down instead of up, and the 2nd and 3rd quadrants go up
instead of down.
In a positive horizontal shear the 1st and 2nd quadrants are dragged to the right, the
3rd and 4th quadrants are dragged to the left, and the X-axis is left fixed. In a negative
horizontal shear, these pairs of quadrants are dragged the other way.
6.2.3 Rotations
Among the most important linear transformations are those which cause points to rotate
around the origin. The angle of rotation measures how much rotation occurs. By
convention, we regard anticlockwise as the positive direction for rotation, and clockwise
as the negative direction. So a rotation of 30◦ means 30◦ anticlockwise. A clockwise
rotation of this amount would be called a rotation of −30◦ .
Theorem. The transformation which causes every point to rotate anticlockwise about
the origin through an angle of θ is the linear transformation whose matrix is
G H
cos θ − sin θ
.
sin θ cos θ
We won’t give a proof of this theorem. You will probably come to believe it once you
have seen some examples.
√
◮ Example 6.6 Let’s start with θ = 30◦ . Note that cos 30
G √
◦
= 3/2 and
H sin 30 =
◦
3/2 −1/2
1/2. So consider the transformation TA where A = √ . Let’s see
1/2 3/2
what TA does to □1 :
(0, 0) 6→ (0, 0)
√
(1, 0) 6 → ( 3/2, 1/2)
√ √
(1, 1) 6 → ( 12 ( 3 − 1), 12 (1 + 3))
√
(0, 1) 6 → (−1/2, 3/2)
Section 6.2. Special types of transformations 119
Looking at Figure 6.5 (below) we see that TA has caused □1 to rotate anticlockwise
by 30◦ .
Enter the matrix A into the MATLAB program given in Example 6.3 to see
the effect of TA on all four unit squares. ◭
◮ Example 6.7 Now let’s try θ = 90◦ . In this case, it is easy to determine the matrix
of the transformation from the images of (1, 0) and (0, 1). Clearly, this rotation
mapsG(1, 0) to (0,
H 1) and maps (0, 1) to (−1, 0), so it is the transformation TA where
0 −1
A= . (Note this agrees with the result from the formula above since
1 0
cos 90◦ = 0 and sin 90◦ = 1.) Here’s the action of TA on □1 :
(0, 0) 6→ (0, 0)
(1, 0) 6 → (0, 1)
(1, 1) 6 → (−1, 1)
(0, 1) 6 → (−1, 0)
Looking at Figure 6.6 (on the next page) we see that TA (□1 ) occupies the same
place in the plane as □2 .
120 Chapter 6. Linear Transformations of the Plane
Y ✻
Q′ ! ! P′
✛ ! ! ✲
′ ′ X
R O
We might well conclude from this that TA has rotated □1 by 90◦ , as expected.
However, □1 would also finish on top of □2 if TA were a reflection across the Y -axis
(to be defined in the next section). How can we tell which of these two transfor-
mations TA is? Or is it even another transformation? In ambiguous situations like
this, the way to determine the geometric effect of TA is to find the images of some
specific points, rather than a shape. For example, look at the corners of □1 . The
point P (1, 0) has been sent to (0, 1), and R(0, 1) has been sent to (−1, 0). This
certainly means that TA is not a reflection across the Y -axis, and should be enough
to convince you that TA is indeed a rotation of 90◦ . ◭
6.2.4 Reflections
The last special type of linear transformation that we will look at is the type that causes
points to be reflected across a line (which passes through the origin). The critical infor-
mation we need about the line is the angle θ it makes with the X-axis (measured from
the positive end of the X-axis travelling anticlockwise towards the line). If we start with
the equation y = mx of the line, we can compute θ using the formula tan θ = m. (And
the vertical line x = 0 yields the angle θ = 90◦ .)
Section 6.2. Special types of transformations 121
Theorem. The transformation which causes every point to reflect across the line hitting
the X-axis at an angle of θ is the linear transformation whose matrix is
G H
cos 2θ sin 2θ
.
sin 2θ − cos 2θ
◮ Example 6.8 Suppose that θ = 45◦ , so we are reflecting across the line y = x
(because tan 45◦ = 1). Clearly, the image of (1,G 0) is (0,H
1) and the image of (0, 1)
0 1
is (1, 0), so the matrix of this reflection is A = . (This also follows from
1 0
the general formula since cos 90◦ = 0 and sin 90◦ = 1.) It is clear from this matrix
that the point (x, y) is mapped to (y, x). If you now compute TA (□1 ) you will find
that it remains in the same place in the plane, but that the bottom right corner
P (1, 0) swaps positions with the top left corner R(0, 1) while the other two corners
stay fixed. Thus TA is indeed a reflection across the line y = x. Enter the matrix
A into the MATLAB program given in Example 6.3 to see the effect of TA on all
four unit squares. ◭
√
◮ Example 6.9 Now let’s try θ = 120◦ , so that we reflect across the line y = − 3x.
√
Note
G that cos 240◦ =H −1/2 and sin 240◦ = − 3/2, so the matrix we use is A =
√
−1/2 − 3/2
√ . Let’s see how TA affects □1 :
− 3/2 1/2
(0, 0) 6→ (0, 0)
√
(1, 0) 6 → (−1/2, − 3/2)
√ √
(1, 1) 6 → ( 12 (−1 − 3), 12 (− 3 + 1))
√
(0, 1) 6 → (− 3/2, 1/2)
Figure 6.7 (on the next page) shows that □1 has indeed been reflected across the
√
line y = − 3x. ◭
122 Chapter 6. Linear Transformations of the Plane
Exercises
5. Write down the matrices which accomplish the following transformations. Check
your answers by entering the matrices into the MATLAB program given in Exam-
ple 6.3.
6. Describe the effects of the linear transformations with the following matrices. Check
your answers by entering the matrices into the MATLAB program given in
Example 6.3.
G H
11 0
(a)
0 1/6
G H
1 −3
(b)
0 1
G H
1 0
(c)
5 1
G H
0 −1
(d)
1 0
G √ √ H
1/ 2 −1/ 2
(e) √ √
1/ 2 1/ 2
G H
−1 0
(f )
0 1
G H
0 1
(g)
1 0
(0, 0) 6→ (0, 0)
(1, 0) 6 → (1, 2)
(1, 1) 6 → (3, 6)
(0, 1) 6 → (2, 4)
If you don’t study the image points too carefully, everything looks normal. However, if
you plot these points you get what is shown in Figure 6.8 (on the next page), which is
not normal at all!
124 Chapter 6. Linear Transformations of the Plane
Y ✻
!
Q′
6
✁
✁
✁
✁
R′ ✁!
✁
4
✁
✁
✁
✁
2 !
✁
✁ P′
✁
✁
✁
!
✁
✛ ✁ ✲
O′ 1 2 3 X
❄
Theorem. A linear transformation is degenerate if and only if its matrix is singular, that
is, its determinant is zero.
Exercises
7. Which of the following matrices represent degenerate transformations?
G H
0 0
(a)
3 4
G H
0 1
(b)
2 3
G H
3 6
(c)
2 4
G H
43 5
(d)
215 25
G H
a b
8. Let be the matrix of a degenerate transformation. Show that if a ∕= 0,
c d
c
then the entire plane is mapped to the line y = x. Check this result by entering
a
some matrices of degenerate transformations into the MATLAB program given in
Example 6.3.
This theorem also shows how TB TA and TA TB can be different linear transformations:
they will be the same if and only if A and B happen to commute.
G H
2 0
◮ Example 6.10 Let TS be the horizontal dilation with matrix S = , and
0 1
G H
0 −1
let TR be the 90◦ rotation with matrix R = . Is “stretch then rotate”
1 0
the same as “rotate then stretch”? The geometric way of finding out is to compare
TR TS (□1 ) with TS TR (□1 ). Instead, let’s do it the arithmetic way by comparing
RS and SR. You can check that these products are as follows:
G H G H
0 −1 0 −2
RS = and SR = .
2 0 1 0
◮ Example 6.11 Let TA be “rotation by 90◦ ” and let TB be “reflection across the
Y -axis”. Use matrix multiplication to find the geometric effect of “TA followed by
TB ”.
By drawing a diagram, it is clear that TA maps the basic point (1, 0) to (0, 1) and
it maps the other basic point (0, 1) to (−1, 0). Hence the corresponding G matrixHA
0 −1
has first column (0, 1)T and second column (−1, 0)T , and so it is A = .
1 0
Using another diagram, it is clear that TB maps the basic point (1, 0) to (−1, 0)
and it maps the other basic pointG (0, 1) H to (0, 1) (i.e. it is fixed). Hence the
−1 0
corresponding matrix B is B = . Now “TA followed by TB ” is TB TA ,
0 1
which has corresponding matrix
G HG H G H
−1 0 0 −1 0 1
BA = = .
0 1 1 0 1 0
From the first and second columns of BA, we know that TB TA maps (1, 0) to (0, 1)
and it maps (0, 1) to (1, 0). By drawing a diagram to represent this, it is clear that
TB TA must be “reflection across the line y = x”.
function Pnew=rotat(P,theta)
% Rotate point(s) P (size 2 times n) by theta radians
% anticlockwise about origin
R=[cos(theta) -sin(theta); sin(theta) cos(theta)];
Pnew= R*P;
Note that the program and function M-file use radian measure for the angle of
rotation. This is the default measure for the argument of the sin and cos functions
in MATLAB (and other programming languages). Radian measure can be defined
as follows: a ray from the origin of angle θ with the positive X-axis defines a point
on the unit circle, and the radian measure of the angle is the signed (positive for
anticlockwise, negative for clockwise) length of the corresponding arc on the unit
circle from (1, 0). Therefore, 360◦ = 2π radians (circumference of the unit circle),
180◦ = π radians, 90◦ = π/2 radians, and in general d◦ = d × π/180 radians.
Enter the M-files above and run the program. What is the final dilation and
rotation of the basic curve x = cos t, y = sin 3t? ◭
128 Chapter 6. Linear Transformations of the Plane
Exercises
G H G H
1 a 1 b
9. Consider the horizontal shear matrices A = and B = . Arguing
0 1 0 1
geometrically, is there a difference between applying A then B, and applying B then
A? Check your answer by calculating AB and BA.
10. (a) Write down the matrices corresponding to clockwise and anticlockwise rota-
tions of 90◦ about the origin.
(b) What is the combined effect of one rotation followed by the other?
(c) Does the order of the rotations make a difference?
(d) Use matrix multiplication to check your answers.
11. (a) Write down the matrices corresponding to the transformations “rotate by 90◦ ”
and “reflect across Y -axis”.
(b) Use matrix multiplication to describe the products of these transformations,
taken in the two possible orders.
(c) Check your results by testing the effects of the products on □1 .
G H
a b
12. Show that a linear transformation whose matrix is of the form , where a
0 d
and d are positive, can be accomplished by a dilation followed by a shear.
13. Modify the MATLAB program in Example 6.12 to dilate and rotate the curve
defined by x = 2 + cos t, y = sin t, for 0 $ t $ 2π. Use steps of 2π/60 up to a final
dilation (in both x and y directions) of 2π and final rotation of 2π radians. Modify
the program again to rotate the curve in the opposite direction.
Propositional Logic
7.1 Propositions
A proposition is a statement that can be classified as either true or false, but not both.
That is, it has a truth value (true or false) which is not ambiguous in any sense.
◮ Example 7.4 The following are open propositions over the set of positive integers:
(a) n is even.
(b) n is a power of 3.
(c) n2 − 8n + 12 = 0.
(d) 12 + 22 + · · · + n2 = 16 n(n + 1)(2n + 1). ◭
This chapter is devoted to the study of ordinary propositions.
Conjunction (∧). The conjunction of two propositions p and q is the single proposition
“p and q”, and is denoted by p ∧ q. The truth value of p ∧ q reflects the normal meaning
of the word “and”; p ∧ q will be true when both p is true and q is true. In all other cases
p ∧ q will be false.
Disjunction (∨). The disjunction of two propositions p and q is the single proposition
“p or q”, and is denoted by p ∨ q. The truth value of p ∨ q reflects the meaning of the
inclusive or; p ∨ q will be true when p is true, or when q is true, or when both are true.
Section 7.2. The basic connectives 133
Note that in ordinary English, “or” has two meanings. Sometimes it means one or
other or both, and at other times it means one or other but not both. The former is known
as the inclusive or, and is the way “or” is meant to be interpreted in a sentence such
as: “The orchestra needs someone who can play the flute or the piccolo”. (Presumably
someone who can play both instruments would be acceptable.) In mathematics “or” is
usually taken to mean the inclusive or.
The other meaning of “or” is known as the exclusive or, and is the way “or” is meant
to be interpreted in a sentence such as: “That large, shining object in the sky is either
the sun or the moon”. (It couldn’t be both.) As we shall see later, a different symbol is
used to denote the exclusive or.
Recall from Section 2.6 that in MATLAB, the logical operator not is denoted by the
symbol: ~ Also recall that the logical operator and has the symbol: & and the logical
operator or has the symbol: |
Exercises
1. Let i, w, e and r denote the following propositions.
i “Sarah is inside”
w “Sarah is watching TV”
e “Sarah is eating her dinner”
r “Sarah is riding her bicycle”
p q p∧q p q p∨q
p ¬p 0 0 0 0 0 0
0 1 0 1 0 0 1 1
1 0 1 0 0 1 0 1
1 1 1 1 1 1
◮ Example 7.6 Let p, q and r be propositions. Draw the truth table of the com-
pound proposition P := q ∧ ¬(¬p ∨ r).
Solution.
p q r q ∧ ¬ (¬ p ∨ r).
0 0 0 0 0 0 1 0 1 0
0 0 1 0 0 0 1 0 1 1
0 1 0 1 0 0 1 0 1 0
0 1 1 1 0 0 1 0 1 1
1 0 0 0 0 1 0 1 0 0
1 0 1 0 0 0 0 1 1 1
1 1 0 1 1 1 0 1 0 0
1 1 1 1 0 0 0 1 1 1
(1) ∗(7)∗ (6) (3) (2) (5) (4)
Explanation. We regard P as a function of the 3 variables p, q and r. Each of
the variables can take the values 0 and 1, and thus there are 23 = 8 possibilities.
Section 7.3. Truth tables 135
We list all these possibilities to the left of the vertical line in the table. Then we
fill in the columns to the right of the vertical line in the order indicated by the
numbers in parentheses in the last row:
(1) This is just the column of values of q copied from the left.
(2) This is the column of values of p copied from the left.
(3) This is ¬ of column (2), i.e., ¬p.
(4) This is r.
(5) This is (3) ∨ (4), i.e., ¬p ∨ r.
(6) This is ¬(5).
(7) This is (1) ∧ (6).
The final column (7) is the column of values of the proposition P . We write ∗(7)∗
to help highlight this fact. The column (7) shows that P is almost always a false
proposition. It is only true in the one case when p and q are true and r is false. ◭
% Program truth.m
% Construct the truth table for proposition(s) defined in propos.m
clear
format compact
global num % number of basic variables p(j)
% Find value of num defined in propos.m
propos(-1);
tot=2^num; % number of rows in truth table
% Print heading including a separator column
separt=888;
disp(’ Basic variables 888 Proposition value(s)’)
% Define truth values of basic variables p(j), j=1,...,num,
136 Chapter 7. Propositional Logic
function prop=propos(p)
% Define proposition(s) for truth table constructed by truth.m
global num
if p<0
% Define number of basic variables p(j)
num=3;
else
% Enter the proposition(s)
prop(1) = ~p(1) | p(3);
prop(2) = ~(~p(1) | p(3));
prop(3) = p(2) & ~(~p(1) | p(3));
end
Enter and save the program and function M-files in MATLAB, and run the program
to obtain the following output.
Note that the first three columns are the same as in the truth table given in Exam-
ple 7.6, and the last three columns are the same as those numbered (5), (6) and (7),
Section 7.4. More connectives 137
Exercises
2. Construct truth tables for the following compound propositions. Check your answers
using the MATLAB program truth.m together with a modified version of propos.m.
(a) ¬(p ∨ q)
(b) ¬(p ∧ q) ∨ r
(c) (p ∧ ¬q) ∨ q
! "
(d) ¬ (p ∧ q) ∨ r
p q p→q
0 0 1
0 1 1
1 0 0
1 1 1
“p implies q”,
“if p then q”,
“q if p”,
“p only if q”,
“p is sufficient for q”,
“q is necessary for p”.
The first two lines of the truth table for p → q may look strange, because they seem to
be saying that a false proposition implies anything. But really we are simply defining the
compound proposition p → q to be true if p is false. The following example may make
this seem more acceptable.
Surely I should not be regarded as a liar if I fail to give presents after not winning
the lottery. ◭
◮ Example 7.8 (a) The proposition “If 2 + 2 = 5, then the moon is a balloon” is
vacuously true, because “2 + 2 = 5” is false.
(b) The proposition “If 2 + 2 = 4, then Canberra is the capital of Australia” is
trivially true, because “Canberra is the capital of Australia” is true. ◭
The conditional proposition p → q is true in every logical possibility except the one
in which p is true and q is false. So in order to prove that p → q is true, it is enough
to prove that this possibility does not occur. That is, it is sufficient to prove that if p is
true, then q is true. Such a proof is called a direct proof.
Section 7.4. More connectives 139
It is important to realise that the conditionals p → q and q → p are not the same.
Table 7.3 shows that their truth tables are different. This table also shows two other
related conditionals ¬p → ¬q and ¬q → ¬p.
p q ¬p ¬q p → q q → p ¬p → ¬q ¬q → ¬p
0 0 1 1 1 1 1 1
0 1 1 0 1 0 0 1
1 0 0 1 0 1 1 0
1 1 0 0 1 1 1 1
%
I
& the converse is q→p
For the conditional p → q, the inverse is ¬p → ¬q
I
'
the contrapositive is ¬q → ¬p
Notice that the truth tables of p → q and its contrapositive ¬q → ¬p are the same,
and the truth tables of the converse q → p and the inverse ¬p → ¬q are the same.
a breeze, then the first two propositions would be true and the last two would be
false. ◭
p q p↔q
0 0 1
0 1 0
1 0 0
1 1 1
Observe that p ↔ q is true if both p and q are true or both p and q are false. That is,
p ↔ q is true if p and q have the same truth value, and false if they have different truth
values.
English language synonyms for p ↔ q are “p if and only if q” (this is often shortened
to “p iff q”), and “p is necessary and sufficient for q”. Compare these synonyms to those
given for p → q.
The biconditional connective is represented in MATLAB as the relational operator: ==
This is clear from Table 7.4 by considering the value of the MATLAB statement p == q
for each row.
There is one further logical connective, but it is not used very often:
Notice that p ⊕ q and p ↔ q have opposite truth values: p ⊕ q is true if and only if
p ↔ q is false.
English language synonyms for p ⊕ q are “p or q, but not both”, and “exactly one of
p and q”.
The exclusive or connective is represented in MATLAB as the relational operator: ~=
This is clear from Table 7.5; p ⊕ q is true precisely when p and q have different truth
values.
The following table summarizes the connectives that have been introduced.
Exercises
3. Let i, w, e and r denote the propositions in Exercise 1.
4. Construct truth tables for the following compound propositions. Check your answers
using the MATLAB program truth.m together with a modified version of propos.m.
(a) ¬p → (p ∨ q)
(b) (p ↔ q) ↔ (¬p ↔ ¬q)
(c) (p ⊕ q) ⊕ r
(d) p ⊕ (q ⊕ r)
(e) (p ⊕ p) ⊕ p
142 Chapter 7. Propositional Logic
6. Give the converses, inverses and contrapositives of the following compound propo-
sitions. Notice how the contrapositive has the same meaning as the original condi-
tional, and how the inverse has the same meaning as the converse.
p q (p ∧ ¬ q) ∧ (p → q)
0 0 0 0 1 0 0 0 1 0
0 1 0 0 0 1 0 0 1 1
1 0 1 1 1 0 0 1 0 0
1 1 1 0 0 1 0 1 1 1
(1) (4) (3) (2) ∗(8)∗ (5) (7) (6)
Each value in column (8) is a “0”, which proves that the proposition is a contra-
diction. ◭
Exercises
7. Show that the following propositions are tautologies:
(a) p ∨ ¬p
(b) p → p
(c) (p → q) ↔ (¬q → ¬p)
(a) p ∧ ¬p
(b) (p ∨ q) ∧ (¬p ∧ ¬q)
(c) (p ⊕ q) ∧ (p ↔ q)
(d) p ⊕ p
Proofs of the contrapositive are preferred to direct proofs if ¬q is easier to deal with
√ √
than p. In the above example ¬q denotes “ 8 is rational”, whereas p denotes “ 2 is
irrational”. Since irrational numbers are defined only negatively, that is, as numbers that
√
are not rational, it is easier to start the proof with the assumption that 8 is rational,
√
rather than the assumption that 2 is irrational.
Some logical equivalences are listed in Table 7.6 below. In this table T stands for any
tautology and F stands for any contradiction. In each case we can show that P ⇔ Q by
showing (with a truth table) that P ↔ Q is a tautology. If the truth table for P ↔ Q
has a 0 in the final row, then P ⇔ Q is not valid, and the values of the basic propositions
corresponding to this row make up a counterexample.
Section 7.6. Logical equivalence 145
We can use Table 7.6 to simplify, or replace, long and complicated compound propo-
sitions by shorter, logically equivalent ones. These are usually easier to understand.
p ∨ (p ∧ q) ⇔ (p ∧ T) ∨ (p ∧ q) identity 5(b)
⇔ p ∧ (T ∨ q) distributivity 3(b)
⇔ p ∧ (q ∨ T) commutativity 1(a)
⇔ p∧T nullity 6(a)
⇔ p identity 5(b) ◭
There are other logical equivalences which involve the conditional and biconditional
connectives. These are shown in Table 7.7 below.
146 Chapter 7. Propositional Logic
Once you have become familiar with a few of these basic logical equivalences, proofs
of this type are very easy. However, it is important to remember that proofs via
truth tables
! are just as valid." So in this case it is enough to compute the truth
table of (p → q) ∧ (p → ¬q) ↔ ¬p and show that it is a tautology. ◭
Exercises
Many students are uncomfortable with the algebra of propositions and prefer to use
truth tables when proving logical equivalences. However, sometimes it is actually
quicker to work algebraically, especially if there is no computer handy and you have
to do all the calculations yourself. For example, an equivalence involving 4 basic
propositions needs 24 = 16 rows (and probably many columns) in its truth table,
whereas an algebraic proof might only take 6 short lines. So don’t be scared of the
algebra of propositions. It is not that hard, and the more you practice, the better
you will become. Also, you will find the following sections less difficult if you can
master the algebra of propositions.
(a) p → T ⇔ T
(b) T → p ⇔ p
(c) p → F ⇔ ¬p
(d) F → p ⇔ T
12. Use truth tables to discover whether the following “equivalences” are valid or not:
(a) p → (q → r) ⇔ (p → q) → r.
(b) p ↔ (q ↔ r) ⇔ (p ↔ q) ↔ r.
(c) (p ∨ r) ∧ (q → r) ⇔ (p → q) → r.
This section is devoted to the study of arguments just like the one in the above example.
We will learn how to analyse such arguments and determine whether they are correct or
not. Propositional calculus is perfect for this study because an argument is really just
a sequence of propositions. All but the last proposition are considered to be true (for
whatever reason), and the last proposition is claimed to be also true, by deduction.
We say that a proposition P logically implies another Q, and we write P =⇒ Q,
if Q is true whenever P is true. So just like logical equivalence, we can express logical
implication in terms of a tautology.
(H1 ∧ H2 ∧ · · · ∧ Hn ) =⇒ C.
148 Chapter 7. Propositional Logic
H1
H2
..
.
Hn
∴ C
The symbol ∴ denotes “therefore”, or “it logically follows that”. Each of the propositions
H1 , H2 , . . . , Hn is called an hypothesis or premise, and the proposition C is called the
conclusion.
The inference is said to be valid if the conditional proposition
(H1 ∧ H2 ∧ · · · ∧ Hn ) → C
◮ Example 7.17 Let us now analyse the argument in Example 7.16. The argument
can be written as:
James is a policeman or James is a footballer.
If James is a policeman, then James has big feet.
James doesn’t have big feet.
p∨f
p→b
¬b
∴ f
Section 7.7. Logical implication 149
(We won’t print the truth table here, but you may care to compute it yourself,
either by hand or using a computer.) From the truth table we find that this last
proposition is a tautology. Hence the argument is valid. That is, we can conclude
that James is a footballer from the information given. ◭
You should note that the inference in the previous example will always be valid, no
matter what p, f and b stand for. To analyse the argument, we threw away the context and
just dealt with the underlying logical structure. It is important in any field of knowledge
to be able to separate the underlying logical structure of an argument from the actual
context, because often the context makes the analysis harder rather than easier. (Just
think of the tricks that politicians use to sway their audiences!) Here are some examples
of inference that are context-free.
p∨q
p→r
∴ q∨r
To decide whether or not this inference is valid, we examine the truth table of the
corresponding conditional:
! "
p q r (p ∨ q) ∧ (p → r) → (q ∨ r)
0 0 0 0 0 0 0 0 1 0 1 0 0 0
0 0 1 0 0 0 0 0 1 1 1 0 1 1
0 1 0 0 1 1 1 0 1 0 1 1 1 0
0 1 1 0 1 1 1 0 1 1 1 1 1 1
1 0 0 1 1 0 0 1 0 0 1 0 0 0
1 0 1 1 1 0 1 1 1 1 1 0 1 1
1 1 0 1 1 1 0 1 0 0 1 1 1 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1
(1) (3) (2) (7) (4) (6) (5) ∗(11)∗ (8) (10) (9)
p → (q ∨ r)
¬q
∴ r
To decide whether or not this inference is valid, we examine the truth table of the
corresponding conditional:
This time we use the MATLAB program truth.m given in Section 7.3 to construct
the truth table. The appropriate function M-file propos.m is given below, together
with the output of the program.
function prop=propos(p)
% Define proposition(s) for truth table constructed by truth.m
global num
if p<0
% Define number of basic variables p(j)
num=3;
else
% Enter the proposition(s)
prop(1) = p(1) <= (p(2) | p(3));
prop(2) = ~p(2);
prop(3) = prop(1) & prop(2);
prop(4) = p(3);
prop(5) = prop(3) <= prop(4);
end
The first column on the right is the truth column for the first hypothesis H1 := p → (q∨r).
The second column on the right is the truth column for the second hypothesis H2 := ¬q.
The third column on the right is the truth column for H1 ∧ H2 .
The fourth column on the right is the truth column for the conclusion C := r.
The final column is the truth column for the inference (H1 ∧ H2 ) → C.
Since a 0 appears in the final column, the logical implication is not a tautology, and the
inference is invalid. So r is not a logical conclusion from the hypotheses p → (q ∨ r)
and ¬q. ◭
p→q
q
∴ p
! "
This is invalid because (p → q)∧q → p is not a tautology, as can be demonstrated
by its truth table. (Look at the row in which p = 0, q = 1.) Yet we know that
the conclusion is true, so what has gone wrong?! What has happened is that we’ve
used outside knowledge to deduce that the conclusion is true. Look again at the
information given: Socrates could be somebody’s pet lizard for all we know! It is
not valid to conclude from the two hypotheses that Socrates is a man. The moral
of this example is: The fact that the conclusion is true does not make the inference
valid. The conclusion is true because of the meanings of the words “Socrates” and
“man”, not by inference. ◭
Exercises
14. Analyse the following arguments for logical validity.
1. p =⇒ p ∨ q addition
2. p ∧ q =⇒ p simplification
3. p ∧ (p → q) =⇒ q modus ponens
4. ¬q ∧ (p → q) =⇒ ¬p modus tollens
5. (p ∨ q) ∧ ¬p =⇒ q disjunctive syllogism
6. (p → q) ∧ (q → r) =⇒ p → r hypothetical syllogism
This inference is valid because the logical implication is an example of modus po-
nens (sometimes called the rule of detachment). To see this let p denote “Socrates
is a man”, and let q denote “Socrates is mortal”. With these symbols, the inference
is formulated as
p→q
p
∴ q
This is valid because, by Table 7.8(3), p∧(p → q) =⇒ q is a logical implication. ◭
Section 7.8. Rules of inference 153
◮ Example 7.22 The following inferences can be shown to be valid by using a form
of logical implication known as modus tollens (Table 7.8, rule 4).
(a) If the rains come, then the crops will grow.
The crops did not grow.
Therefore the rains did not come.
(b) The program will stop if there is no infinite loop.
The program did not stop.
Therefore the program has an infinite loop.
(c) The dog will bark if an intruder comes.
The dog did not bark.
Therefore no intruder came.
Each of these inferences is a logical implication of the form
(p → q) ∧ ¬q =⇒ ¬p. ◭
Now here are some more complicated examples of valid inferences. Don’t forget that
when proving them we are allowed to use the logical equivalences in Tables 7.6 and 7.7
to simplify intermediate propositions.
◮ Example 7.23 Write the following argument in symbolic form and prove that it
is logically valid:
“If the batteries are flat or if the connections are loose, then the flash doesn’t work.
If the flash doesn’t work, then the photo is too dark. The photo is not too dark.
Therefore the connections are not loose.”
Denote propositions as follows (using letters suggestive of the statement):
b: “The batteries are flat”;
c: “The connections are loose”;
f: “The flash doesn’t work”;
p: “The photo is too dark”.
The argument in symbolic form is:
(b ∨ c) → f
f →p
¬p
∴ ¬c
154 Chapter 7. Propositional Logic
Proof.
(f → p) ∧ ¬p H2 ∧ H3
! =⇒" ¬f . . . (1) modus tollens
(b ∨ c) → f ∧ ¬f H1 ∧ (1)
=⇒ ¬(b ∨ c) modus tollens
⇔ ¬b ∧ ¬c De Morgan
=⇒ ¬c C simplification
Explanation. We are given three hypotheses to work with. We may use any or
all of these as many times as we wish, together with the logical equivalences in
Table 7.6 and 7.7, and the rules of logical implication in Table 7.8, in order to
arrive at the conclusion.
First we apply modus tollens to the second and third hypotheses to deduce
a useful intermediate conclusion ¬f . We then use this conclusion with the first
hypothesis in another application of modus tollens, deducing ¬(b ∨ c). We replace
this with the logically equivalent proposition ¬b ∧ ¬c. Finally, we use the rule of
simplification to arrive at the desired conclusion. ◭
Obviously this method is much shorter than constructing by hand the truth table for
!! " "
(b ∨ c) → f ∧ (f → p) ∧ ¬p → ¬c.
However, using the rules of inference takes some getting used to, and some students are
frightened into always doing a truth table. It will be far more beneficial to you if you try
to master the rules of inference. And as always, practice makes perfect.
(p ∨ q) → r
¬p → r
r → ¬s
s∨q
∴ r
Section 7.8. Rules of inference 155
Proof.
(¬p → r) ∧ (r → ¬s) H2 ∧ H3
=⇒ ¬p → ¬s hypothetical syllogism
⇔ s→p . . . (1) contrapositive
(s ∨ q) ∧ (s → p)! " ! " H4 ∧ (1)
⇔ s ∧ (s → p) ∨ q ∧ (s → p) distributivity
! "
=⇒ p ∨ q ∧ (s → p) modus ponens
Although all four hypotheses were used in the above proof, the last two are
in fact redundant, i.e. the conclusion follows logically from just the first two
hypotheses. Can you prove this? As a hint, try to get the intermediate conclusion
p → r. ◭
Exercises
15. Write the following argument in symbolic form and then prove that it is valid.
“If I study law, then I will make a lot of money. If I study archaeology, then I will
travel a lot. If I make a lot of money or travel a lot, then I will not be disappointed.
I am disappointed. Therefore I studied neither law nor archaeology.”
q→p
¬q → r
¬p → ¬r
∴ p
Next is a new style of question. You are given hypotheses, but no conclusion!
You should study the hypotheses (in symbolic form), and try to apply the rules of
inference to them in order to deduce one or more simple conclusions.
17. What simple logical conclusions, if any, follow from the following sets of hypotheses?
P := (H1 ∧ H2 ∧ · · · ∧ Hn ) → C
P := (H1 ∧ H2 ∧ · · · ∧ Hn ) → C
The inference is invalid if and only if we are able to choose values for the variables
so that each hypothesis is true and yet the conclusion is false.
◮ Example 7.25 Test the following inference for validity. If it is valid, give a proof
using the rules of inference, and if it is invalid, give a counterexample.
¬p → q
(q ∨ s) → t
p∨s
∴ t
Solution. We try to falsify the inference by making each of the hypotheses true
but the conclusion false. This is indicated by the 1s and 0s in the following working.
1
¬p → q
1
(q ∨ s) → t
1
p∨s
0
∴ t
158 Chapter 7. Propositional Logic
0
∴ t
Now look at the information we have about the second hypothesis: ((q ∨ s) → 0) =
1. This forces q ∨ s = 0, because (1 → 0) ∕= 1.
1
¬p → q
0 1 0
(q ∨ s) →t
1
p∨s
0
∴ t
1 0
¬p →q
000 1 0
(q ∨s) →t
10
p ∨s
0
∴ t
The third hypothesis says p ∨ 0 = 1. Thus p = 1. But does this conflict with
the first hypothesis? It says ¬p → 0 = 1, i.e., ¬p = 0, i.e., p = 1. So these two
hypotheses agree.
01 1 0
¬ p →q
000 1 0
(q ∨s) →t
110
p∨ s
0
∴ t
The working is now complete. We have successfully assigned values to the variables
so that each hypothesis is true and yet the conclusion is false. So the inference is
invalid, and a counterexample is p = 1, q = s = t = 0. ◭
Section 7.9. Testing validity of inferences 159
◮ Example 7.26 Test the following inference for validity. If it is valid, give a proof
using the rules of inference, and if it is invalid, give a counterexample.
(¬p ∨ q) → r
p→s
¬s ∨ q
∴ r
Solution. We try to falsify the inference by making each of the hypotheses true
but the conclusion false.
1
(¬p ∨ q) → r
1
p→s
1
¬s ∨ q
0
∴ r
This forces r = 0.
1 0
(¬p ∨ q) →r
1
p→s
1
¬s ∨ q
0
∴ r
The first hypothesis says ((¬p ∨ q) → 0) = 1. Thus ¬p ∨ q = 0, i.e., ¬p = q = 0,
i.e., p = 1 and q = 0.
0100 1 0
( ¬ p ∨ q ) →r
1 1
p→ s
10
¬s ∨q
0
∴ r
The second hypothesis says 1 → s = 1. Thus s = 1. But the third hypothesis
says ¬s ∨ 0 = 1, i.e., ¬s = 1, i.e., s = 0. So the second and third hypotheses give
conflicting information. Thus we have failed to find an assignment of values to the
variables so that each hypothesis is true and the conclusion is false, that is, there
is no counterexample to this inference. Hence this inference is valid. Now we give
160 Chapter 7. Propositional Logic
0
∴ t
Section 7.10. Solutions to exercises 161
H1 := ¬p → q = ¬1 → 0 = 0 → 0 = 1.
H2 := (q ∨ s) → t = (0 ∨ 0) → 0 = 0 → 0 = 1.
H3 := p ∨ s = 1 ∨ 0 = 1.
C := t = 0. ◭
Exercises
In each of the following questions, analyse the inference for logical validity. If
the inference is valid, give a proof using the rules of inference. Otherwise, give a
counterexample.
18.
q → ¬t
s↔r
¬q ∨ s
∴ t∨r
19.
p → (q → ¬w)
¬s → q
¬t
¬p ∨ t
∴ w→s
(iii) Sarah is inside or riding her bicycle, and if she is inside then she is watching
TV.
(iv) If Sarah is not riding her bicycle, then she is inside and either watching
TV or eating her dinner.
(b) (i) ¬w → ¬i
(ii) (i ∧ e) → ¬r
4. (a)
p q ¬ p → (p ∨ q)
0 0 1 0 0 0 0 0
0 1 1 0 1 0 1 1
1 0 0 1 1 1 1 0
1 1 0 1 1 1 1 1
(2) (1) ∗(6)∗ (3) (5) (4)
(b)
p q (p ↔ q) ↔ (¬ p ↔ ¬ q)
0 0 0 1 0 1 1 0 1 1 0
0 1 0 0 1 1 1 0 0 0 1
1 0 1 0 0 1 0 1 0 1 0
1 1 1 1 1 1 0 1 1 0 1
(1) (3) (2) ∗(9)∗ (5) (4) (8) (7) (6)
(e)
p (p ⊕ p) ⊕ p
0 0 0 0 0 0
1 1 0 1 1 1
(1) (3) (2) ∗(5)∗ (4)
7. (c)
p q (p → q) ↔ (¬q → ¬p)
0 0 1 1 1 1 1
0 1 1 1 0 1 1
1 0 0 1 1 0 0
1 1 1 1 0 1 0
(1) ∗(5)∗ (2) (4) (3)
8. (c)
p q (p ∨ q) ∧ (¬p ∧ ¬q)
0 0 0 0 1 1 1
0 1 1 0 1 0 0
1 0 1 0 0 0 1
1 1 1 0 0 0 0
(1) ∗(5)∗ (2) (4) (3)
10. (a)
¬(¬p ∨ ¬q) ⇔ ¬¬p ∧ ¬¬q De Morgan
⇔ p∧q double negation %
11. (a)
p → T ⇔ ¬p ∨ T implication
⇔ T nullity %
(b)
T → p ⇔ ¬T ∨ p implication
⇔ F∨p
⇔ p identity %
! "
14. (a) Valid. (m ∨ t) ∧ ¬m → t is a tautology.
! "
(b) Invalid! (a → u) ∧ ¬a → ¬u is not a tautology. The reason this argument is
invalid is because, from the information given, it could be that Sunday follows
another day besides Saturday. Remember that you need to separate the logic
from its context.
l→m
a→t
(m ∨ t) → ¬d
d
∴ ¬l ∧ ¬a
Proof. ! "
(m ∨ t) → ¬d ∧ d H3 ∧ H4
=⇒ ¬(m ∨ t) modus tollens
⇔ ¬m ∧ ¬t . . . (1) De Morgan
! =⇒ ¬m
" . . . (2) simplification
(l → m) ∧ ¬m H1 ∧ (2)
=⇒ ¬l . . . (3) modus tollens
¬m ∧ ¬t (1)
! =⇒ " ¬t . . . (4) simplification
(a → t) ∧ ¬t H2 ∧ (4)
=⇒ ¬a . . . (5) modus tollens
¬l ∧ ¬a (3) ∧ (5)
⇔ ¬l ∧ ¬a C %
Section 7.10. Solutions to exercises 165
16.
¬p → ¬r H3
⇔ r→p . . . (1) implication
(¬q → r) ∧ (r → p) H2 ∧ (1)
=⇒ ¬q → p . . . (2) hypothetical syllogism
(q → p) ∧ (¬q → p) H1 ∧ (2)
⇔ (¬q ∨ p) ∧ (q ∨ p) implication
⇔ (¬q ∧ q) ∨ p distributivity
⇔ F∨p excluded middle
⇔ p C identity %
Predicate Logic
166
Section 8.2. Quantifiers 167
8.2 Quantifiers
The logical connectives of propositional calculus allow us to deal with two, three, or any
finite combination of propositions. For example, the fact that {−5, 2, 8} is the truth set of
the predicate P1 in Example 8.1 could be summarised by the true compound proposition
P1 (−5) ∧ P1 (2) ∧ P1 (8). However, the fact that the predicate P2 (x) is true for all real
numbers x cannot be expressed in this way because there are infinitely many real numbers.
To express the truth set of P2 (x) as a proposition we need to introduce a symbol which
means “for all”.
The expression “for all” is known as the universal quantifier, and is denoted by ∀,
which is an inverted A. Other translations of ∀ are “for each” and “for every”. Another
common translation is “for any” but this can be confused with “for any one” so you
should avoid it. The universal quantifier is used with a predicate such as P (x) to build
the compound proposition ∀x P (x) which is read as “for all x, P (x)”. It is important to
realise that ∀x P (x) is a proposition which has a truth value:
$
true if P (x) is true for every x ∈ U;
∀x P (x) is
false otherwise.
The other quantifier of predicate logic is the existential quantifier, which is denoted
by ∃, a reversed E. This stands for “there exists”, or “there is”. The expression ∃x P (x)
is a proposition which has a truth value:
$
true if P (x) is true for at least one x ∈ U;
∃x P (x) is
false otherwise.
“x5 = x + 1” (proposition P4 (x) of Example 8.1) is a predicate which is false for most
values of x. However there is one value of x, approximately equal to 1.67303, for which
“x5 = x + 1” is true. For this reason ∃x (x5 = x + 1) is a true proposition.
If the universal set U is finite, say U = {x1 , x2 , x3 , . . . , xn }, then the quantifiers ∀ and
∃ behave like “large” conjunctions and disjunctions. In fact, if P (x) is any predicate over
U, then
∀x P (x) ⇔ P (x1 ) ∧ P (x2 ) ∧ · · · ∧ P (xn )
and ∃x P (x) ⇔ P (x1 ) ∨ P (x2 ) ∨ · · · ∨ P (xn ).
The quantifiers ∀ and ∃ can be combined with the connectives of the propositional
calculus.
◮ Example 8.5 There are many propositions and theorems in mathematics of the
form ∀x (P (x) → Q(x)). Some of these are listed below, and you will be able to
think of many more.
(a) If x2 − 2x + 1 = 0, then x = 1.
(b) If n is even, then n2 is even.
(c) If C is a circle, then its circumference divided by its diameter equals π. ◭
◮ Example 8.6 Let P (n) be the predicate on N defined by “n is even”. Naturally,
¬P (n) means “n is odd”. The following English translations show how ¬ is to be
interpreted when used with quantifiers.
∀n ¬P (n) Every n ∈ N is odd.
¬∀n P (n) Not every n ∈ N is even.
∃n ¬P (n) There is an n ∈ N that is odd.
¬∃n P (n) There is no n ∈ N that is even. ◭
In the previous example, look at the first and fourth sentences: they have the same
meaning. And look at the second and third sentences: they have the same meaning
too. This is no coincidence. Although there are four ways of combining one ¬ with one
quantifier, there are only two meanings.
If the universal set is finite, say U = {x1 , x2 , . . . , xn }, then these logical equivalences
can be reformulated as:
! "
(a) ¬P (x1 ) ∧ ¬P (x2 ) ∧ · · · ∧ ¬P (xn ) ⇔ ¬ P (x1 ) ∨ P (x2 ) ∨ · · · ∨ P (xn ) ,
! "
(b) ¬P (x1 ) ∨ ¬P (x2 ) ∨ · · · ∨ ¬P (xn ) ⇔ ¬ P (x1 ) ∧ P (x2 ) ∧ · · · ∧ P (xn ) .
This is why these equivalences are known as the Generalised De Morgan’s laws.
Proof of the laws. (a) ∀x ¬P (x) is true if and only if ¬P is a tautology, that is, P
is a contradiction. Similarly, ¬∃x P (x) is true if and only if ∃x P (x) is false, that is,
P is a contradiction. Since both ∀x ¬P (x) and ¬∃x P (x) are logically equivalent to the
statement “P is a contradiction”, they are logically equivalent to each other.
(b) By a similar argument, we can show that ∃x ¬P (x) and ¬∀x P (x) are both logically
equivalent to the statement “P is not a tautology”, and are therefore logically equivalent
to each other. %
De Morgan’s laws allow us to simplify expressions that contain two or more negations.
For example,
¬∀x ¬P (x) ⇔ ¬¬∃x P (x) De Morgan
⇔ ∃x P (x) double negation
Exercises
1. Consider the following predicates over the set of all birds.
F (x) denotes “x can fly” W (x) denotes “x has wings”
N (x) denotes “x builds a nest” E(x) denotes “x lays eggs”
is to use De Morgan. By his laws, ∃x ∈ N ¬(x3 < 42) ⇔ ¬∀x ∈ N (x3 < 42). This
second proposition is the negation of (a), and so must have the opposite truth
value, i.e., it must be true. ◭
Exercises
3. Give a counterexample to the proposition “For each natural number n, 4n + 1 is
prime.”
3. n = 3
Mathematical Induction
9.1 Induction
Now we will combine some of the ideas from the previous chapters to learn a recursive
method for proving propositions of the form ∀n ∈ N P (n).
The principle of mathematical induction has two parts. The first part (B) is known
as the basis for induction, and second part (R) is known as the recursive or inductive
step.
Why should we accept that these two parts together constitute a proof that the propo-
sitions P (0), P (1), P (2), . . . are all true? The reason is that together they can be used to
show that any one of these propositions is true. For suppose that n is a natural number.
To prove that P (n) is true we would argue as follows:
The basis (B) tells us that P (0) is true.
The recursive
! step (R) with " n = 0 tells us that P (0) → P (1) is true.
P (0) ∧ P (0) → P (1) =⇒ P (1) by modus ponens.
Thus P (1) is true.
The recursive " n = 1 tells us that P (1) → P (2) is true.
! step (R) with
P (1) ∧ P (1) → P (2) =⇒ P (2) by modus ponens.
Thus P (2) is true.
And so on . . .
By repeated use of (R) and modus ponens we will eventually conclude that P (n) is true.
173
174 Chapter 9. Mathematical Induction
1 + 2 + 4 + · · · + 2n+1 = (1 + 2 + 4 + · · · + 2n ) + 2n+1
= (2n+1 − 1) + 2n+1 by inductive hypothesis
= 2(2n+1 ) − 1
= 2n+2 − 1 as required.
! "
So ∀n ∈ N P (n) → P (n + 1) is true. This completes part (R) of the proof.
Since both the basis (B) and the recursive step (R) have been proved, by the
principle of mathematical induction, ∀n ∈ N P (n) is true. ◭
Note that instead of writing ∀n ∈ I P (n), where I is the set just defined, we can also
write ∀n # i P (n).
◮ Example 9.3 Exponential functions grow faster than power functions. By this we
mean that if a > 0 and if b > 0, then an > nb , for all “sufficiently large” natural
numbers n.
To illustrate what is meant by “for all sufficiently large natural numbers n”,
we shall consider the particular case in which a and b are both equal to 2. Values
of 2n and n2 for the first few natural numbers n appear below.
n 0 1 2 3 4 5 6 7 8
n
2 1 2 4 8 16 32 64 128 256
n2 0 1 4 9 16 25 36 49 64
This table makes it clear that “2n > n2 ” is not true for all numbers n. However
it does suggest that “2n > n2 ” is true for n # 5, that is, ∀n # 5 (2n > n2 ) is true.
We can prove this using the modified principle of induction.
Proof. Let P (n) be the predicate on N defined by “2n > n2 ”.
BASE STEP: Because 25 = 32 and 52 = 25, P (5) is true. This establishes (B).
INDUCTIVE STEP: Now suppose that P (n) is true for some n # 5. We have
to prove that “2n+1 > (n + 1)2 ” is true.
2n+1 = 2(2n )
> 2(n2 ) by inductive hypothesis
= n2 + n2
= n2 + n · n
# n2 + 5 · n because n # 5
= n2 + 2n + 3n
> n2 + 2n + 1 because n # 5 so 3n # 15 > 1
= (n + 1)2
Our last example of induction shows how to prove that a formula for the general term
defined by a simple recurrence relation is correct.
Thus P (n + 1) is true.
So, by induction, ∀n ∈ N+ P (n) is true. ◭
Exercises
Note that if a question asks you to prove something by induction, it could mean the
ordinary principle of induction or the modified principle.
1. Use mathematical induction to prove that each of the following statements is true
for all n ∈ N.
2. Let r be any real number different from 1. Use induction to prove the following
formula for all n ∈ N:
1 − rn+1
1 + r + r2 + · · · + rn = .
1−r
3. Use induction to prove that each of the following statements is true for all n ∈ N.
4. Use modified induction to prove that each of the following statements is true for all
n ∈ N+ .
(a) n2 # n + 2 for n # 2.
(b) n3 # n2 + 3 for n # 2.
(c) 3n # n3 for n # 5.
6. Below are recursive definitions of sequences, together with a suggested general for-
mula. Use induction to prove that the formula is correct.
3. (a) Proof. Let P (n) be the predicate on N defined by “6 is a factor of n(n2 + 5)”.
180 Chapter 9. Mathematical Induction
BASE STEP: P (0) says “6 is a factor of 0”, which is true. (Explanation: Every
n ∈ N+ is a factor of zero because n0 is a whole number.)
INDUCTIVE STEP: Assume P (n) is true for some n ∈ N, that is, assume that
6 is a factor of n(n2 + 5). This means that n(n2 + 5) = 6k for some whole
number k. We will use this k later in our working. Now we have to prove
P (n + 1), that is, “6 is a factor of (n + 1)((n + 1)2 + 5)”.
5n+1 − 1 = 5.5n − 1
= (4 + 1)5n − 1
= 4.5n + 5n − 1
= 4.5n + (5n − 1)
= 4.5n + 4k by ind. hyp.
= 4(5n + k)
4. (a) Hint. Show that (n + 1)(2(n + 1) − 1)(2(n + 1) + 1)/3 and n(2n − 1)(2n +
1)/3 + (2(n + 1) − 1)2 both multiply out to the same thing.
(n + 1)2 = n2 + 2n + 1
≥ (n + 2) + 2n + 1 by ind. hyp.
= 3n + 3
≥ n + 3 since n ≥ 2
Section 9.2. Solutions to exercises 181
(n + 1)3 = n3 + 3n2 + 3n + 1
≥ (n2 + 3) + 3n2 + 3n + 1 by ind. hyp.
= (n2 + 2n + 1) + 3n2 + n + 3
= (n + 1)2 + 3n2 + n + 3
≥ (n + 1)2 + 3 since n ≥ 2
(c) Hint. Assuming 3n ≥ n3 is true for some n ≥ 5, we must show that 3n+1 ≥
(n + 1)3 i.e. 3n+1 ≥ n3 + 3n2 + 3n + 1.
3n+1 = 3.3n
≥ 3n3 by ind. hyp.
= n3 + 2n3
> n3 + n3
≥ n3 + 5n2 since n ≥ 5
= n3 + 3n2 + n2 + n2
> n3 + 3n2 + 3n + 1
7. pn = n+1
2n
(n # 2).
Boolean Algebra
Definition. A Boolean algebra is a set B, which has two special elements denoted by
0 and 1, together with three operations denoted by ′ , ·, and +, satisfying the properties
listed in Table 10.1.
182
Section 10.1. Definition and examples 183
The elements of B are sometimes called terms. Terms of the Boolean algebra can be
combined using the operations ′ , ·, and + to produce new terms.
The operation ′ acts on one variable at a time: x′ (read as “x dash”) is called the
complement of x.
The operation + acts on two variables at a time: x + y is called the sum of x and y.
The operation · also acts on two variables at a time: x · y is called the product of x
and y. Normally we omit the · and just write xy for the product: x · y is used only when
we wish to emphasise the multiplication.
To show that our abstract Boolean algebras have practical applications, we give some
examples.
◮ Example 10.1 Let U be any set, and let P(U) be its power set, the set of all
subsets of U (see Example 4.1). The sets U and ∅ are the special elements 1 and
0, respectively. The operations that make P(U) a Boolean algebra are the usual
set operations of (complement), ∪ (union) and ∩ (intersection), corresponding
to ′ , + and ·, respectively. It is easy to check that the axioms of a Boolean algebra
are satisfied by P(U), with these operations and these special elements. ◭
◮ Example 10.2 Let F (210) denote the set of all positive integer factors of 210.
That is,
F (210) := {1, 2, 3, 5, 6, 7, 10, 14, 15, 21, 30, 35, 42, 70, 105, 210}.
We can make this finite set a Boolean algebra by defining operations , ∨, and ∧,
as follows.
184 Chapter 10. Boolean Algebra
We could define F (n), and the operations , ∨, and ∧ on F (n), for any positive
integer n. However F (n) is a Boolean algebra only for special positive integers n. We
shall look more closely at this in one of the problems at the end of the section.
The Boolean algebra F (210), and many others like it, are really just curiosities, with
little or no practical value. We shall return to more serious matters. The next example of
a Boolean algebra is very simple, but of great importance in logic and computer science.
◮ Example 10.3 The simplest of all Boolean algebras has just two elements, namely
0 and 1. We denote this Boolean algebra by B2 . The complements, products and
sums of the elements 0 and 1 are as shown in Table 10.2 below.
x x′ · 0 1 + 0 1
0 1 0 0 0 0 0 1
1 0 1 0 1 1 1 1
The 0–1 Boolean algebra B2 is important in logic and computer science precisely
because it has only two elements. In logic, the elements 1 and 0 are used to represent
the states “true” and “false”, respectively, which we associate with propositions. The
operations ′ , · and + on B2 defined in Table 10.2 correspond precisely to the logical
connectives ¬, ∧ and ∨ between propositions. This is clear by comparing Table 10.2 with
the truth tables for ¬, ∧ and ∨ given in Table 7.1.
In computer science the elements of B2 can be treated as binary digits, which in turn
can be regarded as the basic units of information, or as digits of the binary number
system. The addition and multiplication tables for B2 have a close relationship with
binary arithmetic. In fact, the products in Table 10.2 correspond exactly to the products
Section 10.1. Definition and examples 185
of the binary digits 0 and 1. However the sums in Table 10.2 do not correspond exactly
to the sums of the binary digits 0 and 1—there are different results for the sum 1 + 1.
There are many other systems where the two elements of B2 are used to represent the
two possible states. Examples of such systems include electric circuits in which a current
may or may not be flowing, capacitors which may be charged or not charged, switches that
may be open or closed, and conducting coils which may be magnetised in one direction
or the other.
From now on we shall assume that all Boolean variables that we encounter can take
only the values 0 or 1. With this simplification we shall be able to use truth tables, as
well as the general axioms of Boolean algebra, to prove results such as the equality of
Boolean expressions.
Exercises
1. The set of real numbers has no zero-divisors. That is, if x and y are real numbers,
and if xy = 0, then x = 0 or y = 0. Show by counterexample that there is no such
general rule for Boolean algebras. That is, give an example of a Boolean algebra B,
and non-zero elements x and y in B, such that xy = 0.
2. The cancellation law also holds for real numbers. That is, if x, y and z are
real numbers, and if xz = yz, and if z ∕= 0, then x = y. Show, by giving a
counterexample, that the cancellation law fails in Boolean algebras.
186 Chapter 10. Boolean Algebra
(a) Write 210, and then each of the elements of F (210), as a product of prime
numbers.
(b) For each element m in F (210) describe how the prime factors of m and the
prime factors of m are related.
(c) Describe how the prime factors of m ∧ n and m ∨ n are related to the prime
factors of m and n.
(d) Describe the correspondence between F (210) and the Boolean algebra of sub-
sets of {2, 3, 5, 7}.
For any positive integer n, we can define the operations , ∧, and ∨ on F (n), the
set of all positive factors of n, as in Example 10.2.
These theorems (except 10(a)–(b)) correspond to the parts of Table 7.6 that are not
already listed in Table 10.1. We will see proofs of some of these theorems in the following
examples.
You may have noticed a kind of symmetry in the axioms and properties in Tables 10.1
and 10.3. The following definition explains this so-called duality.
Principle of duality. If two Boolean expressions can be shown to be equal, then the
dual expressions are also equal.
Note that the axioms and properties listed in Tables 10.1 and 10.3 are in dual pairs,
except for property 8, which is its own dual.
188 Chapter 10. Boolean Algebra
Notice that (a) and (b) are duals, and that each line in the proof of (b) is
the dual of the corresponding line in the proof of (a). This is typical of this type
of proof, and instead of writing dual proofs as we have done here, we can simply
appeal to the duality principle. ◭
◮ Example 10.6 We shall establish the following dominance laws.
(a) x + 1 = 1, and (b) x · 0 = 0.
Proof. (a) x+1 = (x + 1) · 1 axiom 4(b)
′
= (x + 1)(x + x ) axiom 5(a)
= x + 1x′ axiom 3(a)
= x + x′ 1 axiom 1(b)
′
= x+x axiom 4(b)
= 1 axiom 5(a)
Since (b) is the dual of (a) and (a) is true, the principle of duality tells us that
(b) is also true. ◭
We can use theorems already proved to prove new ones; it is not necessary to go back
to the axioms at every step.
◮ Example 10.7 We shall prove the following absorption laws:
(a) x + xy = x and (b) x(x + y) = x.
Proof. (a) x + xy = x1 + xy axiom 4(b)
= x(1 + y) axiom 3(b)
= x(y + 1) axiom 1(a)
= x1 dominance
= x axiom 4(b)
Since (b) is the dual of (a) and (a) is true, the principle of duality tells us that
(b) is also true. ◭
Section 10.2. Proofs in Boolean algebra 189
Because of the correspondence between Boolean algebra and the algebra of logical
propositions, the MATLAB programs truth.m and propos.m given in Section 7.3 can be
used to verify that two Boolean expressions are equal. To verify that y + (x′ + y)′ = x + y
from the previous example, we show that q ∨ ¬(¬p ∨ q) ↔ p ∨ q is a tautology. The
appropriate function M-file propos.m and the output of the program are given below.
function prop=propos(p)
% Define the proposition(s) for the truth table
global num
if p<0
% Define number of basic variables p(j)
num=2;
else
% Enter the proposition(s)
prop(1) = p(2) | ~(~p(1) | p(2));
prop(2) = p(1) | p(2);
prop(3) = prop(1) == prop(2);
end
Since all the entries in the last column are equal to 1, the result is verified.
190 Chapter 10. Boolean Algebra
◮ Example 10.9 Prove that (x+y = y) =⇒ (x′ +y = 1). This is a common type of
statement. It says “If the first equation is true, then so is the second”. Therefore,
in order to prove the second equation, we must use the first one somewhere in the
proof. At the point where we use it we write “given” to indicate that this is neither
an axiom nor a theorem, but that we were assuming it is true for this particular x
and y.
Proof.
x′ + y = x′ + (x + y) given
= (x′ + x) + y associativity
= (x + x′ ) + y commutativity
= 1+y complementation
= y+1 commutativity
= 1 dominance ◭
The above proof contains two steps that use the commutativity axiom, namely to get
from lines 2 to 3 and from 4 to 5. Steps like this are considered so trivial that they are
not normally written. So the proof would usually be presented as follows:
x′ + y = x′ + (x + y) given
= (x′ + x) + y associativity
= 1+y complementation
= 1 dominance
You may use this abbreviated style of proof in your assignments, tests and exams.
To verify the result in the previous example using MATLAB, we show that (p ∨ q ↔
q) → (¬p ∨ q ↔ T ) is a tautology. The appropriate function M-file propos.m, and the
output of the program truth.m are given below.
function prop=propos(p)
% Define the proposition(s) for the truth table
global num
if p<0
% Define number of basic variables p(j)
num=2;
else
% Enter the proposition(s)
prop(1) = (p(1) | p(2)) == p(2);
prop(2) = (~p(1) | p(2)) == 1;
prop(3) = prop(1) <= prop(2);
end
Section 10.2. Proofs in Boolean algebra 191
Since all the entries in the last column are equal to 1, the result is verified.
Exercises
4. Prove each of the following Boolean identities. Use MATLAB to verify the results.
(a) x + x′ y = x + y
(b) (x + y + z)(x′ + y ′ + z ′ ) = (xyz + x′ y ′ z ′ )′
(a) (x + y)(x′ + y ′ )′ = xy
(b) x + x′ y + x′ y ′ = 1
(c) (x + y)(x + y ′ ) = x
(d) xy ′ z ′ + xyz + x′ yz + xyz ′ = yz + xz ′
6. We saw in Exercise 1 that the zero-divisors law for numbers doesn’t work in Boolean
algebra. However, there is a zero-divisors law for Boolean algebra: it is just a bit
more complicated than the one for numbers. Here it is: If xy = 0 and xy ′ = 0, then
x = 0. Prove this law.
7. There is also a cancellation law for Boolean algebra (see Exercise 2): If xz = yz and
xz ′ = yz ′ , then x = y. Prove this law.
9. Show that each element in a Boolean algebra has just one complement. That is,
show that if xy = 0 and x + y = 1, then y = x′ .
10. Which of the following propositions are always true in a Boolean algebra? Justify
your answer with either a proof or a counterexample. Check your answer using
MATLAB.
192 Chapter 10. Boolean Algebra
Symmetric difference. The first one we will look at is one that we’ve used before in
logic: the “symmetric difference” or “exclusive or” operation. In logic we defined ⊕ in
words, but we cannot do this in the abstract setting of Boolean algebra: we may only
refer to the three inherent operations. It turns out that the correct way to define the
exclusive or is:
x ⊕ y := xy ′ + x′ y.
You should check that this definition matches our idea of ⊕ from logic. That is, check on
a truth table that p ⊕ q = (p ∧ ¬q) ∨ (¬p ∧ q). Note that x ⊕ y is also written x XOR y.
NOR. The third and final new operation we consider is the dual of ↑. It is the NOR
operation, denoted by x ↓ y or x NOR y, and defined by
x ↓ y := (x + y)′ .
NOR is short for NOT OR: If you write (x + y)′ in the notation of propositional logic
you will see where this name comes from.
NAND and NOR are examples of universal operations. This means that the three
basic operations of Boolean algebra can be expressed using only NAND or only NOR.
Here are the relevant expressions using only NAND:
x′ = x ↑ x;
x + y = (x ↑ x) ↑ (y ↑ y);
xy = (x ↑ y) ↑ (x ↑ y).
Section 10.4. Boolean expressions 193
x′ = x ↓ x;
x + y = (x ↓ y) ↓ (x ↓ y);
xy = (x ↓ x) ↓ (y ↓ y).
Notice how these expressions are dual to those given for NAND.
Exercises
11. Show that for any elements x, y and z in a Boolean algebra B,
(a) x ⊕ y = y ⊕ x.
(b) (x ⊕ y) ⊕ z = x ⊕ (y ⊕ z).
(c) x + y = (x ⊕ y) + xy.
(d) x(y ⊕ z) = xy ⊕ xz.
◮ Example 10.10 The following formulas define five 3-variable Boolean functions.
(a) f1 (x, y, z) := xy ′ + yz ′ + zx′ .
(b) f2 (x, y, z) := xyz ′ + xy ′ z + x′ yz + x′ y ′ z + x′ yz ′ + xy ′ z ′ .
(c) f3 (x, y, z) := (x + y + z)(x′ + y ′ + z ′ ).
(d) f4 (x, y, z) := x + y ′ .
(e) f5 (x, y, z) := 1. ◭
194 Chapter 10. Boolean Algebra
The expressions on the right hand side of these formulas can all be regarded as ex-
amples of 3-variable Boolean expressions. The first three involve all three variables, but
the last two don’t. However it does not normally matter whether we regard x + y ′ as
an expression in two variables or in more. If it does matter, then we need to indicate
carefully just how we are to regard such an expression. In the same manner, 0 and 1 can
be regarded as Boolean expressions in any number of variables.
Every Boolean expression determines a Boolean function and every Boolean function
can be defined by a Boolean expression. However the correspondence is not 1–1. There are
many different Boolean expressions for each Boolean function. We say that two Boolean
expressions are equivalent if one can be shown to equal the other by applying the laws
of Boolean algebra.
(x + y ′ z ′ )′ = x′ (y ′ z ′ )′ = x′ (y ′′ + z ′′ ) = x′ (y + z) ◭
The problem of deciding whether two Boolean expressions are equivalent is very im-
portant because of its application to logic and to the design of electronic circuits. In these
two cases the 0–1 Boolean algebra B2 is the underlying algebraic system. The decision
problem can be done systematically for B2 by using truth tables, but because the truth
table for a Boolean expression with n variables has 2n rows, this becomes very slow as n
increases, even for computers! No fast and systematic method for testing the equivalence
of Boolean expressions in many variables has yet been discovered.
◮ Example 10.12 Here are four minterms in 3 variables: xyz, x′ yz, xy ′ z ′ and x′ y ′ z ′ .
There are eight altogether. You shouldn’t have any trouble writing down the other
four. ◭
Section 10.5. Disjunctive normal form 195
Any Boolean expression that is the sum of different minterms in the same variables is
said to be a disjunctive normal form (DNF) or in minterm canonical form. It is
also convenient to regard 0 as a disjunctive normal form.
◮ Example 10.13 In Example 10.10 the expression in (b) is in DNF because each
summand is different and involves all three variables. The expressions in (a), (d)
and (e) are not in DNF because there are summands that do not involve all three
variables. The expression in (c) is not even a sum of products and so cannot be
in DNF. ◭
The following theorem (which we won’t prove in this unit) shows why the DNF is so
important.
Theorem 10.1 Every Boolean expression is equivalent to a disjunctive normal form,
with equivalent Boolean expressions having the same DNF.
The theorem gives us a way to test whether two Boolean expressions are equivalent
or not:
Method for testing equivalence. Convert each expression to its DNF. If the DNFs
are the same, then the original expressions are equivalent. If the DNFs are different, then
the original expressions are not equivalent.
The next example shows how we convert a Boolean expression to its DNF.
◮ Example 10.14 To convert xy ′ + yz ′ + x′ y to its DNF, we first introduce the third
variable into each summand, and then expand and simplify.
xy ′ + yz ′ + x′ y = xy ′ (z + z ′ ) + (x + x′ )yz ′ + x′ y(z + z ′ )
= xy ′ z + xy ′ z ′ + xyz ′ + x′ yz ′ + x′ yz + x′ yz ′
= xy ′ z + xy ′ z ′ + xyz ′ + x′ yz ′ + x′ yz (DNF) ◭
There are algorithms for converting a Boolean expression to its DNF, but we will not
study them here. We shall just give a further example that illustrates another aspect of
the procedure.
◮ Example 10.15 To convert xy ′ + (x + yz)′ to DNF, we first use De Morgan’s laws
to simplify the expression to a sum of products.
xy ′ + (x + yz)′ = xy ′ + x′ (yz)′
= xy ′ + x′ (y ′ + z ′ )
= xy ′ + x′ y ′ + x′ z ′
To complete the conversion to DNF, we introduce a third variable into each term:
xy ′ + x′ y ′ + x′ z ′ = xy ′ (z + z ′ ) + x′ y ′ (z + z ′ ) + x′ (y + y ′ )z ′
= xy ′ z + xy ′ z ′ + x′ y ′ z + x′ y ′ z ′ + x′ yz ′ + x′ y ′ z ′
= xy ′ z + xy ′ z ′ + x′ y ′ z + x′ y ′ z ′ + x′ yz ′ (DNF) ◭
196 Chapter 10. Boolean Algebra
Exercises
13. Obtain the DNF of each of the following:
(a) x + y.
(b) (x + y + z)(x′ + y ′ + z ′ ).
(c) (x + yz ′ )′ (y + zx′ )′ (z + xy ′ )′ .
(d) (x′ + y ′ z)x(y + z ′ ).
14. Which of the following Boolean expressions are equivalent? Check your answers
using MATLAB.
(a) xy + yz + zx.
(b) x.
(c) (x + y)(y + z)(z + x).
(d) (x + y)(x + y ′ ).
(e) x(y + z) + yz.
(f ) xy + (x′ + y)′ .
In the next chapter we will see how to build the logic network for a Boolean expression.
From this an electronic circuit can be constructed which models the behaviour of the
Boolean expression as a function. Boolean operations are represented in logic networks as
gates. Since logic networks with many gates are expensive to make and slow to operate,
it is useful to simplify the Boolean expressions as much as possible. One way of doing
this is to find equivalent Boolean expressions which are sums of products and which use
the least possible number of terms. We call such expressions minimal sums of products.
Section 10.6. Minimal sum of products form 197
To say precisely what is meant by a minimal sum of products, we need more termi-
nology. A literal is any variable x, or its complement x′ , and a fundamental product
is any product of literals. The constants 0 and 1 can also be regarded as fundamental
products. As a matter of interest, fundamental products can be defined recursively as
follows.
(B) 0 and 1 are fundamental products, and every literal is a fundamental product;
(R) If P1 and P2 are fundamental products, then P1 P2 is a fundamental product.
algebra analogue of truth tables for propositions. Before we can explain the method itself
we need to introduce the framework, sort of like an empty truth table.
A Karnaugh table for a Boolean expression in n variables is a rectangular table with
n
2 entries, one for each of the possible minterms. The table is arranged so that entries
corresponding to any two minterms which differ in just one variable are adjacent (i.e.,
they share a common edge). For example, the minterms xy ′ z and x′ y ′ z differ in just the
x variable, whereas xyz and x′ yz ′ differ in both x and z.
A Karnaugh table for Boolean expressions in two variables is shown in Figure 10.1. It
is not usual to label the entries themselves, but the rows and columns instead. The label
of any entry in the table is the “product” of the corresponding row and column labels.
y y′
x
x′
A Karnaugh table for Boolean expressions in three variables is shown in Figure 10.2.
It is a 2 × 4 rectangular table with each square corresponding to one of the eight possible
minterms. The labels on the long side are the four possible minterms in two of the
variables, and the labels on the short side are the minterms in the remaining variable. It
does not matter which two variables we group together. Note that to satisfy all of the
adjacency conditions we imagine that the short sides at each end are joined together, as if
the rectangle is wrapped around into a cylinder. That is, we consider the first and fourth
columns to be adjacent, because their labels only differ by one literal.
yz yz ′ y′z′ y′z
x
x′
A Karnaugh table for Boolean expressions in four variables is shown in Figure 10.3.
It is a 4 × 4 rectangular table. The labels on one side are the minterms in two of the
variables, and the labels on the other side are the minterms in the remaining two variables.
Again, in order to satisfy the adjacency requirements, we must imagine that the top edge
is joined to the bottom edge, and that the left edge is joined to the right edge. It may
help to imagine that the table is drawn on a doughnut (or torus).
Now that we have the framework we can discuss Karnaugh’s method for finding an
Section 10.6. Minimal sum of products form 199
yz yz ′ y′z′ y′z
wx
wx′
w ′ x′
w′ x
MSP form of a Boolean expression. Each square or entry in a Karnaugh table corresponds
to a minterm. So we can regard a collection of squares in a Karnaugh table as a sum of
minterms. The Karnaugh map of a Boolean expression is the collection of squares in
a Karnaugh table corresponding to its disjunctive normal form. We mark each of these
squares with a + (or some other symbol) to highlight them. So there is one + symbol for
each minterm in the DNF.
However, you do not need to convert an expression to DNF before you can draw its
Karnaugh map. All you need is for the expression to be a sum of products. The procedure
for drawing the map from such an expression is illustrated in the next example.
◮ Example 10.21 Suppose we wish to draw the Karnaugh map of the expression
E = x+(y ↓ z). First we convert E to a sum of products: E = x+(y+z)′ = x+y ′ z ′ .
Here is the map—the explanation is given below it.
yz yz ′ y′z′ y′z
x + + + +
x′ +
Karnaugh maps are useful for finding MSP forms because the Karnaugh maps of fundamental
products are easy to recognise. As we saw in the previous example, each fundamental product
(i.e., summand in a sum of products form) is a rectangular subset of the Karnaugh table in
which the number of squares is a power of 2. So the Karnaugh map of an expression in sum of
products form is just the superposition of the Karnaugh maps of each of the summands.
The Karnaugh map of a fundamental product is called a fundamental rectangle, or an s-
rectangle if it contains s squares. Some fundamental rectangles are shown in Figures 10.4, 10.5
and 10.6 below. Note that the size of the rectangle for a fixed fundamental product depends
on how many variables there are. Also notice the “wrap around” properties of some of the
rectangles. This results from the adjacency conditions we mentioned earlier.
dd
The procedure for finding an MSP form of a Boolean expression is to cover its Kar-
naugh map with the fewest and the largest fundamental rectangles. We use the fewest
number of fundamental rectangles to minimise the number of summands, and we use large
rectangles to minimise the total number of literals.
We start with the largest fundamental rectangles and work down to the smallest. In
the case of a Boolean expression in four variables, for example, we try to cover the Kar-
naugh maps with 16-rectangles, then 8-rectangles, then 4-rectangles, then 2-rectangles,
and finally with 1-rectangles. The procedure is described in detail below.
3. Determine which, if any, of the uncovered squares in the Karnaugh map can be
covered by 2k -rectangles which are subsets of the Karnaugh map.
202 Chapter 10. Boolean Algebra
4. Choose the fewest number of 2k -rectangles required to cover all such squares. (There
may be more than one way to do this.)
5. Decrease k by 1. Go to step 6.
6. Repeat steps 3, 4 and 5 until the entire Karnaugh map is covered by fundamental
rectangles. Then go to step 7.
7. Remove any “unnecessary rectangles” (i.e., rectangles that, if removed, do not un-
cover any marked squares in the Karnaugh map).
8. The minimal sum of products form is the sum of the fundamental products corre-
sponding to the chosen fundamental rectangles.
The Karnaugh map contains no 8-rectangles and no 4-rectangles, but can be cov-
ered by 2-rectangles. This can be done in two ways, which is why we have depicted
two maps. The squares corresponding to xyz ′ and x′ y ′ z ′ are contained in just one
2-rectangle each, namely the 2-rectangles corresponding to xy and x′ y ′ respectively.
Thus any MSP form of f2 incorporates xy and x′ y ′ which together cover xyz, xyz ′ ,
x′ y ′ z ′ and x′ y ′ z. To cover xy ′ z we then have the choice of using either y ′ z or xz.
So the two MSP forms for f2 are xy + x′ y ′ + y ′ z and xy + x′ y ′ + xz. Note that
xy + x′ y ′ + xy ′ z is also equivalent to f2 , and that the three corresponding funda-
mental rectangles cover the Karnaugh map without overlapping. However it is not
minimal as it involves a total of 7 literals, rather than 6.
Section 10.6. Minimal sum of products form 203
f3 = x + wx′ z + (x′ y ⊕ z)
= x + wx′ z + (x′ y)z ′ + (x′ y)′ z
= x + wx′ z + x′ yz ′ + (x′′ + y ′ )z
= x + wx′ z + x′ yz ′ + (x + y ′ )z
= x + wx′ z + x′ yz ′ + xz + y ′ z
The Karnaugh map of f3 can be covered in two ways by one 8-rectangle and
three 4-rectangles. The corresponding MSP forms are x + yz ′ + y ′ z + wz and
x + yz ′ + y ′ z + wy. ◭
as a factor, the value would be 0), and moreover x′ y ′ z ′ = 0 for any other assignment of
values. Similarly, the only minterm that equals 1 for x = 0, y = 1 and z = 1 is x′ yz,
which can be identified easily by including as factors x′ (since x = 0), y (since y = 1) and
z (since z = 1). Furthermore, clearly x′ yz equals 0 for every other assignment of values.
Now, given the truth table of a Boolean expression E, consider the set of minterms
that equal 1 for the rows where E = 1. If we add these minterms, the resulting Boolean
sum is equivalent to the given Boolean expression E. This is because the sum has value:
0 + · · · + 0 = 0, when E = 0 (since each minterm equals 0), and value:
0 + · · · + 0 + 1 + 0 · · · + 0 = 1 (where the 1 corresponds to a single minterm), when E = 1.
Therefore this sum is precisely the disjunctive normal form of the Boolean expression E.
Thus the truth table of a Boolean expression can be used to write down the DNF of
the expression. Since the Karnaugh map also represents the DNF, there is a one-to-one
correspondence between each marked square in the Karnaugh map and a row in the truth
table for which the expression has value 1. This connection between truth tables and
Karnaugh maps will be used in the next chapter.
◮ Example 10.23 Draw the truth table for the Boolean expression f1 = y + xy ′ z.
You will find there are five rows for which f1 = 1, namely when x, y and z are:
[0 1 0], [0 1 1], [1 0 1], [1 1 0] and [1 1 1]. These correspond to the five minterms:
x′ yz ′ , x′ yz, xy ′ z, xyz ′ and xyz, respectively. Hence the DNF of f1 is
f1 = x′ yz ′ + x′ yz + xy ′ z + xyz ′ + xyz.
These five minterms are precisely the same as those represented in the Karnaugh
map of f1 in Example 10.22(a). ◭
Exercises
15. Find MSP forms for the Boolean functions with these Karnaugh maps.
(a)
yz yz ′ y′z′ y′z
x + +
x′ +
(b)
yz yz ′ y′z′ y′z
x + + +
x′ + + +
Section 10.7. Solutions to exercises 205
(c)
yz yz ′ y′z′ y′z
wx + + +
wx′ + + +
w ′ x′ + + + +
w′ x + + +
(d)
yz yz ′ y′z′ y′z
wx + +
wx′ + +
w ′ x′ + +
w′ x +
16. Draw the Karnaugh map of f := (wx′ y ′ + yz + w′ x)′ without doing any algebraic
simplification.
(a) x′ (y + z) + xy ′ z
(b) (x + z ′ )(y + x′ )(y ′ + z)
(c) (w′ ⊕ y) + xyz ′ + wxz ′
(d) wx ↑ (xy ↓ yz)
2. Let U := {1, 2} and let B be the Boolean algebra P(U). Put x := {1, 2}, y := {2}
and z := {2}. Then x ∩ z = y ∩ z and z ∕= ∅ but x ∕= y.
3. (a) 210 = 2 · 3 · 5 · 7
(b) They have no prime factors in common, and together they have all the prime
factors of 210.
(c) The prime factors of m ∧ n are those that are common to both m and n. The
prime factors of m ∨ n are those that are common to m or n.
(d) ∧ is like “intersection of common primes” and ∨ is like “union of common
primes”.
206 Chapter 10. Boolean Algebra
(f ) Some of the axioms are not satisfied for F (12). For example, 2 ∨ 2 is supposed
to equal 12, by axiom 5(a), but in reality 2 ∨ 2 = 2 ∨ 6 = 6.
(g) 66 = 2 · 3 · 11 and 12 = 22 · 3.
(h) F (n) is a Boolean algebra if and only if n does not have any factor p2 where p
is a prime.
4. (a) Proof.
x + x′ y = (x + x′ )(x + y) distributivity
= 1(x + y) complementation
= x+y identity %
(b) Proof.
6. Proof.
0 = 0+0 identity
= xy + xy ′ given
= x(y + y ′ ) distributivity
= x1 complementation
= x identity %
7. Proof.
x = x1 identity
= x(z + z ′ ) complementation
= xz + xz ′ distributivity
= yz + yz ′ given
= y(z + z ′ ) distributivity
= y1 complementation
= y identity %
8. (c) Proof.
x = x1 identity
= x(y + y ′ ) complementation
= xy + xy ′ distributivity
= xy + 0 given
= xy identity %
Section 10.7. Solutions to exercises 207
9. Proof.
y = 1y identity
= (x + x′ )y complementation
= xy + x′ y distributivity
= 0 + x′ y given
= x′ x + x′ y complementation
= x′ (x + y) distributivity
= x′ 1 given
= x′ identity %
10. (a) False: Put U := {1, 2} and consider the Boolean algebra P(U ) with x :=
{1, 2}, y := {1, 2}, z := {1} and w := {2}. This is a counterexample to the
proposition.
(b) True: If z = w, then z ′ = w′ .
13. (a) xy + xy ′ + x′ y.
(b) xyz ′ + xy ′ z ′ + xy ′ z + x′ yz + x′ yz ′ + x′ y ′ z.
(c) x′ y ′ z ′ .
(d) 0.
14. (a), (c) and (e) are equivalent, and (b), (d) and (f ) are equivalent.
15. (a)
(b)
There are two MSP forms. The one shown here is z ′ + xy + x′ y ′ + w′ x′ . The
other is z ′ + xy + x′ y ′ + w′ y.
(d)
16. Recall that f + f ′ = 1. Thus the Karnaugh maps of f and f ′ are “opposite” or
“complementary” in the sense that they have no squares in common and together
cover every square. So if we already had the Karnaugh map of f ′ we could obtain
the map of f simply by taking the complement of the first map. Now note that f ′
is already in a sum of products form and so its map can be drawn without doing
any algebra. In the Karnaugh table below, the map of f ′ is marked with − symbols
and the map of f with + symbols.
yz yz ′ y′z′ y′z
wx − + + +
wx′ − + − −
w ′ x′ − + + +
w′ x − − − −
The lesson of this exercise is that if you have a sum of products form for f ′ , then
you don’t need to do any more algebra to get the Karnaugh map of f .
Section 10.7. Solutions to exercises 209
17. (a) x′ y + y ′ z
(b) xyz + x′ y ′ z ′
(c) wy + w′ y ′ + xz ′
(d) You should have been able to simplify the given expression to (wxy ′ + wxy ′ z ′ )′ .
The Karnaugh map is then the complement of the (easy to draw) map for this
expression. (See previous exercise if this puzzles you.) From the map, the MSP
form is w′ + x′ + y.
However, if you look carefully, (wxy ′ + wxy ′ z ′ )′ simplifies further, by the ab-
sorption law, to just (wxy ′ )′ . We can then apply the generalised De Morgan’s
law to get the MSP form immediately, without drawing the Karnaugh map at
all! (Unfortunately, though, we don’t know that this is an MSP form until we
draw the map and cover it as usual.)
Chapter 11
Logic Networks
11.1 Introduction
Digital computers are built mainly out of “binary” components, that is, components that
have only two possible states. Examples of binary components are electric switches and
light bulbs, which can be either “on” or “off”; electric circuits in which currents are
either flowing or not flowing; magnetic cores which can be magnetised either clockwise
or anticlockwise; and locations on a card which can be either punched or unpunched.
Binary components are used because they are cheaper to make, faster to operate, and
more reliable than non-binary components.
We can regard the various functional units in a computer as logic networks or
circuits which accept a collection of inputs and generate a collection of outputs. Each
input and output is “binary”, that is, capable of assuming only two distinct values which,
for convenience, we label 0 and 1. Let x1 , x2 , . . . , xm denote the binary inputs and let z1 ,
z2 , . . . , zn denote the binary outputs. Since the outputs are determined by the inputs,
we can think of a logic network as a representation of a transmission function f which
maps B2m to B2n (where B2 is the 0–1 Boolean algebra). This is shown in Figure 11.1.
x1 ✲
✲ z1
x2 ✲
✲ z2
· f ·
· ·
· ·
✲ zn
xm ✲
210
Section 11.1. Introduction 211
The transmission function can be specified by a truth table, which lists the values of
the outputs z1 , z2 , . . . , zn for each of the 2m combinations of the input values x1 , x2 , . . . ,
xm .
The simplest logic networks, which are the basic building blocks of all networks, are
called gates. These have just a single binary output, and correspond to simple Boolean
operations. The six most elementary gates and the special symbols used to represent them
are shown in Figure 11.2. In each of these symbols, the inputs are shown on the left, and
the Boolean expressions corresponding to the outputs are underneath. The truth tables
of these six gates are shown in Table 11.1. Note that a NOT gate is more commonly
called an inverter.
x x
x z z z
y y
Inverter: z = x’ AND gate: z = xy OR gate: z = x + y
x x x
z z z
y y y
NAND gate: z = (xy)’ NOR gate: z = (x + y)’ XOR gate: z = x + y
x′ xy x + y x ↑ y x ↓ y x ⊕ y
x y NOT AND OR NAND NOR XOR
0 0 1 0 0 1 1 0
0 1 1 0 1 1 0 1
1 0 0 0 1 1 0 1
1 1 0 1 1 0 0 0
Gates can be combined to form more complicated networks. The output(s) of such
networks can be described with Boolean expressions.
w
x
y
z
A signal line may be “split” so that the signal can be used as input to several gates.
◮ Example 11.2 The expression (x′ + y)(x + z ′ ) can be implemented by the circuit
shown below.
s
t
◮ Example 11.4 Let’s do some algebra on the expression we had in Example 11.2:
(x′ + y)(x + z ′ ) = x′ z ′ + yx + yz ′
= (x ↓ z) + y(x + z ′ ).
Since the two expressions are equivalent, the two networks (Figures 11.4 and
11.6) are also equivalent. That is, each network computes the same Boolean func-
tion of its inputs. ◭
Since there may be many equivalent networks that compute a given Boolean function,
how should we choose the final design? There are two general principles that dictate this
choice: we want to the circuit to be as cheap as possible to manufacture, and as fast as
possible at performing its task.
Generally speaking, cost of manufacture is measured in number of gates. Therefore,
the fewer the gates the cheaper the circuit. (It isn’t really as simple as this because some
gates cost more than others, but we’ll ignore this fact.) On the other hand, the speed of
a circuit depends on so-called “levels of gating”. Each gate causes a slight delay in the
signal on the line, so the more gates a signal has to go through, the slower the circuit is.
For example, the circuit in Example 11.2 has three levels of gating because some of its
inputs (though not all) have to go through three gates. Now look at the equivalent circuit
in Example 11.4. This network also has five gates, but there are four levels of gating. So
these two networks cost the same to make, but the first is faster.
As a rule of thumb, the fewer the gates, the fewer the levels of gating. However,
this is not always the case. It can easily happen that network N1 is faster than network
N2 but N1 has more gates (they are just arranged in a better way). So sometimes the
cheapest circuit is not the fastest! In this unit we are not going to worry too much
214 Chapter 11. Logic Networks
about this practical problem. The main thing is that we learn how to take an expression,
manipulate it algebraically, and come up with a circuit that uses as few gates as possible.
In some cases this might not be the fastest, but we won’t worry about that. If you can
see that there are two or more circuits with the fewest gates, then you should choose one
with the fewest levels of gating. However, this generally takes too much time, so you
don’t have to do this is your assignments, tests and exams. You get full marks for just
using the fewest gates possible. So remember, whenever you are asked to design a logic
network you must use as few gates as possible, but don’t have to pay attention to the levels
of gating.
The process of circuit design is illustrated in the next few examples. Note that a key
step is obtaining the MSP form of the expression.
and so the MSP form is xy + yz ′ . In this expression there are only four operations,
a big improvement on the original seven gates. However, we can do even better
than four gates simply by writing the expression as y(x + z ′ ). Now there are only
three operations, and this is the best possible. So the most efficient network for
the expression is:
What happened in this example is very typical. The expression we begin with
is very inefficient, so we convert to MSP, which is a big improvement. However,
the MSP might be able to be manipulated further to reach an optimal design. ◭
Instead of being given a Boolean expression to design from, it is also very common
to be given a description in words of what the output of the circuit should be. In this
situation we have to convert the description to a Boolean expression.
Section 11.1. Introduction 215
◮ Example 11.6 Consider MAJ(x, y, z), the majority function on 3 variables, de-
fined by:
%
&0 if at most one of x, y and z has the value 1.
MAJ(x, y, z) =
'1 if two or three of x, y and z have the value 1
So the MSP form is yz + xy + xz. This can be simulated with three AND gates
all feeding into a 3-input OR gate, totalling 4 gates and 2 levels. Alternatively, we
can re-write the MSP as xy + (x + y)z. This can also be simulated in 4 gates, but
needs 3 levels so is a little slower. So the optimal circuit is:
x
y
MAJ
z
216 Chapter 11. Logic Networks
◮ Example 11.7 Design a network with four inputs w, x, y and z and one output u
so that
u = 1 ⇔ (wx = 0 and y + z = 1).
The truth table for u is shown in Table 11.2.
w x y z wx = 0 y + z = 1 u
0 0 0 0 1 0 0
0 0 0 1 1 1 1
0 0 1 0 1 1 1
0 0 1 1 1 1 1
0 1 0 0 1 0 0
0 1 0 1 1 1 1
0 1 1 0 1 1 1
0 1 1 1 1 1 1
1 0 0 0 1 0 0
1 0 0 1 1 1 1
1 0 1 0 1 1 1
1 0 1 1 1 1 1
1 1 0 0 0 0 0
1 1 0 1 0 1 0
1 1 1 0 0 1 0
1 1 1 1 0 1 0
u = (wx + y ′ z ′ )′
= wx ↓ y ′ z ′
= wx ↓ (y ↓ z)
Section 11.1. Introduction 217
w
x
y
z
Exercises
Remember to use as few gates as possible when designing a network.
1. For each of the following Boolean expressions sketch a logic network using just NOT,
OR and AND gates.
2. Sketch logic networks, using any of the six standard gates, in which there are three
binary inputs x, y, and z, and one binary output u, where
3. Sketch logic networks, using any of the six standard gates, in which there are four
inputs u, v, w and x, and one output z, where
11.2 Adders
The most basic digital arithmetic function is the addition of two binary digits (bits). A
logic circuit that represents the addition of two bits is called a half-adder. A half-adder
has two inputs x and y, and two outputs, s and c, called the sum and carry digits. The
truth table for s and c is shown below:
x y c s
0 0 0 0
0 1 0 1
1 0 0 1
1 1 1 0
Note that this table is just the base-2 addition table of x and y. We see from the table
that c = xy and s = x ⊕ y. Thus a half-adder can be implemented as shown below:
x
c
s
y
Now suppose we want to add two 2-digit binary numbers X and Y whose binary
representations are x1 x0 and y1 y0 , respectively. As we learned in Chapter 3, first we must
add the digits x0 and y0 . This can be performed by a half-adder to give a sum digit s0 and
a carry digit c0 . Then we add the carry digit c0 to the digits x1 and y1 to give a new sum
digit s1 and a new carry digit c1 . Thus the binary representation of X + Y is c1 s1 s0 . Note
that the second stage of this computation cannot be performed by a half-adder because
we need to add three bits at once, namely x1 , y1 and c0 . A logic circuit that performs
this operation is called a full-adder. A full-adder has three inputs x, y and c, and two
outputs s and c̃. The input c represents the “old” carry digit, and the output c̃ represents
the “new” carry digit. The truth table for s and c̃ is shown below:
x y c c̃ s
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
Section 11.2. Adders 219
From this table we see that the DNFs of the outputs are
These expressions can then be used to design a logic network for a full-adder. This is left
as an exercise. Now, finally, we have a circuit for adding two 2-digit binary numbers x1 x0
and y1 y0 :
x1 Full c1
y1 Adder s1
c
0
x0 Half
y Adder s0
0
Now we are ready to consider the addition of larger binary numbers. Suppose that
X and Y are two natural numbers, and that xn . . . x1 x0 and yn . . . y1 y0 are their binary
representations. Let S = X + Y , and let sn+1 . . . s1 s0 be its binary representation. The
binary digits s0 , s1 , . . . , sn+1 are computed as follows:
s0 = x0 + y0 (half-adder)
s1 = x1 + y1 + c0 (full-adder)
s2 = x2 + y2 + c1 (full-adder)
.. ..
. .
sn = xn + yn + cn−1 (full-adder)
sn+1 = cn
So this is how computers perform addition! For example, a logic circuit which represents
the addition of two 3-bit numbers x2 x1 x0 and y2 y1 y0 is shown in Figure 11.8 below.
x2 Full s3
y2 Adder s2
c1
x1 Full
y1 Adder s1
c0
x0 Half
y0 Adder s0
It should be pointed out that more sophisticated circuits have been designed which
perform binary addition and other arithmetic operations faster than the ones shown above.
Our examples are merely to show how complicated circuits can be built out of simple ones.
Exercises
4. In practice, AND gates and OR gates are actually built from NAND gates. Design
a half-adder that uses only NAND gates and inverters.
6. Show that the output c̃ of a full-adder can be expressed as c̃ = MAJ(x, y, c), where
MAJ is the three-input majority function defined in Example 11.6.
y u
3. (a)
z = uvwx + uvw′ x′ + u′ v ′ wx + u′ v ′ w′ x′ MSP
= uv(wx + w′ x′ ) + u′ v ′ (wx + w′ x′ )
= (uv + u′ v ′ )(wx + w′ x′ )
= (u ⊕ v ′ )(w ⊕ x′ )
u
v
z
w
x
Section 11.3. Solutions to exercises 221
(c)
z = uv + wx + uw + ux + vw + vx MSP
= uv + wx + u(w + x) + v(w + x)
= uv + wx + (u + v)(w + x)
v
z
w
4. c = xy = (xy)′′ = (x ↑ y)′ .
s = x ⊕ y = xy ′ + x′ y = (x ↑ y ′ )′ + (x′ ↑ y)′ = (x ↑ y ′ ) ↑ (x′ ↑ y).
x
c